How is the main score calculated?

jgciudad · April 19, 2020, 6:41pm

Hi, I’ve been stuck for some time now trying to figure out how the goal score is obtained so I can fully interpret how good (or bad) my submissions are, but I don’t get how it’s done.

In the evaluation webpage, the Balanced Multiclass Accuracy is defined as follows:

Normalized (or balanced) multi-class accuracy is defined as the accuracies of each category, weighted by the category prevalence. Specifically, it is the arithmetic mean of the (<category>_true_positives / <category>_positives) across each of the diagnostic categories. This metric is semantically equivalent to the average recall score.

Hence, the Balanced Multiclass Accuracy should be equal to the mean sensitivity (recall) given in the chart, but in most of the teams in the leaderboard (including mine) is not.

I suspect it has something to do with the binary threshold and sigmoid conversion explained in the submission page, since I guess some teams have implemented it and others not, but I’ve uploaded the same prediction with and without this sigmoid conversion and all the stats were exactly the same. I think my error is here since I’m not completely sure on how to implement the sigmoid conversion, specifically what ‘a’ value I should take.

I’d be really grateful if someone explains how the sigmoid conversion is done and also confirms that when it is used the mean sensitivity and the balanced multiclass accuracy are equal.

Thanks in advance.

minhthienap · August 10, 2020, 8:21pm

I have a same question with you. Are balanced multiclass accuracy and mean sensitivity equal ? (currently the result shown in table is not equal)
@brianhelba , can you help to answer it for us, thank you very much.

kurtansn · September 23, 2020, 5:10pm

The greatest diagnosis category score determines the category prediction for each image. The mean recall of this multiclass confusion matrix (i.e. the mean of the diagonal element-wise divided by the positive incidences) is the balanced multi-class accuracy.

In other words, the sensitivity of each disease class is not calculated with the continuous outputs from the algorithm, but rather binarized vectors in which, for the output vector of a particular image, the positive prediction is attributed to the disease class with the maximum output value.