Hi, I’ve been stuck for some time now trying to figure out how the goal score is obtained so I can fully interpret how good (or bad) my submissions are, but I don’t get how it’s done.
In the evaluation webpage, the Balanced Multiclass Accuracy is defined as follows:
Normalized (or balanced) multi-class accuracy is defined as the accuracies of each category, weighted by the category prevalence. Specifically, it is the arithmetic mean of the
(<category>_true_positives / <category>_positives)
across each of the diagnostic categories. This metric is semantically equivalent to the average recall score.
Hence, the Balanced Multiclass Accuracy should be equal to the mean sensitivity (recall) given in the chart, but in most of the teams in the leaderboard (including mine) is not.
I suspect it has something to do with the binary threshold and sigmoid conversion explained in the submission page, since I guess some teams have implemented it and others not, but I’ve uploaded the same prediction with and without this sigmoid conversion and all the stats were exactly the same. I think my error is here since I’m not completely sure on how to implement the sigmoid conversion, specifically what ‘a’ value I should take.
I’d be really grateful if someone explains how the sigmoid conversion is done and also confirms that when it is used the mean sensitivity and the balanced multiclass accuracy are equal.
Thanks in advance.