You mentioned that the validation score (non-representative and pre-determined subset of images) is not intended to be used for algorithm ranking or evaluation, but is provided for a sanity check of submission data. We are wondering how close the final evaluation score will be to this validation score?
We realized that our own validation score (5-fold validation) is somehow very different from the system computed validation score. Do you have any idea why this is the case?
Thanks for your response. We understand that you cannot comment more.
We have another question regarding the evaluation:
In the submission page, you mentioned the following:
Note that arbitrary score ranges and thresholds can be converted to the range of 0.0 to 1.0, with a threshold of 0.5, trivially using the following sigmoid conversion:
1 / (1 + e^(-(a(x - b))))
where x is the original score, b is the binary threshold, and a is a scaling parameter (i.e. the inverse measured standard deviation on a held-out dataset). Predicted responses should set the binary threshold b to a value where the classification system is expected to achieve 89% sensitivity, although this is not required.
We do not understand how the change of b value can change sensitivity. Can you please give more explanations and instructions on this?