Evaluation

eric.c · August 22, 2019, 12:42am

You mentioned that the validation score (non-representative and pre-determined subset of images) is not intended to be used for algorithm ranking or evaluation, but is provided for a sanity check of submission data. We are wondering how close the final evaluation score will be to this validation score?

We realized that our own validation score (5-fold validation) is somehow very different from the system computed validation score. Do you have any idea why this is the case?

noelcodella · August 22, 2019, 12:52am

Sorry, we’re not going to comment on that. We recommend you use the validation score as a sanity check, and nothing else.

eric.c · August 22, 2019, 5:32am

Thanks for your response. We understand that you cannot comment more.

We have another question regarding the evaluation:

In the submission page, you mentioned the following:

Note that arbitrary score ranges and thresholds can be converted to the range of 0.0 to 1.0, with a threshold of 0.5, trivially using the following sigmoid conversion:

1 / (1 + e^(-(a(x - b))))

where x is the original score, b is the binary threshold, and a is a scaling parameter (i.e. the inverse measured standard deviation on a held-out dataset). Predicted responses should set the binary threshold b to a value where the classification system is expected to achieve 89% sensitivity, although this is not required.

We do not understand how the change of b value can change sensitivity. Can you please give more explanations and instructions on this?