Dear Organizers, two nitpicking precisions on this final sprint:
(1) The deadline says 11:59:59pm EDT, but the countdown clock is not synchronised to EDT. Which one is correct?
(2) Task 3 says “Diagnosis confidences are expressed as floating-point values in the closed interval [0.0, 1.0], where 0.5 is used as the binary classification threshold.” Does this mean that the assigned class has to have probability higher than 0.5? Or is it sufficient that the assigned class has the highest probability?
Hi @dr.eduardo.valle,
Thank you for pointing these things out.
(1) Unfortunately the countdown clock is fixed to the WordPress instance time-zone, I have removed it to avoid confusion close to the deadline.
(2) For the primary (ranking) metric highest probability is sufficient, the 0.5 threshold can be disregarded. The 0.5 threshold will only be used for secondary metrics. As a test, you can try to divide all your validation set predictions ranging between [0.0,1.0] by 2.1, where now no value should be above 0.5, and submit again - the evaluation metric should stay identical.
As a note: Although I don’t have the code for the evaluation container, the allocation of values to a diagnosis should be dependant on the column headers (you can test this by switching them in validation submissions). In your case, the header should be something e.g. image,MEL,NV,BCC,AKIEC,BKL,DF,VASC for this row to count as melanoma. I am just raising this issue in case someone sorts prediction values in alphabetical order of the classes - but misaligned submissions should become evident with exceedingly low validation-set scores (<0.200) anyways.
Is validation on Task 3 working correctly at the moment? Cause I was getting under-0.15 scores and just now ran a test of submitting my predictions, then moving first column in second, third and so on place and somehow getting .106 every time…I’m puzzled, cause I’m getting decent local validation scores(and I’m not splitting images of same lesions between my train and local validation).
Since these low suggest almost random guessing, getting them is almost always a problem with having your predicts aligned correctly with the image name row and/or class column.
I guess you will have to search for problems in your pipeline, e.g. in this thread people found described shuffling of Keras’ flow_from_directory() caused such a problem.
For testing purposes, I have myself previously successfully uploaded validation predictions with scientific notation. The quickest way to check these is to actually try a validation set submission.
To provide a slightly more definitive answer to some questions here:
A header row is required for a valid Task 3 CSV, and the order of columns doesn’t matter. The names of the columns themselves must exactly match the ground truth format.
Scientific notation is parsed by the scoring system. Specifically, we use pandas.read_csv to load Task 3 CSV files, so if your number format is parsed by that, it will be scored by us. Non-parsable / non-numeric values should trigger a scoring rejection anyway.
@tetelias I took a manual look at one of your submitted validation CSVs. The file appears to be superficially of a correct format, so I’d follow the advice of others and ensure that you don’t have issues with your pipeline causing values to be output in the wrong columns or rows being labeled as the wrong images.
This is precisely the sort of issue (valid format but scores much too low) that the validation system is designed to detect!