Submission procedure


I am participating first time. I am not able to find the detailed submission instructions anywhere. Can someone please explain the submission procedure?

Detailed submission instructions are forthcoming, but for now this is the process:

To make a submission follows a 3 step process, starting with the “New Submission” button on the task page:

At this point you can create a team that you will submit on behalf of (or pick one on the right hand side if you’re already a member of one).

Afterwards, your submission will need an approach. This is a name and manuscript (pdf) that explains the approach you’re taking to solving the problem. Notice here that you can reuse an existing approach you’ve already created, or create a brand new one.

Finally, you should submit your data to have it scored:

Let me know if that answers your immediate concerns.

Good morning. Can you explain what information the CSV file should contain? Is it similar to ISIC_2019_Training_GroundTruth.csv? Should we send the probabilities of the labels? Can you provide an example file so that the information is completely clear?

Exact instructions will be made clear, but expect the CSV file to look similar to the CSV file for the training set, with the exception that there will be an additional out-of-distribution class.

When will be the sample of submission csv will be released?

1 Like

Would you be so kind to publish a csv file as a sample that indicates how to send the results?
Thank you!

Attached is an example of what the header and leftmost column should look like (although the image order should not matter as long as the image names are correct). Classifier output should be in the appropriate sections of the CSV (which are left blank in this example). Classifier scores must be in floating point format. submission_example_format.csv (185.1 KB)

Please also note the following:

Diagnosis confidences should not sum to 1.0, but instead should be expressed as floating-point values in the closed interval [0.0, 1.0] , where 0.5 is used as the binary classification threshold. Note that arbitrary score ranges and thresholds can be converted to the range of 0.0 to 1.0 , with a threshold of 0.5 , trivially using the following sigmoid conversion:

1 / (1 + e^(-(a(x - b))))

where x is the original score, b is the binary threshold, and a is a scaling parameter (i.e. the inverse measured standard deviation on a held-out dataset). Predicted responses should set the binary threshold b to a value where the classification system is expected to achieve 89% sensitivity, although this is not required.



I don’t understand your comment. If I used a softmax at the end of my model, then the “probabilities” of the classes would be sum to 1.0 for each given sample. Also, I don’t understand why such diagnosis confidences are need to compute the balanced accuracy metric.

Do you use such diagnosis confidences to compute another evaluation measure? In case that I reported the probabilities given by softmax, Will it alter the results when computing the balanced accuracy?

Best regards


A doctor can’t interpret a softmax as output: i.e. even though the classifier might call something benign, how likely is a melanoma diagnosis? With a softmax, the doctor cannot determine this.

Therefore, each category prediction must be independent of the others, with 0.5 at a binary threshold (higher than 0.5 means more likely than not). In this manner, the output can be interpreted – the most probable diagnosis is still clear (the max prediction), but other possible disease states to rule out are also clear (those above 0.5 or some other threshold).

Softmax versus independent probabilities will not affect the balanced accuracy metric; however, there are many secondary metrics we measure that your method will perform poorly in if it relies on softmax alone.

Also remember that your system must classify the out-of-distribution category for which no training data is provided.

1 Like

Thank you! But this information should have been communicated much earlier


Also just one question, lets say the winning team did an Amazing job at balanced accuracy while the other scores are bad, they would still win the competition is that correct?

Ranking will be performed on balanced accuracy

FYI I’ve also posted here re: validation:

Completely agree with you. This information describes a different problem to the one I have understood since the beginning.

Since the labels of the database are mutually exclusive (all the images are labeled only with one class), the softmax activation function is adequate to train the NN.

The softmax activation function won’t alter the balanced accuracy score, so it won’t change your position in the ranking.

As far as the secondary binary metrics are concerned, replacing softmax for a single fit sigmoid across all the categories would suffice.

FYI there may be some additional clarity available here, which we’ve updated based on your helpful forum questions. Thanks to all.