I am curious to know whether it is possible to have a labeling’s for either test data or validation data set for the Task-III as the competition is done I feel It is fair enough to release the labeling’s of either of them. Hoping for a positive reply
We currently don’t plan a public release of the ground-truth labels of the test-set. We do have in mind reopening the possibility for limited submissions after complete workup of the challenge, but not anytime soon.
Unless there’s a compelling reason against it (e.g., the same test set will be used for the 2019 Challenge), I feel it would be very important for the community that the ground truths are released.
Our current reasoning is that we feel follow-up machine learning studies with known ground-truth have a tendency to overfit to test-sets, and therefore may be hard to interpret in comparison to original results without access to those. At the same time we don’t want to close things off, and think allowing limited submissions would serve researchers well.
Eventually we want to do what is best for the community and increasing knowledge in that area, so we are of course open to discussions.
I agree that a hidden ground-truth could be useful in that sense, but to preserve validity the limited submissions would have to be really limited (once a day or less), otherwise even the very limited information provided by the score given by the ISIC site can be used (quite successfully) to fit the models — it has been done on Kaggle.
Definitely, I agree once a day is still too frequent.
Indeed, concern about ‘wacky boosting’ is part of the reason we kept Test scores totally secret and don’t consider Validation scores to be worthwhile for ranking. Ideas here are definitely welcome, as we want this Test set data (and ground-truth) to be useful for the community into the future.
Given the limited size of the test set i don’t think it makes sense to allow more than monthly submissions.
Personally i would favour a quarterly ranking.
Quarterly seems to me a bit too restrictive — it would be interesting in another spirit, if it would serve as “mini-quarterly challenges”, with an streamlined procedure to not overtax the organizers neither the researchers, to keep the community in our toes, in between the more formal annual challenges.
I agree with @dr.eduardo.valle as I assume research groups will need to test their approaches continuously for their ongoing research projects. Between-challenges are an interesting thought.
As Philip said, we are debating this internally.
Another reason for keeping the ground truth for the phases “hidden”, at least for now is to allow this to be used for teaching purposes in general. If we reopen the submission engine in the future, that would allow classes/students/whoever to be able to hopefully use this platform as a way to learn and improve on existing models using the current data set. Once the ground truth is “in the wild”, it will be very complicated for most people interested in the data set to keep track of which images were part of the validation set.
Also as we grow the parent database (ISIC database), we will be able to generate additional training/testing sets… So given the relatively small size of the validation image set (1000s) relative to the number of images in the archive (10,000++), being able to see how things change over time may be valuable.
Hopefully it’s clear the whole mission of all of the work we’ve put in the archive is to directly support these sort’s of machine learning exercises and knowledge advancement. So I think these discussions are extremely helpful.
Is it possible to see how we would have scored on the validation or test set? I just now stumbled upon this challenge and would like to try it out!
Hi @robzuazua, as you can read above we will make this possible in the future. We are still figuring out the details and will also try to get input from users at the upcoming MICCAI conference.
Therefore availability will rather be later this fall (not earlier than october), if you need results for urgent reasons, e.g. your thesis or a pending article revision, send me a PM.
Thanks for the quick reply, I just wanted to double check.
I am using this competition as practice, so I will just use 20% of training data as my validation set for now. I’m looking forward to participating in the next challenge!