Here is how I did it

I don’t intend to submit my model since it will be automatically dropped. I am participating only as a learning exercise. Plus I don’t have the time or desire to write a ‘scientific’ manuscript.

So I thought I’d share how I developed my model.

For starters, I used PyTorch and package. I experimented with multiple pre-trained models (VGG, inception, resnet, etc. ). ResNext101-64 seemed to produce the best result.

I used package for data augmentation and chose to set the learning rate to 0.0004. I used lr_find method in to find the best learning rate. also supports differential learning rates. This allows me to use different learning rate for different parts of the architecture. I used three different learning rates for different parts of the architecture (0.0004/6, 0.0004/2 and 0.0004).

I had to reduce batch size to 13 because I was running out of memory with my setup which is Linux with 8 cores, 32gb ram and P4000 GPU with 8GB ram. Images where scaled to 256X256.

The best balanced accuracy I was able to get with this model was 0.87 on my holdout set and 0.85 on the test dataset (100 images). The actual code for this model is about 50 lines.

I can’t wait to see the leaderboard and learn about some of the other approaches.

Good luck everyone.


I used three different learning rates for different parts of the architecture (0.0004/6, 0.0004/2 and 0.0004).

Could you state what the intuition behind this is? doc explains it best. This appears to be mostly relevant to ImageNet-trained models, where we want to alter the layers closest to the images by much smaller amounts hence 0.0004/6 for first group, etc.

please be aware that if you did not control for duplicate images in the training set, results from a validation set or test set drawn randomly from the training set will probably be biased and overestimate accuracy. For details please refer to this thread

Your solution is for task 3, right?

Yes. Sorry. I should’ve mentioned it.

I did control for duplicate images. I did forget to mention that I used ‘test time augmentation’ which improved my result a little. With TTA, the image that is being scored is augmented (in my case I created 4 images) and all 5 images are scored. The probability for all 5 are averaged for the final submission file.