Indeed, concern about ‘wacky boosting’ is part of the reason we kept Test scores totally secret and don’t consider Validation scores to be worthwhile for ranking. Ideas here are definitely welcome, as we want this Test set data (and ground-truth) to be useful for the community into the future.