I recently developed a machine-learning based algorithm to identify some specific regions of interest in a series of images. The resolution of images are all the same, but the size of the required regions of interest varies in this stack of images. After performing this classification task, I tried to validate the algorithm's performance using ROC curves.
The problem is that these curves cannot be used to compare how well the classification is done in different images because for those images in which the size of targeted regions are smaller than the rest of images, there will be a huge number of true negative pixels that can spuriously increase AUC values for these images regardless of how well the algorithm could identify true positive pixels. As a result, we may get higher AUC value for terrible classification outcome for images that have a small region of interest compared to those that have large region of interest and very good classification results.
Does anyone know how we can overcome this limitation of ROC curves?