Detecting Mitoses with a Convolutional Neural Network for MIDOG 2022 Challenge
Abstract
This work presents a mitosis detection method with only one vanilla Convolutional Neural Network (CNN). Our approach consists of two steps: given an image, we first apply a CNN using a sliding window technique to extract patches that have mitoses; we then calculate each extracted patch’s class activation map to obtain the mitosis’s precise location. To increase the model generalizability, we train the CNN with a series of data augmentation techniques, a loss that copes with noise-labeled images, and an active learning strategy. Our approach achieved an F1 score of 0.7323 with an EfficientNet-b3 model in the preliminary test phase of the MIDOG 2022 challenge.
Gu
itosis detection | domain shift | convolutional neural network | class activation map
xac@ucla.edu
Introduction
Mitotic activity is a crucial pathological indicator related to cancer malignancy and patients’ prognosis (cree2021counting). Because of its importance, a considerable amount of literature has proposed datasets (ludovic2013mitosis, veta2019predicting, aubreville2020completely, bertram2019large) and deep learning models (cirecsan2013mitosis, li2018deepmitosis, mahmood2020artificial) for mitosis detection. State-of-the-art methods utilize a two-stage approach — a localization model (e.g., RetinaNet) is employed for extracting interest locations, followed by a classification model to justify whether these locations have mitoses (li2018deepmitosis, mahmood2020artificial, aubreville2020completely). Such a two-stage setup was reported to improve the performance of mitosis detection compared to that with the localization model only (aubreville2020completely).
Since adding a classification model can improve the performance, we argue that using only one CNN model for mitosis detection is also viable. Because CNNs cannot directly report the location of mitosis, previous works either modified the structure of CNNs (cirecsan2013mitosis), or used CNNs with a small input size to reduce the localization errors (8327641). Instead, our approach extracts the location of mitoses with the class activation map (CAM) (zhou2016learning), which allows CNNs to accept a larger input size for more efficient training. Also, our approach can work with vanilla CNNs because calculating CAMs does not require changing the network structure.
We validated our proposed method in MItosis DOmain Generalization (MIDOG) 2022 Challenge (midog2022). The challenge training set consists of 403 Hematoxylin & Eosin (H&E) stained regions of interest (ROIs, average size= pixels), covering six tumor types scanned from multiple scanners. 354/403 ROIs have been labeled and have 9,501 mitotic figures. The preliminary test set includes 20 cases from four tumor types, and the final test set has 100 independent tumor cases from ten tumor types. Given the dataset’s high variance, we employed three techniques to improve the CNN’s generalizability:
-
An augmentation pipeline with balance-mixup (galdran2021balanced) and stain augmentation (8327641);
-
An Online Uncertainty Sample Mining (OUSM) (xue2019robust) + COnsistent RAnk Logits (CORAL) loss (cao2020rank) to cope with noisy labels;
-
An active learning strategy that adds false-positive, false-negative, and hard-negative patches after each round of training.
Methods
0.1 Extracting Patches for Initial Training
We randomly used of the image instances in the MIDOG 2022 Challenge to generate the training set and for the validation set. To maximally utilize the dataset, we included unlabelled images in the training set and treated them as negative images (i.e., no mitoses inside). For each image, we extracted patches with the size of pixels surrounding the center of each annotation (provided by the challenge) and placed them into the train/validation set.
0.2 Model Training
We trained an EfficientNet-b3 (tan2019efficientnet) model (input size: ) from pre-trained ImageNet weights. Here, we added model generalization by constructing an online data augmentation pipeline. The pipeline includes general image augmentation techniques, including random rotation, flip, elastic transform, grid distortion, affine, color jitter, Gaussian blur, and Gaussian noise. Besides, we added two augmentation methods – stain augmentation (8327641) and balance-mixup (galdran2021balanced) – to deal with the domain shift in pathology images. Examples of augmented patches are shown in Figure 1(b). The model was trained with an SGD optimizer with momentum 0.9 and a Cosine Annealing learning rate scheduler with warm restart (max LR=). Since we treated all unlabeled images as negative, we used an OUSM(xue2019robust) + CORAL loss (cao2020rank) to deal with noisy labels. Each round of training had 100 epochs, and we selected the model with the highest F1 score on the validation set for inferencing.
0.3 Inferencing
We slid the trained EfficientNet on train and validation images with window size and step-size 30. We then cross-referenced the CNN predictions with the ground truth. Here, we define a positive window classification as a true-positive if mitoses were inside the window and a false-positive otherwise. We further define false-negative if no positive windows surround a mitosis annotation.
0.4 Incrementing the Patch Dataset with Active Learning
We employed a multi-round active learning process to boost the performance of the EfficientNet (Figure 1(a)). Each round starts with the model training on the current train/validation set (Section 0.2). Then, the best model is selected and applied to the images (Section 0.3). After that, false-positive, false-negative, and hard-negative patches are added to the train/validation set. The procedure was repeated six times until the model’s F1 score on the validation set did not increase. Eventually, there are 103,816 patches in the final training set, and 23,638 in the validation set.
0.5 Extracting Mitosis Locations with CAMs
We used the best model from the final round in Section 0.4 for the test images. A window with a CNN probability > 0.84 was considered positive, and non-maximum suppression with a threshold of 0.22 was used to eliminate the overlapping windows. For each positive window, we calculated the CAM with GradCAM++ (gradcam++), and extracted the hotspot’s centroid as the mitosis location (Figure 1(c)).
Results
On the preliminary test phase of the MIDOG 2022 Challenge, our approach achieved an overall F1 score of 0.7323, with 0.7313 precision and 0.7333 recall. More specifically, our model has F1 scores of 0.7467, 0.7593, 0.6963, and 0.7407 for detecting mitoses in four types of tumors, respectively.
Discussion
Although CAMs are primarily used for explaining CNN classifications, we demonstrate their power to detect mitoses precisely in H&E images. It is noteworthy that CAMs might fail to highlight all mitoses when there are multiple in an image. To this extent, we suggest that future work consider aligning mitosis locations and CAMs to improve detection performance and explanation quality.