Detecting Mitoses with a Convolutional Neural Network for MIDOG 2022 Challenge

Hongyan Gu HCI Research, University of California, Los Angeles, USA Mohammad Haeri Department of Pathology and Laboratory Medicine, University of Kansas Medical Center, USA Shuo Ni HCI Research, University of California, Los Angeles, USA Christopher Kazu Williams Department of Pathology and Laboratory Medicine, UCLA David Geffen School of Medicine, USA Neda Zarrin-Khameh Pathology and Immunology, Baylor College of Medicine, USA Shino Magaki Department of Pathology and Laboratory Medicine, UCLA David Geffen School of Medicine, USA Xiang ‘Anthony’ Chen HCI Research, University of California, Los Angeles, USA
Abstract

This work presents a mitosis detection method with only one vanilla Convolutional Neural Network (CNN). Our approach consists of two steps: given an image, we first apply a CNN using a sliding window technique to extract patches that have mitoses; we then calculate each extracted patch’s class activation map to obtain the mitosis’s precise location. To increase the model generalizability, we train the CNN with a series of data augmentation techniques, a loss that copes with noise-labeled images, and an active learning strategy. Our approach achieved an F1 score of 0.7323 with an EfficientNet-b3 model in the preliminary test phase of the MIDOG 2022 challenge.

M
\leadauthor

Gu

itosis detection | domain shift | convolutional neural network | class activation map

{corrauthor}

xac@ucla.edu

Introduction

Mitotic activity is a crucial pathological indicator related to cancer malignancy and patients’ prognosis (cree2021counting). Because of its importance, a considerable amount of literature has proposed datasets (ludovic2013mitosis, veta2019predicting, aubreville2020completely, bertram2019large) and deep learning models (cirecsan2013mitosis, li2018deepmitosis, mahmood2020artificial) for mitosis detection. State-of-the-art methods utilize a two-stage approach — a localization model (e.g., RetinaNet) is employed for extracting interest locations, followed by a classification model to justify whether these locations have mitoses (li2018deepmitosis, mahmood2020artificial, aubreville2020completely). Such a two-stage setup was reported to improve the performance of mitosis detection compared to that with the localization model only (aubreville2020completely).

Since adding a classification model can improve the performance, we argue that using only one CNN model for mitosis detection is also viable. Because CNNs cannot directly report the location of mitosis, previous works either modified the structure of CNNs (cirecsan2013mitosis), or used CNNs with a small input size to reduce the localization errors (8327641). Instead, our approach extracts the location of mitoses with the class activation map (CAM) (zhou2016learning), which allows CNNs to accept a larger input size for more efficient training. Also, our approach can work with vanilla CNNs because calculating CAMs does not require changing the network structure.

We validated our proposed method in MItosis DOmain Generalization (MIDOG) 2022 Challenge (midog2022). The challenge training set consists of 403 Hematoxylin & Eosin (H&E) stained regions of interest (ROIs, average size= pixels), covering six tumor types scanned from multiple scanners. 354/403 ROIs have been labeled and have 9,501 mitotic figures. The preliminary test set includes 20 cases from four tumor types, and the final test set has 100 independent tumor cases from ten tumor types. Given the dataset’s high variance, we employed three techniques to improve the CNN’s generalizability:

  1. An augmentation pipeline with balance-mixup (galdran2021balanced) and stain augmentation (8327641);

  2. An Online Uncertainty Sample Mining (OUSM) (xue2019robust) + COnsistent RAnk Logits (CORAL) loss (cao2020rank) to cope with noisy labels;

  3. An active learning strategy that adds false-positive, false-negative, and hard-negative patches after each round of training.

Methods

0.1 Extracting Patches for Initial Training

We randomly used of the image instances in the MIDOG 2022 Challenge to generate the training set and for the validation set. To maximally utilize the dataset, we included unlabelled images in the training set and treated them as negative images (i.e., no mitoses inside). For each image, we extracted patches with the size of pixels surrounding the center of each annotation (provided by the challenge) and placed them into the train/validation set.

(a) Illustration of the active learning training strategy used in this work; (b) Examples of augmented patches according to our augmentation pipeline; (c) Overall data processing pipeline of our approach: detecting mitosis using a convolution neural network and the class activation map.
Figure 1: (a) Illustration of the active learning training strategy used in this work; (b) Examples of augmented patches according to our augmentation pipeline; (c) Overall data processing pipeline of our approach: detecting mitosis using a convolution neural network and the class activation map.

0.2 Model Training

We trained an EfficientNet-b3 (tan2019efficientnet) model (input size: ) from pre-trained ImageNet weights. Here, we added model generalization by constructing an online data augmentation pipeline. The pipeline includes general image augmentation techniques, including random rotation, flip, elastic transform, grid distortion, affine, color jitter, Gaussian blur, and Gaussian noise. Besides, we added two augmentation methods – stain augmentation (8327641) and balance-mixup (galdran2021balanced) – to deal with the domain shift in pathology images. Examples of augmented patches are shown in Figure 1(b). The model was trained with an SGD optimizer with momentum 0.9 and a Cosine Annealing learning rate scheduler with warm restart (max LR=). Since we treated all unlabeled images as negative, we used an OUSM(xue2019robust) + CORAL loss (cao2020rank) to deal with noisy labels. Each round of training had 100 epochs, and we selected the model with the highest F1 score on the validation set for inferencing.

0.3 Inferencing

We slid the trained EfficientNet on train and validation images with window size and step-size 30. We then cross-referenced the CNN predictions with the ground truth. Here, we define a positive window classification as a true-positive if mitoses were inside the window and a false-positive otherwise. We further define false-negative if no positive windows surround a mitosis annotation.

0.4 Incrementing the Patch Dataset with Active Learning

We employed a multi-round active learning process to boost the performance of the EfficientNet (Figure 1(a)). Each round starts with the model training on the current train/validation set (Section 0.2). Then, the best model is selected and applied to the images (Section 0.3). After that, false-positive, false-negative, and hard-negative patches are added to the train/validation set. The procedure was repeated six times until the model’s F1 score on the validation set did not increase. Eventually, there are 103,816 patches in the final training set, and 23,638 in the validation set.

0.5 Extracting Mitosis Locations with CAMs

We used the best model from the final round in Section 0.4 for the test images. A window with a CNN probability > 0.84 was considered positive, and non-maximum suppression with a threshold of 0.22 was used to eliminate the overlapping windows. For each positive window, we calculated the CAM with GradCAM++ (gradcam++), and extracted the hotspot’s centroid as the mitosis location (Figure 1(c)).

Results

On the preliminary test phase of the MIDOG 2022 Challenge, our approach achieved an overall F1 score of 0.7323, with 0.7313 precision and 0.7333 recall. More specifically, our model has F1 scores of 0.7467, 0.7593, 0.6963, and 0.7407 for detecting mitoses in four types of tumors, respectively.

Discussion

Although CAMs are primarily used for explaining CNN classifications, we demonstrate their power to detect mitoses precisely in H&E images. It is noteworthy that CAMs might fail to highlight all mitoses when there are multiple in an image. To this extent, we suggest that future work consider aligning mitosis locations and CAMs to improve detection performance and explanation quality.

Bibliography

References