Fully Automated Segmentation of Head CT Neuroanatomy Using Deep Learning

Jason Cai, Kenneth Philbrick, Zeynettin Akkus, Bradley Erickson

Radiology Informatics Lab, Mayo Clinic, Rochester MN

 

[GitHub Download]

 

 

 

 

Abstract

 

Semantic segmentation of the brain on CT can assist in diagnosis (1-7) and treatment planning (8,9). We present a 2D U-Net that simultaneously segments 16 intracranial structures from head CT. Our model generalized to external scans from the RSNA Hemorrhage Detection Challenge (10), as well as scans demonstrating idiopathic normal pressure hydrocephalus (iNPH). Overall Dice coefficients were comparable to expert annotations and higher that of existing segmentation methods. Although the training dataset consisted of noncontrast studies, our model handled contrast-enhanced studies equally well upon visual inspection. Developers can leverage transfer learning and fine-tuning to further optimize the model for their specific needs.

 

 
 
                                        
 
 

Dataset

 

Primary Dataset (Training, Validation, and Testing)

  · 62 normal non-contrast head CTs. Mean patient age: 73.4 years old (range: 27-95).

  · Training: 40 volumes; validation: 10 volumes; testing: 12 volumes.

 

Secondary Datasets (Testing Only)

   · 12 non-contrast head CTs demonstrating iNPH. Mean patient age: 74.3 years old (range: 60-84).

   · 30 normal non-contrast head CTs from the RSNA Hemorrhage Detection Challenge (10).

 

Dataset annotation

All slices were annotated using RIL-Contour (11) by a team of trained analysts and supervised by a neuroradiologist.

Ground Truth Masks: From each test volume, 3 observers segmented the same 5 slices independently (capturing all 16 structures). From these annotations, a set of multi-rater consensus labels was constructed using STAPLE (12).

 

To protect patient privacy, the datasets are not available for download.

 

DICOM information

 

 

Primary Dataset
(Normal Scans)

Secondary Dataset
(iNPH Scans)

 

Training & Validation
(40+10 volumes)

Test (12 volumes)

Test (12 volumes)

Manufacturer and model

 

 

 

·GE Discovery CT750 HD

11

5

-

·GE Optima CT660

2

1

-

·Siemens Sensation 64

4

-

3

·Siemens SOMATOM Definition AS

1

-

-

·Siemens SOMATOM Definition Edge

2

-

5

·Siemens SOMATOM Definition Flash

27

6

3

·Siemens SOMATOM Force

1

-

-

·Toshiba Aquilion

1

-

-

·Toshiba Aquilion Prime SP

1

-

-

·Toshiba Aquilion ONE

-

-

1

Slice Thickness (mm)

 

 

 

·1.5

1

-

-

·3

1

-

1

·3.75

12

6

-

·4

20

6

-

·5

16

-

11

Tube Voltage (kVp)

120 (all scans)

120 (all scans)

120 (all scans)

Tube Current (mA)

Mean: 334;
Range: 150-570

Mean: 400;
Range: 339-500

Mean: 213;
Range: 150-368

Pixel Spacing (mm)

Mean: 0.46;
Range: 0.39-0.49

Mean: 0.43;
Range: 0.38-0.45

Mean: 0.48;
Range: 0.43-0.51

Image Dimensions

512x512 (all scans)

512x512 (all scans)

512x512 (all scans)

 

 
 
                                         
 
 

Model (13)

 
                                        
 
 
                                         
 
 

Sample Images

 

Sample images from the primary test dataset (click to enlarge).

 

Sample images from the iNPH dataset (click to enlarge).

In our paper, the iNPH dataset was used solely for testing. We additionally trained our model on these examinations and made its weights available separately on our GitHub page.

 

Test dataset workflow (click to enlarge): DICE - Dice coefficients, reported below; VOL - Differences in structure volume, reported in our paper.

 
 
 
                                         
 
 

Results

 

Box and whisker plot comparing the model's predictions with the observers' annotations, using Ground Truth Masks as a reference (click to enlarge).

Ground Truth Masks: From each test volume, 3 observers segmented the same 5 slices independently (capturing all 16 structures, n=59 total slices). From these annotations, a set of multi-rater consensus labels was constructed using STAPLE (12).
Statistical analysis: categorical linear regression with p<0.05.
Red asterisk: observer has higher Dice coefficients as compared to the model (p<0.05).
Green asterisk: observer has lower Dice coefficients as compared to the model (p<0.05).
Blue asterisk: observer has higher Dice coefficients as compared to the model (p<0.05). However, the Ground Truth Masks contain a large number of slices that included the boundary between the temporal and parietal lobes, which is defined by an arbitrary straight line in the sagittal plane posteriorly. Overall Dice coefficients for the parietal and temporal lobes were higher when the model was evaluated on full volumes from the primary test dataset (see chart below).

 

Box and whisker plot comparing the model's performance between the primary and secondary test datasets (click to enlarge).

Red Boxes: Dice Coefficients of the model on the primary test dataset vs. Dice coefficients of the model on the iNPH dataset. Results from these datasets were analyzed together because both contained fully annotated volumes.
Blue Boxes: Dice Coefficients of the model on the Ground Truth Masks vs. Dice coefficients of the model on the RSNA dataset. Results from these datasets were analyzed together because both contained 5 annotated slices per volume representing all 16 structures.
Red asterisk: lower Dice coefficients as compared to the primary test dataset or Ground Truth Masks (p<0.05).
Green asterisk: higher Dice coefficients as compared to the primary test dataset or Ground Truth Masks (p<0.05).

 

Note: The model could not consistently identify the central sulcus in iNPH patients because ventricular enlargement severely distorted its appearance. Two volumes were excluded because the central sulcus could not be identified manually as well.

 
 
                                          
 
 

Citation

 

JC Cai, Z Akkus, KA Philbrick, A Boonrod, S Hoodeshenas, AD Weston, P Rouzrokh, GM Conte, A Zeinoddini, DC Vogelsang, Q Huang, BJ Erickson
“Fully Automated Segmentation of Neuroanatomy on Head CT Using Deep Learning”
Radiol Artif Intell. 2020 Sep; 2(5):e190183. https://doi.org/10.1148/ryai.2020190183

Click here to download citation data.

 

References

 

1. Frisoni GB, Geroldi C, Beltramello A, Bianchetti A, Binetti G, Bordiga G, et al. Radial Width of the Temporal Horn: A Sensitive Measure in Alzheimer Disease. Am J Neuroradiol [Internet]. 2002;23(1):35. Available from: http://www.ajnr.org/content/23/1/35.abstract
2. Diprose William K, Diprose James P, Wang Michael TM, Tarr Gregory P, McFetridge A, Barber PA. Automated Measurement of Cerebral Atrophy and Outcome in Endovascular Thrombectomy. Stroke [Internet]. 2019;50(12):3636–8. Available from: https://doi.org/10.1161/STROKEAHA.119.027120
3. Anderson RC, Grant Jj Fau - de la Paz R, de la Paz R Fau - Frucht S, Frucht S Fau - Goodman RR, Goodman RR, Neurosurg J. Volumetric measurements in the detection of reduced ventricular volume in patients with normal-pressure hydrocephalus whose clinical condition improved after ventriculoperitoneal shunt placement. (0022-3085 (Print)).
4. Toma AK, Holl E, Kitchen ND, Watkins LD. Evans’ Index Revisited: The Need for an Alternative in Normal Pressure Hydrocephalus. Neurosurgery [Internet]. 2011;68(4):939–44. Available from: https://doi.org/10.1227/NEU.0b013e318208f5e0
5. Relkin N, Marmarou A, Klinge P, Bergsneider M, Black PM. Diagnosing Idiopathic Normal- pressure Hydrocephalus. Neurosurgery [Internet]. 2005 Sep 1;57(suppl_3):S2-4-S2-16. Available from: https://doi.org/10.1227/01.NEU.0000168185.29659.C5
6. Kauw F, Bennink E, de Jong HWAM, Kappelle LJ, Horsch AD, Velthuis BK, et al. Intracranial Cerebrospinal Fluid Volume as a Predictor of Malignant Middle Cerebral Artery Infarction. Stroke [Internet]. 2019 Jun [cited 2020 Feb 19];50(6):1437–43. Available from: https://www.ahajournals.org/doi/10.1161/STROKEAHA.119.024882
7. Takahashi N, Shinohara Y, Kinoshita T, Ohmura T, Matsubara K, Lee Y, et al. Computerized identification of early ischemic changes in acute stroke in noncontrast CT using deep learning [Internet]. Vol. 10950, SPIE Medical Imaging. SPIE; 2019. Available from: https://doi.org/10.1117/12.2507351
8. Fritscher KD, Peroni M, Zaffino P, Spadea MF, Schubert R, Sharp G. Automatic segmentation of head and neck CT images for radiotherapy treatment planning using multiple atlases, statistical appearance models, and geodesic active contours. Med Phys [Internet]. 2014;41(5):51910. Available from: https://aapm.onlinelibrary.wiley.com/doi/abs/10.1118/1.4871623
9. Golby AJ. Image-Guided Neurosurgery [Internet]. Elsevier Science; 2015. 536 p. Available from: https://books.google.com/books?id=2v7IBAAAQBAJ
10. AI challenge [Internet]. [cited 2020 Apr 4]. Available from: https://www.rsna.org/en/education/ai-resources-and-training/ai-image-challenge
11. Warfield SK, Zou KH, Wells WM (2004). “Simultaneous truth and performance level estimation (STAPLE): an algorithm for the validation of image segmentation.” IEEE transactions on medical imaging, 23(7), 903–921.
12. Philbrick KA, Weston AD, Akkus Z, Kline TL, Korfiatis P, Sakinis T, et al. RIL-Contour: a Medical Imaging Dataset Annotation Tool for and with Deep Learning. J Digit Imaging. 2019/05/16. 2019;32(4):571–81.
13. Ronneberger O, Fischer P, Brox T. U-Net: Convolutional Networks for Biomedical Image Segmentation [Internet]. arXiv e-prints. 2015. Available from: https://ui.adsabs.harvard.edu/abs/2015arXiv150504597R