If you don't remember your password, you can reset it by entering your email address and clicking the Reset Password button. You will then receive an email that contains a secure link for resetting your password
If the address matches a valid account an email will be sent to __email__ with instructions for resetting your password
H. H. Chao Comprehensive Digestive Disease Center, Division of Gastroenterology & Hepatology, Department of Medicine, University of California, Irvine, Orange, California, USA
H. H. Chao Comprehensive Digestive Disease Center, Division of Gastroenterology & Hepatology, Department of Medicine, University of California, Irvine, Orange, California, USA
H. H. Chao Comprehensive Digestive Disease Center, Division of Gastroenterology & Hepatology, Department of Medicine, University of California, Irvine, Orange, California, USA
H. H. Chao Comprehensive Digestive Disease Center, Division of Gastroenterology & Hepatology, Department of Medicine, University of California, Irvine, Orange, California, USA
H. H. Chao Comprehensive Digestive Disease Center, Division of Gastroenterology & Hepatology, Department of Medicine, University of California, Irvine, Orange, California, USA
H. H. Chao Comprehensive Digestive Disease Center, Division of Gastroenterology & Hepatology, Department of Medicine, University of California, Irvine, Orange, California, USA
H. H. Chao Comprehensive Digestive Disease Center, Division of Gastroenterology & Hepatology, Department of Medicine, University of California, Irvine, Orange, California, USA
Reprint requests: Jason Samarasena, MD, Division of Gastroenterology, University of California Irvine, 333 City Blvd West, Suite 400, Orange, CA 92868.
Affiliations
H. H. Chao Comprehensive Digestive Disease Center, Division of Gastroenterology & Hepatology, Department of Medicine, University of California, Irvine, Orange, California, USA
The visual detection of early esophageal neoplasia (high-grade dysplasia and T1 cancer) in Barrett’s esophagus (BE) with white-light and virtual chromoendoscopy still remains challenging. The aim of this study was to assess whether a convolutional neural artificial intelligence network can aid in the recognition of early esophageal neoplasia in BE.
Methods
Nine hundred sixteen images from 65 patients of histology-proven early esophageal neoplasia in BE containing high-grade dysplasia or T1 cancer were collected. The area of neoplasia was masked using image annotation software. Nine hundred nineteen control images were collected of BE without high-grade dysplasia. A convolutional neural network (CNN) algorithm was pretrained on ImageNet and then fine-tuned with the goal of providing the correct binary classification of “dysplastic” or “nondysplastic.” We developed an object detection algorithm that drew localization boxes around regions classified as dysplasia.
Results
The CNN analyzed 458 test images (225 dysplasia and 233 nondysplasia) and correctly detected early neoplasia with sensitivity of 96.4%, specificity of 94.2%, and accuracy of 95.4%. With regard to the object detection algorithm for all images in the validation set, the system was able to achieve a mean average precision of .7533 at an intersection over union of .3
Conclusions
In this pilot study, our artificial intelligence model was able to detect early esophageal neoplasia in BE images with high accuracy. In addition, the object detection algorithm was able to draw a localization box around the areas of dysplasia with high precision and at a speed that allows for real-time implementation.
Esophageal cancer is the eighth most common cancer and the sixth leading cause of cancer death worldwide with an estimated incidence of 52,000 cases in 2012.
However, more than 40% of patients with esophageal adenocarcinoma are diagnosed after the disease has metastasized, and the 5-year survival rate is less than 20%.
Current guidelines recommend endoscopic surveillance in patients with BE with random 4-quadrant biopsy specimens obtained every 1 to 2 cm to detect dysplasia,
The American Society for Gastrointestinal Endoscopy PIVI (Preservation and Incorporation of Valuable Endoscopic Innovations) on imaging in Barrett's Esophagus.
As a result, the American Society for Gastrointestinal Endoscopy set a performance threshold for optical diagnosis. That is, if an imaging technology is able to achieve a per-patient sensitivity of 90%, a negative predictive value of 98%, and a specificity of 80% for detecting early esophageal neoplasm,
ASGE Technology Committee ASGE Technology Committee systematic review and meta-analysis assessing the ASGE Preservation and Incorporation of Valuable Endoscopic Innovations thresholds for adopting real-time imaging-assisted endoscopic targeted biopsy during endoscopic surveillance of Barrett's esophagus.
a great number of technologies and imaging enhancements have been studied and developed to help improve detection. Among these are chromoendoscopy, endocytoscopy, probe-based confocal laser–induced endomicroscopy, image-enhanced endoscopy, volumetric laser endomicroscopy, magnification endoscopy, and wide-area transepithelial sampling with computer-assisted 3-dimensional analysis.
Many of these have disadvantages with respect to learning curve, cost, and time issues.
A simple real-time diagnosis support system would be of great help for endoscopists in the detection of Barrett’s dysplasia. Recently, artificial intelligence (AI) using deep learning (DL) with convolutional neural networks (CNNs) has emerged and showed great results in the diagnosis and detection of lesions in the esophagus,
However, no study has been reported on an application of DL for detection of early neoplasia within BE. We conducted a pilot study on the endoscopic detection of early esophageal neoplasia on BE using a DL system showing promising results.
Methods
Upper endoscopy and dysplasia image dataset
We retrospectively reviewed all endoscopic images of patients with early esophageal neoplasia in BE, defined in this case as high-grade dysplasia or T1 stage adenocarcinoma, proven by histology at our institution between January 2016 and November 2018 using the electronic endoscopy database. The endoscope used over this time period was the Olympus 190 series upper endoscope (190 HQ and 190 H; Olympus, Center Valley, Pa, USA). The images were categorized by image quality (excellent, adequate, or poor) and by the following imaging techniques: white-light imaging (WLI), narrow-band imaging (NBI), standard focus, and Near Focus (Olympus). Near focus is a proprietary imaging technique to Olympus endoscopes that enables a change to the focal length of the scope lens such that objects within 2 mm of the endoscope tip are maintained in clear focus. Only excellent and adequate images were used.
Nine hundred sixteen images from 70 patients were retrospectively collected of histology-proven dysplasia (high-grade dysplasia and T1 adenocarcinoma) in BE, and 916 control images from 30 patients were collected of either histology-proven or confocal laser endomicroscopy–proven BE without dysplasia. We did not include cases with low-grade dysplasia for this pilot study. A large diversity of dysplastic lesions was chosen for training in this study to optimize the algorithm. Neoplastic lesions in the image dataset ranged from 3 to 20 mm in size, and most lesions were deemed subtle, such that a lesser-trained endoscopist could miss them.
Annotation of images
All neoplastic lesions in the selected images were annotated independently by 2 expert endoscopists (R.H. and J.S.) who were not blinded to endoscopic findings and pathology. Images were annotated with a single rectangular box using image annotation software (Fig. 1, red box). When multiple neoplastic lesions were noted in an image, multiple rectangular boxes were used. This annotation was used as the ground truth for the training of the automated detection algorithm.
Figure 1White-light endoscopic image of a dysplastic area of Barrett’s esophagus. Red box is expert annotation. Green box is prediction by artificial intelligence algorithm.
We developed and designed a CNN model architecture from 2 primary modules: feature extraction and classification. The base module of our algorithm is responsible for automated feature extraction and borrows from the Inception-ResNet-v2 algorithm developed by Google AI (Mountain View, Calif, USA) that achieves state-of-the-art accuracy on popular image classification benchmarks. The head module of our algorithm is designed for transforming extracted features from the base layers (50 mm + parameters) into a graded scale that allows for pathologic classification. The sigmoid activation function maps the model’s output logits into a float value ranging between 0 and 1. An Nvidia Gtx 1070 GPU (Nvidia, Santa Clara, Calif, USA) was used for algorithm development. Running inference on a frame requires 13 ms (77 fps), essentially real time.
CNN training and validation
CNN training and validation proceeded in 3 stages: preprocessing, training, and inference. In the preprocessing stage, data were formatted for input in the algorithm by resizing images, normalizing pixels, and then using transfer learning to help initiate the base layer imaging weights. The base layer imaging weights were pretrained on ImageNet, which is a large visual database designed for object recognition software. In the training stage, the CNN was trained with Tensorflow, an open-source machine learning framework. Data augmentation provided standardization for training data on the fly, including various image adjustments. To prevent overfitting, we used various regularization techniques including dropout, weight decay (L2 regularization), and data augmentation, that is, horizontal/vertical flips, rotations, scaling, color augmentations, Gaussian noise, and so on. Additionally, to prevent data leakage, we carefully split the training and validation sets such that each set only contained unique images from different patients. This ensured the algorithm metrics were validated on patients who were not part of the training set. In the inference stage, the data produced by the training were used to generate test set predictions. Various augmentations to the test set generated an additional layer of predictions. Predictions were then averaged to create a composite probability and then fed into an Adam optimizer to produce a binary result between 0 to .5 and .5 to 1 for final classification data. Any value ≥.5 was classified as the presence of dysplasia and <.5 as no dysplasia.
Algorithm design
Our algorithm was composed of 2 steps. The first was a binary classification assessing the presence of any neoplastic lesion and/or area on the image. If the binary classification classified the image as containing neoplasia, the second step was object detection (ie, localization of the lesion). For our 2-stage strategy, we used CNN based on Xception architecture for binary classification to quickly flag frames of interest and YOLO v2
The IEEE Conference on Computer Vision and Pattern Recognition.2017; (Available at: https://arxiv.org/abs/1612.08242. Accessed November 22, 2019): 7263-7271
for detection and localization of neoplasia on positively identified frames from the binary classifier. The algorithm was designed to draw a predicted rectangular bounding box over the area of neoplasia (Figs. 1 and 2).
Figure 2Illustration of the deep learning system. CNN, Convolutional neural network; NMS, non-maximum suppression.
Using the algorithm developed by training, we performed internal validation experiments on 458 test images (Supplementary Fig. 1, available online at www.giejournal.org). For binary classification the sensitivity, specificity, and accuracy were calculated per image, per patient, and by WLI, NBI, standard focus, and near focus.
For localization accuracy, we used mean average precision (mAP), a standard metric used in AI. Average precision serves to determine the area under the precision recall curve and carries a value between 0 and 1. To calculate mAP, we need to use the intersection of union, which measures the overlap between prediction and ground truth. We predefined an intersection over union threshold at .3 to classify whether the box prediction by the algorithm was a true positive or a false positive. An intersection over union threshold of .3 was chosen because it was believed that in real-time clinical endoscopy this level of lesion targeting was more than sufficient to aid in the clinical identification of a region of neoplasia during a screening or surveillance BE examination (Fig. 3).
Figure 3Endoscopic images with varying intersections of union (IoUs): A, IoU .3. B, IoU .5. C, IoU .8. In this study, we set the IoU threshold as ≥.3 for true positive on object detection.
All statistical analyses were performed using R software (version 3.3.3; The R Foundation for Statistical Computing, Vienna, Austria). Categorical data were compared with the χ2 and t tests and continuous variables with the Mann-Whitney U test. Differences between variables with P < .05 were considered significant.
Ethics
This study was approved by the Institutional Review Board of the University of California Irvine Medical Center (UCI IRB HS no. 2008-6258).
Results
Binary classification per image
A total of 458 images, 225 images of dysplastic lesions from 26 patients and 233 images of nondysplastic BE from 13 patients, were used for validation. For binary classification (dysplasia vs nondysplasia) sensitivity, specificity, and accuracy per image were 96.4%, 94.2%, and 95.4%, respectively (Fig. 4, Table 1). The sensitivity per image for WLI only was 98.6% and for NBI was 92.4%. The sensitivity per image for standard focus images only was 96.6% and for near focus was 96.2%. NBI specificity (99.2%) was higher than WLI specificity (88.8%) (P = .0007). Near focus specificity (98.4%) was higher than standard focus (89.9%, P = .005).
Figure 4A 2 × 2 table showing results of the artificial intelligence algorithm performance of binary classification of dysplasia versus no dysplasia per image.
Table 1Results of AI binary diagnosis: per-image analysis
Sensitivity (%)
P value
Specificity (%)
P value
AI diagnosis by white-light imaging
98.6 (144/146)
.023
88.8 (95/107)
.0007
AI diagnosis by narrow-band imaging
92.4 (73/79)
99.2 (125/126)
AI diagnosis by standard focus
96.6 (141/146)
.89
89.9 (98/109)
.005
AI diagnosis by near focus
96.2 (76/79)
98.4 (122/124)
Comprehensive AI diagnosis
96.4 (217/225)
94.2 (220/233)
Values in parentheses are n/N. A total of 458 images (dysplastic 225, nondysplastic 233) were used for validation. The number of dysplastic images of white-light imaging, narrow-band imaging, standard focus, and near focus was 146, 79, 146, and 79 and nondysplastic images 107, 126, 109, and 124, respectively. AI, Artificial intelligence.
The CNN correctly diagnosed 24 of 26 patients (92.3%) as having dysplasia. The sensitivities of dysplasia diagnosis per patient by WLI, NBI, false standard focus, and near focus were 94.7%, 91.7%, 95.2%, and 91.7%, respectively (Table 2).
Table 2Results of AI binary diagnosis: per-patient analysis
Sensitivity (%)
P value
AI diagnosis by white-light imaging
94.7 (18/19)
.74
AI diagnosis by narrow-band imaging
91.7 (11/12)
AI diagnosis by standard focus
95.2 (20/21)
.68
AI diagnosis by near focus
91.7 (11/12)
Comprehensive AI diagnosis
92.3 (24/26)
Values in parentheses are n/N. Validation set includes images of dysplasia from 26 patients in total. Images of white-light imaging, narrow-band imaging, standard focus, and near focus came from 19, 12, 21, and 12 patients, respectively. AI, Artificial intelligence.
False-negative and -positive lesions on binary classification
Eight of 225 dysplastic images (3.6%) with 4 patients showed a false negative in the binary classification. Six of 8 false-negative images were from 2 patients but had another true-positive image and therefore did not affect per-patient sensitivity. Two of 8 false-negative images from 2 patients had no true-positive images. All false negatives were small subtle lesions less than 5 mm in size (Fig. 5). For most false-positive lesions, it appears that the algorithm detected an area of nodularity and/or raised tissue as dysplastic.
Figure 5Two cases with negative binary result. Both lesions showed high-grade dysplasia in histopathologic examination. A, A 4-mm nodular lesion from buried Barrett’s esophagus. B, A 4-mm slightly depressed lesion.
In our validation set with an intersection of union at .3, overall mAP was .7533. mAP for NBI only was higher at .802, and mAP for near focus images was .819. The overall sensitivity (ie, recall) was 95.6%. The sensitivity for WLI was 94.1%, NBI 98.2%, standard focus 96.8%, and near focus 97.5%. The overall positive predictive value (ie, precision) was 89.2%. The positive predictive value for WLI was 87.8%, NBI 92.1%, standard focus 86.3%, and near focus 96.3% (Table 3).
Table 3Results of AI object detection for binary positive image: per-image analysis
Sensitivity (%)
P value
Positive predictive value (%)
P value
AI diagnosis by white-light imaging
94.1 (318/338)
.018
87.8 (318/362)
.14
AI diagnosis by narrow-band imaging
98.2 (164/166)
92.1 (164/178)
AI diagnosis by standard focus
96.8 (328/346)
.24
86.3 (328/380)
.0003
AI diagnosis by near focus
97.5 (154/158)
96.3 (154/160)
Comprehensive AI diagnosis
95.6 (482/504)
89.2 (482/540)
Values in parentheses are n/N. AI, Artificial intelligence.
On GPU gtx1070 setting, the binary classifier runs at around 72 frames per s (299x299 pixel image size). The YOLO v2 runs at around 45 frames per s (416 x 416p x image size). Video 1 (available online at www.giejournal.org) demonstrates how this algorithm could work in real time.
Discussion
Our pilot study for detection of early esophageal neoplasm in BE using CNN demonstrated a high sensitivity (96.4%), specificity (94.2%), and accuracy (95.4%) per image for binary classification (dysplasia vs no dysplasia) on an internal validation study. Our localization algorithm also detected most lesions correctly with a mAP of .7533, sensitivity of 95.6%, and positive predictive value of 89.2%.
DL for the detection of lesions in the GI tract is rapidly emerging.
The IEEE Conference on Computer Vision and Pattern Recognition.2017; (Available at: https://arxiv.org/abs/1612.08242. Accessed November 22, 2019): 7263-7271
reported a study of automated detection of early esophageal neoplasia in BE using machine learning with a hand-crafted algorithm. They used 100 endoscopic images from 44 patients with a sensitivity of 83% and a specificity of 83%.
They also published a prospective pilot study of this algorithm with promising data. However, this algorithm was not fast enough to be used in real time. Also reported was the application of AI for volumetric laser endomicroscopy, resulting in a sensitivity and specificity of 90% and 93%.
However, volumetric laser endomicroscopy requires time and equipment. A Japanese group recently reported a pilot study using DL with CNN on detection mainly of early squamous cell carcinoma with a sensitivity of 98%.
used computer-aided diagnosis in the evaluation of early esophageal adenocarcinoma in BE. This system showed sensitivity and specificity of 97% and 88% and was trained on only 148 images, significantly less than our training set. In a follow-up study piloting this system in real time, the same group was able to assess the system in 14 patients with promising results, and their system was trained on 129 images.
The speed of the algorithm prediction required pausing the live video display to make a prediction. Our current prototype as seen in Video 1 does not require pausing and also does not require a second monitor to display the user interface as in the system shown in their publication.
To improve the impact of endoscopic surveillance of BE, a real-time AI video overlay assisting an endoscopist detect neoplastic lesions is certainly desirable. Our group has already achieved a real-time video overlay with colonoscopy algorithms including polyp detection and optical pathology and believe this is certainly feasible.
The advantage of real-time AI over other technologies aimed to improve dysplasia detection is that the learning curve is negligible. Ideally, the system would simply flag areas of concern for the endoscopist to interrogate further with more detailed imaging or a biopsy sampling. An algorithm, well trained by expert endoscopists, has the potential to elevate the image interpretation skill set of an endoscopist to a level near that of an imaging expert. If an AI algorithm were able to achieve American Society for Gastrointestinal Endoscopy Preservation and Incorporation of Valuable Endoscopic Innovations threshold for the detection of early esophageal neoplasia, it could decrease the number of unnecessary biopsy samplings performed during surveillance examinations.
There were performance differences on binary classification with specific imaging modalities. NBI showed better sensitivity and specificity than WLI, and near focus showed better specificity than standard focus. Although the effect of NBI on detection of neoplasia is controversial,
our study suggests that NBI and near focus may alter tissue imaging characteristics and detail in such a way that subtle imaging differences between dysplastic and nondysplastic tissue are enhanced and are better differentiated by the algorithm.
Because overfitting and spectrum bias can cause overestimation of accuracy, we were extremely careful in selecting a highly diverse group of images from each patient. Furthermore, images of specific patients used in the training set were not used in the validation set. Rectangular polygons for annotation and prediction were chosen to minimize processing power such that the algorithm can make predications at >40 frames per s and real-time evaluation is possible (as evidenced in Video 1). With more complex polygons with independent corners and/or vertices there becomes a significantly increased level of processing power needed, thereby slowing down the frame rate for prediction and diminishing the utility in real time.
Performance detection depends on the neural network architecture used. Ghatwary et al
compared several DL object detection methods to identify esophageal adenocarcinoma using 100 WL images from 39 patients and concluded that single-shot multibox detector methods outperformed regional-based CNN, fast regional-based CNN, and faster regional-based CNN with a sensitivity of .90, specificity of .88, precision of .70, and recall of .79.
We chose to use YOLO v2, which is similar to the single-shot multibox detector but has a faster frames per second prediction to use this algorithm in real time.
Our study has limitations. First, this study was a single-center retrospective study and lacked external validation. That is, although this algorithm worked very well on our validation set of images, it was not tested on endoscopic images from other centers or other scope manufacturers, and this is a future goal for this project. Second, the number of early esophageal neoplasia in BE was not large.
In summary, this is among the first studies in which AI using CNNs was examined for detection of early esophageal neoplasm in BE. Although this is a pilot study, the results were very promising. The speed of the algorithm is fast enough to enable true real-time detection, and our results suggest a real-time AI detection system for dysplasia in BE can likely be achieved in the near future.
Appendix
Supplementary Figure 1Study flow of this study. CNN, Convolutional neural network.
The American Society for Gastrointestinal Endoscopy PIVI (Preservation and Incorporation of Valuable Endoscopic Innovations) on imaging in Barrett's Esophagus.
ASGE Technology Committee systematic review and meta-analysis assessing the ASGE Preservation and Incorporation of Valuable Endoscopic Innovations thresholds for adopting real-time imaging-assisted endoscopic targeted biopsy during endoscopic surveillance of Barrett's esophagus.
The IEEE Conference on Computer Vision and Pattern Recognition.2017; (Available at: https://arxiv.org/abs/1612.08242. Accessed November 22, 2019): 7263-7271
DISCLOSURE: The following authors disclosed financial relationships: J. Requa: Employee of Docbot Inc. D. Tyler, A. Ninh, W. E. Karnes: Co-founder and equity holder of Docbot Inc. K. J. Chang: Consultant for Olympus, Cook Medical, Medtronics, Endogastric Solution, Erbe, Apollo, Mederi, Ovesco, Mauna Kea, and Pentax. J. Samarasena: Co-founder and equity holder of Docbot Inc; consultant for Medtronic, Olympus, US Endoscopy, Mauna Kea, Motus, and Pentax. All other authors disclosed no financial relationships.
If you would like to chat with an author of this article, you may contact Dr Samarasena at [email protected]