Original article Clinical endoscopy| Volume 97, ISSUE 1, P121-129.e1, January 2023

Validation of a natural language processing algorithm to identify adenomas and measure adenoma detection rates across a health system: a population-level study

      Background and Aims

      Measuring adenoma detection rates (ADRs) at the population level is challenging because pathology reports are often reported in an unstructured format; further, there is significant variation in reporting methods across institutions. Natural language processing (NLP) can be used to extract relevant information from text-based records. We aimed to develop and validate an NLP algorithm to identify colorectal adenomas that could be used to report ADR at the population level in Ontario, Canada.


      The sampling frame included pathology reports from all colonoscopies performed in Ontario in 2015 and 2016. Two random samples of 450 and 1000 reports were selected as the training and validation sets, respectively. Expert clinicians reviewed and classified reports as adenoma or other. The training set was used to develop an NLP algorithm (to identify adenomas) that was evaluated using the validation set. The NLP algorithm test characteristics were calculated using expert review as the reference. We used the algorithm to measure ADR for all endoscopists in Ontario in 2019.


      The 1450 pathology reports were derived from 62 laboratories, 266 pathologists, and 532 endoscopists. In the training set, the NLP algorithm for any adenoma had a sensitivity of 99.60% (95% confidence interval (CI), 97.77-99.99), specificity of 99.01% (95% CI, 96.49-99.88), positive predictive value of 99.19% (95% CI, 97.12-99.90), and F1 score of .99. Similar results were obtained for the validation set. The median ADR was 33% (interquartile range, 26%-40%).


      When we used a population-based sample from Ontario, our NLP algorithm was highly accurate and was used at the system level to measure ADR.


      ADR (adenoma detection rate), CI (confidence interval), CCO (Cancer Care Ontario), eMaRC (Electronic Mapping Reporting and Coding), NLP (natural language processing), NPV (negative predictive value), OHIP (Ontario Health Insurance Plan), PPV (positive predictive value)
      To read this article in full you will need to make a payment

      Purchase one-time access:

      Academic & Personal: 24 hour online accessCorporate R&D Professionals: 24 hour online access
      One-time access price info
      • For academic or personal research use, select 'Academic and Personal'
      • For corporate R&D use, select 'Corporate R&D Professionals'


      Subscribe to Gastrointestinal Endoscopy
      Already a print subscriber? Claim online access
      Already an online subscriber? Sign in
      Institutional Access: Sign in to ScienceDirect


        • Tinmouth J.
        • Kennedy E.B.
        • Baron D.
        • et al.
        Colonoscopy quality assurance in Ontario: systematic review and clinical practice guideline.
        Can J Gastroenterol Hepatol. 2014; 28: 251-274
        • Corley D.A.
        • Jensen C.D.
        • Marks A.R.
        • et al.
        Adenoma detection rate and risk of colorectal cancer and death.
        N Engl J Med. 2014; 370: 1298-1306
        • Kaminski M.F.
        • Regula J.
        • Kraszewska E.
        • et al.
        Quality indicators for colonoscopy and the risk of interval cancer.
        N Engl J Med. 2010; 362: 1795-1803
        • Kaminski M.F.
        • Wieszczy P.
        • Rupinski M.
        • et al.
        Increased rate of adenoma detection associates with reduced risk of colorectal cancer and death.
        Gastroenterology. 2017; 153: 98-105
        • Valori R.
        • Rey J.
        • Atkin W.
        • et al.
        European guidelines for quality assurance in colorectal cancer screening and diagnosis. First edition—quality assurance in endoscopy in colorectal cancer screening and diagnosis.
        Endoscopy. 2012; 44: SE88-SE108
        • Rex D.K.
        • Schoenfeld P.S.
        • Cohen J.
        • et al.
        Quality indicators for colonoscopy.
        Gastrointest Endosc. 2015; 81: 31-53
        • National Cancer Screening Service
        Guidelines for quality assurance in colorectal screening.
        2nd ed. Ireland. 2017 (Available at: Accessed August 17, 2022)
        • Mehrotra A.
        • Dellon E.S.
        • Schoen R.E.
        • et al.
        Applying a natural language processing tool to electronic health records to assess performance on colonoscopy quality measures.
        Gastrointest Endosc. 2012; 75: 1-12
        • Imler T.D.
        • Morea J.
        • Kahi C.
        • et al.
        Natural language processing accurately categorizes findings from colonoscopy and pathology reports.
        Clin Gastroenterol Hepatol. 2013; 11: 689-694
        • Lee J.K.
        • Jensen C.D.
        • Levin T.R.
        • et al.
        Accurate identification of colonoscopy quality and polyp findings using natural language processing.
        J Clin Gastroenterol. 2019; 531: e25-e30
        • Fevrier H.B.
        • Liu L.
        • Herrinton L.J.
        • et al.
        A transparent and adaptable method to extract colonoscopy and pathology data using natural language processing.
        J Med Syst. 2020; 44: 151
        • Parthasarathy G.
        • Lopez R.
        • McMichael J.
        • et al.
        A natural language-based tool for diagnosis of serrated polyposis syndrome.
        Gastrointest Endosc. 2020; 92: 886-890
        • Nayor J.
        • Borges L.F.
        • Goryachev S.
        • et al.
        Natural language processing accurately calculates adenoma and sessile serrated polyp detection rates.
        Dig Dis Sci. 2018; 63: 1794-1800
        • Raju G.S.
        • Lum P.
        • Slack R.S.
        • et al.
        Natural language processing as an alternative to manual reporting of colonoscopy quality metrics.
        Gastrointest Endosc. 2015; 82: 512-519
        • Gawron A.J.
        • Thompson W.K.
        • Keswani R.N.
        • et al.
        Anatomic and advanced adenoma detection rates as quality metrics determined via natural language processing.
        Am J Gastroenterol. 2014; 109: 1844-1849
        • Imler T.D.
        • Morea J.
        • Kahi C.
        • et al.
        Multi-center colonoscopy quality measurement utilizing natural language processing.
        Am J Gastroenterol. 2015; 110: 543-552
        • Gawron A.J.
        • Yao Y.
        • Thompson W.
        • et al.
        National US healthcare system web-based colonoscopy quality report card: accurate, usable, and robust [abstract].
        Gastrointest Endosc. 2018; 87: AB114-5
        • Harkema H.
        • Chapman W.W.
        • Saul M.
        • et al.
        Developing a natural language processing application for measuring the quality of colonoscopy procedures.
        J Am Med Inform Assoc. 2011; 18: 150-156
        • Tinmouth J.
        • Sutradhar R.
        • Liu N.
        • et al.
        Validation of 5 key colonoscopy-related data elements from Ontario health administrative databases compared to the clinical record: a cross-sectional study.
        CMAJ Open. 2018; 6: E330-E338
      1. Quality Management Partnership. Advancing quality: progress on key priorities in colonoscopy, mammography and pathology. Toronto: 2018. Available at: Accessed November 12, 2018.

      2. Klein WT, Havener L (eds). Standards for Cancer Registries Volume V: Pathology Laboratory Electronic Reporting, Version 4.0. Springfield, IL: North American Association of Central Cancer Registries, Inc, April 2011. Available at: Accessed November 12, 2018.

        • Centers for Disease Control and Prevention
        Registry Plus eMarc Plus.
        (Available at:) (Accessed November 12, 2018)
        • Meystre S.
        • Hag J.
        Natural language processing to extract medical problems from electronic clinical documents: performance evaluation.
        J Biomed Inform. 2006; 39: 589-599
        • Jones S.R.
        • Carley S.
        • Harrison M.
        An introduction to power and sample size estimation the importance of power and sample size estimation for study design and analysis.
        Emerg Med J. 2003; 20: 453-458
      3. Institute for Quality Management in Healthcare.
        Accredited facilities. 2018; (Available at:) (Accessed November 12, 2019)
        • Driman D.K.
        • Marcus V.A.
        • Hilsden R.J.
        • et al.
        Pathological reporting of colorectal polyps: pan-Canadian consensus guidelines.
        Can J Pathol. 2012; 4: 81-90
        • Fleming M.
        • Ravula S.
        • Tatishchev S.F.
        • et al.
        Colorectal carcinoma: pathologic aspects.
        J Gastrointest Oncol. 2012; 3: 153-173
        • Geramizadeh B.
        • Marzban M.
        • Owen D.A.
        Malignant colorectal polyps; pathological consideration (a review).
        Iran J Pathol. 2017; 12: 1-8
        • Salmo E.
        • Haboubi N.
        Adenoma and malignant colorectal polyp: pathological considerations and clinical applications.
        EMJ Gastroenterol. 2018; 7: 92-102
        • Lieberman D.A.
        • Rex D.K.
        • Winawer S.J.
        • et al.
        Guidelines for colonoscopy surveillance after screening and polypectomy: a consensus update by the US Multi-Society Task Force on Colorectal Cancer.
        Gastroenterology. 2012; 143: 844-857
        • Hassan C.
        • Quintero E.
        • Dumonceau J.M.
        • et al.
        Post-polypectomy colonoscopy surveillance: European Society of Gastrointestinal Endoscopy (ESGE) guideline.
        Endoscopy. 2013; 45: 842-851
        • Leddin D.
        • Enns R.
        • Hilsden R.
        • et al.
        Colorectal cancer surveillance after index colonoscopy: guidance from the Canadian Association of Gastroenterology.
        Can J Gastroenterol. 2013; 27: 224-228