Abstract
Background and Aims
Colonoscopy is commonly performed for colorectal cancer screening in the United States.
Reports are often generated in a non-standardized format and are not always integrated
into electronic health records. Thus, this information is not readily available for
streamlining quality management, participating in endoscopy registries, or reporting
of patient- and center-specific risk factors predictive of outcomes. We aim to demonstrate
the use of a new hybrid approach using natural language processing of charts that
have been elucidated with optical character recognition processing (OCR/NLP hybrid)
to obtain relevant clinical information from scanned colonoscopy and pathology reports,
a technology co-developed by Cleveland Clinic and eHealth Technologies (West Henrietta,
NY, USA).
Methods
This was a retrospective study conducted at Cleveland Clinic, Cleveland, Ohio, and
the University of Minnesota, Minneapolis, Minnesota. A randomly sampled list of outpatient
screening colonoscopy procedures and pathology reports was selected. Desired variables
were then collected. Two researchers first manually reviewed the reports for the desired
variables. Then, the OCR/NLP algorithm was used to obtain the same variables from
3 electronic health records in use at our institution: Epic (Verona, Wisc, USA), ProVation
(Minneapolis, Minn, USA) used for endoscopy reporting, and Sunquest PowerPath (Tucson,
Ariz, USA) used for pathology reporting.
Results
Compared with manual data extraction, the accuracy of the hybrid OCR/NLP approach
to detect polyps was 95.8%, adenomas 98.5%, sessile serrated polyps 99.3%, advanced
adenomas 98%, inadequate bowel preparation 98.4%, and failed cecal intubation 99%.
Comparison of the dataset collected via NLP alone with that collected using the hybrid
OCR/NLP approach showed that the accuracy for almost all variables was >99%.
Conclusions
Our study is the first to validate the use of a unique hybrid OCR/NLP technology to
extract desired variables from scanned procedure and pathology reports contained in
image format with an accuracy >95%.
Abbreviations:
ACG (American College of Gastroenterology), ADR (adenoma detection rate), ASGE (American Society for Gastrointestinal Endoscopy), CRC (colorectal cancer), EHR (electronic health record), NLP (natural language processing), OCR (optical character recognition), PPV (positive predictive value), SQL (Structured Query Language), SSP (sessile serrated polyp)To read this article in full you will need to make a payment
Purchase one-time access:
Academic & Personal: 24 hour online accessCorporate R&D Professionals: 24 hour online accessOne-time access price info
- For academic or personal research use, select 'Academic and Personal'
- For corporate R&D use, select 'Corporate R&D Professionals'
Subscribe:
Subscribe to Gastrointestinal EndoscopyAlready a print subscriber? Claim online access
Already an online subscriber? Sign in
Register: Create an account
Institutional Access: Sign in to ScienceDirect
References
- Colorectal cancer screening: estimated future colonoscopy need and current volume and capacity.Cancer. 2016; 122: 2479-2486
- Colonoscopic polypectomy and long-term prevention of colorectal-cancer deaths.N Engl J Med. 2012; 366: 687-696
- Prevention of colorectal cancer by colonoscopic polypectomy. The National Polyp Study Workgroup.N Engl J Med. 1993; 329: 1977-1981
- Long-term colorectal-cancer incidence and mortality after lower endoscopy.N Engl J Med. 2013; 369: 1095-1105
- Adenoma detection rate and risk of colorectal cancer and death.N Engl J Med. 2014; 370: 1298-1306
- Quality indicators for colonoscopy.Am J Gastroenterol. 2015; 110: 72-90
- Natural language processing as an alternative to manual reporting of colonoscopy quality metrics.Gastrointest Endosc. 2015; 82: 512-519
- Are we up to speed?: from big data to rich insights in CV imaging for a hyperconnected world.JACC Cardiovasc Imaging. 2013; 6: 1222-1224
- The inevitable application of big data to health care.JAMA. 2013; 309: 1351-1352
- Epidemiology of angina pectoris: role of natural language processing of the medical record.Am Heart J. 2007; 153: 666-673
- Extraction of echocardiographic data from the electronic medical record is a rapid and efficient method for study of cardiac structure and function.J Clin Bioinform. 2014; 4: 12
- Automated extraction of ejection fraction for quality measurement using regular expressions in Unstructured Information Management Architecture (UIMA) for heart failure.J Am Med Inform Assoc. 2012; 19: 859-866
- Natural language processing and the promise of big data: small step forward, but many miles to go.Circ Cardiovasc Qual Outcomes. 2015; 8: 463-465
- A natural language processing tool for large-scale data extraction from echocardiography reports.PLoS One. 2016; 11e0153749
- Multi-center colonoscopy quality measurement utilizing natural language processing.Am J Gastroenterol. 2015; 110: 543-552
- Anatomic and advanced adenoma detection rates as quality metrics determined via natural language processing.Am J Gastroenterol. 2014; 109: 1844-1849
- Applying a natural language processing tool to electronic health records to assess performance on colonoscopy quality measures.Gastrointest Endosc. 2012; 75: 1233-1239.e14
- Quality indicators for colonoscopy and the risk of interval cancer.N Engl J Med. 2010; 362: 1795-1803
- Natural language processing systems for capturing and standardizing unstructured clinical information: a systematic review.J Biomed Inform. 2017; 73: 14-29
- Provider-specific quality measurement for ERCP using natural language processing.Gastrointest Endosc. 2018; 87: 164-173.e2
- Natural language processing as an alternative to manual reporting of colonoscopy quality metrics.Gastrointest Endosc. 2015; 82: 512-519
- Accurate identification of colonoscopy quality and polyp findings using natural language processing.J Clin Gastroenterol. 2019; 53: e25-e30
Article info
Publication history
Published online: September 02, 2020
Accepted:
August 27,
2020
Received:
September 25,
2018
Footnotes
If you would like to chat with an author of this article, you may contact Dr Rizk at [email protected]
DISCLOSURE: All authors disclosed no financial relationships.
Identification
Copyright
© 2021 by the American Society for Gastrointestinal Endoscopy
ScienceDirect
Access this article on ScienceDirectLinked Article
- Will machines decipher colonoscopy quality from endoscopists’ notes?Gastrointestinal EndoscopyVol. 93Issue 3
- PreviewColonoscopy has been shown to reduce incidence and mortality of colorectal cancer; however, its effectiveness is highly dependent on the quality.1-3 Therefore, it is widely recognized that quality assessment, assurance, and improvement tools in colonoscopy are essential to ensure its effectiveness. There are clearly defined and measurable quality indicators such as the endoscopist’s adenoma detection rate (ADR), rate of adequate bowel preparation, cecal intubation rate, and mean withdrawal time, which have been proved to be associated with important outcomes for patients.
- Full-Text
- Preview