Advertisement

Artificial intelligence for disease diagnosis: the criterion standard challenge

Open AccessPublished:April 27, 2022DOI:https://doi.org/10.1016/j.gie.2022.04.057

      Abbreviations:

      AI (artificial intelligence), CADe (computer-aided detection), CADx (computer-aided diagnosis)
      Artificial intelligence (AI) for disease diagnosis is a hot topic. Recent studies have shown that AI tools for computer-aided detection (CADe) of polyps during colonoscopy increases the adenoma detection rate from 19% to 29% with a 50% relative increase.
      • Barua I.
      • Vinsard D.G.
      • Jodal H.C.
      • et al.
      Artificial intelligence for polyp detection during colonoscopy: a systematic review and meta-analysis.
      The increased detection and removal of the adenomas may lead to increased benefits of screening on the incidence and mortality of colorectal cancer.
      • Mori Y.
      • Bretthauer M.
      • Kalager M.
      Hopes and hypes for artificial intelligence in colorectal cancer screening.
      Despite the promising benefits of AI-aided detection, the new tools have downsides. AI-guided detection may increase polyp overdiagnosis and overtreatment. Of particular concern is that current CADe tools mainly increase the detection of small polyps, some of which are not neoplastic and thus do not need removal.
      • Wang P.
      • Liu X.
      • Berzin T.M.
      • et al.
      Effect of a deep-learning computer-aided detection system on adenoma detection during colonoscopy (CADe-DB trial): a double-blind randomised study.
      ,
      • Wang P.
      • Berzin T.M.
      • Glissen Brown J.R.
      • et al.
      Real-time automatic detection system increases colonoscopic polyp and adenoma detection rates: a prospective randomised controlled study.
      Increased detection of such polyps may increase patient burden, risk of adverse events, and costs by histopathologic assessment of additional polyps unless the endoscopists are well experienced with optical diagnosis of polyps.
      Computer-aided diagnosis (CADx) tools for the real-time diagnosis of polyp histologic features provides a solution to overcome this challenge. Recent evidence suggests that CADx tools are expected to reduce the number of polyp removals by identifying nonneoplastic polyps.

      Barua I, Wieszczy P, Kudo S, et al. Real-time AI-based optical diagnosis of neoplastic polyps during colonoscopy. NEJM Evid. Epub 2022 Apr 13.

      However, further development of CADx tools is hampered by a critical challenge, namely, unreliable pathologic diagnoses. The development of CADx needs accurate pathologic diagnoses of polyps that are used as criterion standard diagnoses for machine learning.

      Barua I, Wieszczy P, Kudo S, et al. Real-time AI-based optical diagnosis of neoplastic polyps during colonoscopy. NEJM Evid. Epub 2022 Apr 13.

      However, these pathologic diagnoses are subject to significant interobserver variability among pathologists (κ coefficient range 0.4-0.7).
      • Vennelaganti S.
      • Cuatrecasas M.
      • Vennalaganti P.
      • et al.
      Interobserver agreement among pathologists in the differentiation of sessile serrated from hyperplastic polyps.
      • Mollasharifi T.
      • Ahadi M.
      • Jamali E.
      • et al.
      Interobserver agreement in assessing dysplasia in colorectal adenomatous polyps: a multicentric Iranian study.

      Lee M, Kudose S, Del Portillo A, et al. Invasive carcinoma versus pseudoinvasion: interobserver variability in the assessment of left-sided colorectal polypectomies. J Clin Pathol. Epub 2021 Apr 12.

      This precludes a correct labeling of polyp images to feed the computer. Thus, it is difficult to create a reliable CADx system in colonoscopy because the performance of the developed CADx tools depends on the accuracy of the histopathologic diagnosis.
      Technical developers of AI tools in colonoscopy are unlikely to become aware of the challenges to their criterion standard and take pathologists’ diagnoses at face value. As more advanced multiclass CADx tools are developed, this problem will get worse. Actually, the multiclass CADx tools are expected to allow highly personalized medicine; a CADx tool that provides information on multiclass histologic features (eg, nonneoplastic vs low-grade adenoma vs high-grade adenoma vs sessile serrated lesion vs invasive cancer) would help endoscopists determine appropriate treatment options (nonremoval, endoscopic treatment, or surgery) and predict surveillance interval even without making any histologic examination. Therefore, overcoming the above-mentioned challenge is a prerequisite to improve the quality of colonoscopy practice with the aid of AI.
      Again, this hurdle for the reliable development of CADx tools in colonoscopy is not widely recognized in the research community (to our knowledge, there have been no studies in the AI-colonoscopy field that elaborate on how to manage unreliable criterion standards). Some may choose to sidestep this problem by discarding images with uncertain pathologic diagnoses. But who decides what is uncertain? Any intentional selection of images for machine learning will reduce the generalizability of the developed CADx tools (a so-called overfitted model) and thus is not a promising way out.
      So, what is the solution? We suggest the use of so-called semisupervised learning when advanced CADx tools in medicine are developed. Supervised learning is learning with labeled training data with only reliable histologic information about all polyps used, and unsupervised learning uses polyps with no requirement of reliably labeled polyp diagnosis. As the term indicates, semisupervised learning for AI development includes both supervised and unsupervised learning.
      • Anand D.
      • Yashashwi K.
      • Kumar N.
      • et al.
      Weakly supervised learning on unannotated H&E-stained slides predicts BRAF mutation in thyroid cancer with high accuracy.
      The concept of semisupervised learning is well suited to the discouraging situation we face, where colonoscopy images cannot be labeled properly owing to uncertain pathologic diagnoses. With unsupervised learning, all images with uncertain diagnoses can be classified into meaningful clusters on the basis of polyp appearance instead of being taken away.
      Even though there are wide variations in pathologic diagnoses, there are “true” biologically unique features for each pathologic polyp category (such as genetic features and potential to become malignant). For example, one study suggested that the endoscopic appearance of sessile serrated lesions was closely associated with the extent of DNA methylation.
      • Kimura T.
      • Yamamoto E.
      • Yamano H.O.
      • et al.
      A novel pit pattern identifies the precursor of colorectal cancer derived from sessile serrated adenoma.
      Recent evidence suggests that these biologic features may be equivalent to the morphologic appearance of each polyp during colonoscopy. The endoscopic diagnosis of small polyps by multiple expert endoscopists was reported to be closer to the true nature of the polyp than a single pathologist’s diagnosis.
      • Ponugoti P.
      • Rastogi A.
      • Kaltenbach T.
      • et al.
      Disagreement between high confidence endoscopic adenoma prediction and histopathological diagnosis in colonic lesions ≤ 3 mm in size.
      Another study also mentioned that interobserver agreement among expert endoscopists was quite high (κ value between 0.716 and 0.862) even for advanced lesions.
      • Huang Q.
      • Fukami N.
      • Kashida H.
      • et al.
      Interobserver and intra-observer consistency in the endoscopic assessment of colonic pit patterns.
      Semisupervised learning to develop reliable CADx tools for polyps in colorectal cancer screening involves the following steps:
      • 1.
        Two (or more) expert endoscopists review endoscopic images and corresponding pathologic diagnosis of each polyp used in the training stages of the machine learning approach.
      • 2.
        When all evaluators agree on the preassigned pathologic diagnosis of a lesion, the images are handled as the “labeled” data. If there is disagreement about a pathologic diagnosis, the pathologic diagnosis is removed from the images. These images are handled as “unlabeled” data.
      • 3.
        In semisupervised learning, unlabeled data are automatically classified into clusters with use of the information obtained from labeled data and similarity of the colonoscopy images between each data (Figure 1).
        Figure thumbnail gr1
        Figure 1Concept of semisupervised learning for endoscopic images with uncertain pathologic diagnoses. Unlabeled data (images without reliable pathologic diagnoses) are automatically classified into clusters with use of the information obtained from labeled data (images with reliable pathologic diagnoses) and similarity of the images between each data.
      Another possible solution is to improve the reliability of the criterion standard, namely, pathologic diagnoses, with the aid of AI. A recent study showed that the use of AI improved the accuracy of 4-class differentiation of pathologic diagnoses from 74% to 81%.
      • Nasir-Moin M.
      • Suriawinata A.A.
      • Ren B.
      • et al.
      Evaluation of an artificial intelligence-augmented digital system for histologic classification of colorectal polyps.
      The combined use of semisupervised learning and AI-aided pathologic diagnosis may be the best solution to overcome this problem.
      A limitation of our proposed approach is that it creates a “virtual” criterion standard that is different from current standard histopathologic assessment. However, given the poor agreement among pathologists for polyp diagnosis, new disruptive methods are needed to develop advanced CADx tools in colonoscopy. The concept is also applicable to other cancer screening tests such as CADx for mammography and CADx for smear cytology, where the same problem applies in relation to variations in pathologic diagnoses.
      • Stoler M.H.
      • Schiffman M.
      Interobserver reproducibility of cervical cytologic and histologic interpretations: realistic estimates from the ASCUS-LSIL Triage Study.
      ,
      • van Seijen M.
      • Jóźwiak K.
      • Pinder S.E.
      • et al.
      Variability in grading of ductal carcinoma in situ among an international group of pathologists.
      The proposed approach may help overcome the challenges of overdiagnosis and overtreatment induced by the introduction of CADe tools into cancer screening and will thus optimize cancer screening programs.

      Disclosure

      Dr Mori, Dr Misawa, and Dr Kudo are consultants for, and recipients of speaking fees from, Olympus Corp and have ownership interests in Cybernet System Corp. Dr Rastogi is a consultant for Boston Scientific , Cook Medical , and Olympus America and the recipient of a research grant from Olympus America. Dr Esparrach is a consultant for MiWendo Solutions SL. The remaining authors disclosed no financial relationships.

      References

        • Barua I.
        • Vinsard D.G.
        • Jodal H.C.
        • et al.
        Artificial intelligence for polyp detection during colonoscopy: a systematic review and meta-analysis.
        Endoscopy. 2021; 53: 277-284
        • Mori Y.
        • Bretthauer M.
        • Kalager M.
        Hopes and hypes for artificial intelligence in colorectal cancer screening.
        Gastroenterology. 2021; 161: 774-777
        • Wang P.
        • Liu X.
        • Berzin T.M.
        • et al.
        Effect of a deep-learning computer-aided detection system on adenoma detection during colonoscopy (CADe-DB trial): a double-blind randomised study.
        Lancet Gastroenterol Hepatol. 2020; 5: 343-351
        • Wang P.
        • Berzin T.M.
        • Glissen Brown J.R.
        • et al.
        Real-time automatic detection system increases colonoscopic polyp and adenoma detection rates: a prospective randomised controlled study.
        Gut. 2019; 68: 1813-1819
      1. Barua I, Wieszczy P, Kudo S, et al. Real-time AI-based optical diagnosis of neoplastic polyps during colonoscopy. NEJM Evid. Epub 2022 Apr 13.

        • Vennelaganti S.
        • Cuatrecasas M.
        • Vennalaganti P.
        • et al.
        Interobserver agreement among pathologists in the differentiation of sessile serrated from hyperplastic polyps.
        Gastroenterology. 2021; 160: 452-454.e1
        • Mollasharifi T.
        • Ahadi M.
        • Jamali E.
        • et al.
        Interobserver agreement in assessing dysplasia in colorectal adenomatous polyps: a multicentric Iranian study.
        Iranian journal of pathology. 2020; 15: 167-174
      2. Lee M, Kudose S, Del Portillo A, et al. Invasive carcinoma versus pseudoinvasion: interobserver variability in the assessment of left-sided colorectal polypectomies. J Clin Pathol. Epub 2021 Apr 12.

        • Anand D.
        • Yashashwi K.
        • Kumar N.
        • et al.
        Weakly supervised learning on unannotated H&E-stained slides predicts BRAF mutation in thyroid cancer with high accuracy.
        J Pathol. 2021; 255: 232-242
        • Kimura T.
        • Yamamoto E.
        • Yamano H.O.
        • et al.
        A novel pit pattern identifies the precursor of colorectal cancer derived from sessile serrated adenoma.
        Am J Gastroenterol. 2012; 107: 460-469
        • Ponugoti P.
        • Rastogi A.
        • Kaltenbach T.
        • et al.
        Disagreement between high confidence endoscopic adenoma prediction and histopathological diagnosis in colonic lesions ≤ 3 mm in size.
        Endoscopy. 2019; 51: 221-226
        • Huang Q.
        • Fukami N.
        • Kashida H.
        • et al.
        Interobserver and intra-observer consistency in the endoscopic assessment of colonic pit patterns.
        Gastrointest Endosc. 2004; 60: 520-526
        • Nasir-Moin M.
        • Suriawinata A.A.
        • Ren B.
        • et al.
        Evaluation of an artificial intelligence-augmented digital system for histologic classification of colorectal polyps.
        JAMA Netw Open. 2021; 4e2135271
        • Stoler M.H.
        • Schiffman M.
        Interobserver reproducibility of cervical cytologic and histologic interpretations: realistic estimates from the ASCUS-LSIL Triage Study.
        JAMA. 2001; 285: 1500-1505
        • van Seijen M.
        • Jóźwiak K.
        • Pinder S.E.
        • et al.
        Variability in grading of ductal carcinoma in situ among an international group of pathologists.
        J Pathol Clin Res. 2021; 7: 233-242