Machine learning–based personalized prediction of gastric cancer incidence using the endoscopic and histologic findings at the initial endoscopy

Published:January 05, 2022DOI:

      Background and Aims

      Accurate risk stratification for gastric cancer is required for optimal endoscopic surveillance in patients with chronic gastritis. We aimed to develop a machine learning (ML) model that incorporates endoscopic and histologic findings for an individualized prediction of gastric cancer incidence.


      We retrospectively evaluated 1099 patients with chronic gastritis who underwent EGD and biopsy sampling of the gastric mucosa. Patients were randomly divided into training and test sets (4:1). We constructed a conventional Cox proportional hazard model and 3 ML models. Baseline characteristics, endoscopic atrophy, and Operative Link on Gastritis-Intestinal Metaplasia Assessment (OLGIM)/Operative Link on Gastritis Assessment (OLGA) stage at initial EGD were comprehensively assessed. Model performance was evaluated using Harrel’s c-index.


      During a mean follow-up of 5.63 years, 94 patients (8.55%) developed gastric cancer. The gradient-boosting decision tree (GBDT) model achieved the best performance (c-index from the test set, .84) and showed high discriminative ability in stratifying the test set into 3 risk categories (P < .001). Age, OLGIM/OLGA stage, endoscopic atrophy, and history of malignant tumors other than gastric cancer were important predictors of gastric cancer incidence in the GBDT model. Furthermore, the proposed GBDT model enabled the generation of a personalized cumulative incidence prediction curve for each patient.


      We developed a novel ML model that incorporates endoscopic and histologic findings at initial EGD for personalized risk prediction of gastric cancer. This model may lead to the development of effective and personalized follow-up strategies after initial EGD.

      Graphical abstract


      CPH (Cox proportional hazard), GBDT (gradient-boosting decision tree), OLGA (Operative Link on Gastritis Assessment), OLGIM (Operative Link on Gastritis-Intestinal Metaplasia Assessment), ML (machine learning)
      To read this article in full you will need to make a payment

      Purchase one-time access:

      Academic & Personal: 24 hour online accessCorporate R&D Professionals: 24 hour online access
      One-time access price info
      • For academic or personal research use, select 'Academic and Personal'
      • For corporate R&D use, select 'Corporate R&D Professionals'


      Subscribe to Gastrointestinal Endoscopy
      Already a print subscriber? Claim online access
      Already an online subscriber? Sign in
      Institutional Access: Sign in to ScienceDirect


        • Global Burden of Disease Cancer Collaboration
        • Fitzmaurice C.
        • Abate D.
        • et al.
        Global, regional, and national cancer incidence, mortality, years of life lost, years lived with disability, and disability-adjusted life-years for 29 cancer groups, 1990 to 2017: a systematic analysis for the Global Burden of Disease Study.
        JAMA Oncol. 2019; 5: 1749-1768
        • Smyth E.C.
        • Nilsson M.
        • Grabsch H.I.
        • et al.
        Gastric cancer.
        Lancet. 2020; 396: 635-648
        • Yao K.
        • Uedo N.
        • Kamada T.
        • et al.
        Guidelines for endoscopic diagnosis of early gastric cancer.
        Dig Endosc. 2020; 32: 663-698
        • Evans J.A.
        • Chandrasekhara V.
        • Chathadi K.V.
        • et al.
        • ASGE Standards of Practice Committee
        The role of endoscopy in the management of premalignant and malignant conditions of the stomach.
        Gastrointest Endosc. 2015; 82: 1-8
        • Eom B.W.
        • Joo J.
        • Kim S.
        • et al.
        Prediction model for gastric cancer incidence in Korean population.
        PloS One. 2015; 10e0132613
        • Cai Q.
        • Zhu C.
        • Yuan Y.
        • et al.
        Development and validation of a prediction rule for estimating gastric cancer risk in the Chinese high-risk population: a nationwide multicentre study.
        Gut. 2019; 68: 1576-1587
        • Leung W.K.
        • Cheung K.S.
        • Li B.
        • et al.
        Applications of machine learning models in the prediction of gastric cancer risk in patients after Helicobacter pylori eradication.
        Aliment Pharmacol Ther. 2021; 53: 864-872
        • Sato M.
        • Morimoto K.
        • Kajihara S.
        • et al.
        Machine-learning approach for the development of a novel predictive model for the diagnosis of hepatocellular carcinoma.
        Sci Rep. 2019; 9: 7704
        • Shung D.L.
        • Au B.
        • Taylor R.A.
        • et al.
        Validation of a machine learning model that outperforms clinical risk scoring systems for upper gastrointestinal bleeding.
        Gastroenterology. 2020; 158: 160-167
        • Uemura N.
        • Okamoto S.
        • Yamamoto S.
        • et al.
        Helicobacter pylori infection and the development of gastric cancer.
        N Engl J Med. 2001; 345: 784-789
        • Shichijo S.
        • Hirata Y.
        • Niikura R.
        • et al.
        Histologic intestinal metaplasia and endoscopic atrophy are predictors of gastric cancer development after Helicobacter pylori eradication.
        Gastrointest Endosc. 2016; 84: 618-624
        • Arai J.
        • Aoki T.
        • Hayakawa Y.
        • et al.
        Predictive model for gastric cancer after eradication of Helicobacter pylori—a survival analysis using a deep learning algorithm [letter].
        Aliment Pharmacol Ther. 2021; 54: 528-529
        • Shichijo S.
        • Hirata Y.
        • Niikura R.
        • et al.
        Association between gastric cancer and the Kyoto classification of gastritis.
        J Gastroenterol Hepatol. 2017; 32: 1581-1586
        • Charlson M.E.
        • Pompei P.
        • Ales K.L.
        • et al.
        A new method of classifying prognostic comorbidity in longitudinal studies: development and validation.
        J Chronic Dis. 1987; 40: 373-383
        • Arai J.
        • Niikura R.
        • Hayakawa Y.
        • et al.
        Nonsteroidal anti-inflammatory drugs prevent gastric cancer associated with the use of proton pump inhibitors after Helicobacter pylori eradication.
        JGH Open. 2021; 5: 770-777
        • Niikura R.
        • Hayakawa Y.
        • Hirata Y.
        • et al.
        Distinct chemopreventive effects of aspirin in diffuse and intestinal-type gastric cancer.
        Cancer Prev Res. 2018; 11: 279-286
        • Niikura R.
        • Hayakawa Y.
        • Hirata Y.
        • et al.
        Long-term proton pump inhibitor use is a risk factor of gastric cancer after treatment for Helicobacter pylori: a retrospective cohort analysis.
        Gut. 2018; 67: 1908-1910
        • Kimura K.
        • Takemoto T.
        An endoscopic recognition of the atrophic border and its significance in chronic gastritis.
        Endoscopy. 1969; 1: 87-97
        • Capelle L.G.
        • de Vries A.C.
        • Haringsma J.
        • et al.
        The staging of gastritis with the OLGA system by using intestinal metaplasia as an accurate alternative for atrophic gastritis.
        Gastrointest Endosc. 2010; 71: 1150-1158
        • Rugge M.
        • Meggio A.
        • Pennelli G.
        • et al.
        Gastritis staging in clinical practice: the OLGA staging system.
        Gut. 2007; 56: 631-636
      1. Arai J, Niikura R, Hayakawa Y, et al. OLGIM staging and proton pump inhibitor use predict the risk of gastric cancer. Gut. Epub 2021 Aug 3.

        • Katzman J.L.
        • Shaham U.
        • Cloninger A.
        • et al.
        DeepSurv: personalized treatment recommender system using a Cox proportional hazards deep neural network.
        BMC Med Res Methodol. 2018; 18: 24
        • Natekin A.
        • Knoll A.
        Gradient boosting machines, a tutorial.
        Front Neurorobot. 2013; 7: 21
        • Sheridan R.P.
        • Wang W.M.
        • Liaw A.
        • et al.
        Extreme gradient boosting as a method for quantitative structure-activity relationships.
        J Chem Inf Model. 2016; 56: 2353-2360
        • Vapnik V.
        The nature of statistical learning theory.
        New York: Springer Science and Business Media, 1995
        • Hearst M.A.
        • Dumais S.T.
        • Osuna E.
        • et al.
        Support vector machines.
        IEEE Intell Syst Their Appl. 1998; 13: 18-28
        • Fotso S.
        PySurvival: open source package for survival analysis modeling, 2019.
        (Available at:)
        Date accessed: February 16, 2022
        • Pölsterl S.
        scikit-survival: A Library for Time-to-Event Analysis Built on Top of scikit-learn.
        Journal of Machine Learning Research. 2020; 21: 1-6
        • Harrell Jr., F.E.
        • Lee K.L.
        • Mark D.B.
        Multivariable prognostic models: issues in developing models, evaluating assumptions and adequacy, and measuring and reducing errors.
        Stat Med. 1996; 15: 361-387
        • Fisher A.
        • Rudin C.
        • Dominici F.
        All models are wrong, but many are useful: learning a variable’s importance by studying an entire class of prediction models simultaneously.
        J Mach Learn Res. 2019; 20: 1-81
        • Burkholder I.
        • Edler L.
        Linear model: overview.
        Wiley STATSRef Stat Ref Online, 2014 (Hoboken: John Wiley & Sons. Available at:)
        • Xindong Wu
        • Xingquan Zhu
        • Gong-Qing Wu
        • et al.
        Data mining with big data.
        IEEE Trans Knowl Data Eng. 2014; 26: 97-107
        • Hengl S.
        • Kreutz C.
        • Timmer J.
        • et al.
        Data-based identifiability analysis of non-linear dynamical models.
        Bioinformatics. 2007; 23: 2612-2618
        • Rajkomar A.
        • Dean J.
        • Kohane I.
        Machine learning in medicine.
        N Engl J Med. 2019; 380: 1347-1358
        • Austin P.C.
        • Pencinca M.J.
        • Steyerberg E.W.
        Predictive accuracy of novel risk factors and markers: a simulation study of the sensitivity of different performance measures for the Cox proportional hazards regression model.
        Stat Methods Med Res. 2017; 26: 1053-1077
        • De Vries B.C.S.
        • Hegeman J.H.
        • Nijmeijer W.
        • et al.
        Comparing three machine learning approaches to design a risk assessment tool for future fractures: predicting a subsequent major osteoporotic fracture in fracture patients with osteopenia and osteoporosis.
        Osteoporos Int. 2021; 32: 437-449
        • Blaha M.J.
        • Blumenthal R.S.
        Risk factors: new risk-assessment guidelines—more or less personalized?.
        Nat Rev Cardiol. 2014; 11: 136-137
        • Inoue M.
        • Tsugane S.
        Epidemiology of gastric cancer in Japan.
        Postgrad Med J. 2005; 81: 419-424

      Linked Article