Original article Clinical endoscopy| Volume 93, ISSUE 4, P914-923, April 2021

Development and initial validation of an instrument for video-based assessment of technical skill in ERCP

      Background and Aims

      The accurate measurement of technical skill in ERCP is essential for endoscopic training, quality assurance, and coaching of this procedure. Hypothesizing that technical skill can be measured by analysis of ERCP videos, we aimed to develop and validate a video-based ERCP skill assessment tool.


      Based on review of procedural videos, the task of ERCP was deconstructed into its basic components by an expert panel that developed an initial version of the Bethesda ERCP Skill Assessment Tool (BESAT). Subsequently, 2 modified Delphi panels and 3 validation exercises were conducted with the goal of iteratively refining the tool. Fully crossed generalizability studies investigated the contributions of assessors, ERCP performance, and technical elements to reliability.


      Twenty-nine technical elements were initially generated from task deconstruction. Ultimately, after iterative refinement, the tool comprised 6 technical elements and 11 subelements. The developmental process achieved consistent improvements in the performance characteristics of the tool with every iteration. For the most recent version of the tool, BESAT-v4, the generalizability coefficient (a reliability index) was .67. Most variance in BESAT scores (43.55%) was attributed to differences in endoscopists’ skill, indicating that the tool can reliably differentiate between endoscopists based on video analysis.


      Video-based assessment of ERCP skill appears to be feasible with a novel instrument that demonstrates favorable validity evidence. Future steps include determining whether the tool can discriminate between endoscopists of varying experience levels and predict important outcomes in clinical practice.


      BESAT (Bethesda ERCP Skill Assessment Tool), D (decision), G (generalizability), SVI (stent versus indomethacin [trial])
      To read this article in full you will need to make a payment

      Purchase one-time access:

      Academic & Personal: 24 hour online accessCorporate R&D Professionals: 24 hour online access
      One-time access price info
      • For academic or personal research use, select 'Academic and Personal'
      • For corporate R&D use, select 'Corporate R&D Professionals'


      Subscribe to Gastrointestinal Endoscopy
      Already a print subscriber? Claim online access
      Already an online subscriber? Sign in
      Institutional Access: Sign in to ScienceDirect


        • Cotton P.B.
        • Romagnuolo J.
        • Faigel D.O.
        • et al.
        The ERCP quality network: a pilot study of benchmarking practice and performance.
        Am J Med Qual. 2013; 28: 256-260
        • Coté G.A.
        • Imler T.D.
        • Xu H.
        • et al.
        Lower provider volume is associated with higher failure rates for endoscopic retrograde cholangiopancreatography.
        Med Care. 2013; 51: 1040-1047
        • Keswani R.N.
        • Qumseya B.J.
        • O'Dwyer L.C.
        • et al.
        Association between endoscopist and center endoscopic retrograde cholangiopancreatography volume with procedure success and adverse outcomes: a systematic review and meta-analysis.
        Clin Gastroenterol Hepatol. 2017; 15: 1866-1875
        • Freeman M.L.
        • DiSario J.A.
        • Nelson D.B.
        • et al.
        Risk factors for post-ERCP pancreatitis: a prospective, multicenter study.
        Gastrointest Endosc. 2001; 54: 425-434
        • Vandervoort J.
        • Soetikno R.M.
        • Tham T.C.
        • et al.
        Risk factors for complications after performance of ERCP.
        Gastrointest Endosc. 2002; 56: 652-656
        • Swan M.P.
        • Alexander S.
        • Moss A.
        • et al.
        Needle knife sphincterotomy does not increase the risk of pancreatitis in patients with difficult biliary cannulation.
        Clin Gastroenterol Hepatol. 2013; 11: 430-436
        • Loperfido S.
        • Angelini G.
        • Benedetti G.
        • et al.
        Major early complications from diagnostic and therapeutic ERCP: a prospective multicenter study.
        Gastrointest Endosc. 1998; 48: 1-10
        • Adler D.G.
        • Lieb 2nd, J.G.
        • Cohen J.
        • et al.
        Quality indicators for ERCP.
        Am J Gastroenterol. 2015; 110: 91-101
        • Birkmeyer J.D.
        • Finks J.F.
        • O'Reilly A.
        • et al.
        Surgical skill and complication rates after bariatric surgery.
        N Engl J Med. 2013; 369: 1434-1442
        • Hogg M.E.
        • Zenati M.
        • Novak S.
        • et al.
        Grading of surgeon technical performance predicts postoperative pancreatic fistula for pancreaticoduodenectomy independent of patient-related variables.
        Ann Surg. 2016; 264: 482-491
        • Hu Y.Y.
        • Mazer L.M.
        • Yule S.J.
        • et al.
        Complementing operating room teaching with video-based coaching.
        JAMA Surg. 2017; 152: 318-325
        • Rindos N.B.
        • Wroble-Biglan M.
        • Ecker A.
        • et al.
        Impact of video coaching on gynecologic resident laparoscopic suturing: a randomized controlled trial.
        J Minim Invasive Gynecol. 2017; 24: 426-431
        • Scaffidi M.A.
        • Grover S.C.
        • Carnahan H.
        • et al.
        A prospective comparison of live and video-based assessments of colonoscopy performance.
        Gastrointest Endosc. 2018; 87: 766-775
        • Patel S.G.
        • Duloy A.
        • Kaltenbach T.
        • et al.
        Development and validation of a video-based cold snare polypectomy assessment tool (with videos).
        Gastrointest Endosc. 2019; 89: 1222-1230
        • Duloy A.M.
        • Kaltenbach T.R.
        • Wood M.
        • et al.
        Colon polypectomy report card improves polypectomy competency: results of a prospective quality improvement study (with video).
        Gastrointest Endosc. 2019; 89: 1212-1221
        • Messick S.
        Validity of test interpretation and use.
        Cognit Ther Res. 1990; 34: 380-387
        • American Educational Research Association
        American Psychological Association, the National Council on Measurement in Education.
        in: Standards for educational and psychological testing. American Educational Research Association, Washington, DC2014
        • Elmunzer B.J.
        • Serrano J.
        • Chak A.
        • et al.
        Rectal indomethacin alone versus indomethacin and prophylactic pancreatic stent placement for preventing pancreatitis after ERCP: study protocol for a randomized controlled trial.
        Trials. 2016; 17: 120
        • de Villiers M.R.
        • de Villiers P.J.
        • Kent A.P.
        The Delphi technique in health sciences education research.
        Med Teach. 2005; 27: 639-643
      1. Keeney S, Hasson F, McKenna H, editors. The Delphi technique. In: The Delphi technique in nursing and health research. Oxford, UK: Wiley-Blackwell; 2011, p. 1-26.

        • Webb N.M.
        • Shavelson R.J.
        • Haertel E.H.
        Reliability coefficients and generalizability theory.
        in: Rao C.R. Sinharay S. Handbook of statistics. Elsevier, Amsterdam, Netherlands2006: 81-124
        • Webb N.M.
        • Shavelson R.J.
        Generalizability theory: overview.
        in: Wiley StatsRef: statistical reference online. John Wiley & Sons, Hoboken, NJ2014
        • Shavelson R.J.
        • Webb N.M.
        Generalizability theory.
        in: Green J.L. Camilli G. Elmore P.B. Complementary methods for research in education. 3rd ed. American Educational Research Association, Washington, DC2005: 599-612
        • Bloch R.
        • Norman G.
        Generalizability theory for the perplexed: a practical introduction and guide: AMEE guide no. 68.
        Med Teach. 2020; 34: 960-992
        • Webb N.M.
        • Shavelson R.J.
        Generalizability theory: overview.
        in: Everitt B.S. Howell D.C. Encyclopedia of statistics in behavioral science. Wiley, Chichester, UK2005: 717-719
        • Downing S.M.
        Reliability: on the reproducibility of assessment data.
        Med Educ. 2004; 38: 1006-1012
        • Gupta S.
        • Anderson J.
        • Bhandari P.
        • et al.
        Development and validation of a novel method for assessing competency in polypectomy: direct observation of polypec my skills.
        Gastrointest Endosc. 2011; 73: 1232-1239
        • Knight S.
        • Aggarwal R.
        • Agostini A.
        • et al.
        Development of an objective assessment tool for total laparoscopic hysterectomy: a Delphi method among experts and evaluation on a virtual reality simulator.
        PLoS One. 2018; 13e0190580
        • Palter V.N.
        • Grantcharov T.P.
        A prospective study demonstrating the reliability and validity of two procedure-specific evaluation tools to assess operative competence in laparoscopic colorectal surgery.
        Surg Endosc. 2012; 26: 2489-2503
        • Augestad K.M.
        • Butt K.
        • Ignjatovic D.
        • et al.
        Video-based coaching in surgical education: a systematic review and meta-analysis.
        Surg Endosc. 2020; 34: 521-535
        • Patnaik R.
        • Anton N.E.
        • Stefanidis D.
        A video anchored rating scale leads to high inter-rater reliability of inexperienced and expert raters in the absence of rater training.
        Am J Surg. 2020; 219: 221-226
        • Wani S.
        • Keswani R.N.
        • Han S.
        • et al.
        Competence in endoscopic ultrasound and endoscopic retrograde cholangiopancreatography, from training through independent practice.
        Gastroenterology. 2018; 155: 1483-1494
        • Tavares W.
        • Eva K.W.
        Exploring the impact of mental workload on rater-based assessments.
        Adv Health Sci Educ Theory Pract. 2013; 18: 291-303
        • Vogt V.Y.
        • Givens V.M.
        • Keathley C.A.
        • et al.
        Is a resident’s score on a video- taped objective structured assessment of technical skills affected by revealing the resident’s identity?.
        Am J Obstet Gynecol. 2003; 189: 688-691
        • Jeyalingam T.
        • Walsh C.M.
        Video-based assessments: a promising step in improving polypectomy competency.
        Gastrointest Endosc. 2019; 89: 1231-1233

      Linked Article

      • ERCP and video assessment: Can video judge the endoscopy star?
        Gastrointestinal EndoscopyVol. 93Issue 4
        • Preview
          ERCP continues to be one of the most technically challenging and complex endoscopic procedures performed. ERCP carries higher risk and rates of adverse events than traditional endoscopy, and, as with other endoscopic procedures, effective training is critical. As with most procedures, it is believed that volume, skill level, and competency affect outcomes and adverse events; however, this has largely been speculative or based on indirect evidence. Formal assessment of ERCP competency is lacking, and traditional measures have previously relied on volume thresholds as a surrogate.
        • Full-Text
        • PDF