Advertisement

ERCP and video assessment: Can video judge the endoscopy star?

      Abbreviations:

      BESAT (Bethesda ERCP Skill Assessment Tool), G-coefficient (generalizability coefficient)
      ERCP continues to be one of the most technically challenging and complex endoscopic procedures performed. ERCP carries higher risk and rates of adverse events than traditional endoscopy, and, as with other endoscopic procedures, effective training is critical. As with most procedures, it is believed that volume, skill level, and competency affect outcomes and adverse events; however, this has largely been speculative or based on indirect evidence. Formal assessment of ERCP competency is lacking, and traditional measures have previously relied on volume thresholds as a surrogate. However, the number of ERCPs a practitioner needs to become competent has varied greatly.
      Initial studies estimated that a practitioner would need between 35 and 100 ERCPs to become competent.
      • Watkins J.L.
      • Etzkorn K.P.
      • Wiley T.E.
      • et al.
      Assessment of technical competence during ERCP training.
      ,
      Health and Public Policy Committee, American College of Physicians
      Clinical competence in diagnostic endoscopic retrograde cholangiopancreatography.
      This increased to 180 after a landmark prospective trial by Jowell et al,
      • Jowell P.S.
      • Baillie J.
      • Branch M.S.
      • et al.
      Quantitative assessment of procedural competence: a prospective study of training in endoscopic retrograde cholangiopancreatography.
      a number adopted by the American Society for Gastrointestinal Endoscopy (ASGE). A more recent systematic review that looked at 9 studies assessing 137 trainees found that overall competency was achieved after 70 to 400 ERCPs.
      • Shahidi N.
      • Ou G.
      • Telford J.
      • et al.
      When trainees reach competency in performing ERCP: a systematic review.
      This variability in volume data suggests that markers other than just numbers are important in assessing ERCP competency. Technical skill and proficiency in ERCP may be a more important component of assessment, but its ability to be measured is unknown. Other procedures, including colonoscopy, have used video analysis as an effective way to measure and improve skills.
      • Scaffidi M.A.
      • Grover S.C.
      • Carnahan H.
      • et al.
      A prospective comparison of live and video-based assessments of colonoscopy performance.
      ,
      • Patel S.G.
      • Duloy A.
      • Kaltenbach T.
      • et al.
      Development and validation of a video-based cold snare polypectomy assessment tool (with videos).
      Before this study, an ERCP-specific procedural skill video assessment tool had yet to be established.
      In this issue of Gastrointestinal Endoscopy, Elmunzer et al
      • Elmunzer B.
      • Guiton G.
      • Walsh C.
      • et al.
      Development and initial validation of an instrument for video-based assessment of technical skill in ERCP.
      present a novel study that examines the development of a video-based ERCP skill assessment tool. Inspired by other procedural videos and assessments, the authors deconstructed the ERCP task into its basic major components to create the Bethesda ERCP Skill Assessment Tool (BESAT). The creation of the video-based tool and the methodology (4 versions) and time invested (multiple years) are the true strengths of the present study. The development of the tool began with 9 investigators reviewing 8 videos to break ERCP into basic elements and determine the feasibility of judging these skill elements on video rather than in in-person real time. These elements then were sent to 20 additional ERCP endoscopists, who graded each element on a scale of 1 to 5. Delphi consensus methodology was used, and all elements judged as important and very important by >75% of endoscopists were kept in the model. This second revised model was then used by 6 investigators to judge 6 new videos that included adverse outcomes related to the procedure, which resulted in the third version of the tool. The present version, BESAT-v4, was then created when 6 assessors judged 8 new videos that included patient demographics and descriptive anchors added to each technical element. A second Delphi process was performed with 12 endoscopists refining the tool. The tool was initially composed of 29 separate elements of performing ERCP, but ultimately it was refined to 6 technical elements and 1 subelement after several reiterations over a few years. The strength of the study presented by Elmunzer et al,
      • Elmunzer B.
      • Guiton G.
      • Walsh C.
      • et al.
      Development and initial validation of an instrument for video-based assessment of technical skill in ERCP.
      for which they should be significantly congratulated, is definitively the methodology: the creation, development, redevelopment, and refinement of a tool to judge ERCP technical skill based on video analysis.
      The first question to ask, then, after so much work was put into creating an instrument for assessing video-based ERCP skill is this: does it work? The authors show that with the most recent version of the tool, BESAT-v4, there was a high generalizability coefficient (G-coefficient) of 0.67, which is nearing ≥0.70, a threshold at which a tool is considered reliable. Simply put, generalizability theory is a theory for evaluating the reliability of measurements, and a G-coefficient is a measure of this reliability. Thus, on the fourth version of their tool for assessing video ERCP skill, the tool is approaching the threshold for what would be considered reliable. At present, the tool is good but does not reach a G-coefficient of ≥0.80, which would be a more acceptable level of reliability. Looking at the tool and its elements demonstrates the challenge of developing a reliable tool to judge video-based ERCP skill. Despite the multiple refinements, there will always be subjectivity of the assessors. Several categories and subcategories still have very subjective interpretations, such as the “gentleness” of cannulation and trying to evaluate the endoscopists “procedural judgment.” The authors freely admit the challenges in developing their tool and admit that it is not a finished model and is still undergoing further refinement. However, the tool, even in its present but not finished form, is already in the realm of reliability of other validated video-based tools in endoscopy and surgery.
      • Gupta S.
      • Anderson J.
      • Bhandari P.
      • et al.
      Development and validation of a novel method for assessing competency in polypectomy: direct observation of polypectomy skills.
      • Knight S.
      • Aggarwal R.
      • Agostini A.
      • et al.
      Development of an objective assessment tool for total laparoscopic hysterectomy: a Delphi method among experts and evaluation on a virtual reality simulator.
      • Palter V.N.
      • Grantcharov T.P.
      A prospective study demonstrating the reliability and validity of two procedure-specific evaluation tools to assess operative competence in laparoscopic colorectal surgery.
      The other part of the question of does the model work is not just is it a reliable assessment but can a video-based assessment work to practically and truly evaluate an endoscopist’s ERCP skills? One issue raised can be found in the study results of the final version of the assessment tool. Modeling showed that it took 8 reviewers of a single video to make the tool reliable and hit a reliability threshold of 0.70. If only 2 or 3 or 4 assessors rated the video, their assessments were not deemed reliable. In real life, if it took 8 evaluators to examine just 1 video for an average 25 minutes to make the test reliable, this is likely not practical if multiple videos on multiple endoscopists are evaluated to compare and judge skill levels between endoscopists. As the authors again state, further work on the model and further training of assessors needs to be done to make the model reliable with fewer assessors, or this will not be a practical model to assess ERCP skill.
      Another challenge of the video-based assessment tool is whether it actually judges an endoscopist’s skill. The tool has been shown to be valid and reliable in its ability to assess technical skill across multiple assessors for a video. However, 1 video is not anywhere near an accurate judge of an endoscopist’s skill. As stated, ERCP is the most challenging and at times humbling procedure we perform as endoscopists. The greatest ERCP “master” can look quite foolish on any given ERCP. On any single video of a challenging cannulation. an experienced endoscopist may not appear that “gentle” or “efficient.” Alternatively, in some ERCPs, a novice interventional endoscopy fellow can cannulate freely on the first pass and appear to be an expert. Thus, judging 1 video does not judge an endoscopist. To achieve discriminative validity on whether the tool can differentiate the technical skill of different endoscopists, how many videos have to be judged for an individual endoscopist—5, 10, 50, 100? If 50 videos need to be assessed to enable a true judgement of an endoscopist’s skills by 8 assessors looking at each 25-minute video, is this a practical tool?
      However, any present limitations of the described ERCP video assessment tool in this study and any challenge of applying it to ERCP training and practice are outweighed by the unique and novel development of a tool that can assess ERCP skill based on video. The tool, even though it has taken 3 years to develop to its present point, is still in its infancy. This is still essentially a pilot-based study and will only improve. Having a reliable way to judge an endoscopist’s skill through video assessment has multiple implications going forward. If video analysis can be shown to differentiate an endoscopist’s skill at different levels of training and expertise (discriminative validity) and if video assessment of ERCP skill can predict outcomes (predictive validity) of success and adverse events, it could play a major role in training, in determining competence and hospital privileging, and in displaying a maintenance of skills. Videos with the ability to reliably assess skill could be shared between different trainers and mentors, could be shared across institutions, and could be used as training protocols. Everyone who has taught and practiced during the COVID-19 experience knows that virtual-based patient care and video-based training and assessment are already a part of our lives as endoscopists and will only continue to grow and increase in importance.
      In conclusion, live training of ERCP and assessment of ERCP skills and the cumulative and summative feedback driven by the mentor-mentee relationship during training will remain the dominant and most important way we train and evaluate ERCP skills. However, video-based assessment of ERCP and all endoscopy skills will continue to grow and complement and improve in-person assessment and training. Elmunzer et al
      • Elmunzer B.
      • Guiton G.
      • Walsh C.
      • et al.
      Development and initial validation of an instrument for video-based assessment of technical skill in ERCP.
      provide an early and reliable tool that provides a new way to assess competency among endoscopists performing ERCPs, but for video-based assessment this is just the beginning.

      Disclosure

      Both authors disclosed no financial relationships.

      References

        • Watkins J.L.
        • Etzkorn K.P.
        • Wiley T.E.
        • et al.
        Assessment of technical competence during ERCP training.
        Gastrointest Endosc. 1996; 44: 411-415
        • Health and Public Policy Committee, American College of Physicians
        Clinical competence in diagnostic endoscopic retrograde cholangiopancreatography.
        Ann Intern Med. 1988; 108: 142-144
        • Jowell P.S.
        • Baillie J.
        • Branch M.S.
        • et al.
        Quantitative assessment of procedural competence: a prospective study of training in endoscopic retrograde cholangiopancreatography.
        Ann Intern Med. 1996; 125: 983-989
        • Shahidi N.
        • Ou G.
        • Telford J.
        • et al.
        When trainees reach competency in performing ERCP: a systematic review.
        Gastrointest Endosc. 2015; 81: 1337-1342
        • Scaffidi M.A.
        • Grover S.C.
        • Carnahan H.
        • et al.
        A prospective comparison of live and video-based assessments of colonoscopy performance.
        Gastrointest Endosc. 2018; 87: 766-775
        • Patel S.G.
        • Duloy A.
        • Kaltenbach T.
        • et al.
        Development and validation of a video-based cold snare polypectomy assessment tool (with videos).
        Gastrointest Endosc. 2019; 89: 1222-1230
        • Elmunzer B.
        • Guiton G.
        • Walsh C.
        • et al.
        Development and initial validation of an instrument for video-based assessment of technical skill in ERCP.
        Gastrointest Endosc. 2021; 93: 914-923
        • Gupta S.
        • Anderson J.
        • Bhandari P.
        • et al.
        Development and validation of a novel method for assessing competency in polypectomy: direct observation of polypectomy skills.
        Gastrointest Endosc. 2011; 73: 1232-1239
        • Knight S.
        • Aggarwal R.
        • Agostini A.
        • et al.
        Development of an objective assessment tool for total laparoscopic hysterectomy: a Delphi method among experts and evaluation on a virtual reality simulator.
        PLoS One. 2018; 13e0190580
        • Palter V.N.
        • Grantcharov T.P.
        A prospective study demonstrating the reliability and validity of two procedure-specific evaluation tools to assess operative competence in laparoscopic colorectal surgery.
        Surg Endosc. 2012; 26: 2489-2503

      Linked Article