Artificial Intelligence in Orthopedic Surgery Education: Current Concepts

Karam Samplay; Saqib Rehman

doi:10.58616/001c.137610

Samplay K, Rehman S. Artificial Intelligence in Orthopedic Surgery Education: Current Concepts. SurgiColl. 2025;3(3). doi:10.58616/001c.137610

View more stats

Abstract

Artificial intelligence (AI) creates new opportunities to streamline orthopedic surgery medical education. There are four important ways in which AI can be used in medical education: adaptive learning using AI, surgical simulation, performance improvement, and clinical adjuncts. Adaptive learning is a practical AI application for improving electronic learning (E-learning) - delivery of educational content through digital devices like computers, tablets, and smartphones. JBJS (Journal of Bone and Joint Surgery) Clinical Classroom is an example of an E-learning platform utilizing adaptive learning for residents and practicing physicians. Companies like Khan Academy and Quizlet have also developed AI-based tutors for E-learning. Surgical simulation has become an effective way to recreate key components of surgery that trainees should understand to safely and confidently develop technical skills and perform such procedures. Haptic feedback (using touch and vibrations to communicate physical sensations or feelings to a user) is helpful for these simulators, providing tactile feedback expected with orthopedic procedures. Modern clinical education extends from the classroom to the point of care, and performance improvement is a major component. Predictive analytics involves collecting and analyzing data to predict the likelihood of different outcomes. Since many of these technologies are in their early stages, further testing and refinement are needed to improve security, accuracy, and validity.

INTRODUCTION

Orthopaedic surgical education had evolved from when one needed to read a few textbooks and learn via the “see one, do one, teach one” approach. The explosion of information and the increasing number of procedures that trainees are exposed to, combined with online learning platforms, have changed how graduate, postgraduate, and continuing medical education (CME) is delivered and consumed. At the same time, increasing financial pressures on academic health systems and orthopaedic practices have made it increasingly challenging for faculty to have dedicated teaching time with their learners.^1,2 There is a simultaneous emphasis on performance improvement and patient safety at institutions and orthopaedic practices, partly due to these same pressures. Technology has made its way into orthopaedic education at multiple levels, meeting some of the needs of both learners and educators. That, coupled with the rise of artificial intelligence, will potentially change the face of orthopaedic education.

Artificial intelligence (AI) involves using complex algorithms to generate useful outputs like human cognition.³ These algorithms can process massive amounts of information from electronic medical record (EMR) systems and performance data for analysis. Applications of AI, such as natural language processing (NLP), machine learning (ML), etc., can be used to analyze different types of information to provide other valuable outputs that can augment learning [Table 1]. This can be used to streamline data analysis and develop tools and platforms for key facets of medical education, such as didactic education, clinical simulation, and practice improvement.

There has been a greater focus on independent learning tools in response to the Coronavirus disease (COVID-19) pandemic.⁴ E-learning and surgical simulations have especially gained popularity,⁵ particularly for residents, as work hour restrictions and limited resources have created concern about limited training opportunities, experience, and education.⁴ There is an increasing need for residents and practicing physicians to hone their skills using such technology to optimize patient safety when performing newer non-invasive procedures.^4,6

While the previously mentioned advancements in education have shown promising results,⁷ still needs to enhance the quality of medical education. AI is emerging as a potential tool for medical education at multiple levels. This creates opportunities to improve the quality and efficiency of medical education while improving patient safety.³ This concepts review will look at how AI can and has been utilized to enhance the quality of orthopedic surgical training by improving current education and medical practice.

Table 1.Key Definitions

Artificial intelligence (AI)	Application of complex algorithms to generate sound output, excluding the need for human cognitive intelligence³
Machine Learning (ML)	Application of AI involving analyzing large datasets to identify patterns, correlations, and insights⁸
Natural Language Processing (NLP)	Application of AI involving understanding human language text, using algorithms and statistical models to receive, interpret, and generate human language^9,10
Deep Learning (DL)	The progressive subcategory of machine learning is made of numerous complex layers of algorithms meant to mirror neural networks seen in the brain³
Convolutional Neural Networks (CNN)	Neural network based on deep learning where layers of processing extract data from images, synthesize learnable parameters, and automatically propagate inputs to learn spatial hierarchies of features³

REVIEW

Adaptive Learning Using Artificial Intelligence

Adaptive learning is an effective AI application for improving E-learning.⁸ This approach aims to identify knowledge gaps in learners and adjust the material to address gaps in knowledge and fit the learner’s style.⁸ ML has been utilized for data collection and processing to determine adjustments in material. This allows for continuous refinement of the learning process instead of requiring the learner to adjust how the material is presented.⁸

Personalized teaching, tailored to individual learning styles, has proven more effective than traditional methods.⁸ A review by Shorten et al. details how utilizing learning analytics to assess performance and knowledge gaps has multiple uses within medical training. For trainees, this is a means to gain insight regarding their mastery of the material and to develop an adaptive learning curriculum based on their knowledge gaps.¹¹ This would increase how efficiently such gaps are addressed, improving the use of independent learning time. This is especially useful in residency, where time is already limited. Furthermore, implementing such analytics allows residency directors to assess the mastery of material across trainees better, helping them critically evaluate areas of improvement within the curriculum and the preparation of trainees.¹¹

JBJS Clinical Classroom is an example of an E-learning platform utilizing adaptive learning for residents and practicing physicians. Companies like Khan Academy and Quizlet have also developed AI-based tutors for E-learning. As additional platforms become more mainstream, one of the major obstacles to consider is training teachers on effectively utilizing AI within the curriculum.¹² Institutions must also develop the infrastructure to support integrating this technology into the curriculum.

Clinical Simulation

Surgical simulation has become an effective way to recreate key components of surgery that trainees should understand to safely and confidently develop technical skills and perform such procedures.^13,14 Haptic feedback is helpful for these simulators, providing tactile feedback expected with orthopedic procedures.⁶ A study by Ershad et al. has demonstrated how different styles of haptic feedback can be applied to correct poor performance during simulated tasks.¹⁵ A similar study by Wang et al. used DL to efficiently assess motions of suturing, needle passing, and knot tying.¹⁶ Studies have also utilized haptic feedback with a VR-based bone drilling task, demonstrating improved performance.¹⁷ Similar training models have been enhanced via DL systems.¹⁸ In the early stages, AI algorithms to monitor performance can streamline surgical training.¹⁹

Curriculum can also be developed using AI assessment of training. In a study by Mariani et al., gaps identified in the trainee’s performance were used to create a personalized curriculum that provided tailored tasks. The study showed improved performance and better performance in skill transference tasks compared to the control group (p = 0.02 and p = 0.0152, respectively).²⁰ Residents and physicians would benefit from independently practicing and assessing their surgical capability in a structured setting.

AI, such as chatbot patient simulations, has become more utilized in developing patient interaction skills.¹² Work has been done to expand the cases such bots can simulate with programs developed to analyze and synthesize real patient records into virtual patients that students can interact with.¹² More advanced models have been developed, too, such as the Hybrid Language model used by Ng et al.²¹ Patient simulations using AI-based VR have been reported by users to feel like real-life patient interactions and would be helpful in their studies.²² However, Mir et al. explain that AI-based bots cannot effectively replicate interacting with the psychosocial experiences of patients.¹²

Many studies conducted around AI-based surgical simulation face limitations. Wang et al. note how in their study, it is challenging to assess actions like knot tying compared to other skills.¹⁶ Developing reliable and validated models requires training with large volumes of surgical cases.^13,23,24 Other issues include limited study participants, AI’s difficulty in measuring subjective qualities of actions, and AI’s difficulty recognizing how different approaches can accomplish a single goal.²³ There is limited availability of orthopedic-specific AI simulators and studies compared to other specialties.¹⁴ Given how applicable the current AI-based surgical simulators are to orthopedics, it is reasonable to expect further development of orthopedic-specific AI-based simulators.³ Regarding patient simulations using chatbots, there is concern about accuracy in interpreting the intention of the user’s questions and responses. Bots also demonstrate difficulty understanding conversational cues that lack the nuance of patient-doctor interactions.¹³ While such issues are being addressed, these failures pose reasonable concern about their utility.^21,22

Performance Improvement

Modern clinical education extends from the classroom to the point of care, and performance improvement is a major component. Predictive analytics involves collecting and analyzing data to predict the likelihood of different outcomes.^3,12 In medicine, predictive analytics have been utilized in processing EMR systems, public data sets, and surveys to determine how patient populations with comorbidities and demographic characteristics may have an increased likelihood of specific outcomes. Franklin et al. used this technology in a study where ML models were used to process patient outcomes to generate predictive reports for hip and knee replacements.²⁵ While meant to be for informed decision-making, providing predictive information tailored to specific populations can also improve medical education.

The efficiency and potential of such analytics have expanded with AI. ML has been used to develop algorithms for the processing of clinical data.²⁶ Recent studies have shown ML’s ability to predict orthopedic procedures such as shoulder replacement outcomes, post-ACL reconstruction nerve block efficacy, and probability of hip arthroplasty dislocation.¹⁹ Physicians used these tools to understand better how certain factors contribute to specific outcomes and better treatment practices for certain groups. As technology advances, this will be especially useful for expanding the understanding of outcomes for groups underrepresented in scientific literature.

This technology is also being applied to smaller datasets. ML models have been developed, compared, and implemented using publicly available datasets to predict unplanned readmissions for total shoulder arthroplasty (TSA) with acceptable accuracy.²⁷ This study used smaller databases for TSA than total knee and hip arthroplasty.²⁷ Another study used EMR data from a single tertiary care center to predict operative time.²⁸ This illustrates the potential of using AI within a tertiary center – and possibly within physicians’ practice – to analyze care patterns and predict outcomes probabilities. This could help inform providers of the quality of their care for certain populations, allowing them to assess how to improve care critically. Such analytics would also be useful in assisting attending physicians in assessing orthopedic residents finding gaps within education/performance that should be addressed.

NLP has also been utilized for efficient and accurate analysis of patient feedback, imaging reports, surgical reports, and health documentation.^9,10 This is especially useful when considering many pieces of non-numerical health information that require more hands-on processing for interpretation can hold valuable information about patients, their conditions, management, etc.¹⁰

This technology can assist researchers in analyzing trends, risks, and outcomes when there is a need for further literature regarding, as exemplified by, the pandemic. Jungmann et al. utilized NLP to process radiology reports to assess trends in fracture frequency during the COVID-19 pandemic.²⁹ NLP algorithms provide increasingly efficient and accurate means to process EMR systems and large databases, helping discover findings that can build upon medical education and improve patient care.

There is a concern about health information privacy since large amounts of patient data must be processed.¹⁹ Quality of data must be criticized as well. Across medical practices, the quality and standards by which patient data is collected can vary greatly, especially due to limited time/resources for complete data collection.²⁵ This can introduce inaccuracies into the models that rely on this data, which may lack completeness and consistency for training. Hence, models must rely on complete data sets. Furthermore, institutions must ensure documentation meets quality standards. Due to the need for further validation of these models, providers should also be critical of the conclusions made.¹⁹ As highlighted in a review by Rodriguez-Merchan EC, there is especially a need for an improved methodology to improve external validity and the ability of AI algorithms to predict outcomes more accurately and expand the orthopedic procedures that would be applicable.³⁰

Due to the cost of developing NLP algorithms in-house, many institutions rely on open-access NLP algorithms. While this would make using NLP more accessible, the security of open-access algorithms further raises privacy concerns.¹⁰ Other concerns include cost of implementation, training, regulatory obstacles, biases in documentation causing inaccuracies, and applicability of an NLP system between data sets.^9,30

Clinical Adjunct

Within medical practice, AI has gained popularity because of its potential to improve the speed and quality of medical care. Numerous studies have explored how these tools can augment physicians’ performance, especially those early in their careers. Patients have also shown support for incorporating AI into clinical workflow, as shown in a study by Roberts et al. This study surveyed parents of pediatric patients with fractures regarding their opinions on integrating AI into patient care. In the survey, 56% believed more research is required to reduce wait times, 76% preferred a nurse or doctor to review their child’s radiographs, 65% were happy with an AI program diagnosing their child’s fracture, and 82% favored using AI as an adjunct.³¹ With growing coverage of AI’s development, patients are open to integrating the technology into their care.

AI-based clinical adjuncts have clear utility for improving diagnostic accuracy and speed of radiographic image interpretation. This can be important in cases where time-sensitive, accurate readings are crucial for quality medical care. A study by Zech et al. 2024 evaluated how implementing an AI adjunct to highlight suspected areas of pathology in upper extremity radiology images affected performance in MSK fellowship-trained radiology attendings, radiology residents, and pediatric residents. This demonstrated increased average speed of interpreting images across all three groups (38.9 s with AI vs 52.1 s without, p = 0.030) and increased area under receiver operator curve (AUC) with identifying fractures in radiology (0.768 without AI to 0.876 with AI, p < 0.001) and pediatric residents (0.706 without AI to 0.844 with AI, P = 0.093).³² In the face of increased use of imaging in clinical settings, AI-augmented analysis can help manage physician workload and stress.³³ This is especially useful in managing stress and improving residents’ performance as their interpretation time and AUC improved with an AI adjunct.³² Such improvements will positively contribute to physician wellness, performance, and patient outcomes. As discussed by Rodriguez-Merchan EC, tools such as BoneView have exemplified the utility of these AI adjuncts as they have decreased undetected fractures and reduced radiograph reading time.³⁰ These tools are especially useful because they can provide guidance to less experienced physicians while improving efficiency and patient safety.³⁰

Further studies have demonstrated utility with orthopaedic residents and early career attendings. A study by Liu et al. utilized CNN to create AI software trained to detect pediatric hip and periarticular infection on MRI images. This software was trained via MRI images annotated by radiologists and orthopedic surgeons and was able to outperform surgeons in terms of diagnosing abscesses and osteomyelitis with accuracies of 0.957 and 0.976, respectively (P < 0.05). Furthermore, it was able to classify infections and provide risk probabilities based on imaging, which can be useful for planning and discussing treatment options with patients.³⁴ A similar framework was utilized in a study by Hasei et al., where a CNN-based AI program was trained to detect, annotate, and classify osteosarcoma using X-ray imaging. This tool had a sensitivity of 95.52%, specificity of 96.21%, and AUC of .989. Notably, it had a higher sensitivity than older AI-based radiology tools, which is crucial for conditions like osteosarcoma, where time-sensitive diagnosis is important for treatment and prognosis.³⁵ This can be especially useful for less experienced surgeons or settings with reduced resources to diagnose and treat rarer orthopedic conditions. These tools can go beyond analyzing imaging, as seen in a study by Ghandour et al. where trained CNN was utilized to accurately analyze subtle photographic details captured by a smartphone camera to diagnose pes planus (sensitivity of 87% and specificity of 84%) and pes cavus (specificity of 97% and sensitivity of 70%).³⁶ Assessment with this model moderately correlated with radiographic Meary’s angle measurements (P < 0.05), emphasizing the utility of such accessible tools in pre-screening patients and helping guide conservative treatment for downstream pathology.³⁶

The shortcomings of these tools are apparent when looking at the characteristics of their training and performance. Liu et al. highlight that while their tool allowed for accurate annotation of infection on MRI compared to a surgeon, inflammatory manifestations at the time MRI images were captured could impact model training.³⁴ In addition, there can be a large variance in the appearance of abscesses and infection on MRI based on staging and soft tissue manifestations, limiting the types of lesions that the tool can detect.³⁴ Underrepresentation of different demographic groups, such as certain age groups, can limit the utility of these tools, illustrating the need to expand training data.³⁶ While these models could guide less experienced physicians with diagnosing and treating rarer conditions such as osteosarcoma, the rare condition can result in overfitting when training the AI model with possibly multiple images from the same patient.³⁵

There are also important discrepancies within studies of AI tools meant to be diagnostic adjuncts, as highlighted in a review by Husarek et al. Industry-funded studies of AI tools had higher sensitivity but lower specificity than non-industry-funded studies. It is also important to note that performance is not exceptional across anatomical regions.³³ Compared to analysis of other anatomical regions, stand-alone AI tools underperformed when assessing rib fractures (pooled sensitivity of .66) and spine fractures compared to others (pooled specificity of .63), further illustrating how performance is better when AI serves as an adjunct instead of a stand-alone tool.³³ Ultimately, while there are promising results of AI tools aiding clinicians – especially less experienced ones – further research is necessary to improve their applicability and validity.

CONCLUSION

AI creates new opportunities to streamline orthopedic surgery medical education. Implementing adaptive learning into medical didactics and surgical simulation has developed more efficient ways to assess and address gaps in knowledge. The analytical capability of AI has continued to improve opportunities to predict outcomes of patients, evaluate current surgical performance for different patient groups, and expand the knowledge of orthopedic surgery. Since many of these technologies are in their early stages, further testing and refinement are needed to improve security, accuracy, and validity. With the increased demands in education and growing time constraints, improving the quality and efficiency of medical education with AI will help alleviate such loads while improving patient safety and physician wellness.

Declaration of Conflict of Interest

The authors do NOT have any potential conflicts of interest related to the content presented in this manuscript.

Declaration of Funding

The authors received NO financial support for the preparation, research, authorship, and publication of this manuscript.

Declaration of Ethical Approval

Institutional Review Board approval was not required to produce this manuscript.

There is no information (names, initials, hospital identification numbers, or photographs/images) in the submitted manuscript that can be used to identify any patients.

Submitted: February 07, 2025 EDT

Accepted: May 03, 2025 EDT

References

Lu ST, Mc Colgan R, Nguyen J, Kelly BT, Fufa DT. Worsening Burnout in Orthopedic Surgeons Since 2019 and Key Areas of Work life Drivers. Hss Journal. Published online April 2024. doi:10.1177/15563316241242129