Introduction
Artificial intelligence (AI) language models such as Chat generative pre-trained transformer (ChatGPT) have experienced a recent rise in popularity. ChatGPT, launched in fall 2022, is a conversational tool that can compose, analyze, and present information to its users.1 Its versatility has allowed it to be utilized in various industries, including medicine. Within clinical practice, ChatGPT has been found to assist in generating accurate differential diagnoses, answering questions regarding medical conditions and queries, and creating patient letters, reports, and discharge summaries.2
Several surgical specialties have found the use of ChatGPT and other forms of artificial intelligence to be useful yet limited in clinical practice. A study done on the application of ChatGPT in clinical neurosurgical care stated that this tool was able to present information to patients regarding certain diagnoses; however, it could not complete more “human” aspects of care, such as assessment of emotional state and providing emotional support, such as using a physical exam to decipher diagnoses.3 However, this does present its potential viability to provide patient information sheets regarding a specific diagnosis. Within plastic and reconstructive surgery, ChatGPT’s limitations were similar to those found in neurosurgical studies. However, they found that the use of this tool did allow for increased productivity for plastic surgeons with regard to research and manuscript preparation, healthcare communication for patients, and drafting teaching sessions.4
There is little literature on the ability of ChatGPT to generate and present clinical information that patients can utilize regarding their medical conditions. However, within orthopaedics alone, the use of other forms of artificial intelligence has increased by a factor of 10, showing the viability of the use of a tool like ChatGPT in clinical practice.5 Studies have shown that artificial intelligence tools can be used to assist in the evaluation of diagnostic images, assessing injury risk, and other factors.5 Nevertheless, research on ChatGPT in orthopaedics has shown that on the current in-training exam for orthopaedic residents, it answered 190 out of 360 questions correctly, correlating to a post-graduate year one (PGY-1) level of knowledge.6 Thus, this tool may be useful in providing patient information rather than assisting clinicians in decision-making.
The objective of this study is to assess whether ChatGPT can generate patient data sheets that are comprehensive (also referred to as “readability”), understandable, and presented well for the average patient in the United States with respect to hand pathologies. The American Medical Association (AMA) recommends presenting medical information to patients at the sixth-grade reading level.7 We aim to compare these data sheets presented by ChatGPT to widely accepted patient information sheets from the American Academy of Orthopaedic Surgeons (AAOS), the American Association of Hand Surgery (AAHS), and the American Society for Surgery of the Hand (ASSH). We hypothesize that this tool will be able to present patient information in a more concise and readable manner compared to that of those written by each surgical society.
Methods
Patient information sheets related to common hand pathologies were identified through the AAOS, AAHS, and ASSH websites in June 2023.8–10 All entries used were pathologies limited to the hand and wrist. ChatGPT version 3.5 was queried to generate patient information sheets on the same hand pathologies at the sixth-grade reading level. The following prompt was used to develop patient information sheets: “Create a patient information sheet about [hand pathology] for the 6th-grade reading level for the average United States (US) patient.” The articles from AAOS, AAHS, and ASSH, and text generated by ChatGPT, were copied in plain text and analyzed by a free-to-use readability website, WebFX.11 Readability was calculated using the following readability scoring tests: Flesch Reading Ease, Automated Readability Index, Flesch Kincaid Grade Level, Gunning Fog Score, Simple Measure of Gobbledygook (SMOG) Index, and Coleman-Liau Index.
The Flesch Reading Ease is scored from 0 to 100, with higher scores correlating with easier readability.12 A score of 0 to 30 and 30 to 50 corresponds with readable works that are scientific or academic in quality, respectively.13 A score of 80 to 90 corresponds with a 6th-grade reading level. The Automated Readability Index is the opposite in that a higher score has a lower readability.14 The remaining four scoring calculations are associated with the grade level needed to understand the text, where higher scores also have a lower readability.15–19
After filtering out articles in another language and pathologies not specific to the hand and wrist, we analyzed 28 articles from AAOS, 11 from AAHS, and 47 from ASSH. We asked ChatGPT to generate patient information sheets at the 6th-grade reading level using the same pathologies from each article in June 2023. The six readability scores were calculated for each article, and those generated from ChatGPT. Additionally, data on the number of sentences, words, complex words, percentage of complex words, average words per sentence, and average syllables per word were also collected.
A comparison of the readability between each article and those generated by ChatGPT was made using a paired two-tailed t-test. Statistical significance was defined as P < 0.05 for all tests.
Results
Statistical analysis demonstrated that AAOS patient information sheets were associated with significantly higher Flesch Reading Ease, sentences, words, and complex words. Chat GPT was associated with significantly higher scores for Flesch Kincaid Grade Level, Gunning Fog Score, SMOG Index, Coleman Liau Index, Automated Readability Index, percent of complex words, and average words per sentence [Table 1].
AAHS patient information sheets had significantly higher Flesch Reading Ease and average words per sentence. In contrast, ChatGPT had significantly higher Coleman-Liau Index scores, percent of complex words, average words per syllable, and average syllables per word [Table 2].
ASSH sheets had significantly higher Flesch Reading Ease, number of sentences, and number of words. Chat GPT was associated with significantly higher scores for Flesch Kincaid Grade Level, Gunning Fog Score, SMOG Index, Coleman Liau Index, Automated Readability Index, percent of complex words, average words per sentence, and average syllables per word [Table 3].
Discussion
ChatGPT is a novel AI system that has been taking the world by storm and can influence a patient’s perception of their pathology. Patients are more likely to be familiar with ChatGPT compared to patient information sheets from AAOS, AAHS, or ASSH. Therefore, it is imperative to be aware of the overall level of ease to read patient information sheets found on these academic sites and the ability to comprehend them. Based on the results, patient information sheets from AAOS and ASSH are significantly easier to read on most metrics. Those from AAHS are no different from those generated from ChatGPT, except for increased readability as measured by the Flesch Reading Ease and Coleman-Liau Index. It should be noted that information sheets generated by ChatGPT have a significantly smaller number of words and complex words compared to those from AAOS and ASSH, suggesting that the information is more concise on ChatGPT.
Previous studies have shown that orthopedic patient material from the AAOS and ASSH has been difficult to read, with Flesch-Kincaid Reading Grade Level scores ranging from the 8th to 10th grade reading levels.15,20 Our study has shown that this material has been made easier to read over time, with a grade level score of around the 7th grade. Online educational material from institutions with both plastic and orthopedic surgery training programs is presented at the 11th-grade reading level, which is more difficult than those generated by ChatGPT and from the AAOS and ASSH.21
In terms of limitations and future directions, ChatGPT was instructed to create patient education material at the 6th-grade reading level. However, the Flesch-Kincaid Grade Level, on average, was at around an 8th to 9th grade reading level. Additionally, the Flesch Reading Ease of ChatGPT-generated material was in the 50s, which indicates that they are slightly easier to read than those that are academic in quality, instead of a score of 80 to 90, which corresponds with a 6th-grade reading level. One study has been able to train ChatGPT to convert orthopedic educational material to a lower grade level.22
A metric to test medical validity should be measured between the two groups to compare the overall quality of the information. Other studies have used the Ensuring Quality Information for Patients tool; however, it has not been validated for AI chatbot use.23 This study should be expanded to other subspecialties in orthopedics to investigate potential variations in pathologies.24,25 Furthermore, as ChatGPT is evolving as an AI, a repeat study in years to come should be conducted to see how much has changed since its inception.
Conclusion
While ChatGPT demonstrates the potential to generate concise patient education materials, its current outputs do not meet the recommended sixth-grade reading level set by the AMA. Compared to established hand surgery societies like AAOS and ASSH, ChatGPT-generated materials tend to be less readable across several standardized metrics, despite having fewer words and complex terms. These findings highlight both the promise and the limitations of using AI-generated content in patient education, particularly within hand surgery. As the technology continues to evolve, future efforts should focus on refining prompts, improving language model training, and incorporating validated tools to ensure both readability and medical accuracy. Expanding this research to include other orthopedic subspecialties and repeating the study as AI tools improve will provide further insight into the utility and progression of ChatGPT in clinical patient communication.
Declaration of conflict of interest
Teren Yedikian (No), Amanda Azer (No), Hershil Patel (No), Ariana Shaari (No), Brian Molokwu (No), Kush Modi (No), Irfan Ahmed (No), Michael Vosbikian (Honorarium for being the Section Head for Hand and Wrist for JBJS Clinical Classroom, Honorarium for being course faculty for the Medartis Pre-Hand Fellowship Primer, Editorial Board for ePlasty journal, and SurgiColl)
Declaration of funding
The authors received NO financial support for the preparation, research, authorship, and publication of this manuscript.
Declaration of ethical approval for study
exempt from IRB as no human data were collected
Declaration of informed consent
Not applicable
