INTRODUCTION:This study aimed to evaluate the accuracy, reliability, and comprehensibility of information about surgically-assisted rapid palatal expansion provided by language models based on artificial intelligence (AI).
METHODS:A cross-sectional content analysis was conducted on the responses to surgically-assisted rapid palatal expansion-related questions by ChatGPT-4 (OpenAI LLC, San Francisco, Calif), Gemini (Alphabet Inc, Mountain View, Calif), and Copilot (Microsoft, Redmond, Wash). In total, 115 questions (categorized into 11 domains) were created by 3 orthodontists and 1 oral and maxillofacial surgeon. The accuracy of the answers generated by the AI language models was independently evaluated by the same experts via a 5-point Likert scale. To test the relationships among categorical variables, when the sample size assumption was met, the Pearson chi-square test was used. However, when the sample size assumption was not met, Fisher's exact test was applied. Analyses were performed in SPSS (version 27; IBM, Armonk, NY).
RESULTS:The responses of the AI types presented a general homogeneous distribution, with no statistically significant difference between the types of AI and the types of responses (P >0.05). Although there were no significant differences, ChatGPT-4 had the highest objectively true rate. In contrast, Gemini produced answers with more balanced accuracy, whereas Copilot had the highest number of false answers.
CONCLUSIONS:These findings reveal that the accuracy of AI-supported language models in providing medical information may vary according to subject matter.