喵ID:c1iTjT

Assessing ChatGPT's Responses to Otolaryngology Patient Questions.
Assessing ChatGPT's Responses to Otolaryngology Patient Questions.

评估 ChatGPT 对耳鼻喉科患者问题的答复。

基本信息

DOI:
10.1177/00034894241249621
10.1177/00034894241249621
发表时间:
2024
2024
期刊:
The Annals of otology, rhinology, and laryngology
The Annals of otology, rhinology, and laryngology
影响因子:
--
--
通讯作者:
Jessica R. Levi
Jessica R. Levi
中科院分区:
文献类型:
--
--
作者: Jonathan M. Carnino;William R. Pellegrini;Megan Willis;Michael B. Cohen;Marianella Paz;Elizabeth M Davis;Gregory A. Grillone;Jessica R. Levi
研究方向: --
MeSH主题词: --
关键词: --
来源链接:pubmed详情页地址

文献摘要

OBJECTIVE This study aims to evaluate ChatGPT's performance in addressing real-world otolaryngology patient questions, focusing on accuracy, comprehensiveness, and patient safety, to assess its suitability for integration into healthcare. METHODS A cross-sectional study was conducted using patient questions from the public online forum Reddit's r/AskDocs, where medical advice is sought from healthcare professionals. Patient questions were input into ChatGPT (GPT-3.5), and responses were reviewed by 5 board-certified otolaryngologists. The evaluation criteria included difficulty, accuracy, comprehensiveness, and bedside manner/empathy. Statistical analysis explored the relationship between patient question characteristics and ChatGPT response scores. Potentially dangerous responses were also identified. RESULTS Patient questions averaged 224.93 words, while ChatGPT responses were longer at 414.93 words. The accuracy scores for ChatGPT responses were 3.76/5, comprehensiveness scores were 3.59/5, and bedside manner/empathy scores were 4.28/5. Longer patient questions did not correlate with higher response ratings. However, longer ChatGPT responses scored higher in bedside manner/empathy. Higher question difficulty correlated with lower comprehensiveness. Five responses were flagged as potentially dangerous. CONCLUSION While ChatGPT exhibits promise in addressing otolaryngology patient questions, this study demonstrates its limitations, particularly in accuracy and comprehensiveness. The identification of potentially dangerous responses underscores the need for a cautious approach to AI in medical advice. Responsible integration of AI into healthcare necessitates thorough assessments of model performance and ethical considerations for patient safety.
目的 本研究旨在评估ChatGPT在解答现实世界耳鼻喉科患者问题方面的表现,重点关注准确性、全面性和患者安全,以评估其融入医疗保健领域的适用性。 方法 利用来自公共在线论坛Reddit的r/AskDocs板块(人们在此向医疗专业人员寻求医疗建议)上的患者问题进行了一项横断面研究。将患者问题输入ChatGPT(GPT - 3.5),并由5名获得执业资格的耳鼻喉科医生对回复进行评审。评估标准包括难度、准确性、全面性以及临床态度/同理心。统计分析探究了患者问题特征与ChatGPT回复评分之间的关系,同时还识别出了可能存在危险的回复。 结果 患者问题平均字数为224.93个,而ChatGPT的回复更长,为414.93个单词。ChatGPT回复的准确性得分为3.76/5,全面性得分为3.59/5,临床态度/同理心得分为4.28/5。较长的患者问题与更高的回复评分并无关联。然而,较长的ChatGPT回复在临床态度/同理心方面得分更高。问题难度越高,全面性得分越低。有5条回复被标记为可能存在危险。 结论 虽然ChatGPT在解答耳鼻喉科患者问题方面显示出一定的潜力,但本研究也揭示了其局限性,特别是在准确性和全面性方面。对可能存在危险回复的识别凸显了在医疗建议中对人工智能采取谨慎态度的必要性。将人工智能负责任地融入医疗保健领域需要对模型性能进行全面评估,并考虑患者安全方面的伦理问题。
参考文献(2)
被引文献(0)
ChatGPT vs. web search for patient questions: what does ChatGPT do better?
ChatGPT vs. web search for patient questions: what does ChatGPT do better?
ChatGPT 与针对患者问题的网络搜索:ChatGPT 在哪些方面做得更好?
ChatGPT 与针对患者问题的网络搜索:ChatGPT 在哪些方面做得更好?
Comparing Physician and Artificial Intelligence Chatbot Responses to Patient Questions Posted to a Public Social Media Forum
Comparing Physician and Artificial Intelligence Chatbot Responses to Patient Questions Posted to a Public Social Media Forum
DOI:
10.1001/jamainternmed.2023.1838
10.1001/jamainternmed.2023.1838
发表时间:
2023-04-28
2023-04-28
影响因子:
39
39
作者:
Ayers, John W.;Poliak, Adam;Smith, Davey M.
Ayers, John W.;Poliak, Adam;Smith, Davey M.
通讯作者:
Smith, Davey M.
Smith, Davey M.
共 2 条
  • 1
前往