A comprehensive analysis of ethical issues surrounding large language models in healthcare and medicine.

Abstract

The rapid integration of large language models (LLMs) into healthcare presents opportunities for enhancing diagnosis, treatment planning, and patient engagement. However, it also surfaces serious ethical challenges, many of which remain inadequately addressed. This review analyzed 27 peer-reviewed studies published between 2017 and 2025 across four major open-access databases under strict eligibility criteria, robust synthesis methods, and established guidelines to explore the ethical implications of deploying LLMs in clinical settings. We highlight four key aspects: the prominent ethical issues identified, the commonly employed model architectures in these analyses, the healthcare application domains that attract the most scrutiny, and publication and bibliographic trends within this literature. Our synthesis reveals that bias and fairness (25.9%) were the most frequently discussed ethical concerns, closely followed by safety, reliability, transparency, accountability, and privacy. Notably, the GPT family of models was predominant (51.8%) among those analyzed. Although significant attention was given to privacy protection and bias mitigation, no review comprehensively covered the full spectrum of ethical issues associated with LLM deployment in healthcare. The studies typically focused on specific clinical subdomains and lacked rigorous methodology. This systematic mapping of open-access literature not only identifies prevalent ethical patterns but also pinpoints challenges, outlines future research directions, and includes a provisional ethical integration framework to guide stakeholders in responsibly incorporating LLMs into clinical practices.

Introduction

Artificial Intelligence (AI) aims to imbue computer systems with cognitive capabilities akin to those found in humans, covering tasks such as perception, reasoning, and decision-making. A subset of AI, deep learning, has made significant inroads with the advent of transformer architectures that utilize self-attention mechanisms for efficient parallel processing, allowing for the analysis of large sets of sequential data.

The recent rise of large language models (LLMs) capitalizes on transformers and involves pre-training on extensive textual corpora before optimization for specific tasks. Noteworthy examples of these models include:

Claude from Anthropic
Google’s Bard/Gemini
Meta’s LLaMA family
OpenAI’s GPT series, which includes GPT-3.5 and GPT-4

These models demonstrate remarkable capabilities, such as generating coherent text, summarizing documents, and engaging in multilingual conversations. Their emerging applications within healthcare, especially in clinical decision support and patient interactions, signify a paradigm shift in accessing and utilizing medical knowledge. However, leveraging LLMs in clinical contexts also surfaces critical ethical concerns that cannot be overlooked. Issues such as biases in training datasets can foster unfair outcomes, while the “black-box” nature of LLMs complicates the decision-making process, potentially endangering patient safety.

Motivation for the Review

This review responds to the mounting use of LLMs in healthcare and acknowledges the inadequacies of existing literature. Previous studies often reveal inconsistency in evaluation methods, underrepresentation of medical-domain LLMs, and limited ethical analysis, particularly concerning non-binary identities and regulatory frameworks. Hence, there is an evident demand for a rigorously conducted synthesis that explores the ethical challenges and governance implications of LLM deployment in diverse healthcare scenarios.

We conducted a systematic literature review according to the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines, ensuring methodological rigor. Here’s an overview of the phases involved in the review:

Preliminary Study: Defined research questions and identified relevant search terms.
Screening Process: Analyzed 316 records retrieved from major databases, narrowing down to relevant studies.
Eligibility and Quality Assessment: Evaluated the selected studies based on predefined criteria, resulting in 27 primary studies focusing on ethical considerations related to LLMs in healthcare.
Data Extraction and Compilation: Gathered bibliographic information and ethical variables for comparative analysis.

Objectives of the Study

The primary objectives stem from our interest in identifying the main ethical issues associated with LLM deployment in healthcare, surveying the prevalent model architectures, mapping the most scrutinized healthcare applications, and examining publication and bibliographic trends within the existing literature.

Overview of Large Language Models

Evolution of LLMs

Large Language Models (LLMs) are advanced neural networks trained on vast amounts of textual data, enabling them to understand and produce human-like language. The period between 2017 and 2025 saw significant advancements in LLM development, starting with the Transformer architecture, followed by models like BERT and GPT.

Generative Pre-Trained Transformer (GPT) Models:
- GPT models like GPT-3.5 and GPT-4 are designed for a variety of applications in healthcare, including clinical note drafting and decision-support tools.
Bidirectional Encoder Representations from Transformers (BERT) Models:
- BERT and its variants, like BioBERT, excel at tasks such as entity extraction but can inherit biases, complicating ethical deployments.
LLaMA (Large Language Model Meta AI):
- These models facilitate community fine-tuning, allowing varied licensing while raising concerns about accountability due to their open-source nature.

Ethical Considerations in Using LLMs in Healthcare

Numerous ethical concerns resonate in the context of using LLMs in healthcare, which we categorize into several core areas:

1. Safety and Reliability

Ensuring that LLM outputs lead to safe and reliable outcomes is paramount, given their potential integration into clinical workflows. Various studies emphasize that reliance on LLM-generated outputs could jeopardize patient safety due to possible “hallucinations” — seemingly logical yet incorrect suggestions by the AI.

2. Privacy and Security

Data privacy is a significant ethical concern with LLMs. Past studies have emphasized the need for stringent privacy controls during training and deployment, ensuring patient data is protected under frameworks like HIPAA and the GDPR.

3. Bias and Fairness

Bias and fairness are often highlighted as major ethical challenges, affecting demographic groups through LLM decisions. Evaluations show that model biases can adversely influence healthcare accessibility and quality, requiring targeted debiasing measures.

4. Transparency and Explainability

The opacity of LLM decision-making is an ethical red flag, as it makes it difficult for clinicians to validate AI-supported conclusions. Transparency is vital for fostering trust among patients and healthcare providers.

5. Accountability and Legal Issues

Determining accountability in the face of LLM-driven decisions poses challenges. Legal frameworks have yet to establish clear guidelines regarding responsibility for adverse outcomes stemming from AI recommendations.

6. Misinformation and Trust

Misinformation generated by LLMs carries significant risks. Studies have shown that unregulated outputs may contribute to an “AI-driven infodemic,” affecting patient safety and trust in healthcare systems.

Methodology Summary

This review adheres to the PRISMA and Kitchenham guidelines to maintain methodological rigor. Through a systematic four-phase process that incorporates comprehensive keyword identification and stringent selection criteria, the following key aspects were distilled:

Ethical Issues: Major ethical concerns gathered from the literature.
Prevalent Model Frameworks: Analysis of the LLM architectures commonly discussed within the studies.
Application Domains: Areas of special focus demonstrating the practical impact of LLMs in clinical settings.
Publication Patterns: Insights into publication trends and characteristics within the corpus of literature.

Results and Discussion

Main Ethical Issues Highlighted

Research findings identify several recurring ethical themes:

Bias and Fairness: Predominant discussions focus on unfair treatment across demographic groups, underscoring a need for rigorous auditing and mitigation strategies.
Safety and Reliability: Concerns regarding the reliability of LLM outputs point to the necessity of validation processes before clinical integration.
Transparency and Explainability: Several studies reveal challenges in conveying AI logic to clinicians, signaling the need for robust explainability frameworks.
Accountability and Legal Concerns: The legal repercussions of LLM-driven decisions warrant urgent discussions surrounding regulatory compliance and governance.
Privacy and Security: Safeguarding user data throughout model training and application remains a critical focus area.

Prevalent Models

Among the reviewed studies, models from the GPT family appear most frequently, pointing to a focused evaluation of widely adopted generative frameworks, but it also reveals a broader interest in alternative architectures.

Application Domains Scrutinized

The analysis shows intense scrutiny in clinical decision support, mental health interventions, and patient engagement tools, suggesting a concentrated effort to ensure ethical LLM integration in these key areas.

Publication Trends

An upward surge in publications from 2020 to the present indicates growing interest in the ethical implications of LLMs in healthcare, with notable contributions from various publishing platforms.

In summary, the exploration of ethical considerations surrounding LLMs in healthcare reveals both promising advancements in the potential applications of AI technologies and significant challenges that require ongoing attention. This synthesized review not only illuminates existing gaps but also proposes an ethical integration framework to aid stakeholders in navigating this complex landscape responsibly.