Foresight: A National-Scale Generative AI Model
Foresight, conceived in 2023, initially utilized OpenAI’s GPT-3, the technology underpinning the first iteration of ChatGPT, and was trained on 1.5 million patient records from two London hospitals. University College London’s Chris Tomlinson and his team have since expanded Foresight, branding it the world’s first “national-scale generative AI model of health data.” This enhanced version leverages Meta’s open-source LLM Llama 2 and incorporates eight distinct datasets routinely amassed by the NHS in England from November 2018 to December 2023. These datasets encompass outpatient appointments, hospital admissions, vaccination records, and other health-related events, totaling 10 billion data points across 57 million individuals – essentially the entire population of England.
Despite the lack of publicly available performance metrics due to ongoing testing, Tomlinson asserts that Foresight could eventually facilitate individual diagnoses and predict broader health trends, such as hospitalizations or heart attacks. He emphasized the model’s potential to predict disease complications preemptively, enabling early intervention and a shift toward preventative healthcare at scale during a press conference on May 6.
Privacy and Data Protection Concerns
The prospect of feeding such extensive medical data into an AI model has ignited concerns about privacy. Although researchers claim that all records were “de-identified” before training the AI, the risk of re-identification through data pattern analysis remains a significant concern, especially with large datasets.
Luc Rocher of the University of Oxford highlights the inherent challenge of safeguarding patient privacy while building powerful generative AI models. The very data richness that makes the data valuable for AI purposes also makes it incredibly difficult to anonymize. Rocher advocates for strict NHS control over these models to ensure safe usage.
Michael Chapman of NHS Digital acknowledges the inherent risk of re-identification, even with de-identified data. While direct identifiers are removed, the richness of health data makes it difficult to guarantee complete anonymity.
To counter this risk, Chapman stated that the AI operates within a “secure” NHS data environment, restricting information leakage and ensuring access only to approved researchers. Amazon Web Services and Databricks provide computational infrastructure but cannot access the data.
Yves-Alexandre de Montjoye of Imperial College London suggests verifying a model’s ability to memorize training data to detect potential information leakage. When questioned by New Scientist, Tomlinson admitted that the Foresight team had not yet conducted these tests but planned to do so in the future.
Public Trust and Data Usage
Caroline Green of the University of Oxford emphasizes the importance of communicating data usage to the public to maintain trust. Despite anonymization efforts, people generally want to control their data and understand its destination, which makes them feel very strongly about the ethics of it.
Current controls offer limited options forindividuals to opt out of data usage by Foresight. Data from nationally collected NHS datasets is used to train the model, and existing opt-out mechanisms do not apply because the data has been “de-identified,” according to an NHS England spokesperson. However, individuals who have opted out of sharing data from their family doctor will not have their data included in the model.
GDPR and Data Anonymization
The General Data Protection Regulation (GDPR) mandates that individuals have the option to withdraw consent for the use of their personal data. However, the training process of LLMs like Foresight makes it impossible to remove a single record from the AI tool. The NHS England spokesperson asserts that GDPR does not apply because the data used to train the model is anonymized and does not constitute personal data.
The UK Information Commissioner’s Office’s website clarifies that “de-identified” data should not be used interchangeably with anonymous data, as UK data protection law does not define the term, and its use can lead to confusion.
The legal position is further complicated by Foresight’s current use for research related to COVID-19, which allows for exceptions to data protection laws enacted during the pandemic, according to Sam Smith of medConfidential. Smith asserts that the COVID-only AI likely contains embedded patient data that should not leave the lab, and patients should have control over their data usage.
Ethical Considerations
The ethical considerations surrounding the use of medical data for AI development place Foresight in a precarious position. Green argues that ethics and human considerations should be the starting point for AI development, rather than an afterthought.
Examining the Concerns More Deeply
The concerns surrounding Foresight’s use of NHS medical records extend beyond mere data privacy. They touch upon fundamental questions about the ownership of personal health information, the potential for algorithmic bias, and the long-term impact of AI on the doctor-patient relationship.
Ownership and Control of Health Data
One of the core ethical dilemmas is the extent to which individuals should have control over their own health data. While the NHS undoubtedly requires access to patient information to provide effective care, the use of this data for AI training raises questions about whether individuals are adequately informed about and empowered to consent to such secondary uses.
The current opt-out mechanisms are insufficient, as they do not fully address the complexities of AI training. The argument that de-identified data is no longer personal data under GDPR is a legal interpretation that overlooks the reality that even anonymized data can potentially be re-identified or used to draw inferences about individuals.
A more robust approach would involve implementing a system of informed consent that explicitly outlines how patient data may be used for AI research and development. This would require clear and accessible explanations of the potential benefits and risks of such uses, as well as providing individuals with a meaningful opportunity to opt-in or opt-out. Such a system could leverage existing NHS infrastructure and patient portals, ensuring that consent is obtained transparently and recorded securely. Furthermore, it would be beneficial to establish an independent ethics committee comprised of experts in data privacy, AI ethics, and patient advocacy to oversee the data usage practices and ensure adherence to ethical principles. This committee could regularly audit the AI model and its training data to identify and mitigate any potential risks to patient privacy and data security.
The concept of data trusts, where individuals collectively manage their data rights, could also offer a valuable pathway forward. These trusts could act as intermediaries between patients and AI developers, ensuring that patient data is used ethically and responsibly. They could also negotiate fair compensation for the use of patient data in AI research and development, ensuring that patients benefit from the insights derived from their data.
Algorithmic Bias
Another significant concern is the potential for algorithmic bias in AI models trained on large datasets. If the data used to train Foresight reflects existing health disparities, the model may perpetuate and even amplify these inequalities.
For example, if certain demographic groups are underrepresented in the dataset or if their medical conditions are misdiagnosed or undertreated, the AI may be less accurate in predicting disease or hospitalizations for these groups. This could lead to unequal access to healthcare resources and potentially exacerbate existing health inequities. Consider the scenario where the dataset primarily includes data from urban areas, potentially leading to underrepresentation of rural populations. As a result, the AI model may be less effective in predicting health outcomes for individuals residing in rural areas, potentially hindering their access to timely and appropriate healthcare services.
To mitigate the risk of algorithmic bias, it is essential to carefully analyze the data used to train Foresight and to identify and address any potential biases. This may involve oversampling underrepresented groups, correcting inaccuracies in the data, and developing algorithms that are specifically designed to be fair and equitable. The process of identifying and mitigating algorithmic bias requires a multi-faceted approach. First, a thorough analysis of the training data is necessary to identify any potential biases, such as underrepresentation of certain demographic groups or inaccuracies in the data. Second, techniques like data augmentation and re-weighting can be employed to address imbalances in the data and ensure that all groups are adequately represented. Third, the AI model itself can be designed to be fair and equitable, using techniques like fairness-aware machine learning algorithms. Fourth, the performance of the AI model should be rigorously evaluated across different demographic groups to identify any disparities in accuracy or predictive power. Finally, ongoing monitoring and evaluation are essential to ensure that the AI model continues to perform fairly and equitably over time.
Impact on the Doctor-Patient Relationship
The increasing use of AI in healthcare has the potential to alter the traditional doctor-patient relationship in profound ways. While AI can undoubtedly assist doctors in making more informed decisions, it is crucial to ensure that it does not replace the human element of care.
Patients need to feel confident that their doctors are using AI as a tool to enhance their clinical judgment, not as a substitute for it. The doctor-patient relationship should remain one of trust, empathy, and shared decision-making. If AI is perceived as a replacement for human interaction, it could erode patient trust and diminish the quality of care. For instance, patients may feel uncomfortable sharing sensitive information with a doctor who relies heavily on AI, fearing that their concerns may not be fully understood or addressed.
To safeguard the doctor-patient relationship, it is important to emphasize the importance of human interaction and communication in healthcare. Doctors should be trained to effectively communicate the role of AI in their decision-making process and to address any concerns that patients may have. Medical schools and residency programs should integrate training on the ethical and social implications of AI in healthcare, equipping future doctors with the skills and knowledge to navigate the changing landscape. Furthermore, hospitals and clinics should promote a culture of shared decision-making, encouraging doctors to involve patients in discussions about their care and to consider their preferences and values when making treatment decisions.
Regular surveys and feedback mechanisms can also be implemented to gather patient perspectives on the use of AI in healthcare, ensuring that their voices are heard and their concerns are addressed. This feedback can be used to improve the design and implementation of AI systems, making them more patient-centered and responsive to their needs.
Finding a Path Forward
Navigating the complex ethical and legal landscape surrounding AI in healthcare requires a multi-faceted approach.
- Transparency and Public Engagement: Openly communicate how patient data is used and engage the public in discussions about the ethical implications of AI in healthcare. This includes providing clear and accessible information about the purpose of AI models, the data they use, and the potential benefits and risks they pose. Public forums, town hall meetings, and online platforms can be used to facilitate dialogue and gather feedback from a diverse range of stakeholders.
- Strengthening Data Protection: Implement stricter data protection measures to minimize the risk of re-identification and ensure that individuals have greater control over their health data. This includes adopting advanced anonymization techniques, implementing robust access controls, and establishing independent oversight bodies to monitor data usage practices.
- Addressing Algorithmic Bias: Actively identify and mitigate algorithmic bias in AI models to ensure equitable access to healthcare for all. This requires careful analysis of the training data, the use of fairness-aware machine learning algorithms, and rigorous evaluation of model performance across different demographic groups.
- Prioritizing Human-Centered Care: Emphasize the importance of the doctor-patient relationship and ensure that AI is used as a tool to enhance, not replace, human interaction. This includes providing doctors with training on the ethical and social implications of AI, promoting a culture of shared decision-making, and soliciting patient feedback on the use of AI in healthcare.
By addressing these concerns, we can harness the transformative potential of AI in healthcare while safeguarding patient privacy, promoting equity, and preserving the human element of care. The future of healthcare hinges on our ability to navigate these challenges responsibly and ethically. Only then can we ensure that AI truly serves the best interests of patients and society as a whole. This necessitates ongoing research and development into privacy-preserving AI techniques, such as federated learning and differential privacy, which allow AI models to be trained on distributed datasets without compromising individual privacy. Furthermore, it requires a collaborative effort between policymakers, regulators, healthcare professionals, AI developers, and patient advocates to establish clear ethical guidelines and legal frameworks for the use of AI in healthcare. By working together, we can ensure that AI is used to create a more equitable, efficient, and patient-centered healthcare system for all.