Generative AI for Healthcare Privacy

Published in

Towards AI

5 min readDec 9, 2022

What is Generative AI

Generative AI is a type of artificial intelligence that involves the use of machine learning algorithms to generate new content based on a set of input data. This can include generating text, images, or other types of media.

Generative AI typically involves training a model on a large dataset of existing content, such as a corpus of text or a collection of images. The model then uses this training to generate new content that is similar to the input data but is not an exact copy.

For example, a generative AI model trained on a dataset of images of faces could be used to generate new, previously unseen images of faces. These generated images would have the same general characteristics as the images in the training dataset, such as the shape of the face and the placement of features like eyes and nose, but they would not be exact copies of any of the images in the dataset. Here are some hot websites where you can try out generative AI for art nightcafe, DALL-E 2, Deep Dream Generator, ArtBreeder, DeepAI.

One of the key advantages of generative AI is its ability to create new content without requiring human input. This can be useful in a variety of applications, such as creating new content for marketing or advertising, generating data for machine learning training, or even creating new works of art. Generative AI is an important and rapidly developing area of artificial intelligence that has the potential to revolutionize many fields.

Privacy regulation in Healthcare (HIPAA)

The Health Insurance Portability and Accountability Act of 1996 (HIPAA) is a federal law that requires the creation of national standards to protect sensitive patient health information from being disclosed without the patient’s consent or knowledge. The US Department of Health and Human Services (HHS) issued the HIPAA Privacy Rule to implement the requirements of HIPAA. The HIPAA Security Rule protects a subset of information covered by the Privacy Rule.

HIPAA also provides guidance on protecting patient privacy by setting standards for two types of patient De-Identification. Expert determination is the method by which an expert determines the statistical probability of patient identification to be “very small”. Safe Harbor removes 18 types of identifiers from patient records. These include very important information like dates and localities.

What is The problem with privacy?

The healthcare field stores data in structured and unstructured forms. We still use patient-note as the primary place to store healthcare information in a free-text form. These notes include all kinds of information that falls under HIPAA privacy rules as potentially identifiable information. Such information is difficult to be removed from text easily, which limits the utility of healthcare data in the research. Let us say we trained a large language model on healthcare data. GitHub CoPilot started generating verbatim codes from various coders. GitHub is already facing lawsuits just for replicating the code and IP over the training data. Now, imagine if a large language model on healthcare data started completing prompts using real patient names and symptoms! Yikes

GitHub CoPilot Hit With 2nd Lawsuit

2nd class action lawsuit has been filed on 10th November

ithinkbot.com

Solution: Generative models to the rescue

The ideal solution for allowing the usage of large-scale healthcare data is keeping the information about the disease intact (needed for research), and not about the person (identifiable). What if we could replace identifiable information with fake information using the Generative model? It is the same concept as websites like thispersondoesnotexist.com for replacing real identifiable information. Ideally, it will keep all the information that is necessary for analyses to be conducted but obfuscate anything that is identifiable for a unique patient. This allows sharing of information without worrying about breaching the privacy of patients.

Syntegra was founded more than 3 years ago with the same vision. According to Syntegra, healthcare, in particular, is a vertical where generative AI can reduce the friction of data access, reduce physician burn-out and help automate manual and time-intensive tasks. Syntegra is working on developing an AI that can allow sharing of individual-level medical data in a way that maintains all of its statistical patterns and utility but guarantees patients’ privacy.

A frictionless, rapid, low-burden access to healthcare data opens up huge opportunities for researchers, life science companies, insurance providers, and digital health companies. If successful, it is likely to drive innovation in precision medicine, analytics, and clinical decision support and ultimately accelerate advances in patient care.

Syntegra releases Medical Mind 2.0

On Nov 30th, 2022, Syntegra announced the release of Medical Mind 2.0. The training dataset for Syntegra included 20M+ patient records. This proprietary AI is capable of generating not real but realistic, high-fidelity, and privacy-guaranteed synthetic healthcare data.

Closing thoughts

Given the investments in AI and the breakthroughs this year with ChatGPT, DALL-E, and other large AI models, it is plausible that the healthcare field will get a break from the AI winter! Being a clinical scientist, I am certainly looking forward to it.

For more reading:

Kasthurirathne SN, Dexter G, Grannis SJ. Generative Adversarial Networks for Creating Synthetic Free-Text Medical Data: A Proposal for Collaborative Research and Re-use of Machine Learning Models. AMIA Jt Summits Transl Sci Proc. 2021 May 17;2021:335–344. PMID: 34457148; PMCID: PMC8378601.

Support me by 🔔 clap | follow | Subscribe | become a member 🔔

Checkout my other works —

Leadership in AI: Is Your Leadership Fit for Data Science?

Non-technical people leadership that may have transitioned into data science leadership is often unfamiliar with the…

pub.towardsai.net

The Art Of Negotiation: CICERO AI

CICERO AI can negotiate in the game of Diplomacy better than humans. Just like The deep blue for Chess, OpenAI five for…

pub.towardsai.net

What is GPT-4 (and when?)

GPT-4 is a natural language processing model produced by openAI as a successor to GPT-3

pub.towardsai.net

Generative AI for Healthcare Privacy

What is Generative AI

Privacy regulation in Healthcare (HIPAA)

What is The problem with privacy?

GitHub CoPilot Hit With 2nd Lawsuit

2nd class action lawsuit has been filed on 10th November

Solution: Generative models to the rescue

Syntegra releases Medical Mind 2.0

Closing thoughts

Leadership in AI: Is Your Leadership Fit for Data Science?

Non-technical people leadership that may have transitioned into data science leadership is often unfamiliar with the…

The Art Of Negotiation: CICERO AI

CICERO AI can negotiate in the game of Diplomacy better than humans. Just like The deep blue for Chess, OpenAI five for…

What is GPT-4 (and when?)

GPT-4 is a natural language processing model produced by openAI as a successor to GPT-3

Written by Mandar Karhade, MD. PhD.