Generative AI for Healthcare Privacy

Mandar Karhade, MD. PhD.
Towards AI
Published in
5 min readDec 9, 2022

--

Credits: Generative AI by DALLE 2

What is Generative AI

Generative AI is a type of artificial intelligence that involves the use of machine learning algorithms to generate new content based on a set of input data. This can include generating text, images, or other types of media.

Generative AI typically involves training a model on a large dataset of existing content, such as a corpus of text or a collection of images. The model then uses this training to generate new content that is similar to the input data but is not an exact copy.

For example, a generative AI model trained on a dataset of images of faces could be used to generate new, previously unseen images of faces. These generated images would have the same general characteristics as the images in the training dataset, such as the shape of the face and the placement of features like eyes and nose, but they would not be exact copies of any of the images in the dataset. Here are some hot websites where you can try out generative AI for art nightcafe, DALL-E 2, Deep Dream Generator, ArtBreeder, DeepAI.

One of the key advantages of generative AI is its ability to create new content without requiring human input. This can be useful in a variety of applications, such as creating new content for marketing or advertising, generating data for machine learning training, or even creating new works of art. Generative AI is an important and rapidly developing area of artificial intelligence that has the potential to revolutionize many fields.

Privacy regulation in Healthcare (HIPAA)

The Health Insurance Portability and Accountability Act of 1996 (HIPAA) is a federal law that requires the creation of national standards to protect sensitive patient health information from being disclosed without the patient’s consent or knowledge. The US Department of Health and Human Services (HHS) issued the HIPAA Privacy Rule to implement the requirements of HIPAA. The HIPAA Security Rule protects a subset of information covered by the Privacy Rule.

HIPAA also provides guidance on protecting patient privacy by setting standards for two types of patient De-Identification. Expert determination is the method by which an expert determines the statistical probability of patient identification to be “very small”. Safe Harbor removes 18 types of identifiers from patient records. These include very important information like dates and localities.

Source: HHS.gov

What is The problem with privacy?

The healthcare field stores data in structured and unstructured forms. We still use patient-note as the primary place to store healthcare information in a free-text form. These notes include all kinds of information that falls under HIPAA privacy rules as potentially identifiable information. Such information is difficult to be removed from text easily, which limits the utility of healthcare data in the research. Let us say we trained a large language model on healthcare data. GitHub CoPilot started generating verbatim codes from various coders. GitHub is already facing lawsuits just for replicating the code and IP over the training data. Now, imagine if a large language model on healthcare data started completing prompts using real patient names and symptoms! Yikes

Solution: Generative models to the rescue

The ideal solution for allowing the usage of large-scale healthcare data is keeping the information about the disease intact (needed for research), and not about the person (identifiable). What if we could replace identifiable information with fake information using the Generative model? It is the same concept as websites like thispersondoesnotexist.com for replacing real identifiable information. Ideally, it will keep all the information that is necessary for analyses to be conducted but obfuscate anything that is identifiable for a unique patient. This allows sharing of information without worrying about breaching the privacy of patients.

Syntegra was founded more than 3 years ago with the same vision. According to Syntegra, healthcare, in particular, is a vertical where generative AI can reduce the friction of data access, reduce physician burn-out and help automate manual and time-intensive tasks. Syntegra is working on developing an AI that can allow sharing of individual-level medical data in a way that maintains all of its statistical patterns and utility but guarantees patients’ privacy.

A frictionless, rapid, low-burden access to healthcare data opens up huge opportunities for researchers, life science companies, insurance providers, and digital health companies. If successful, it is likely to drive innovation in precision medicine, analytics, and clinical decision support and ultimately accelerate advances in patient care.

Photo by Ryoji Iwata on Unsplash

Syntegra releases Medical Mind 2.0

On Nov 30th, 2022, Syntegra announced the release of Medical Mind 2.0. The training dataset for Syntegra included 20M+ patient records. This proprietary AI is capable of generating not real but realistic, high-fidelity, and privacy-guaranteed synthetic healthcare data.

Closing thoughts

Given the investments in AI and the breakthroughs this year with ChatGPT, DALL-E, and other large AI models, it is plausible that the healthcare field will get a break from the AI winter! Being a clinical scientist, I am certainly looking forward to it.

For more reading:

Kasthurirathne SN, Dexter G, Grannis SJ. Generative Adversarial Networks for Creating Synthetic Free-Text Medical Data: A Proposal for Collaborative Research and Re-use of Machine Learning Models. AMIA Jt Summits Transl Sci Proc. 2021 May 17;2021:335–344. PMID: 34457148; PMCID: PMC8378601.

--

--