Our Solution

Synthetic electronic health record generation

Our synthetic data solution is based on our cutting-edge machine learning in healthcare research.

We produce high-quality, high-dimensional longitudinal electronic health records based on your patient records.
We create a customized machine learning generator for your patient database.
We enable targeted patient generation for specific conditions of interest.
We verify the synthetic dataset through comprehensive fidelity and privacy tests.

Clinical predictive modeling

Synthetic electronic health record generation

Our predictive modeling solution is based on comprehensive research in machine learning.

We create clinical predictive models for our clients on their real patient data.
We augment the clinical predictive model training using synthetic patient data to boost performance.
We deploy clinical predictive models into practice.

Multi-modal data generation

Synthetic electronic health record generation

Multi-modal data generation

Medisyn provides multi-modal synthetic patient data for supporting clinical predictive modeling.

Details can be found at our presentation and demo-video

Fidelity of our synthetic data

Similar data statistics as real patient data

Real electronic health records (EHRs) are high-dimensional, including diagnoses (ICD codes), procedures (CPT codes), and medications. Altogether, over 20K dimensions need to be modeled and synthesized. Most existing synthetic generator solutions cannot produce such high-dimensional data. Instead, they often require users to specify a handful of variables of interest from a vast number of features in the real data. Those generators will only generate those few variables (usually in the order of tens). In comparison, MediSyn can produce high-dimensional EHRs in their original resolution with high fidelity.

Each dot corresponds to a single medical code (ICD or CPT code). High R^2 indicates high fidelity.

High correlation within a visit

Support machine learning modeling

High correlation within a visit

MediSyn can capture the co-occurrence patterns of medical codes within a visit. The correlation of prevalence between medical code pairs is very high, despite the fact that we have to model over 5 million code pairs.

High correlation across visits

Support machine learning modeling

High correlation within a visit

MediSyn generates realistic longitudinal patient records of multiple visits over time. The temporal correlation of medical codes is accurately captured. Each dot in the plot indicates a pair of medical codes that occurs in consecutive visits. The x-axis is the prevalence of this pair in real data, while the y-axis corresponds to that in synthetic data.

Support machine learning modeling

Our synthetic data can support machine learning modeling:

ML models using our synthetic data perform almost as well as models on real data.
Other synthetic data struggle to support ML models.

Privacy Preservation of real patients

MediSyn protects patient privacy

Our synthetic patient data are not mapped to any specific real patient. Furthermore, we thoroughly test the all synthetic data with privacy attacks to ensure the privacy preservation of real patients.

Membership attacks used in our validation

Membership attack is about discovering the identities of real patients in the training data. We introduce two versions of membership attacks.

Dataset attack: Attackers model the synthetic data directly.
Model attack: Attackers have access to MediShn synthetic data generator directly.

Attackers failed to recover patient identity

Our experiment results show that attackers are unable to identify the real patients in the training data. Their attack success probability is close to random guesses (close to 0.5) in all settings.

Our Solution

Synthetic electronic health record generation

Synthetic electronic health record generation

Synthetic electronic health record generation

Clinical predictive modeling

Synthetic electronic health record generation

Synthetic electronic health record generation

Multi-modal data generation

Synthetic electronic health record generation

Multi-modal data generation

MediSyn Capabilities

Product capability

Research capability

Find out more about our solution

Fidelity of our synthetic data

Similar data statistics as real patient data

High correlation within a visit

Support machine learning modeling

High correlation within a visit

High correlation across visits

Support machine learning modeling

High correlation within a visit

Support machine learning modeling

Support machine learning modeling

Support machine learning modeling

Privacy Preservation of real patients

MediSyn protects patient privacy

Membership attacks used in our validation

Attackers failed to recover patient identity

Our Solution

Synthetic electronic health record generation

Synthetic electronic health record generation

Synthetic electronic health record generation

Clinical predictive modeling

Synthetic electronic health record generation

Synthetic electronic health record generation

Multi-modal data generation

Synthetic electronic health record generation

Multi-modal data generation

MediSyn Capabilities

Product capability

Research capability

Find out more about our solution

Fidelity of our synthetic data

Similar data statistics as real patient data

High correlation within a visit

Support machine learning modeling

High correlation within a visit

High correlation across visits

Support machine learning modeling

High correlation within a visit

Support machine learning modeling

Support machine learning modeling

Support machine learning modeling

Privacy Preservation of real patients

MediSyn protects patient privacy

Membership attacks used in our validation

Attackers failed to recover patient identity

This website uses cookies.