Skip to content

Algorithm to generate synthetic tabular data such as baseline clinical trial data.

License

Notifications You must be signed in to change notification settings

mdsol/Simulants

Repository files navigation

Simulants

In order to address the privacy concerns of patient data and to be able to disclose clinical trial data to other organizations, we have built a system that synthesizes patient data and cross-validates the synthetic data against the real data by running standard statistical techniques and machine learning algorithms. The code consists of a set of libraries used for loading sample data from the UCI reposirtory, preprocessing it and using it to synthesize a new set of patients.

A sample dataset is downloaded from the UCI Machine Learning Repository at: https://archive.ics.uci.edu/ml/datasets/Heart+Disease

Prerequisites

use python 3.8 or later

All the required packages are specified in requirements.txt.

pip install -r requirements.txt

Usage

  1. Modify uci_config.py or use it as it for using the sample dataset from uci heart disease

  2. python uci_demo.py

  3. the outputs ncluding the synthesized data and the results from cross-validation will be in output_uci/

Contributing

See CONTRIBUTING.

Contributors

Jacob Aptekar (Medidata Solutions)

Mandis Beigi (Medidata Solutions)

Pierre-Louis Bourlon (Medidata Solutions)

Jason Mezey (Cornell University)

Afrah Shafquat (Medidata Solutions)

Contact

See the factbook.

Contact

Mandis Beigi at AcornAI (Medidata Solutions Inc., a Dassault Systemes Company)

mandis.beigi@3ds.com

About

Algorithm to generate synthetic tabular data such as baseline clinical trial data.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •  

Languages