Latent diffUsion for Multiplexed Images of Cells is a diffusion model pipeline developed to generate the high-content fluorescent microscopy images of different cell type and chemical compound interactions. LUMIC combines diffusion models with DINO (self-Distillation with NO labels), a vision-transformer based, self-supervised method that can be trained on images to learn feature embeddings, and HGraph2Graph, a hierarchical graph encoder-decoder to represent chemicals.
Code based on lucidrains' imagen and ddpm
Clone the repo and create the environment using
conda env create -f environment.yml
LUMIC Files
trainer.py
: contains the training functions to train LUMICdataset.py
: contains the dataloader to output the necessary images/embeddingsunet.py
: contains the Unet classes and building blocks for low-res and high-res diffusion modelsunet_1d.py
: contains the Unet classes and building blocks for 1d diffusion modelgaussian_diffusion_superes.py
: contains necessary DDPM function for high-res diffusion modelsgaussian_diffusion_1d.py
: contains necessary DDPM functions for 1d diffusion modelsgaussian_diffusion.py
: contains necessary DDPM functions for low-res diffusion models
DINO Files
vision_transormer.py
: contains necessary functions for DINOutils.py
: contains helper functions for DINO
Hgraph2Graph Files
config/hgraph2graph_config.yaml
: config used for Hgraph2Graph training and inferencehgraph/
: contains necessary functions for hgraphhgraph2graph/zinc_lincs_sciplex_smiles.txt
: SMILES (306 JUMP, 61 style transfer, and ~250k ZINC) used for training HGraph2Graph (wrong file)hgraph2graph/all_vocab_zinc.txt
: processing needed for HGraph2Graph (breaking down SMILES into Vocabulary)hgraph2graph.py
: contains the functions necessary to use/sample from HGraph2Graph
Model checkpoints are available on huggingface here.
Datasets (both the JUMP and Style Transfer) are pre-processed and are available on hugging face here.