This neural-network can predict the effector/memory T cell (T lymphocyte) subset of single cells based on flow cytometry data. The model was trained to predict if each flow cytometry event is one of the following: CD4+ Tcm, CD4+ Tem, CD4+ Temra, CD4+ Th0, CD8+ Tcm, CD8+ Tem, CD8+ Temra, CD8+ Th0, a non CD4+/CD8+ living cell or cell debris. The model was found to predict the test dataset with 99,22% accuracy. This means that out of 1,361,486 events in the test dataset, 10,606 events were wrongly predicted.
(Tcm: central memory T cell, Tem: effector memory T cell, Th0: naive T cell and Temra: effector memory T cell expressing CD45RA)
Flow cytometry is a technology used to measure physical characteristics of single cells by a single or multiple lasers. The visible-light scattering and fluorescence parameters of each individual cell can be detected. Forward scatter (FSC) and side scatter (SSC) correlate with the size and granularity of a cell, respectively. Furthermore, cells can be stained with antibodies conjugated to a fluorochrome (specific for a type of cell). The staining of cell populations with different marker antibodies can be used together with flow cytometry to distinguish cell types/subsets. By analysing this flow cytometry data, the cell types/subsets can be quantified, for example in a blood sample with a large number of different cell types. Manual analysis of large flow cytometry data, by manual gating cell groups, can be a time-consuming task. This neural-network was created to try to automate this analysis of flow cytometry data, which could significantly safe time with the analysis of large and complex data.
The dataset this model was trained and tested with was made by manual gating flow cytometry data from the public FlowRepository database. From the manual analysed data, CSV files were extracted and the cells were labeled for the corresponding cell subsets (see the flowcyto_data_preperation jupyter notebook). The datasets with the different cell types were concatenated and subsequently, train and test datasets were generated (traintest jupyter notebook).
The neural-network was trained on 12 parameters: FSC_A, FSC_H, FSC_W, SSC_A, SSC_H, SSC_W, CD4, CD5, CD8, CD197, CD45RA and Live/Dead. Each row in the dataset corresponds to one event (one cell or debris/doublet). The model was trained on 5.445.944 of such events. Various parameters in the model were tweaked or tested, such as the learning rate, batch size, dropout layers, dense layer units, before resulting in the current model.
While this neural-network model is not perfect in predicting the cell type of each single cell, it gives a good illustration on how neural-networks can be used to speed up flow cytometry analysis. As manual analysis of flow cytometry data is also not perfect, I think that one of the difficulties to further improve this model is in training data set (generated by manual gating). Therefore, the next step would be to generate a training dataset based on automated cell clustering, and train this model on such dataset.