Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Visualization #2

Open
alfaroA opened this issue Sep 6, 2022 · 14 comments
Open

Visualization #2

alfaroA opened this issue Sep 6, 2022 · 14 comments
Assignees
Labels
enhancement New feature or request

Comments

@alfaroA
Copy link

alfaroA commented Sep 6, 2022

Hi ,

I would like your help about the visualization.:
I’m currently trying your package on the dataset WNV that you provided and Iwould like to reproduce the kind of figure you did on your paper ( Figure 5 B).
I’m talking about the DrawNetworkPlot based on the Meta-cluster-phenotype-mapping.

I have the data.clust table and the mapping table with the column PopName inside them.

I tried the code below:
DrawNetworkPlot(dat = data.clust,
timepoint.col = "timepoint",
timepoints = c('D0', 'D1', 'D2', 'D3', 'D4' , 'D5' , 'D6','D7'),
cluster.col = 'TrackSOM_metacluster_lineage_tracking',
mapping = mapping,
population.col = PopName)

But after some step it stop with an error:

Calculating edges
Computing node details
Calculating marker's average per node
Error in gmean(PopName) : mean is not meaningful for factors.

Do you have an idea of what's is going wrong ?

Thanks a lot for your help.

@ghar1821
Copy link
Owner

ghar1821 commented Sep 8, 2022

Hi Alfaro,

Thank you for trying the package out!

At the moment, the function requires the parameter marker.cols to be filled, regardless of whether you want to draw the network plot coloured by the expression of any marker(s) in the data.clust or not. If you just supply a marker name that is in data.clust for marker.cols parameter, it should get rid of the error, and give you the network plots coloured by the cell population mapping and the marker.

I will update the function so that you don't need to supply any marker column names, which should allow you to just draw network plot coloured by the cell populations.

@ghar1821 ghar1821 self-assigned this Sep 8, 2022
@ghar1821 ghar1821 added the enhancement New feature or request label Sep 8, 2022
@ghar1821
Copy link
Owner

ghar1821 commented Sep 8, 2022

Also another thing, make sure you put a quote around "PopName" for the population.col parameter's value.

@alfaroA
Copy link
Author

alfaroA commented Sep 8, 2022

Hi ,
Thanks for the response.
I tried but i have another error now :

DrawNetworkPlot(dat = data.clust,

  •             timepoint.col = "timepoint",
    
  •             timepoints = c('D0', 'D1', 'D2', 'D3', 'D4' , 'D5' , 'D6','D7'),
    
  •             cluster.col = 'TrackSOM_metacluster_lineage_tracking',
    
  •             marker.cols = 'FITC.Ly6C.',
    
  •             mapping = mapping,
    
  •             population.col = "PopName")
    

Calculating edges
Computing node details
Calculating marker's average per node
Saving node and edge details
Start drawing plots
Error in graph_from_data_frame(edge.df, directed = TRUE, node.dat) :
Duplicate vertex names

Any idea how i can fix it ?

Best ,
Alexia

@ghar1821
Copy link
Owner

ghar1821 commented Sep 8, 2022

Hmm I've never seen that error before. Could it be that there are duplicated meta-cluster ID in the mapping data.frame?

@ACAIDI
Copy link

ACAIDI commented Nov 17, 2022

Hello,
Thank you for this beautiful work.
I'm testing TRACKSOM for spectral data and I have few questions:
1 In the WNV example that you used,how did you get the population clusters?

2 When I plot the networks with the DrawNetworkPlot, I don't understand why there is always an NA cluster.

3 for the TimeseriesHeatmap, I don't understand why, it doesn't put the timepoints in the wright order

Thanks in advance for your response.

@ghar1821
Copy link
Owner

Hi there,

Thank you for your interest in TrackSOM!

1 In the WNV example that you used,how did you get the population clusters?

I basically first worked out which cluster represent which population, created a csv file which. maps the clusters to populations, something like this:
https://github.com/ghar1821/TrackSOM-evaluations/blob/main/tracksom_paper/wnv_cns_dataset/meta_pop_mapping.csv
Then map them into the dataset, using something like merge.data.table if using data.table object/functions.
Afterwards, I then just simply supply the column denoting the population name in my dataset into any visualisation function through the population.col parameter.
Have a look at the run script for the paper here: https://github.com/ghar1821/TrackSOM-evaluations/blob/main/tracksom_paper/wnv_bm_dataset/draw_Figure_5.R

2 When I plot the networks with the DrawNetworkPlot, I don't understand why there is always an NA cluster.
NA clusters are basically clusters which I do not assign any cell type annotations to.

3 for the TimeseriesHeatmap, I don't understand why, it doesn't put the timepoints in the wright order
Ah yes, for this, you will need order the vector you passed on as the timepoints parameter as the order of your time-point.
Sorry I should've made this clear in the vignette.

@ACAIDI
Copy link

ACAIDI commented Jan 25, 2023

Hello,
Thanks for your response to my first post.
In my team we are very interested to use TRACKSOM for our Spectral data and we have some questions about this package :

  1. In the TRACKSOM tutorial, it is noted that we can use .fcs files for analysis, but unfortunately it does not work with this files type. I had to convert the .fcs files to csv files for the analysis. I would like to use fcs files directly to evitate any errors that may be related to the conversion. Is it possible to enlighten me on this point.

  2. I compared the Metaclusters of TRACKSOM with the clusters obtained using Phenograph and FlowSOM, and they are very different. While Phenograph and FlowSOM clusters are very similar and closer to biological reality. I was surprised by this result, because TRACKSOM uses the FlowSOM algorithm for clustering. I even tried to change several parameters (nclu, maxMeta) but the clusters remain very different from those of FlowSOM. I specify that for FlowSOM, I used version 1.14.1. Could you enlighten me on this as well.

Thanks in advance

@ghar1821
Copy link
Owner

Hi there,

1. In the TRACKSOM tutorial, it is noted that we can use .fcs files for analysis, but unfortunately it does not work with this files type. I had to convert the .fcs files to csv files for the analysis. I would like to use fcs files directly to evitate any errors that may be related to the conversion. Is it possible to enlighten me on this point.

Can you share what the error message you get when you try to use fcs files? There has been quite some major changes done to FlowSOM recently and that might have caused some problem with how TrackSOM parse the fcs file.

2. I compared the Metaclusters of TRACKSOM with the clusters obtained using Phenograph and FlowSOM, and they are very different. While Phenograph and FlowSOM clusters are very similar and closer to biological reality. I was surprised by this result, because TRACKSOM uses the FlowSOM algorithm for clustering. I even tried to change several parameters (nclu, maxMeta) but the clusters remain very different from those of FlowSOM. I specify that for FlowSOM, I used version 1.14.1. Could you enlighten me on this as well.

May I ask how did you run FlowSOM/Phenograph? Did you cluster all the timepoints in one go? or did you cluster one timepoint independent of each other? While TrackSOM does use FlowSOM under the bonnet, it is operates quite differently.

With TrackSOM, the SOM grid is carried over from one time-point to another, so that can affect how the subsequent clustering step looks.

Did you run TrackSOM with merging of metaclusters allowed or disallowed? If the latter, when a metacluster contains SOM nodes which were previously clustered separatedly in previous timepoint, but are now clustered together,TrackSOM will force them to be separated.

@ACAIDI
Copy link

ACAIDI commented Jan 25, 2023

          Hi,

Thank you a lot for your rapid response.

> Can you share what the error message you get when you try to use fcs files?
This is error message:
dat <- lapply(data.files, function(f) fread(f))
Erreur in fread(f) :
null character in the middle of the string : '/\x99E\x99>\xfbD\x9a.\021F\022\xfc\xeaF\xf7\xe7CET\xdd\aF\xd0)\034Ĭ\xa5\xfbCCS\xe9D\035\xa5\xc0E\030ukE\b\u07b5\xc2r\xfa\006F\xccm\xfdF \x8d\004C!t\x83E\0\xa0\xb2D\xbav`I \a\037I\xfa\xc50H\006\x8e\xffG\x80Z\xabG\0\xff\xf5GgO=DDYsE~\x95{E\xf0q\xceC\x97"4D\xfbd\xd3C\x928/C\x9a\x8c*D\xf3R\xc0D\x9dv\xaf\xc3I\036\x8f\xc0\xe5V\xb6D\xd1\xc5XF~\a\x83D܊\0Cĝ\025ßΏ\u0088\xf6^E\021;\x9eD?a\xd6C/\xe9\xb8E\xbd\023\016FK\x9c\xb6FN\xb5uÈd\020E\0\x80\xddD\xafg\x9bI \xfazI|\xa1\x84H\021\xcfXH\x80\xf7&H@\xc3HH\xbeܫD@\xb9\xeaDX\xbc\xfdC\x80\xb5\xa3EE\024\021\xc4\xe7]\026BZ\xecbDކ\xedD\xb2\037QE\xe3\x8d>CA)\0D\004Y\003E\x8c\003\xa4F\xe9֫DY\x9b\xc7DXb\x81\xc3B\xb0U\x87Ej\xc9\035E\xf8N\bEt\xb6\006\xc4\xf1c$Fv0\xa9F\x87b\xa6\xc4\xe6zYE\0P'
More : warning Message:
in fread(f)
Detected 1 column names but the data has 4 columns (i.e. invalid file). Added 3 extra default column names at the end.

> May I ask how did you run FlowSOM/Phenograph? Did you cluster all the timepoints in one go? or did you cluster one timepoint independent of each other?

for FlowSOM and Phenorgraph clustering, i cluster all the timepoints in one go.
I forgot to specify that for each timepoint I have 4 or 5 biological replicas

> Did you run TrackSOM with merging of metaclusters allowed or disallowed?
I tried the two options (disabling or enabling merging), but the results doesn't change a lot

@ghar1821
Copy link
Owner

Thanks for sharing.

On the fcs file error, the code:

dat <- lapply(data.files, function(f) fread(f))

basically run data.table's fread command on the fcs files, which is not right as fread can only read csv files. For the tutorial, this command was used to emulate having the data stored as data.table object.

For fcs files, you need to simply just store the location of the fcs files in a vector and feed it to TrackSOM. So something like:

data.files.fullpath.fcs <- c("timepoint1.fcs", timepoint2.fcs", "timeopoint3.fcs"))

Then feed the vector to TrackSOM.

On the clustering result, I'm gonna have to think it through a bit more. TrackSOM operates differently in that it runs the hierarchical clustering a time-point at a time, on a SOM grid which intial shape, was based on the shape of the SOM grid obtained from the preceding time-point. One can think of it as building the clusters iteratively, a time-point at a time. Whereas the way you ran FlowSOM/phenograph, all the time-points were processed altogether in one go. From this alone, I can see how the clustering result will be different.

@ACAIDI
Copy link

ACAIDI commented Jan 25, 2023

>For fcs files, you need to simply just store the location of the fcs files in a vector and feed it to TrackSOM. So something like
It works thanks, but the problem is that we need dat.clust to plot networks and timeseries. We can obtain clustering details using this function ExportClusteringDetailsOnl, but we don't have the other columns of dat.clust. Is there another function to get dat.clust from TRACKSOM object directly.

Thanks a lot

@ghar1821
Copy link
Owner

Oh I see what you mean. One way to do this is to convert the fcs files into one big data.table object, attach the clustering details emitted by ExportClusteringDetailsOnly, then feed it into any of the plotting functions.

To convert fcs files to data.table, you can use Spectre package's read.files and do.merge.files functions:

# install spectre
remotes::install_github("immunedynamics/spectre")

# specify the location of your fcs files
fcs_file_loc <- "/somewhere"

dt_list <- Spectre::read.files(file.loc=fcs_file_loc, file.type=".fcs", do.embed.file.names=TRUE)
dt <- Spectre::do.merge.files(dt_list)

# check that the ordering of dt follows the time-point, i.e. cells from time-point 1 comes before time-point 2. If not re-order them!! 

# assuming the tracksom object is tsom
tracksom_clust <- ExportClusteringDetailsOnly(tsom)

# attach it to the data.table
dt <- cbind(dt, tracksom_clust)

# then supply dt to one of the plotting function.

@ACAIDI
Copy link

ACAIDI commented Jan 26, 2023

Many thanks for you help @ghar1821
Another question about biological replicates, should I concatenate each timepoint's fcs files as TRACKSOM considers each input as a timepoint?

@ghar1821
Copy link
Owner

I generally would concatenate the biological replicates for a time-point into one fcs file, unless there is a reason why it is not ok to mix the cells from different replicates into one metacluster/SOM node, e.g., if there is a significant difference between them driven by batch effect.

One important note, if there are differences in your samples driven by batch effect, I strongly recommend running batch effect removal tools first to remove/minimise the differences across batches.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants