Compare to clustermatch correlation coefficient #7

rmflight · 2022-06-17T14:43:26Z

Greene lab published an interesting paper on a correlation coefficient that uses a different measure that is very, very interesting.

Would be nice to see how well we compare in terms of speed and relationships detected.

rmflight · 2022-06-22T20:00:33Z

At least in terms of speed on a single comparison, for a random sample with 5000 entries, I got

icikt (using Python all around so comparisons are valid): 0.06 s, -0.003 kendall-tau
ccc: 0.14 s, 0.0008

So we are still 10x faster on a single comparison, and in this random case, have coefficients close to 0 for both.

Would be nice to run all of the GTEx tissues and see if the ICI-Kt tracks with CCC.

rmflight · 2022-06-24T01:58:38Z

Sooo, the paper claims that Spearman only picks up linear relationships, and they give examples of actual Spearman correlation coefficients that look like they are missing some relationships that their CCC picks up on. And reading around the web, Kendall-tau supposedly gives similar values as Spearman.

So if we wanted to go further with this we would have to investigate how well Kendall-tau matches CCC, or at least tracks with it especially for non-linear type examples. Because otherwise, they've definitely made something that seems superior.

hunter-moseley · 2022-06-24T02:12:29Z

Both Spearman and Kendall tau pick up monotonic relationships. A linear relationship is monotonic, but a monotonic relationship is not necessarily linear.

…

On Thu, Jun 23, 2022 at 9:58 PM Robert M Flight ***@***.***> wrote: Sooo, the paper claims that Spearman only picks up linear relationships, and they give examples of actual Spearman correlation coefficients that look like they are missing some relationships that their CCC picks up on. And reading around the web, Kendall-tau supposedly gives similar values as Spearman. So if we wanted to go further with this we would have to investigate how well Kendall-tau matches CCC, or at least tracks with it especially for non-linear type examples. Because otherwise, they've definitely made something that seems superior. — Reply to this email directly, view it on GitHub <#7 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ADEP7B4WKPGHGDA6TDFMWKTVQUI53ANCNFSM5ZCQGCIQ> . You are receiving this because you are subscribed to this thread.Message ID: ***@***.*** com>

-- Hunter Moseley, Ph.D. -- Univ. of Kentucky Associate Professor, Dept. of Molec. & Cell. Biochemistry / Markey Cancer Center / Institute for Biomedical Informatics / UK Superfund Research Center Not just a scientist, but a fencer as well. My foil is sharp, but my mind sharper still. --------------------------------------------------------------- Email: ***@***.*** (work) ***@***.*** (personal) Phone: 859-218-2964 (office) 859-218-2965 (lab) 859-257-7715 (fax) Web: http://bioinformatics.cesb.uky.edu/ Address: CC434 Roach Building, 800 Rose Street, Lexington, KY 40536-0093

rmflight · 2022-06-24T13:45:11Z

Right, that definitely makes sense. And there are two aspects to this:

Picking up relationships that are interesting, and that the other coefficients dont capture
Returning coefficients closer to zero that other coefficients stray from zero.

In Figure 1, they are showing CCC picking up on relationships that Pearson and Spearman do not. However, in Figure 2, CCC actually provides more gene-gene coefficients closer to 0 across the whole blood expression than the other two coefficients. I'm guessing it holds for the other tissues as well.

Different types of relationships in data.
Each panel contains a set of simulated data points described by two generic variables: x and y. The first row shows Anscombe’s quartet with four different datasets (from Anscombe I to IV) and 11 data points each. The second row contains a set of general patterns with 100 data points each. Each panel shows the correlation value using Pearson (p), Spearman (s) and CCC (c). Vertical and horizontal red lines show how CCC clustered data points using x and y.

Distribution of coefficient values on gene expression (GTEx v8, whole blood).
a) Histogram of coefficient values. b) Corresponding cumulative histogram. The dotted line maps the coefficient value that accumulates 70% of gene pairs. c) 2D histogram plot with hexagonal bins between all coefficients, where a logarithmic scale was used to color each hexagon

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Compare to clustermatch correlation coefficient #7

Compare to clustermatch correlation coefficient #7

rmflight commented Jun 17, 2022

rmflight commented Jun 22, 2022

rmflight commented Jun 24, 2022

hunter-moseley commented Jun 24, 2022 via email

rmflight commented Jun 24, 2022

Compare to clustermatch correlation coefficient #7

Compare to clustermatch correlation coefficient #7

Comments

rmflight commented Jun 17, 2022

rmflight commented Jun 22, 2022

rmflight commented Jun 24, 2022

hunter-moseley commented Jun 24, 2022 via email

rmflight commented Jun 24, 2022