On similarity and distance metrics for processes that go towards a target or away #118

salotz · 2023-11-08T20:54:59Z

salotz
Nov 8, 2023
Maintainer

In my initial work I used fairly clear distance metrics like unbinding which optimize moving something to long distances away from a reference. However, for applications like binding you want to enhance matching some target pattern this can be a bit awkward to do with distance metrics without the right mathematical framework.

Previously we have done things like using the inverse of the distances e.g. the RebindingDistance

https://github.com/ADicksonLab/wepy/blob/master/src/wepy/resampling/distances/receptor.py#L208

 d = abs(1.0 / state_a_rmsd - 1.0 / state_b_rmsd)

This has tended to work but it was a little unclear on what grounds it worked.

I found some more theoretical basis that both kind of affirms this is in the right direction but can potentially improve this transformation.

A good summary is here: https://stackoverflow.com/a/62300777

That is, with f(x) you can make similarity = f(distance) or distance = f(similarity). It works in both directions. Such function works, because the relation between similarity and distance is that one decreases when the other increases.

Where the main thing is that if you have a strictly monotonically decreasing function you should be good to go, as long as you look out for things like dividing by zero which the above solution would fail for.

The key idea is the transformation between similarity and distance metrics. A good summary in the context of ML kernels is here: https://scikit-learn.org/stable/modules/metrics.html#metrics

Recommended functions are:

S = np.exp(-D * gamma), where one heuristic for choosing gamma is 1 / num_features
S = 1. / (D / np.max(D))

The other aspect is that there should be some notion of normalization in the transformation which is currently not utilized. In the above case we need either some factor like the number of features or a maximum boundary value.

This kind of depends on whether there is a natural maximum or not. In cases where that is available it would be fairly straightforward to use, but does usually require quite a bit more heuristics (e.g. taking into account box sizes).

The exponential function seems interesting since it only requires the number of features.

Hope this is helpful, it has been for me and kind of makes this decision a bit easier to justify and interpret, as well as hopefully making the metrics more robust and accurate.

salotz · 2023-11-08T20:56:23Z

salotz
Nov 8, 2023
Maintainer Author

I guess pending some discussion here, that this could result in this being changed for the RebindingDistance in wepy itself.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

On similarity and distance metrics for processes that go towards a target or away #118

{{title}}

Replies: 1 comment

{{title}}

Select a reply

On similarity and distance metrics for processes that go towards a target or away #118

salotz Nov 8, 2023 Maintainer

Replies: 1 comment

salotz Nov 8, 2023 Maintainer Author

salotz
Nov 8, 2023
Maintainer

salotz
Nov 8, 2023
Maintainer Author