A library implementing different string similarity and distance measures for ease of use. A dozen of algorithms (including Levenshtein edit distance and sibblings, Jaro-Winkler, Longest Common Subsequence, cosine similarity etc.) are currently implemented. Used in the Cognitive Service Platform cmd.csp for NLP and classifier part.
There are no prerequisites.
Included dependencies:
<dependency>
<groupId>net.jcip</groupId>
<artifactId>jcip-annotations</artifactId>
<version>1.0</version>
</dependency>
To use, merge the following into your Maven POM (or the equivalent into your Gradle build script):
<repository>
<id>github</id>
<name>GitHub swelcker Apache Maven Packages</name>
<url>https://maven.pkg.github.com/swelcker</url>
</repository>
<dependency>
<groupId>cmd.csp</groupId>
<artifactId>cspsimilarity</artifactId>
<version>1.0.0</version>
</dependency>
Then, import cmd.csp.postagger.*;` in your application :
// Example
import cspsimilarity.*;
...
private NormalizedLevenshtein engineNL = new NormalizedLevenshtein();
private JaroWinkler engineJW = new JaroWinkler();
private MetricLCS engineMLCS = new MetricLCS();
private NGram engineNGRAM = new NGram(3);
private Cosine engineCOSINE = new Cosine(9);
private Jaccard engineJACARD = new Jaccard(9);
private SorensenDice engineSOREDICE= new SorensenDice(9);
...
String source = (sourceText);
String search = (toSearch);
double sS=0d;
sS=(engineNL.similarity(source, search));
sS=(engineJW.similarity(source, search));
sS=(1d-engineMLCS.distance(source, search));
sS=(1d-engineNGRAM.distance(source, search));
sS=(engineCOSINE.similarity(source, search));
sS=(engineJACARD.similarity(source, search));
sS=(engineSOREDICE.similarity(source, search));
- Maven - Dependency Management
Please read CONTRIBUTING.md for details on our code of conduct, and the process for submitting pull requests to us.
We use SemVer for versioning. For the versions available, see the tags on this repository.
- Stefan Welcker - Modifications based on tdebatty/java-string-similarity
See also the list of contributors who participated in this project.
This project is licensed under the MIT License - see the LICENSE.md file for details