Questions about wink-distance #31
-
Hello Sanjaya, I've just stumbled upon your wink.js and am very intrigued. But please note that I am a relative noob in the ML/NLP world; trying to learn. I have a few questions about the use of wink-distance to detect similarities between raw, unstructured texts:
Thanks very much for your help. Cordially, Paul |
Beta Was this translation helpful? Give feedback.
Replies: 5 comments 3 replies
-
Hello @paul-bell Yes, You could use // Require packages & models.
const distance = require( 'wink-distance' );
const winkNLP = require( 'wink-nlp' );
const its = require( 'wink-nlp/src/its.js' );
const as = require( 'wink-nlp/src/as.js' );
const model = require( 'wink-eng-lite-web-model' );
// Instantiate wink nlp using the model.
const nlp = winkNLP( model )
// Texts to be tested for similaity.
const text1 = `the dog chased the cat`;
const text2 = `the cat chased the mouse`;
// Compute bow using wink nlp.
const bow1 = nlp.readDoc( text1 ).tokens().out( its.normal, as.bow );
const bow2 = nlp.readDoc( text2 ).tokens().out( its.normal, as.bow );
// Use bag-of-words to compute cosine similarity .
const cosineSimilarity = 1 - distance.bow.cosine( bow1, bow2 ); The above code is available as a RunKit Notebook — you can play with it there. The Best, |
Beta Was this translation helpful? Give feedback.
-
Hi Sanjaya, Most kind of you; thanks very much! I can't wait to try this out. With any luck, I'll get to it before end of day. If not, then first thing in the AM. Again, thank you. Cordially, Paul |
Beta Was this translation helpful? Give feedback.
-
Hello Sanjaya, Here we are one month later and I am finally able to start playing with the BOW/cosine similarity code you've generously given me (thank you again). Because the code I am working on is strictly backend, I chose to use wink-eng-lite-model rather than its 'web' variant. Do you have such models in languages other than English? Thank you. Cordially, Paul |
Beta Was this translation helpful? Give feedback.
-
Hi Sanjaya, Please forgive my ignorance, but do Similarity and BM25 Vectorizer include non-English models? To your last question, I am open to the possibility of assisting in such an effort but would probably depend on some initial guidance from you as to what's required. OTTOMH, French and German; and at least one non-Western language. Cordially, Paul |
Beta Was this translation helpful? Give feedback.
-
Thank you. I will give them a spin. I forgot to mention that Spanish is probably the most important of the western alphabet languages. When you have the time, can you tell me what I'd need to do to create such a model? -Paul |
Beta Was this translation helpful? Give feedback.
Hello @paul-bell
Yes,
wink-distance
has APIs to determine similarity between texts. Cosine distance is one of the prominent way to determine similarity; other being jaccard and tversky. The cosine distance acceptsbow
, whereas jaccard and tversky acceptset
as the input parameters. Note, since it computes "distance", you will have to subtract from1
to obtain the similarity!You could use
wink-nlp
to create the bag-of-words (bow). Here is the code to do it: