NLP_Word2Vec:Skip-Gram Model

I implement a classic word2vec model: skip-gram model with negative sampling as the optimization method by hand in pure python3 and used TED-Talks-Dataset as the train set. To test the performence of the final embedding vectors, I used the TOEFL Synonym Questions Dataset to test its accuracy.

GOAL:

Building a skip-gram model with negative sampling to achieve that:
Given a specific word in the middle of a sentence (the input word), look at the words nearby and pick one at random. The network is going to tell us the probability for every word in our vocabulary of being the “nearby word” that we chose.

REFERANCE:

Blog:
01.Word2Vec Tutorial - The Skip-Gram Model
02.Word2Vec Tutorial - Negative Sampling
03.Deep Learning实战之word2vec
04.Word2Vec and FastText Word Embedding with Gensim
05.A Gentle Introduction to the Bag-of-Words Model
06.Python implementation of Word2Vec

Paper:
01.Distributed Representations of Words and Phrases and their Compositionality
02.Efficient Estimation of Word Representations in Vector Space
03.Word2vec Parameter Learning Explained
04.Linguistic Regularities in Continuous Space Word Representations
05.Evaluation methods for unsupervised word embeddings
06.Word and Phrase Translation with word2vec
07.word2vec Explained: Deriving Mikolov et al.’s Negative-Sampling Word-Embedding Method
08.How to Generate a Good Word Embedding?

Video:
01.Negative Sampling-Coursera Deeplearning

Code:
01.word2vec_commented_in_C
02.word2vec code in python

DATASET:

01.TED-Talks-Dataset
02.TOEFL Synonym Questions
Other datasets:
WordSim353、SNLI、[NER]、[SQuAD]、[Coref]、[SRL]、[SST-5]、[Parsing]

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
bin		bin
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

NLP_Word2Vec:Skip-Gram Model

GOAL:

REFERANCE:

DATASET:

About

Releases

Packages

Languages

Huixxi/NLP_Word2Vec

Folders and files

Latest commit

History

Repository files navigation

NLP_Word2Vec:Skip-Gram Model

GOAL:

REFERANCE:

DATASET:

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages