skip-gram word embedding model by C++
in header nlp.h
#include "word_lib.h"
#include "bp_network.h"
in header word_lib.h
class wordlib
{
private:
std::list<std::string> dictionary;
public:
wordlib();
~wordlib();
int get_place(std::string);
int lib_size();
bool search(std::string);
void add_word(std::string);
void add_word_from_file(std::string);
void print_lib();
void print_word(int);
std::string get_word(int);
};
this class is used to record all words that exist in training set.
in header bp_network.h
struct neuron;
double tanh(double x);
double difftanh(double x);
double sigmoid(double x);
double diffsigmoid(double x);
class word2vec;
word2vec will help you calculate each word's embedding vector and output information on the screen and into the file .
default input file's name is "trainingset.txt" ,you can change it if you like in word_embedding.cpp .
If you want to change the length of vectors,please go to bp_network.h and find this function: void word2vec::initializing()
then change the HNUM at the beginning, this parameter decides the length of vectors.