Skip to content

ValKmjolnir/word-embedding-skip-gram

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 

Repository files navigation

word-embedding-skip-gram

skip-gram word embedding model by C++

in header nlp.h

#include "word_lib.h"
#include "bp_network.h"

in header word_lib.h

class wordlib
{
	private:
		std::list<std::string> dictionary;
	public:
		wordlib();
		~wordlib();
		int get_place(std::string);
		int lib_size();
		bool search(std::string);
		void add_word(std::string);
		void add_word_from_file(std::string);
		void print_lib();
		void print_word(int);
		std::string get_word(int);
};

this class is used to record all words that exist in training set.

in header bp_network.h

struct neuron;
double tanh(double x);
double difftanh(double x);
double sigmoid(double x);
double diffsigmoid(double x);
class word2vec;

word2vec will help you calculate each word's embedding vector and output information on the screen and into the file .

default input file's name is "trainingset.txt" ,you can change it if you like in word_embedding.cpp .

the most important thing

If you want to change the length of vectors,please go to bp_network.h and find this function: void word2vec::initializing()

then change the HNUM at the beginning, this parameter decides the length of vectors.

Releases

No releases published

Packages

No packages published