This repository includes some demo word2vec models.
Note: The project refers to 动手学深度学习
Datasets:
dataset1
: text8dataset2
: ptb
Models:
model1 (DONE)
: Continuous-Bag-Of-Words with Hierarchical-Softmaxmodel2 (DONE)
: Continuous-Bag-Of-Words with Negative-Samplingmodel3 (DONE)
: Skip-Gram with Hierarchical-Softmaxmodel4 (DONE)
: Skip-Gram with Negative-Samplingmodel5 (TODO)
: FastTextmodel6 (TODO)
: Glove
# download dataset text8
PYTHONPATH=. python dataprocess/process.py --dataset_name text8
# download dataset ptb
PYTHONPATH=. python dataprocess/process.py --dataset_name ptb
- for loader
# CBOW_HS_Loader
PYTHONPATH=. python loaders/CBOW_HS_Loader.py
# CBOW_HS_Loader: load data from cache
PYTHONPATH=. python loaders/CBOW_HS_Loader.py --use_cache
# CBOW_NS_Loader
PYTHONPATH=. python loaders/CBOW_NS_Loader.py
# CBOW_NS_Loader: load data from cache
PYTHONPATH=. python loaders/CBOW_NS_Loader.py --use_cache
# SG_HS_Loader
PYTHONPATH=. python loaders/SG_HS_Loader.py
# SG_HS_Loader: load data from cache
PYTHONPATH=. python loaders/SG_HS_Loader.py --use_cache
# SG_NS_Loader
PYTHONPATH=. python loaders/SG_NS_Loader.py
# SG_NS_Loader: load data from cache
PYTHONPATH=. python loaders/SG_NS_Loader.py --use_cache
- for module
# CBOW_HS_Module
PYTHONPATH=. python modules/CBOW_HS_Module.py
# CBOW_NS_Module
PYTHONPATH=. python modules/CBOW_NS_Module.py
# SG_HS_Module
PYTHONPATH=. python modules/SG_HS_Module.py
# SG_NS_Module
PYTHONPATH=. python modules/SG_NS_Module.py
- for train
PYTHONPATH=. python main.py --mode train
- for predict
PYTHONPATH=. python main.py --mode predict
You can change the config either in the command line or in the file utils/parser.py
Here are the examples for each module:
# CBOW_HS model
PYTHONPATH=. python main.py --module_type CBOW_HS --dataset_name text8
PYTHONPATH=. python main.py --module_type CBOW_HS --dataset_name ptb
# CBOW_NS model
PYTHONPATH=. python main.py --module_type CBOW_NS --dataset_name text8
PYTHONPATH=. python main.py --module_type CBOW_NS --dataset_name ptb
# SG_HS model
PYTHONPATH=. python main.py --module_type SG_HS --dataset_name text8
PYTHONPATH=. python main.py --module_type SG_HS --dataset_name ptb
# SG_NS model
PYTHONPATH=. python main.py --module_type SG_NS --dataset_name text8
PYTHONPATH=. python main.py --module_type SG_NS --dataset_name ptb