Crawler.synom
Only Pt-br words
I created the project when my leader needed a bunch of synonym words (on PT-BR) to use it inside our MSSQL database, to enable some text markups to our users, so i handle that problem with that web site www.sinonimo.com.br that contains a lot of synonyms, with that project you will collect all data from the words and their synonym, after that you can generate a thesaurus.xml to import that (if you're on Microsoft ecosystem).
You will need follow the steps below to run that application.
To correct run the project make sure that you have the dependencies installed on your machine.
- npm
- mongodb
You need that package to make the crawler run managed and restarted if needed.
npm/yarn install pm2 -g
- Clone the repo
git clone https://github.com/ronniery/crawler.synom
- Go inside project folder
cd crawler.synom
- Now open the file
.env
on the root of the project and set the variablesDB_HOST
,DB_USER
andDB_PASSWORD
. - Install NPM packages
npm install
Or
yarn install
- Just run on bash
pm2 start ecosystem.config.js
PM2 package will handle the crawler execution for you.
There is 2 command line arguments that you can start the application with it:
--run-crawler or --run-crawler=true: With that flag you're starting the application to run the crawler only. --run-xml-builder or --run-xml-builder: With that flag you will start the application to generate Thesaurus xml file.
Distributed under the MIT License