Skip to content

A crawler for the sinonimo.com.br website that saves the words into mongodb database.

License

Notifications You must be signed in to change notification settings

ronniery/crawler.synom

Repository files navigation


Crawler.synom
Only Pt-br words

About The Project

I created the project when my leader needed a bunch of synonym words (on PT-BR) to use it inside our MSSQL database, to enable some text markups to our users, so i handle that problem with that web site www.sinonimo.com.br that contains a lot of synonyms, with that project you will collect all data from the words and their synonym, after that you can generate a thesaurus.xml to import that (if you're on Microsoft ecosystem).

Getting Started

You will need follow the steps below to run that application.

Prerequisites

To correct run the project make sure that you have the dependencies installed on your machine.

  • npm
  • mongodb

You need that package to make the crawler run managed and restarted if needed.

npm/yarn install pm2 -g

Installation

  1. Clone the repo
git clone https://github.com/ronniery/crawler.synom
  1. Go inside project folder
cd crawler.synom
  1. Now open the file .env on the root of the project and set the variables DB_HOST, DB_USER and DB_PASSWORD.
  2. Install NPM packages
npm install

Or

yarn install
  1. Just run on bash
pm2 start ecosystem.config.js

PM2 package will handle the crawler execution for you.

Flags

There is 2 command line arguments that you can start the application with it:

--run-crawler or --run-crawler=true: With that flag you're starting the application to run the crawler only. --run-xml-builder or --run-xml-builder: With that flag you will start the application to generate Thesaurus xml file.

License

Distributed under the MIT License