The New York Times Docker Pipeline

Running on Telegram @NYTtopic

This code maintains a simple Telegram bot which collects fresh updates from the Twitter account of The New York Times and allows the user to look for recent articles on topics of their choice.
Hosted on Amazon EC2, the NYTtopic Bot consists of a pipeline of Docker containers:

➤ a first container runs a Python module which leverages Tweepy for accessing The New York Times's profile via the Twitter API, creating a stream of tweets and storing these into a Mongo database (second container);

➤ the third container carries out ETL tasks. It uses SpaCy to perform named-entity recognition (NER) on the text of each tweet extracted from MongoDB. These tags are then formatted as #hashtags, and all the data are eventually stored into a PostgreSQL database (fourth container);

➤ the fifth container feeds all the data into the Telegram bot, which is controlled and kept online using a library called Python Telegram Bot;

➤ the sixth and last container runs once per week, removing the records older than a year from both databases, so as to prevent them from growing too large.

I hope this bot will be useful anytime you are looking for high quality information.

Used Technology

Guest Star

Instructions For Using This Code Locally

📌 STEP 1: Obtain credentials for the Twitter API and the Telegram Bot API

Open profiles on Twitter and Telegram if you do not already have them.
Four authentication keys are needed to access Twitter's Streaming API: API Key, API Secret, Access Token and Access Token Secret:
- You can obtain them by registering an application on apps.twitter.com.
- Once in possession of the access keys, store them locally as environment variables with the following names: API_KEY, API_SECRET, ACCESS_TOKEN, SECRET_ACCESS_TOKEN.
Authentication to Telegram Bot Api is coparatively easier, as you only need one Access Token:
- To generate it, you have to chat with BotFather on Telegram (no kidding!) and follow a few simple steps (to prevent overlapping, please make sure you do not choose NYTtopic as a name for your bot 🙏🏻 ).
- Once again, store the token as an environment variable. Call it TOKEN_TELEGRAM.

📌 STEP 2: Run the pipeline with Docker

Clone this repository and install Docker if needed.
Go into the folder NYTopic_twitter_to_telegram:
- run docker-compose build and wait for Docker to set up everything for you;
- run docker-compose up. The bot should start responding within a few seconds.
Open a Telegram chat with your new bot and start browsing The New York Times!

To Do

~~Add a container for removing old records from Mongo and Postgres~~.
Provide the user with links to similar content in other newspapers.
Make hashtag-based queries possible, so as to return all the available articles related to a precise topic in a single message.

Name		Name	Last commit message	Last commit date
Latest commit History 73 Commits
NYTopic_twitter_to_telegram		NYTopic_twitter_to_telegram
img_and_gif		img_and_gif
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

The New York Times Docker Pipeline

Used Technology

Instructions For Using This Code Locally

📌 STEP 1: Obtain credentials for the Twitter API and the Telegram Bot API

📌 STEP 2: Run the pipeline with Docker

To Do

About

Releases

Packages

Languages

License

fra-mari/NYTimes_Docker_Pipeline

Folders and files

Latest commit

History

Repository files navigation

The New York Times Docker Pipeline

Used Technology

Instructions For Using This Code Locally

📌 STEP 1: Obtain credentials for the Twitter API and the Telegram Bot API

📌 STEP 2: Run the pipeline with Docker

To Do

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages