Test out the API:
curl https://language-identifier-app.herokuapp.com/identify -d "data=Le commerce n'est pas un monstre et la publicité" -X GET
See the website for a tutorial.
This repo is an example of how to build a machine learning application from scratch! This is a simple web application that uses the Naive Bayes algorithm to classify a string of text as belonging to one of several languages.
This application has a front-end built in Flask, a model server created with flask_restful, and a database in SQLite. Data is downloaded from Wikipedia.
Install dependencies with conda:
conda create --name language_classifier python=3.8
conda activate language_classifier
pip install -r requirements.txt
Run the web application:
export FLASK_APP=app.py
flask run
Or:
python app.py
View the application:
http://127.0.0.1:5000
Run the model API:
python api.py
Test out the API:
curl http://127.0.0.1:5001//identify -d "data=Le commerce n'est pas un monstre et la publicité" -X GET
Download some data from Wikipedia:
cd scraper && python get_data.py
Generate a SQLite database:
create_database.py
Explore the data:
$ sqlite3 language_data.db
> SELECT language, title FROM wiki_data LIMIT 25;
Make and test model (run all cells in this notebook):
cd ../modeling
jupyter lab create_model.ipynb
Deploy the model by editing the following lines in classify_language.py
:
MODEL_NAME = "NB_classif"
MODEL_VERSION = "1"
You should deploy the web application app.py
to Heroku or something. Then you should deploy api.py
to GCP or AWS. You'll need to fiddle with the requirements for each of these apps as well as the DNS so the servers can talk to each other.
- This application was designed with teaching in mind, so there are some simplifications.
- This application has three distinct severs: a web app, a model API, and a SQL database. When run locally, these distinctions don't matter, but when you deploy an app like this, these pieces will all be distinct servers.
- All the dependencies here are global - if this was deployed, not all dependencies would need to be shared between servers.