This repo aims to build a image-to-image & text-to-image search engine for fashion products using Jina as a neural search framework.
The fashion images are retrieved form Kaggle.
Jina includes:
- DocumentArray - Concurrent processing of Documents and push/pull them between machines. Useful for creating embeddings on remote machine with GPU and then indexing and querying locally
- Jina Hub Executors, which integrate deep learning models
- Jina Client, formats the REST request
- PQLite allowing us to pre-filter results by price, rating, etc
The front-end is built in Streamlit.
pip install -r requirements.txt
You'll want to create your own get_data.py
since processing logic varies from dataset to dataset.
This will create embeddings for all images using CLIPImageEncoder, and then store them on disk (with metadata) with PQLiteIndexer.
cd indexer
python app.py <number_of_docs_to_index>
By default the number of docs to index is set to 1,000,000.
After indexing you'll have a file called columns.json
in your indexer
directory. Copy this to the backend-
directories you want to work with. This will let the user filter by things like price, rating, color, etc (based on what options you present in your front-end). This will overwrite the existing columns.json
file(s) which are the ones from the fashion search.
From the repo's root directory:
cd searcher
python app.py -t <task>
to start the search server(s)
<task>
can be one of:
search
: Open up a RESTful interface for searching. Defaults to port 12345test_text
- Submit a sample text query and returnuri
s of resultstest_image
- Submit a sample image query and returnuri
s of results
- Open a new terminal window/tab, return to same directory
cd frontend
streamlit run frontend.py
- First index the data as stated above
- In the repo's root directory, run
docker-compose up