NLQ Product Retriever

A question answer system that spans across several product categories. Users provide some request query for a product or listing in one of the six categories. In response, the program finds matches to the query and returns results to the user. The program utilizes exact and partial matching techniques to produce as relevant an output as possible.

Datasets

Files have been entered into the "Datasets" directory. There are currently 6 categories available: Cars, Data Science Jobs, Furniture, Housing, Jewelry, and Motorcycles.

The data files in this repository are in the public domain.

Set up

The main.py program in the src directory loads the product question answer program for some given query. However, before it is used, the SQLite database needs to be created. The script load_db.py has been provided to acheive this purpose.

Dependencies

There are some dependencies required to run the main module. These include NLTK, Pandas, spaCy, and Scikit-Learn. You may install via pip by:

python -m pip install nltk
python -m pip install pandas
python -m pip install spacy
python -m pip install sklearn

Furthermore, once NLTK is installed, some corpora need to be downloaded. This can be performed by launching Python and entering:

import nltk
nltk.download('stopwords')
nltk.download('punkt')
nltk.download('averaged_perceptron_tagger')

Run

Run the main.py module, and specify the user query as the first argument to the program. The query will be classified as one of the six categories, be spell corrected, and will be analyzed for relevant product information. Matching or closely related products will be printed in the retrieved list.

Flags

Name	Effect
-v	Print verbose output for the process (defaults to non-verbose).
-l x	Limit the number of products returned to no more than x (defaults to 25 for partial, no limit for exact).
-e	Returns only exact matches to the specifications extracted from the user query, skipping any partial match recommendations. This is the inverse of -p. By default, partial and exact matches are included in results.
-p	Returns only partial matches to the specifications extracted from the user query. This is the inverse of -e. By default, exact and partial matches are included in results.
-s	Disables spelling correction on the given query (spelling corrected by default).
-r x	Specifies a ranking method. Available options for x are 'main', 'vsm', 'tfidf', 'query_tuple' and 'random'. (defaults to 'main')

Name		Name	Last commit message	Last commit date
Latest commit History 149 Commits
Datasets		Datasets
abbreviations		abbreviations
bool_II		bool_II
manip_turk		manip_turk
similarity		similarity
src		src
tagging		tagging
trie		trie
word2vec		word2vec
.gitignore		.gitignore
README.md		README.md
abbrev.txt		abbrev.txt
boundary-synonyms.txt		boundary-synonyms.txt
load_db.py		load_db.py
main.py		main.py
measure-units.txt		measure-units.txt
place-names.txt		place-names.txt
sample queries.txt		sample queries.txt
superlatives-synonyms.txt		superlatives-synonyms.txt
svm_test.py		svm_test.py
tagging.py		tagging.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

NLQ Product Retriever

Datasets

Set up

Dependencies

Run

Flags

About

Releases

Packages

Contributors 2

Languages

mmoult/NLQ-Product-Retriever

Folders and files

Latest commit

History

Repository files navigation

NLQ Product Retriever

Datasets

Set up

Dependencies

Run

Flags

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages