Tatilsepeti Chatbot is created by utilizing Telegram API via python-telegram-bot. The main purpose of the bot is to answer the questions of the users and get feedback from the dialog. To achieve this, the bot needs to understand the semantic of the question as good as possible.
In order to understand the questions, the bot processes the asked questions by using Natural Language Processing methods.
python -m pip install -r requirements.txt
At the start of the conversation, the bot greets you with an introductory sentence. Then it waits for input to answer your questions. Conversation with the bot proceeds through certain states. The state diagram can be shown as follows:
A flow of conversation example:
The bot normalizes the texts sent by the users by using VNLP: Turkish NLP Tools developed by VNGRS.
Normalization consists of:
- Removing punctuations
- Removing accent marks
- Decapitalization
- DeASCIIfication
- Typo corrections
def normalize_message(text: str, decapitalize=True, punctuation=True,
accent_marks=True, deasciify=False, correct_typos=True) -> str:
'''By default; remove punctuations and accent marks,
decapitalize the text, correct typos.'''
if decapitalize: text = text.lower()
if punctuation: text = normalizer.remove_punctuations(text)
if accent_marks: text = normalizer.remove_accent_marks(text)
if deasciify: text = " ".join(normalizer.deasciify(text.split()))
if correct_typos: text = " ".join(normalizer.correct_typos(text.split()))
return text
The bot cross-checks the asked question with the predetermined questions in the database according to their similarity level. The similarity is computed by encoding sentences into embeddings vectors and taking inner product of them. Google's Universal Sentence Encoder multilingual extension is used to get sentence embeddings.
Example embeddings:
Semantic Textual Similarity by inner products:
The bot does the sentimental analysis by using Bert-base Turkish Sentiment Model based on BERTurk for Turkish Language.
Usage of the model:
from transformers import AutoModelForSequenceClassification, AutoTokenizer, pipeline
model = AutoModelForSequenceClassification.from_pretrained("savasy/bert-base-turkish-sentiment-cased")
tokenizer = AutoTokenizer.from_pretrained("savasy/bert-base-turkish-sentiment-cased")
sentiment = pipeline("sentiment-analysis", tokenizer=tokenizer, model=model)
Although the bot does not use the sentiments while answering the questions, it is very useful data to evaluate customer happiness.