Tatilsepeti Turkish Chatbot

Tatilsepeti Chatbot is created by utilizing Telegram API via python-telegram-bot. The main purpose of the bot is to answer the questions of the users and get feedback from the dialog. To achieve this, the bot needs to understand the semantic of the question as good as possible.
In order to understand the questions, the bot processes the asked questions by using Natural Language Processing methods.

Conversation with the Bot
Natural Language Processing

Requirements

python -m pip install -r requirements.txt

Conversation with the Bot

At the start of the conversation, the bot greets you with an introductory sentence. Then it waits for input to answer your questions. Conversation with the bot proceeds through certain states. The state diagram can be shown as follows:

A flow of conversation example:

Natural Language Processing

Normalization

The bot normalizes the texts sent by the users by using VNLP: Turkish NLP Tools developed by VNGRS.
Normalization consists of:

Removing punctuations
Removing accent marks
Decapitalization
DeASCIIfication
Typo corrections

def normalize_message(text: str, decapitalize=True, punctuation=True, 
              accent_marks=True, deasciify=False, correct_typos=True) -> str:
    '''By default; remove punctuations and accent marks, 
    decapitalize the text, correct typos.'''
    if decapitalize:    text = text.lower()
    if punctuation:     text = normalizer.remove_punctuations(text)
    if accent_marks:    text = normalizer.remove_accent_marks(text)
    if deasciify:       text = " ".join(normalizer.deasciify(text.split()))
    if correct_typos:   text = " ".join(normalizer.correct_typos(text.split()))
    return text

Semantic Similarity

The bot cross-checks the asked question with the predetermined questions in the database according to their similarity level. The similarity is computed by encoding sentences into embeddings vectors and taking inner product of them. Google's Universal Sentence Encoder multilingual extension is used to get sentence embeddings.

Example embeddings:

Semantic Textual Similarity by inner products:

Sentimental Analysis

The bot does the sentimental analysis by using Bert-base Turkish Sentiment Model based on BERTurk for Turkish Language.

Usage of the model:

from transformers import AutoModelForSequenceClassification, AutoTokenizer, pipeline

model = AutoModelForSequenceClassification.from_pretrained("savasy/bert-base-turkish-sentiment-cased")
tokenizer = AutoTokenizer.from_pretrained("savasy/bert-base-turkish-sentiment-cased")
sentiment = pipeline("sentiment-analysis", tokenizer=tokenizer, model=model)

Although the bot does not use the sentiments while answering the questions, it is very useful data to evaluate customer happiness.

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
README.md		README.md
database.py		database.py
main.py		main.py
requirements.txt		requirements.txt
sentiment.py		sentiment.py
similarity.py		similarity.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Tatilsepeti Turkish Chatbot

Requirements

Conversation with the Bot

Natural Language Processing

Normalization

Semantic Similarity

Sentimental Analysis

About

Releases

Packages

Languages

HarunErgen/tatilsepeti-turkish-chatbot

Folders and files

Latest commit

History

Repository files navigation

Tatilsepeti Turkish Chatbot

Requirements

Conversation with the Bot

Natural Language Processing

Normalization

Semantic Similarity

Sentimental Analysis

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages