Skip to content

Telegram Chatbot, NLP, Semantic Similarity, Sentimental Analysis, Normalization

Notifications You must be signed in to change notification settings

HarunErgen/tatilsepeti-turkish-chatbot

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

image

Tatilsepeti Turkish Chatbot

Tatilsepeti Chatbot is created by utilizing Telegram API via python-telegram-bot. The main purpose of the bot is to answer the questions of the users and get feedback from the dialog. To achieve this, the bot needs to understand the semantic of the question as good as possible.
In order to understand the questions, the bot processes the asked questions by using Natural Language Processing methods.

Requirements

python -m pip install -r requirements.txt

Conversation with the Bot

At the start of the conversation, the bot greets you with an introductory sentence. Then it waits for input to answer your questions. Conversation with the bot proceeds through certain states. The state diagram can be shown as follows:

diagram

A flow of conversation example:
diagram

Natural Language Processing

Normalization

The bot normalizes the texts sent by the users by using VNLP: Turkish NLP Tools developed by VNGRS.
Normalization consists of:

  • Removing punctuations
  • Removing accent marks
  • Decapitalization
  • DeASCIIfication
  • Typo corrections
def normalize_message(text: str, decapitalize=True, punctuation=True, 
              accent_marks=True, deasciify=False, correct_typos=True) -> str:
    '''By default; remove punctuations and accent marks, 
    decapitalize the text, correct typos.'''
    if decapitalize:    text = text.lower()
    if punctuation:     text = normalizer.remove_punctuations(text)
    if accent_marks:    text = normalizer.remove_accent_marks(text)
    if deasciify:       text = " ".join(normalizer.deasciify(text.split()))
    if correct_typos:   text = " ".join(normalizer.correct_typos(text.split()))
    return text

normalization

Semantic Similarity

The bot cross-checks the asked question with the predetermined questions in the database according to their similarity level. The similarity is computed by encoding sentences into embeddings vectors and taking inner product of them. Google's Universal Sentence Encoder multilingual extension is used to get sentence embeddings.

Example embeddings:
embeddings

Semantic Textual Similarity by inner products:
similarity

Sentimental Analysis

The bot does the sentimental analysis by using Bert-base Turkish Sentiment Model based on BERTurk for Turkish Language.

Usage of the model:

from transformers import AutoModelForSequenceClassification, AutoTokenizer, pipeline

model = AutoModelForSequenceClassification.from_pretrained("savasy/bert-base-turkish-sentiment-cased")
tokenizer = AutoTokenizer.from_pretrained("savasy/bert-base-turkish-sentiment-cased")
sentiment = pipeline("sentiment-analysis", tokenizer=tokenizer, model=model)

positive         negative


Although the bot does not use the sentiments while answering the questions, it is very useful data to evaluate customer happiness.

About

Telegram Chatbot, NLP, Semantic Similarity, Sentimental Analysis, Normalization

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages