Processors

Xatkit embeds processors: additional pieces of logic that can be plugged to tune the intent recognition process. Pre-processors operate on the user input to optimize it before intent extraction (e.g. by adding a question mark at the end of a sentence that is obviously a question). Post-processors are designed to operate after the intent recognition process, usually to set additional context parameters (e.g. perform some sentiment analysis, set whether the input is a yes/no question, etc).

Add the Xatkit Processors dependency

Xatkit processors are bundled in a specific project you need to add to your pom.xml:

<dependency>
    <groupId>com.xatkit</groupId>
    <artifactId>xatkit-processors</artifactId>
    <version>0.0.1-SNAPSHOT</version>
</dependency>

Enable Xatkit Processors

Unless explicitly stated in the documentation, Xatkit processors can be used with any intent recognition engine. The bot properties file accepts the following keys to enable pre/post processors:

xatkit.recognition.preprocessors  = PreProcessor1, PreProcessor2
xatkit.recognition.postprocessors = PostProcessor1, PostProcessor2

The value of each property is a comma-separated list of processor names (see the table below for the list of processors embedded in Xatkit and their names).

📚 You can also directly use the processor class to configure your bot programmatically:

import static com.xatkit.core.recognition.IntentRecognitionProviderFactoryConfiguration.*;
 
[...]
Configuration configuration = new BaseConfiguration();
// Bot-specific configuration (NLP engine, database, etc)
configuration.addProperty(RECOGNITION_PREPROCESSORS_KEY, "PostProcessor1, PostProcessor2");

Access Processor Data

Data extracted from processors is attached to the current intent and stored in the state context. context.getIntent().getNlpData() is a map containing all the information extracted by Xatkit processors for the current intent.

The code below shows how to access the property nlp.stanford.isYesNo (extracted by the IsEnglishYesNoQuestion post-processor) and use it to tune the bot behavior:

state("myState")
    .body(context -> {
        if((Boolean) context.getIntent().getNlpData().get("nlp.stanford.isYesNo")) {
            // Post a reply matching a yes/no question
        } else {
            // Post a generic reply
        }
     })
     .next()
     [...]

Pre-Processors

Name	Description	Requirements
`EmojiToText`	Replaces all emojis by its alias (i.e. the emoji name) or removes all emojis.
`InternetSlang`	Translates slang terms by their standard-English form (e.g. from "idk who r u" to "I don't know who are you")	A `json` file containing a dictionary of slang terms (keys) and their respective translations (values). See InternetSlangPreProcessor ussage
`SpacePunctuation`	Adds spaces around punctuation when needed (e.g. from "?" to " ?"). This processor is enabled by default when using the `NlpjsIntentRecognitionProvider`.

Post-Processors

Name	Description	Requirements
`Emoji`	Stores the emoji information in dedicated objects for all emojis found in the intent. Some emojis have sentiment related information.	A file with the sentiment information (available in the resources folder).
`EnglishSentiment`	Sets the parameter `nlp.stanford.sentiment` to a value in `["Very Negative", "Negative", "Neutral", "Positive", "Very Positive"]` corresponding to the sentiment extracted from the user input.	See Stanford CoreNLP configuration
`IsEnglishYesNoQuestion`	Sets the parameter `nlp.stanford.isYesNo` to `true` if the user input is a yes/no question, and `false` otherwise	See Stanford CoreNLP configuration
`LanguageDetection`	- Detects the language of the last user input and sets it in `nlp.opennlp.langdetect.lastInput`. - Stores the detected language for the entire conversation in `context.getSession().get("OPENNLP_LAST_N_INPUTS_SCORE_PARAMETER_KEY")`.	A pre-trained language detection model. See LanguageDetectionPostProcessor usage)
`RemoveEnglishStopWords`	Removes English stop words from recognized intent's parameter values that have been extracted from `any` entities. This processor helps normalizing DialogFlow values when using `any` entities.
`TrimParameterValues`	Removes leading/trailing spaces in extracted parameter values (e.g. from "Barcelona " to "Barcelona"). This processor is enabled by default when using the `NlpjsIntentRecognitionProvider`.
`TrimPunctuation`	Removes punctuation in extracted parameter values (e.g. from "Barcelona!" to "Barcelona"). This processor is enabled by default when using the `NlpjsIntentRecognitionProvider`.
`Toxicity`	Sets `nlp.perspectiveapi`, `nlp.detoxify` or both (see ToxicityPostProcessor usage) to an object that contains scores for different toxicity labels.	For PerspectiveAPI you need an API key. You can ask for it here. For Detoxify you need to deploy a python server to make requests to the model (see this example server).

Stanford CoreNLP Configuration

Stanford CoreNLP is not embedded by default in Xatkit. Add the following dependencies in your bot's pom.xml if you want to use a Stanford CoreNLP processor:

<dependency>
    <groupId>edu.stanford.nlp</groupId>
    <artifactId>stanford-corenlp</artifactId>
    <version>3.9.2</version>
    <exclusions>
        <exclusion>
            <groupId>com.google.protobuf</groupId>
            <artifactId>protobuf-java</artifactId>
        </exclusion>
    </exclusions>
</dependency>

<dependency>
    <groupId>edu.stanford.nlp</groupId>
    <artifactId>stanford-corenlp</artifactId>
    <version>3.9.2</version>
    <classifier>models</classifier>
    <exclusions>
        <exclusion>
            <groupId>com.google.protobuf</groupId>
            <artifactId>protobuf-java</artifactId>
        </exclusion>
    </exclusions>
</dependency>

📚 You can use models for specific language by adapting the classifier. For example <classifier>models-chinese</classifier> imports the chinese models.

ToxicityPostProcessor usage

See the PerspectiveAPI and Detoxify for more information about the language models.

If you want to use Detoxify, add these properties to your bot with the proper parameters:

botConfiguration.setProperty(USE_DETOXIFY, true);
botConfiguration.setProperty(DetoxifyConfiguration.DETOXIFY_SERVER_URL, "YOUR SERVER URL");

📚 You'll need to wrap your Detoxify model in a REST API. See our prototype implementation for more information.

If you want to use PerspectiveAPI, add these properties to your bot with the proper parameters:

botConfiguration.setProperty(RECOGNITION_POSTPROCESSORS_KEY,"ToxicityPostProcessor");
botConfiguration.setProperty(USE_PERSPECTIVE_API, true);
botConfiguration.setProperty(PerspectiveApiConfiguration.API_KEY, "YOUR PERSPECTIVEAPI KEY");
botConfiguration.setProperty(PerspectiveApiConfiguration.LANGUAGE, "YOUR LANGUAGE (en/es)");

PerspectiveAPI has other optional parameters that can be added to the bot: doNotStore, clientToken and sessionId. See the official description of the attributes and methods for more information.

📚 See the Perspective API documentation to request an API key.

You can access to the toxicity scores this way:

DetoxifyScore score = (DetoxifyScore) context.getIntent().getNlpData().get("nlp.detoxify");
Double toxicity = score.getToxicity();

and the same way with PerspectiveAPI. See PerspectiveAPI and Detoxify code and the toxicity example bot for more information.

LanguageDetectionPostProcessor usage

This processor requires a language detection model, you can configure it with the following propery:

botConfiguration.setProperty(LanguageDetectionPostProcessor.OPENNLP_MODEL_PATH_PARAMETER_KEY,"/path/to/model");

📚 You can download the model provided by OpenNLP here

This processor stores a list of possible language for the received user input. By default this list contains the 3 most probable languages. You can customize this number with the following property:

/*
 * Number of probable languages returned by the processor (default: 3)
 */
botConfiguration.setProperty(LanguageDetectionPostProcessor.MAX_NUM_LANGUAGES_IN_SCORE, 4);

💡 Detected languages are represented as ISO 639-3 strings.

This processor also stores a global, more accurate prediction in context.getSession().get("OPENNLP_LAST_N_INPUTS_SCORE_PARAMETER_KEY"). This prediction is computed on the last n user inputs. By default the processor considers the last 10 inputs, but this value can be customized with the following property:

/*
 * Number of user inputs to consider for the global prediction
 */
botConfiguration.setProperty(LanguageDetectionPostProcessor.MAX_NUM_USER_MESSAGES, 10);

EmojiToTextPreProcessor usage

If this processor is used, by default all emojis will be removed from the user input. To replace them by an alias, you must set the configuration parameter:

botConfiguration.setProperty(EmojiToTextPreProcessor.REMOVE_EMOJIS, false);

Note that to manually set the remove emojis option, you have to set a true value.

EmojiPostProcessor usage

The information about an emoji (excluding sentiment-related information) is obtained from the emoji-java library.

The sentiment information is obtained from a file located in the resources folder, which contains positive, neutral and negative scores for the positive, neutral and negative sentiment of the emoji, respectively. These scores are in the [0,1] interval, being 0 the lowest possible value and 1 the highest one. The sum of the 3 values is always 1. This class contains positive, neutral and negative score (instead of a single generic) to allow the bot designers to finely tune how sentiments from emojis are handled by their bot.

There is also an attribute obtained from the sentiment file, frequencyInSentimentRanking, that refers to the number of occurences of the emoji in the training process of the sentiment scores. It can be used, for instance, to ignore sentiment values of emojis with low frequencyInSentimentRanking (e.g. < 5) since the sentiment values might not be enough truthful.

InternetSlangPreProcessor usage

As this is a pre-processor, it translates the slang terms found in the provided slang dictionary before the intent recognition is performed.

By default this pre-processor uses the dictionary embedded in xatkit-runtime, which has been obtained from the Noslang online dictionary. To use a custom json file containing a dictionary, set this property of the bot configuration with the absolute path to the json file.

botConfiguration.setProperty(InternetSlangPreProcessor.SLANG_DICTIONARY_SOURCE, "<Absolute path to your file>");

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Processors

Add the Xatkit Processors dependency

Enable Xatkit Processors

Access Processor Data

Pre-Processors

Post-Processors

Stanford CoreNLP Configuration

ToxicityPostProcessor usage

LanguageDetectionPostProcessor usage

EmojiToTextPreProcessor usage

EmojiPostProcessor usage

InternetSlangPreProcessor usage

Installation

Your first bot

The Xatkit DSL

NLP in Xatkit

Platforms

Bot monitoring

Advanced Concepts

Collaborations

Clone this wiki locally