-
Notifications
You must be signed in to change notification settings - Fork 22
Processors
Xatkit embeds processors: additional pieces of logic that can be plugged to tune the intent recognition process. Pre-processors operate on the user input to optimize it before intent extraction (e.g. by adding a question mark at the end of a sentence that is obviously a question). Post-processors are designed to operate after the intent recognition process, usually to set additional context parameters (e.g. perform some sentiment analysis, set whether the input is a yes/no question, etc).
Xatkit processors are bundled in a specific project you need to add to your pom.xml
:
<dependency>
<groupId>com.xatkit</groupId>
<artifactId>xatkit-processors</artifactId>
<version>0.0.1-SNAPSHOT</version>
</dependency>
Unless explicitly stated in the documentation, Xatkit processors can be used with any intent recognition engine. The bot properties file accepts the following keys to enable pre/post processors:
xatkit.recognition.preprocessors = PreProcessor1, PreProcessor2
xatkit.recognition.postprocessors = PostProcessor1, PostProcessor2
The value of each property is a comma-separated list of processor names (see the table below for the list of processors embedded in Xatkit and their names).
📚 You can also directly use the processor class to configure your bot programmatically:
import static com.xatkit.core.recognition.IntentRecognitionProviderFactoryConfiguration.*; [...] Configuration configuration = new BaseConfiguration(); // Bot-specific configuration (NLP engine, database, etc) configuration.addProperty(RECOGNITION_PREPROCESSORS_KEY, "PostProcessor1, PostProcessor2");
Data extracted from processors is attached to the current intent and stored in the state context. context.getIntent().getNlpData()
is a map containing all the information extracted by Xatkit processors for the current intent.
The code below shows how to access the property nlp.stanford.isYesNo
(extracted by the IsEnglishYesNoQuestion
post-processor) and use it to tune the bot behavior:
state("myState")
.body(context -> {
if((Boolean) context.getIntent().getNlpData().get("nlp.stanford.isYesNo")) {
// Post a reply matching a yes/no question
} else {
// Post a generic reply
}
})
.next()
[...]
Name | Description | Requirements |
---|---|---|
EmojiToText |
Replaces all emojis by its alias (i.e. the emoji name) or removes all emojis. | |
InternetSlang |
Translates slang terms by their standard-English form (e.g. from "idk who r u" to "I don't know who are you") | A json file containing a dictionary of slang terms (keys) and their respective translations (values). See InternetSlangPreProcessor ussage
|
SpacePunctuation |
Adds spaces around punctuation when needed (e.g. from "?" to " ?"). This processor is enabled by default when using the NlpjsIntentRecognitionProvider . |
Name | Description | Requirements |
---|---|---|
Emoji |
Stores the emoji information in dedicated objects for all emojis found in the intent. Some emojis have sentiment related information. | A file with the sentiment information (available in the resources folder). |
EnglishSentiment |
Sets the parameter nlp.stanford.sentiment to a value in ["Very Negative", "Negative", "Neutral", "Positive", "Very Positive"] corresponding to the sentiment extracted from the user input. |
See Stanford CoreNLP configuration |
IsEnglishYesNoQuestion |
Sets the parameter nlp.stanford.isYesNo to true if the user input is a yes/no question, and false otherwise |
See Stanford CoreNLP configuration |
LanguageDetection |
- Detects the language of the last user input and sets it in nlp.opennlp.langdetect.lastInput .- Stores the detected language for the entire conversation in context.getSession().get("OPENNLP_LAST_N_INPUTS_SCORE_PARAMETER_KEY") . |
A pre-trained language detection model. See LanguageDetectionPostProcessor usage) |
RemoveEnglishStopWords |
Removes English stop words from recognized intent's parameter values that have been extracted from any entities. This processor helps normalizing DialogFlow values when using any entities. |
|
TrimParameterValues |
Removes leading/trailing spaces in extracted parameter values (e.g. from "Barcelona " to "Barcelona"). This processor is enabled by default when using the NlpjsIntentRecognitionProvider . |
|
TrimPunctuation |
Removes punctuation in extracted parameter values (e.g. from "Barcelona!" to "Barcelona"). This processor is enabled by default when using the NlpjsIntentRecognitionProvider . |
|
Toxicity |
Sets nlp.perspectiveapi , nlp.detoxify or both (see ToxicityPostProcessor usage) to an object that contains scores for different toxicity labels. |
For PerspectiveAPI you need an API key. You can ask for it here. For Detoxify you need to deploy a python server to make requests to the model (see this example server). |
Stanford CoreNLP is not embedded by default in Xatkit. Add the following dependencies in your bot's pom.xml
if you want to use a Stanford CoreNLP processor:
<dependency>
<groupId>edu.stanford.nlp</groupId>
<artifactId>stanford-corenlp</artifactId>
<version>3.9.2</version>
<exclusions>
<exclusion>
<groupId>com.google.protobuf</groupId>
<artifactId>protobuf-java</artifactId>
</exclusion>
</exclusions>
</dependency>
<dependency>
<groupId>edu.stanford.nlp</groupId>
<artifactId>stanford-corenlp</artifactId>
<version>3.9.2</version>
<classifier>models</classifier>
<exclusions>
<exclusion>
<groupId>com.google.protobuf</groupId>
<artifactId>protobuf-java</artifactId>
</exclusion>
</exclusions>
</dependency>
📚 You can use models for specific language by adapting the classifier. For example
<classifier>models-chinese</classifier>
imports the chinese models.
See the PerspectiveAPI and Detoxify for more information about the language models.
If you want to use Detoxify, add these properties to your bot with the proper parameters:
botConfiguration.setProperty(USE_DETOXIFY, true);
botConfiguration.setProperty(DetoxifyConfiguration.DETOXIFY_SERVER_URL, "YOUR SERVER URL");
📚 You'll need to wrap your Detoxify model in a REST API. See our prototype implementation for more information.
If you want to use PerspectiveAPI, add these properties to your bot with the proper parameters:
botConfiguration.setProperty(RECOGNITION_POSTPROCESSORS_KEY,"ToxicityPostProcessor");
botConfiguration.setProperty(USE_PERSPECTIVE_API, true);
botConfiguration.setProperty(PerspectiveApiConfiguration.API_KEY, "YOUR PERSPECTIVEAPI KEY");
botConfiguration.setProperty(PerspectiveApiConfiguration.LANGUAGE, "YOUR LANGUAGE (en/es)");
PerspectiveAPI has other optional parameters that can be added to the bot: doNotStore
, clientToken
and
sessionId
. See the official description of the attributes and methods for more information.
📚 See the Perspective API documentation to request an API key.
You can access to the toxicity scores this way:
DetoxifyScore score = (DetoxifyScore) context.getIntent().getNlpData().get("nlp.detoxify");
Double toxicity = score.getToxicity();
and the same way with PerspectiveAPI. See PerspectiveAPI and Detoxify code and the toxicity example bot for more information.
This processor requires a language detection model, you can configure it with the following propery:
botConfiguration.setProperty(LanguageDetectionPostProcessor.OPENNLP_MODEL_PATH_PARAMETER_KEY,"/path/to/model");
📚 You can download the model provided by OpenNLP here
This processor stores a list of possible language for the received user input. By default this list contains the 3 most probable languages. You can customize this number with the following property:
/*
* Number of probable languages returned by the processor (default: 3)
*/
botConfiguration.setProperty(LanguageDetectionPostProcessor.MAX_NUM_LANGUAGES_IN_SCORE, 4);
💡 Detected languages are represented as ISO 639-3 strings.
This processor also stores a global, more accurate prediction in context.getSession().get("OPENNLP_LAST_N_INPUTS_SCORE_PARAMETER_KEY")
. This prediction is computed on the last n
user inputs. By default the processor considers the last 10 inputs, but this value can be customized with the following property:
/*
* Number of user inputs to consider for the global prediction
*/
botConfiguration.setProperty(LanguageDetectionPostProcessor.MAX_NUM_USER_MESSAGES, 10);
If this processor is used, by default all emojis will be removed from the user input. To replace them by an alias, you must set the configuration parameter:
botConfiguration.setProperty(EmojiToTextPreProcessor.REMOVE_EMOJIS, false);
Note that to manually set the remove emojis option, you have to set a true
value.
The information about an emoji (excluding sentiment-related information) is obtained from the emoji-java library.
The sentiment information is obtained from a file located in the resources folder, which contains positive, neutral and negative scores for the positive, neutral and negative sentiment of the emoji, respectively. These scores are in the [0,1] interval, being 0 the lowest possible value and 1 the highest one. The sum of the 3 values is always 1. This class contains positive, neutral and negative score (instead of a single generic) to allow the bot designers to finely tune how sentiments from emojis are handled by their bot.
There is also an attribute obtained from the sentiment file, frequencyInSentimentRanking
, that refers to the
number of occurences of the emoji in the training process of the sentiment scores. It can be used, for instance, to
ignore sentiment values of emojis with low frequencyInSentimentRanking
(e.g. < 5) since the sentiment values
might not be enough truthful.
As this is a pre-processor, it translates the slang terms found in the provided slang dictionary before the intent recognition is performed.
By default this pre-processor uses the dictionary embedded in xatkit-runtime, which has been obtained from the Noslang online dictionary. To use a custom json
file containing a dictionary, set this property of the bot configuration with the absolute path to the json
file.
botConfiguration.setProperty(InternetSlangPreProcessor.SLANG_DICTIONARY_SOURCE, "<Absolute path to your file>");
- Getting Started
- Configuring your bot
- Integrating an Intent Recognition Provider
- Adding a bot to your website
- Deploying on Slack
- Basic concepts
- Intents and Entities
- States, Transitions, and Context
- Default and Local Fallbacks
- Core Library