-
-
Notifications
You must be signed in to change notification settings - Fork 0
Home
There are general recommendations for preparing the input data, examples of using the json2csv
package and output data variants of CSV files.
This is the data that the parser will expect on the input to work successfully.
A folder can have any number of files: from one to infinity. The parser perfectly solves any number. Each *.json
file can have any structure, but in the format of a list of objects and must contain a field with content (message) that needs to be filtered and qualified.
For example, ./json_files/123-abc.json
:
[
{
"user": "client",
"created_at": "2022-09-08T08:30:43.944982+00:00",
"type": "botrequest",
"message": "Hello, my name is Viktor."
},
{
"user": "operator",
"created_at": "2022-09-08T11:04:12.682817+00:00",
"type": "botstate",
"message": "What would you be interested in?"
},
{
"user": "user",
"created_at": "2022-09-08T11:24:12.817682+00:00",
"type": "botrequest",
"message": "And what training programs do you have?"
},
// ...
]
Let's create a new JSON file with name intents-file.json
and this structure for key-values:
{
"greetings": [
"hi", "hello", "hey"
],
"questions": [
"what", "why", "what for", "where", "when"
],
"farewells": [
"bye", "goodbye", "see you later"
]
}
In this JSON structure:
- key is a name of the intent (in the output CSV file);
- values are list of the words that must be qualified (assigned to the intent);
Let's create a new JSON file with name filter-file.json
and only this structure for key-values:
{
"skip_prefixes": [
"+", "-"
],
"skip_suffixes": [
"!", ")"
],
"skip_words": [
"hate", "death", "shut up"
]
}
⚠️ Please note: keys should not be changed because they're used internally to filter through the list in thejson2csv
package.
In this JSON structure:
- key is a name of the list to filter input strings:
-
skip_prefixes
is a list of the prefixes; -
skip_suffixes
is a list of the suffixes; -
skip_words
is a list of the words;
-
- values are list of the words (prefixes, suffixes) that must be skipped;
This is the data that you get in the output after the parser works.
After the parser works successfully, in the folder specified in the -output
parameter, you will find CSV files with the following content:
message | intent |
---|---|
Hello, my name is Viktor. | greetings |
What would you be interested in? | questions |
And what training programs do you have? | questions |
In this CSV structure:
-
message
is a column with an original content from JSON file(s) (name will be taken from the-content-field
option); -
intent
is a column with a qualified intent for the original content;
Sometimes it happens, it's true… but don't get upset! 🙂
Wiki is a dynamically growing section, so write us your question, and we'll try to answer it. If the question is interesting and/or repeated too often, we'll add it to a FAQ section (with reference to the author's issue).
The best way to ask a question is to create a new issue or a discussion in GitHub repository.
So, follow this checklist to ask a question that we can answer in the shortest possible time:
- Try to search similar question in the issues section.
- If your question is about Go, or else, try to search at StackOverflow first.
- Please fully follow our template to create a new issue.
- If English is not your native language, please use an online translator in advance (for example, DeepL or Google Translate).
- Demonstrate understanding to authors, because this is Open Source and not-for-profit product, and their support is not paid.
- Be nice to the other members of our community.