Skip to content
Vic Shóstak edited this page May 9, 2023 · 5 revisions

Welcome to the json2csv wiki!

There are general recommendations for preparing the input data, examples of using the json2csv package and output data variants of CSV files.

Input data

This is the data that the parser will expect on the input to work successfully.

Folder with *.json files

A folder can have any number of files: from one to infinity. The parser perfectly solves any number. Each *.json file can have any structure, but in the format of a list of objects and must contain a field with content (message) that needs to be filtered and qualified.

For example, ./json_files/123-abc.json:

[
  {
    "user": "client",
    "created_at": "2022-09-08T08:30:43.944982+00:00",
    "type": "botrequest",
    "message": "Hello, my name is Viktor."
  },
  {
    "user": "operator",
    "created_at": "2022-09-08T11:04:12.682817+00:00",
    "type": "botstate",
    "message": "What would you be interested in?"
  },
  {
    "user": "user",
    "created_at": "2022-09-08T11:24:12.817682+00:00",
    "type": "botrequest",
    "message": "And what training programs do you have?"
  },

  // ...
]

Intents

Let's create a new JSON file with name intents-file.json and this structure for key-values:

{
  "greetings": [
    "hi", "hello", "hey"
  ],
  "questions": [
    "what", "why", "what for", "where", "when"
  ],
  "farewells": [
    "bye", "goodbye", "see you later"
  ]
}

In this JSON structure:

  • key is a name of the intent (in the output CSV file);
  • values are list of the words that must be qualified (assigned to the intent);

Filter

Let's create a new JSON file with name filter-file.json and only this structure for key-values:

{
  "skip_prefixes": [
    "+", "-"
  ],
  "skip_suffixes": [
    "!", ")"
  ],
  "skip_words": [
    "hate", "death", "shut up"
  ]
}

⚠️ Please note: keys should not be changed because they're used internally to filter through the list in the json2csv package.

In this JSON structure:

  • key is a name of the list to filter input strings:
    • skip_prefixes is a list of the prefixes;
    • skip_suffixes is a list of the suffixes;
    • skip_words is a list of the words;
  • values are list of the words (prefixes, suffixes) that must be skipped;

Output data

This is the data that you get in the output after the parser works.

Folder with output CSV files

After the parser works successfully, in the folder specified in the -output parameter, you will find CSV files with the following content:

message intent
Hello, my name is Viktor. greetings
What would you be interested in? questions
And what training programs do you have? questions

In this CSV structure:

  • message is a column with an original content from JSON file(s) (name will be taken from the -content-field option);
  • intent is a column with a qualified intent for the original content;