Skip to content
/ parseit Public

Parseit - Parseit is command line tool to parse data using EBNF or ABNF using the excellent Instaparse library, and serializing the result into JSON, EDN, YAML or Transit format

License

Notifications You must be signed in to change notification settings

theasp/parseit

Repository files navigation

Parseit

Introduction

Parseit is command line tool to parse data using EBNF or ABNF using the excellent Instaparse library, and serializing the result into JSON, EDN, YAML or Transit format.

Usage

Usage: parseit [options] <grammar> [input]
       parseit [options] --preset <preset> [input]
       parseit --help

Options
  -p, --preset PRESET           none         Preset grammar to use
  -f, --format FORMAT           json-pretty  Select the output format
  -S, --start RULE                           Start processing at this rule
  -t, --transform RULE:TYPE[+]               Transform a rule into a type, and keep wrapped with + suffix
  -T, --no-standard-tx                       Do use the standard transformations
  -a, --all                                  Return all parses rather than the best match
  -s, --split REGEX                          Process input as a stream, parsing each chunk seperated by REGEX
  -l, --split-lines                          Split on newlines, same as --split '(?<=\r?\n)'
  -e, --encoding ENCODING       utf8         Use the specified encoding when reading the input, or raw
  -X, --style TYPE              hiccup       Build the parsed tree in the style of hiccup or enlive
  -h, --help                                 Help

Output Formats
  edn              Extensible Data Format
  edn-pretty       Extensible Data Format (pretty)
  json             JavaScript Object Notation
  json-pretty      JavaScript Object Notation (pretty)
  transit          Transit JSON
  transit-verbose  Transit JSON Verbose
  yaml             YAML Ain't Markup Language

Transformation Types
  array, list, unwrap, vec        Create list from children
  decimal, double, float, number  Convert to floating point
  dict, map, object               Create map from children
  first                           No conversion, just the first item
  int, integer                    Convert to integer
  keyword                         Create a keyword (only useful for EDN or Transit)
  map-kv                          Transform a list of key value pairs into a map
  merge                           Merge multiple maps into a single map
  nil, null                       Convert to nil
  str, string, text               Convert to string

Presets
  csv     Comma Seperated Value
  group   NSS group(5), i.e. /etc/group
  hosts   NSS hosts(5), i.e. /etc/hosts
  hpl     High Performance Linpack benchmark results
  passwd  NSS passwd(5), i.e. /etc/passwd

Example

Parsing passwd

You can parse passwd entries using the following as passwd.ebnf:

passwd = (user <EOL>)*
user = name <SEP> pw <SEP> uid <SEP> gid <SEP> gecos <SEP> home <SEP> shell

name = STRING
pw = STRING
uid = INTEGER
gid = INTEGER
gecos = STRING?
home = STRING
shell = STRING
STRING = #'[^:\r\n]+'
INTEGER = #'[0-9]+'
SEP = ':'
EOL = #'(?:\r\n|\r|\n)'

We will use getent passwd root to get the entry for the root user:

$ getent passwd root
root:x:0:0:root:/root:/bin/bash

We can then pipe that into Parseit and see the result:

$ getent passwd root | parseit passwd.ebnf 
["passwd",["user",["name","root"],["pw","x"],["uid",0],["gid",0],["gecos","root"],["home","/root"],["shell","/bin/bash"]]]

Not a bad start! You will notice that the UID and GID values were converted into integers. There is a library of standard transformations and the INTEGER rule will be transformed into an integer, and the STRING rule will be turned into a string. You can avoid this by using --no-standard-tx:

$ getent passwd root | node target/parseit.js --no-standard-tx passwd.ebnf 
["passwd",["user",["name",["STRING","root"]],["pw",["STRING","x"]],["uid",["INTEGER","0"]],["gid",["INTEGER","0"]],["gecos",["STRING","root"]],["home",["STRING","/root"]],["shell",["STRING","/bin/bash"]]]]

If you use the options --transform user:map and --transform passwd:list, Parseit will turn the user rule into a map (dictionary) and passwd rule into a list of users:

$ getent passwd root | parseit --transform user:map --transform passwd:list passwd.ebnf 
[{"name":"root","pw":"x","uid":0,"gid":0,"gecos":"root","home":"/root","shell":"/bin/bash"}]

You don’t actually need the EBNF file or any of those options though! There is a passwd preset which does it all for your:

$ getent passwd root | parseit --preset passwd
[{"name":"root","pw":"x","uid":0,"gid":0,"gecos":"root","home":"/root","shell":"/bin/bash"}]

We are parsing a single user, but it’s currently being returned in an array. We can tell Parseit to start parsing at a specific rule using the argument --start RULE:

$ getent passwd root | parseit --preset passwd --start user
Parse error at line 1, column 23:
root:x:0:0:root:/root:/bin/bash
                      ^
Expected:
#"[^:\r\n]+" (followed by end-of-string)

Hey, that didn’t work! The problem is that the input ends with a newline and the user rule does not allow that. We can use the --split REGEX argument to split the input on newlines.

$ getent passwd root | parseit --preset passwd --start user --split '\n'
{"name":"root","pw":"x","uid":0,"gid":0,"gecos":"root","home":"/root","shell":"/bin/bash"}

With --split, each chunk will be processed as they are read. You can see this by doing:

$ (getent passwd root; sleep 10; getent passwd bin) | parseit --preset passwd --start user --split '\n'
{"name":"root","pw":"x","uid":0,"gid":0,"gecos":"root","home":"/root","shell":"/bin/bash"}
{"name":"bin","pw":"x","uid":2,"gid":2,"gecos":"bin","home":"/bin","shell":"/usr/sbin/nologin"}

Maybe you don’t like reading JSON? You can use the YAML output format to make it more readable:

$ getent passwd root | parseit --preset passwd --format yaml
---
- name: root
  pw: x
  uid: 0
  gid: 0
  gecos: root
  home: /root
  shell: /bin/bash

Grammar

Parseit uses Instaparse, so the notation section of the tutorial has a good description of the grammar syntax. Keep in mind that you will not need to escape strings as you would in Clojure as the grammar will be read out of a text file.

Building

This will install Shadow CLJS and then build the JavaScript as target/parseit.js and a native executable (using nexe) as parseit:

$ npm install -g shadow-cljs
$ npm install --save-dev shadow-cljs
$ shadow-cljs release cli

About

Parseit - Parseit is command line tool to parse data using EBNF or ABNF using the excellent Instaparse library, and serializing the result into JSON, EDN, YAML or Transit format

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published