Parseit is command line tool to parse data using EBNF or ABNF using the excellent Instaparse library, and serializing the result into JSON, EDN, YAML or Transit format.
Usage: parseit [options] <grammar> [input] parseit [options] --preset <preset> [input] parseit --help Options -p, --preset PRESET none Preset grammar to use -f, --format FORMAT json-pretty Select the output format -S, --start RULE Start processing at this rule -t, --transform RULE:TYPE[+] Transform a rule into a type, and keep wrapped with + suffix -T, --no-standard-tx Do use the standard transformations -a, --all Return all parses rather than the best match -s, --split REGEX Process input as a stream, parsing each chunk seperated by REGEX -l, --split-lines Split on newlines, same as --split '(?<=\r?\n)' -e, --encoding ENCODING utf8 Use the specified encoding when reading the input, or raw -X, --style TYPE hiccup Build the parsed tree in the style of hiccup or enlive -h, --help Help Output Formats edn Extensible Data Format edn-pretty Extensible Data Format (pretty) json JavaScript Object Notation json-pretty JavaScript Object Notation (pretty) transit Transit JSON transit-verbose Transit JSON Verbose yaml YAML Ain't Markup Language Transformation Types array, list, unwrap, vec Create list from children decimal, double, float, number Convert to floating point dict, map, object Create map from children first No conversion, just the first item int, integer Convert to integer keyword Create a keyword (only useful for EDN or Transit) map-kv Transform a list of key value pairs into a map merge Merge multiple maps into a single map nil, null Convert to nil str, string, text Convert to string Presets csv Comma Seperated Value group NSS group(5), i.e. /etc/group hosts NSS hosts(5), i.e. /etc/hosts hpl High Performance Linpack benchmark results passwd NSS passwd(5), i.e. /etc/passwd
You can parse passwd
entries using the following as passwd.ebnf
:
passwd = (user <EOL>)*
user = name <SEP> pw <SEP> uid <SEP> gid <SEP> gecos <SEP> home <SEP> shell
name = STRING
pw = STRING
uid = INTEGER
gid = INTEGER
gecos = STRING?
home = STRING
shell = STRING
STRING = #'[^:\r\n]+'
INTEGER = #'[0-9]+'
SEP = ':'
EOL = #'(?:\r\n|\r|\n)'
We will use getent passwd root
to get the entry for the root user:
$ getent passwd root root:x:0:0:root:/root:/bin/bash
We can then pipe that into Parseit and see the result:
$ getent passwd root | parseit passwd.ebnf ["passwd",["user",["name","root"],["pw","x"],["uid",0],["gid",0],["gecos","root"],["home","/root"],["shell","/bin/bash"]]]
Not a bad start! You will notice that the UID and GID values were converted into integers. There is a library of standard transformations and the INTEGER
rule will be transformed into an integer, and the STRING
rule will be turned into a string. You can avoid this by using --no-standard-tx
:
$ getent passwd root | node target/parseit.js --no-standard-tx passwd.ebnf ["passwd",["user",["name",["STRING","root"]],["pw",["STRING","x"]],["uid",["INTEGER","0"]],["gid",["INTEGER","0"]],["gecos",["STRING","root"]],["home",["STRING","/root"]],["shell",["STRING","/bin/bash"]]]]
If you use the options --transform user:map
and --transform passwd:list
, Parseit will turn the user
rule into a map (dictionary) and passwd
rule into a list of users:
$ getent passwd root | parseit --transform user:map --transform passwd:list passwd.ebnf [{"name":"root","pw":"x","uid":0,"gid":0,"gecos":"root","home":"/root","shell":"/bin/bash"}]
You don’t actually need the EBNF file or any of those options though! There is a passwd
preset which does it all for your:
$ getent passwd root | parseit --preset passwd [{"name":"root","pw":"x","uid":0,"gid":0,"gecos":"root","home":"/root","shell":"/bin/bash"}]
We are parsing a single user, but it’s currently being returned in an array. We can tell Parseit to start parsing at a specific rule using the argument --start RULE
:
$ getent passwd root | parseit --preset passwd --start user Parse error at line 1, column 23: root:x:0:0:root:/root:/bin/bash ^ Expected: #"[^:\r\n]+" (followed by end-of-string)
Hey, that didn’t work! The problem is that the input ends with a newline and the user rule does not allow that. We can use the --split REGEX
argument to split the input on newlines.
$ getent passwd root | parseit --preset passwd --start user --split '\n' {"name":"root","pw":"x","uid":0,"gid":0,"gecos":"root","home":"/root","shell":"/bin/bash"}
With --split
, each chunk will be processed as they are read. You can see this by doing:
$ (getent passwd root; sleep 10; getent passwd bin) | parseit --preset passwd --start user --split '\n' {"name":"root","pw":"x","uid":0,"gid":0,"gecos":"root","home":"/root","shell":"/bin/bash"} {"name":"bin","pw":"x","uid":2,"gid":2,"gecos":"bin","home":"/bin","shell":"/usr/sbin/nologin"}
Maybe you don’t like reading JSON? You can use the YAML output format to make it more readable:
$ getent passwd root | parseit --preset passwd --format yaml --- - name: root pw: x uid: 0 gid: 0 gecos: root home: /root shell: /bin/bash
Parseit uses Instaparse, so the notation section of the tutorial has a good description of the grammar syntax. Keep in mind that you will not need to escape strings as you would in Clojure as the grammar will be read out of a text file.
This will install Shadow CLJS and then build the JavaScript as target/parseit.js
and a native executable (using nexe) as parseit
:
$ npm install -g shadow-cljs $ npm install --save-dev shadow-cljs $ shadow-cljs release cli