$ npm install @sinclair/parsebox
ParseBox provides combinators for parsing in Runtime and Static environments.
import { Runtime } from '@sinclair/parsebox'
const T = Runtime.Tuple([
Runtime.Const('X'),
Runtime.Const('Y'),
Runtime.Const('Z')
])
const R = Runtime.Parse(T, 'X Y Z W') // const R = [['X', 'Y', 'Z'], ' W']
import { Static } from '@sinclair/parsebox'
type T = Static.Tuple<[
Static.Const<'X'>,
Static.Const<'Y'>,
Static.Const<'Z'>
]>
type R = Static.Parse<T, 'X Y Z W'> // type R = [['X', 'Y', 'Z'], ' W']
ParseBox is a parsing library designed to embed domain-specific languages (DSLs) within the TypeScript type system. It provides a set of runtime and type-level combinators that enable EBNF notation to be encoded as TypeScript types. These combinators can then be used to parse content at runtime or interactively in editor via static type inference.
This project was developed as a generalized parsing solution for the TypeBox project, where it is currently used to parse TypeScript syntax into runtime types. This project seeks to provide a robust foundation for parsing a variety of domain-specific languages, with information encoded in each language able to be reconciled with TypeScript's type system.
License: MIT
ParseBox offers combinators for runtime and static environments, with each combinator based on EBNF constructs. These combinators produce schema fragments that define parse operations, which ParseBox interprets during parsing. As schematics, the fragments can also be reflected as EBNF or remapped to other tools. The following section examines the Runtime combinators and their relation to EBNF.
The Const combinator parses the next occurrence of a specified string, ignoring whitespace and newline characters unless explicitly specified as parameters.
BNF
<T> ::= "X"
TypeScript
const T = Runtime.Const('X') // const T = {
// type: 'Const',
// value: 'X'
// }
const R = Runtime.Parse(T, 'X Y Z') // const R = ['X', ' Y Z']
The Tuple parser matches a sequence of parsers, with an empty tuple representing Epsilon (the empty production).
BNF
<T> ::= "X" "Y" "Z"
TypeScript
const T = Runtime.Tuple([ // const T = {
Runtime.Const('X'), // type: 'Tuple',
Runtime.Const('Y'), // parsers: [
Runtime.Const('Z'), // { type: 'Const', value: 'X' },
]) // { type: 'Const', value: 'Y' },
// { type: 'Const', value: 'Z' },
// ]
// }
const R = Runtime.Parse(T, 'X Y Z W') // const R = [['X', 'Y', 'Z'], ' W']
The Union combinator tries each interior parser in sequence until one matches
BNF
<T> ::= "X" | "Y" | "Z"
TypeScript
const T = Runtime.Union([ // const T = {
Runtime.Const('X'), // type: 'Union',
Runtime.Const('Y'), // parsers: [
Runtime.Const('Z') // { type: 'Const', value: 'X' },
]) // { type: 'Const', value: 'Y' },
// { type: 'Const', value: 'Z' }
// ]
// }
const R1 = Runtime.Parse(T, 'X E') // const R1 = ['X', ' E']
const R2 = Runtime.Parse(T, 'Y E') // const R2 = ['Y', ' E']
const R3 = Runtime.Parse(T, 'Z E') // const R3 = ['Z', ' E']
The Array combinator parses zero or more occurrences of the interior parser, returning an empty array if there are no matches.
EBNF
<T> ::= { "X" }
TypeScript
const T = Runtime.Array( // const T = {
Runtime.Const('X') // type: 'Array',
) // parser: { type: 'Const', value: 'X' }
// }
const R1 = Runtime.Parse(T, 'X Y Z') // const R1 = [['X'], ' Y Z']
const R2 = Runtime.Parse(T, 'X X X Y Z') // const R2 = [['X', 'X', 'X'], ' Y Z']
const R3 = Runtime.Parse(T, 'Y Z') // const R3 = [[], 'Y Z']
The Optional combinator parses zero or one occurrence of the interior parser, returning a tuple with one element or an empty tuple if there is no match.
EBNF
<T> ::= [ "X" ]
TypeScript
const T = Runtime.Optional( // const T = {
Runtime.Const('X') // type: 'Optional',
) // parser: { type: 'Const', value: 'X' }
// }
const R1 = Runtime.Parse(T, 'X Y Z') // const R1 = [['X'], ' Y Z']
const R2 = Runtime.Parse(T, 'Y Z') // const R2 = [[], 'Y Z']
ParseBox does not have a dedicated combinator for Epsilon; instead, it can be represented using an empty Tuple combinator. Epsilon is typically used as a fall-through case in sequence matching.
EBNF
<T> ::= "X" "Y" | ε
TypeScript
const T = Runtime.Union([
Runtime.Tuple([Runtime.Const('X'), Runtime.Const('Y')]),
Runtime.Tuple([]) // ε - fall-through case
])
const R1 = Runtime.Parse(T, 'X Y Z') // const R1 = [['X', 'Y'], ' Z']
const R2 = Runtime.Parse(T, 'Y Z') // const R2 = [[], 'Y Z']
ParseBox provides combinators for parsing common lexical tokens, such as numbers, identifiers, and strings, enabling static, optimized parsing of typical JavaScript constructs.
Parses numeric literals, including integers, decimals, and floating-point numbers. Invalid formats, like leading zeroes, are not matched.
const T = Runtime.Number()
// ...
const R1 = Runtime.Parse(T, '1') // const R1 = ['1', '']
const R2 = Runtime.Parse(T, '3.14') // const R2 = ['3.14', '']
const R3 = Runtime.Parse(T, '.1') // const R3 = ['.1', '']
const E = Runtime.Parse(T, '01') // const E = []
The String combinator parses quoted string literals, accepting an array of permissible quote characters. The result is the interior string.
const T = Runtime.String(['"'])
// ...
const R = Runtime.Parse(T, '"hello"') // const R = ['hello', '']
Parses valid JavaScript identifiers, typically used to extract variable or function names. The following example demonstrates parsing a let
statement.
<let> ::= "let" <ident> "=" <number>
const Expression = Runtime.Number() // const Expression = { type: 'Number' }
const Let = Runtime.Tuple([ // const Let = {
Runtime.Const('let'), // type: 'Tuple',
Runtime.Ident(), // parsers: [
Runtime.Const('='), // { type: 'Const', value: 'let' },
Expression // { type: 'Ident' },
]) // { type: 'Const', value: '=' },
// { type: 'Number' },
// ]
// }
const R = Runtime.Parse(Let, 'let n = 10') // const R = [[ 'let', 'n', '=', '10'], '' ]
ParseBox supports semantic actions (i.e., mappings) for both static and runtime parsing, enabling parsed content to be transformed into complex structures like abstract syntax trees (ASTs). Below is an explanation of how mapping works in both environments.
Runtime combinators can accept an optional callback as their last argument, which receives the parsed elements and maps them to arbitrary return values. The following example shows how a let statement is parsed and mapped into a syntax node.
const LetMapping = (_0: 'let', _1: string, _2: '=', _3: string) => {
return {
type: 'Let',
ident: _1,
value: parseFloat(_3)
}
}
const Let = Runtime.Tuple([
Runtime.Const('let'), // _0
Runtime.Ident(), // _1
Runtime.Const('='), // _2
Runtime.Number() // _3
], values => LetMapping(...values))
const R = Runtime.Parse(Let, 'let n = 10') // const R = [{
// type: 'Let',
// ident: 'n',
// value: 10
// }, '' ]
Static combinators accept an optional higher-kinded type, IMapping, as the last generic argument. Static mapping uses the this['input']
property to read input values, assigning the mapping to the output
property. The following example demonstrates implementing the Let parser using static actions.
type ParseFloat<Value extends string> = (
Value extends `${infer Value extends number}` ? Value : never
)
interface LetMapping extends Static.IMapping {
output: this['input'] extends ['let', infer Ident, '=', infer Value extends string] ? {
type: 'Let',
ident: Ident
value: ParseFloat<Value>
} : never
}
type Let = Static.Tuple<[
Static.Const<'let'>,
Static.Ident,
Static.Const<'='>,
Static.Number
], LetMapping>
type R = Static.Parse<Let, 'let n = 10'> // type R = [{
// type: 'Let',
// ident: 'n',
// value: 10
// }, '' ]
ParseBox allows exterior values to be passed into and referenced within semantic actions. A context is passed as the last argument to the Static and Runtime parse types/functions, and is propagated into each action. The following demonstrates its usage.
The Runtime Parse function accepts a context as the last argument, which is received as the second argument to the OptionMapping function.
import { Runtime } from '@sinclair/parsebox'
// Context Received as Second Argument
const OptionMapping = (input: 'A' | 'B' | 'C', context: Record<ProeprtyKey, string>) => {
return (
input in context
? context[input]
: undefined
)
}
const Option = Runtime.Union([
Runtime.Const('A'),
Runtime.Const('B'),
Runtime.Const('C')
], OptionMapping)
const R = Runtime.Parse(Option, 'A', { // const R = ['Matched Foo', '']
A: 'Matched Foo',
B: 'Matched Bar',
C: 'Matched Baz',
})
The Static Parse type accepts a context as the last generic argument, which is received via the this['context']
property on the OptionMapping type.
import { Static } from '@sinclair/parsebox'
// Context Received on Context Property
interface OptionMapping extends Static.IMapping {
output: (
this['input'] extends keyof this['context']
? this['context'][this['input']]
: undefined
)
}
type Option = Static.Union<[
Static.Const<'A'>,
Static.Const<'B'>,
Static.Const<'C'>
], OptionMapping>
type R = Static.Parse<Option, 'A', { // type R = ['Matched Foo', '']
A: 'Matched Foo',
B: 'Matched Bar',
C: 'Matched Baz',
}>
ParseBox modules act as containers for Runtime parsers, enabling recursion and mutual recursion by allowing parsers to reference each other via string keys. They are only for Runtime parsers, as Static parsers don’t have ordering issues due to TypeScript’s non-order-dependent types.
In this example, we define a List parser that recursively parses sequences of Item elements. The List parser is either a tuple of a Value followed by another List (recursive) or an empty tuple (base case). Recursion is achieved by referencing both Item and List parsers within the same module.
import { Runtime } from '@sinclair/parsebox'
// Item ::= "X" "Y" "Z"
const Item = Runtime.Union([
Runtime.Const('X'),
Runtime.Const('Y'),
Runtime.Const('Z'),
])
// List ::= Item List | ε
const List = Runtime.Union([
Runtime.Tuple([Runtime.Ref('Item'), Runtime.Ref('List')]), // Recursive Self
Runtime.Tuple([]) // Epsilon
], values => values.flat())
// Embed inside Module
const Module = new Runtime.Module({
Item,
List
})
// Use Module.Parse
const R = Module.Parse('List', 'X Y Z Y X E') // const R = [['X', 'Y', 'Z', 'Y', 'X'], ' E']
The following example demonstrates using ParseBox to parse a mathematical expression with LL(1) parsing techniques, avoiding left recursion and respecting operator precedence rules.
import { Static } from '@sinclair/parsebox'
// Static Mapping Actions to remap Productions
type BinaryReduce<Left extends unknown, Right extends unknown[]> = (
Right extends [infer Operator, infer Right, infer Rest extends unknown[]]
? { left: Left, operator: Operator, right: BinaryReduce<Right, Rest> }
: Left
)
interface BinaryMapping extends Static.IMapping {
output: this['input'] extends [infer Left, infer Right extends unknown[]]
? BinaryReduce<Left, Right>
: never
}
interface FactorMapping extends Static.IMapping {
output: (
this['input'] extends ['(', infer Expr, ')'] ? Expr :
this['input'] extends [infer Operand] ? Operand :
never
)
}
// Expression Grammar
type Operand = Static.Ident
type Factor = Static.Union<[
Static.Tuple<[Static.Const<'('>, Expr, Static.Const<')'>]>,
Static.Tuple<[Operand]>,
], FactorMapping>
type TermTail = Static.Union<[
Static.Tuple<[Static.Const<'*'>, Factor, TermTail]>,
Static.Tuple<[Static.Const<'/'>, Factor, TermTail]>,
Static.Tuple<[]>,
]>
type ExprTail = Static.Union<[
Static.Tuple<[Static.Const<'+'>, Term, ExprTail]>,
Static.Tuple<[Static.Const<'-'>, Term, ExprTail]>,
Static.Tuple<[]>,
]>
type Term = Static.Tuple<[Factor, TermTail], BinaryMapping>
type Expr = Static.Tuple<[Term, ExprTail], BinaryMapping>
// Parse!
type Result = Static.Parse<Expr, 'x * (y + z)'> // type R = [{
// left: "x";
// operator: "*";
// right: {
// left: "y";
// operator: "+";
// right: "z";
// };
// }, ""]
ParseBox is open to community contribution. Please ensure you submit an open issue before submitting your pull request. The ParseBox project prefers open community discussion before accepting new features.