readExpr :: FilePath -> Text -> Either Text Syntax
Syntax
is an s-expression tree. We distinguish between square brackets, which are intended to be used whenever the arity is fixed, from parentheses, which are intended to be used whenever the arity is variable. So if you are experimenting with an unfamiliar piece of syntax, you can try to duplicate an element found between parentheses, but you probably shouldn't try to duplicate an element found between square brackets.
At this stage, the identifiers at the leaves are simply Text
. Other leaf values include booleans, strings, and "signals". A signal is simply a number, but we call it a signal because we intend to use those numbers as labels for the wait-signal
and send-signal
operations, we don't intend to use them in arithmetic operations.
Each node in the syntax tree is annotated with its source location and with a "scope set" indicating which variables are in scope. At this point, we have only parsed the s-expression, we have not attributed meaning such as binding-sites vs use sites to the various identifiers, and so at this point all the scope sets are empty.
Scope sets are used during expansion to determine which binding an identifier refers to. See the section on macro expansion for a quick description of their use, or "Binding as Sets of Scopes" (Flatt, POPL '16) for a complete description.
TODO: Presumably, we use those scope sets later on. Give an example which explains why we need Syntax
values to have scope sets.
Macro expansion is the process of converting user-written syntax into the core language. Unlike traditional Lisps, macro expansion does not necessarily proceed from the outside of the expression towards the inside, from left to right. Instead, macro expansion is controlled by a task queue and a split expression. Split expressions are trees in the core language in which some subtrees are not yet known (please see the section that describes them for details). Some tasks in the queue have dependencies; these will be skipped and requeued until their dependencies are satisfied.
The entry point to expanding expressions is
expandExpr :: Syntax -> Expand SplitCore
It initializes the expander state with an empty split expression and a single task: to expand the input.
To expand an expression, the first step is to determine which procedure can expand it (this procedure is called the transformer). Then, the syntax itself is passed to the transformer, which mutates the current expander state to make progress, arranging for any necessary further tasks to be enqueued.
Transformers are determined as follows:
-
If the expression is an identifer, it is resolved (see the relevant subsection). If the identifier resolves to a variable, the current core expression is filled in with a reference to the variable. If the identifer resolves to anything else, the corresponding transformer is invoked on the syntax being expanded.
-
If the expression is a list or vector, there are two possibilities:
-
The expression's head is an identifier. In this case, the identifier is resolved. If it resolves to a variable, then
#%app
is inserted at the beginning of the list or vector, and it is re-expanded. If it resolves to any other transformer, then that transformer is invoked on the syntax being expanded. -
The expression's head is not an identifier. In this case,
#%app
is inserted at the beginning and the syntax is re-expanded.
-
-
If the expression is a signal literal, then the current expression is replaced by the signal. This should perhaps be replaced by a
#%signal
builtin, similar to#%app
. -
If the expression is anything else, then expansion fails.
TODO: explain the the expansion process, and the difference between the various Unique
s we haven't explained yet: Binding
, ModulePtr
, DeclPtr
, ModBodyPtr
, and TaskID
.
The most important operation in hygienic macro expansion is being able to reliably determine the referent of an identifier. Each identifier has a scope set, and the expander maintains a binding table that relates names as written by users to a scope set and a binding. The range of the binding table represents the bindings that could in theory be available for an identifier with the appropriate name.
To resolve an identifier x, the first step is to find all candidate bindings. Candidate bindings are those that are named x whose scope sets are subsets of x's scopes. Once the candidate bindings are found, best candidate is returned. If there is no unique best candidate, then expansion fails.
The best candidate binding is the one whose scope set is the largest. This allows inner bindings to shadow outer bindings.
Quoted syntax expands to itself. In other words, quotation leaves intact the scope sets that prior expansion has added. This means that quotations maintain lexical scoping with respect to their original position in the program.
In traditional Lisp systems, function application is unnamed, and is indicated by providing an expression with a non-macro car
. In Racket, on the other hand, function application is a named operator (#%app
) that is implicitly inserted in positions that a traditional Lisp would unconditionally consider to be an application. Importantly, #%app
is inserted with the same scope set as the list that contains its arguments. This allows languages to provide their own notion of application, and even to locally override it using a macro.
In this language, the built-in #%app
is a binary operator, taking only one function and argument. It can be overridden with a macro to create Curried application.
When expanding a binding form such as lambda
, a fresh scope is created to represent the fact of the binding. This scope serves two roles at once:
-
Applied to the binding occurrence, the scope ensures that only those identifiers that have been provided with the scope can refer to the binding. This is applied at all phases, because scopes on binders serve to limit scope, so using all phases makes the binding occurrence as restrictive as possible.
-
Applied to the expression within which the bound name is accessible, the scope provides access to the binding occurrence. The use-site addition of the scope is performed only at the phase in which the binding should be accessible, because scopes on potential use sites serve to provide access to previously-limited scopes.
Understood this way, scopes can be seen as a generalization of de Bruijn indices, in which a set is augmented with a scope instead of shifting a binding. Scope sets, however, work for multiple dimensions of binding, as introduced by macro expansion.
TODO (but see Flatt, POPL '16)
eval :: Core -> Eval Value
Eventually the Syntax
gets simplified into a core language which we can evaluate. Most of the complexity happens between Syntax
and Core
, so we'll keep that part for last.
Core
expressions consist of function applications, pattern-matching, and constructors for the language's primitive types. Those primitive types include functions, booleans, signals, reified Syntax
objects, and "macro actions". A macro action is a monadic block performing zero or more side-effects. These side-effects do not occur during evaluation, but rather during expansion, so we'll cover macro actions in more details in the Expanding
section.
-- The following definitions are simplified for ease of exposition.
data Ident = Ident ScopeSet SrcLoc Text
newtype Binding = Binding Unique
newtype Var = Var Unique
newtype MacroVar = MacroVar Unique
newtype BindingTable = BindingTable (Map Ident Binding)
newtype ExpansionEnv = ExpansionEnv (Map Binding EValue)
data EValue
= ...
| ELambdaMacro
| EVarMacro Var
| EDefineMacrosMacro
| EUserMacro MacroVar
data Core
= ...
| CoreLam Var Core
| CoreVar Var
Ident
is the representation of identifiers inside a Syntax
object.
Depending on the code into which it is spliced, an Ident
may be unbound, ambiguous, or it may resolve to a Binding
and a corresponding EValue
. Simpler languages can afford to have identifiers resolve to values directly, but in Klister, free-identifier=?
can determine whether two Ident
s resolve to the same binding site, so it is important to be able to distinguish two binding sites which happen to bind their Ident
to the same EValue
.
The E
in EValue
stands for "expansion-time", not "evaluation-time". This means that when expanding an identifier-headed expression, the expander resolves that identifier to an EValue
, and that tells the expander which macro to call. For example, the kernel's lambda
resolves to ELambdaMacro
.
In a core expression, variables don't have names, they use unique Var
s to avoid accidental capture. Thus, ELambdaMacro
expands to the partial core expression CoreLam <var>
, with a fresh Var
.
This core expression is partial because the body is missing. The body may contain more macro calls, and thus needs to be expanded. It must be expanded in an extended ExpansionEnv
in which the variable bound by the lambda resolves to (a fresh Binding
and a corresponding) EVarMacro <var>
, which in turn expands to CoreVar <var>
.
Finally, define-macros
is similar to lambda
in that it binds a variable which may be used later on. The variable bound by define-macros
resolves to (a fresh Binding
and a corresponding) EUserMacro <macro-var>
, with a fresh MacroVar
.
Now is the time to talk about the complexity which happens between Syntax
and Core
. Part of that complexity is the fact that there are two more intermediate formats in-between: Syntax
becomes SplitCore
, then PartialCore
, then Core
.
unsplit :: SplitCore -> PartialCore
split :: PartialCore -> IO SplitCore
nonPartial :: Core -> PartialCore
runPartialCore :: PartialCore -> Maybe Core
A PartialCore
expression is a tree with the same shape as a Core
expression, except that sub-trees may be missing. The reason those sub-trees are missing is that we haven't computed them from the Syntax
yet. SplitCore
is a more useful form of PartialCore
in which the missing sub-trees are labelled with a SplitCorePtr
, so that we may graft sub-trees back in once they are computed. Every node in a SplitCore
expression is labelled by a unique SplitCorePtr
, not just the missing sub-trees.
newtype SplitCorePtr = SplitCorePtr Unique
TODO: explain how the sub-trees get computed.
Some transformers are user-written macros. These macros have the type Syntax -> Macro Syntax
. In other words, they receive a syntax object and then compute a new syntax object using actions in the macro monad.
The macro monad includes operations to compare the binding information of identifiers, to send a signal (see next section), to block until receipt of a signal, and to raise a syntax error. When expanding a user macro, the macro is applied to the syntax object being expanded, and the resulting macro monad action is interpreted in the metalanguage expansion monad. This can have three results:
-
Expansion fails, in which case the error is thrown as an expander error
-
Expansion succeeds, in which case the resulting syntax object is queued up for the same expression slot
-
Expansion gets stuck, in which case the interpreter indicates the necessary conditions for it to resume and returns a continuation to be invoked when these conditions are satisfied. The stuck expansion and its conditions are re-added to the job queue.
-
bound-identifier=? : Syntax -> Syntax -> Macro Bool
determines whether its arguments are identifiers such that each would bind the other at the current phase. In practice, this means that their scope sets are identical. If either argument is not an identifier, fails. -
free-identifier=? : Syntax -> Syntax -> Macro Bool
determines whether its arguments are identifiers that refer to the same binding in the current context. If either argument is not an identifier, fails. -
syntax-error : ∀α. Syntax -> Syntax * -> Macro α
fails, using its first argument as the reason for failure and the remaining arguments as source locations involved in the failure. This should probably be rewritten to take a list instead of being variable arity, once we have lists in the language. -
send-signal : Signal -> Macro Unit
sends a signal. -
wait-signal : Signal -> Macro Unit
blocks execution of the macro until a particular signal has been sent. -
pure : ∀α. α -> Macro α
is the unit of the macro monad. -
>>= : ∀α β. Macro α -> (α -> Macro β) -> Macro β
is the bind operator of the macro monad.
Many tasks have dependencies that must be satisfied before the expander can carry them out. For instance, the scope in which a macro is bound cannot be expanded before the macro itself has been expanded and evaluated. The "job queue" approach to macro expansion means that the order of expansion is not necessarily predictable from the source code itself, so operators cannot be arranged to take care of these concerns on their own.
In addition to the natural dependencies that the primitive macro operations impose on their tasks, users can impose additional dependencies. While we eventually plan to use this to allow macro expansion to block until a type variable has been solved by unification, we presently have a simpler system that captures the essence of the problem: the signal system. The expander maintains a set of signals that have been sent, and an action in the macro monad can add a new signal to this set. A further action can cause macro monad evaluation to block until a signal has been received; in this case, a continuation is captured and added to the task queue with an indication of the signal that is being waited for. If the signal has been sent, the captured continuation will be reinvoked in the next pass through the queue.
The module system is based on that described by Matthew Flatt in "Composable and Compilable Macros: You Want it When?" (ICFP 2002). Source modules are written in a language that is intended to provide all the initial bindings available in the file. The kernel
language is primitive and contains all the built-in operators.
There is a cache of the expanded and evaluated forms of each module in the system. At the start of expansion, this cache contains only kernel
.
There are two parts of the process of bringing a module into scope: loading it, and visiting it. Loading refers to the process of parsing the module's source code and expanding it to the core language, while visiting is the process of evaluating the contents of a module at a particular phase and producing a collection of bindings exported from the module. Loading occurs at phase 0, while visiting a module can be shifted relative to some base phase.
Loading an uncached module consists of the following steps:
-
Visit the module's language, unless that language is
kernel
. -
Allocate a scope to serve as the root for the module (the "outside-edge scope" described in Flatt '16), and bind all the language's exports using this scope and the current-phase scope that's appropriate for the export.
-
Expand the contents of the module.
-
Add the expanded module to the cache.
Visiting an uncached module relative to phase p consists of the following steps:
-
Load the module.
-
Shift the expanded module p phases.
-
Evaluate the module's contents and insert definitions into the environment.
-
Construct the set of exports, and return it. Add them to the cache.