Library Authoring Guide¶
A Capy library is the entire grammar of one source language, plus the recipe for generating output from it. This doc is the reference walkthrough.
Libraries are written in .capy — Capy's native syntax. Multi-line
write blocks read natively, same indentation and string rules as the
source files the library will parse. Every example below is .capy.
File shape¶
A complete Capy library has these top-level sections (all optional
except at least one function):
extension py # informational; suggests target extension
output_file "" # if set, capy writes here instead of stdout
comments # OPTIONAL — opt user scripts into a
line "#" # comment syntax. With no `comments`
end # block, user scripts have NO comments
# (the engine ships zero defaults).
context # initial schema for the accumulated context
imports []
classes []
end
type Email # library-defined argument types
pattern "^[^@]+@[^@]+\\.[^@]+$"
end
function greet # one DSL statement shape
arg literal "greet"
arg capture name string
write `Hello, ${name}!
`
end
file_template # final-output assembler
write body
end
Comments — opt-in, not automatic¶
Capy ships zero predefined grammar, and that includes comment
syntax. Out of the box, a user script that starts a line with #
or // will fail to lex with unexpected character '#'. To allow
comments in scripts written against your library, declare them:
Each line "MARKER" entry adds one line-comment prefix. The marker
matches at the start of a line (after any indent) and at any point
on a line; everything from the marker to the end of the line is
discarded by the lexer.
This declaration only affects user scripts. The library
manifest itself, and inner-DSL run:/command bodies, always use
# for comments — that's Capy's own config syntax.
Why opt-in: a library targeting a host language that uses # as
a meaningful character (e.g. CSS selectors, Markdown headings,
Mermaid flowcharts) might want to forbid # comments to avoid
ambiguity. The grammar belongs to the library, end-to-end.
Functions¶
Each function NAME … end block defines one DSL statement shape.
NAME is the function's reference name (used for the auto-name-prepend
rule and to name a block's closer); it's not necessarily what appears
in source.
The function body is a sequence of inner-DSL statements. Two common forms:
- **
write \...`** appends to the output body. Backtick literals can span multiple lines and accept${EXPR}` interpolation. set/append/prepend/merge/deletemutate the accumulated context.
function greet
arg capture name Email # built-in OR library-declared type
write `Hello, ${name}!
`
append context.greetings name
priority 0 # higher wins; default 0
end
The body is the only place statements go. Renders (write) and
state mutations (set / append / …) interleave freely; the engine
walks the AST twice per function call — once to produce output, once
to update context.
arg — the match shape¶
Each arg line takes one of two forms:
| Form | Meaning |
|---|---|
arg literal "TEXT" |
A literal token to match exactly in source. |
arg capture NAME TYPE |
Bind a captured value of type TYPE to NAME. |
Both forms accept an optional trailing description string for auto-generated docs:
Optional args with defaults¶
A trailing capture can declare a default, making it omittable at
the call site. The full arg capture shape is:
When the call site ends before reaching the optional arg, the engine binds the default instead of consuming a token. This collapses families of near-duplicate components into one function:
function button
arg literal "button"
arg capture label string
arg capture variant string default "primary"
arg capture kind string default "button"
template
<button type="${decoded kind}" class="btn btn-${decoded variant}">${escapeHtml (decoded label)}</button>
end
end
button "Save" → btn-primary, type=button
button "Delete" "danger" → btn-danger, type=button
button "Submit" "primary" "submit" → btn-primary, type=submit
Optional args must be trailing — a required capture (or a
literal) after an optional one is a load-time error, since the
matcher couldn't tell which slot a supplied value fills. A string
default is delivered in the same quoted form a real string capture
carries, so ${x} and ${decoded x} behave identically whether the
arg was supplied or defaulted.
Auto-name-prepend¶
If a function declares zero arg literal lines, the engine
prepends a literal of the function's name. So:
…matches greet <any> — the function name becomes the leading token.
As soon as you add any arg literal, you own the entire shape. This
is how you build operator-style functions:
function assign
arg capture var ident
arg literal "="
arg capture value any
write `${var} = ${value}
`
end
…matches <ident> = <any>. No leading assign token in source.
bare — opt out of auto-name-prepend¶
If you want a function whose args are all captures AND you DON'T want
a leading keyword, declare it bare. The function then matches
purely by shape — useful for grammars whose data lines have no
syntactic anchor:
function row
bare
arg capture a string
arg capture b string
arg capture c string
write `<button>${unquote a}</button><button>${unquote b}</button><button>${unquote c}</button>
`
end
Source: "1" "2" "3" parses as one call to row with three
captures — no row keyword needed.
Use sparingly: a function declared bare will try to match every
statement that starts with a capture-compatible token, so prioritise
specific shapes ahead of it if there's ambiguity.
Built-in types¶
any, ident, raw, string, int, float, bool.
| Type | What it captures |
|---|---|
any |
Any single value expression (number, string, ident, list, object, bool, null, dotted ident, parenthesized sub-call). |
ident |
A single identifier token; bound as a string. |
raw |
One identifier OR string token; bound as a string. |
tail |
Every remaining token on the statement, reconstructed with original column-position spacing. Use for free-form trailing values (e.g. CSS 20px, 1px solid red) that don't fit any's single-value-expression grammar. |
string |
A quoted string literal — OR a bare identifier (transpile-mode permissive). |
int/float/bool |
The respective literal — OR a bare identifier. |
Bare identifiers always pass primitive type checks because at the target language's runtime they could refer to a value of any type.
See types.md for library-defined types with
pattern/options.
write — what goes into the body¶
write EXPR appends EXPR to the function's output. EXPR is most
commonly a backtick literal with ${EXPR} interpolations:
Inside a backtick string you can use:
${name}— a capture by name.${body}— the inner block's rendered output (block functions only).${context.X}— the read-only accumulated context.${func arg arg}— call a template helper inline. The same helpers that work intext/template(indent,pascalCase,toQuoted,toJSON,lower,upper,join,split,unescape, …) are all available — see templates.md.
The bytes inside the backticks are emitted verbatim. There is no whitespace-trimming sigil — if you don't want a trailing newline, don't put one in.
template … end — write-literal sugar¶
For functions whose output is a multi-line HTML / text block, the
backtick bookkeeping (write \ …, closing \ on its own line, escape
hatches for embedded backticks) crowds out the actual content. The
template … end sugar drops the ceremony:
function card
arg capture title string
block_closer end
template
<div class="card">
<h3>${escapeHtml (decoded title)}</h3>
<div>${indent 2 body}</div>
</div>
end
end
Semantically identical to the hand-authored equivalent:
function card
arg capture title string
block_closer end
write `<div class="card">
<h3>${escapeHtml (decoded title)}</h3>
<div>${indent 2 body}</div>
</div>
`
end
Properties:
- Where it's valid: anywhere a
writestatement is — function bodies,file_template, andfile "X" … endblocks. Mixes freely with other statements (set/append/if/for). - Opener: a bare
templateline, no trailing args. - Closer: a bare
endat the same indent astemplate. Lines whoseendis at a deeper indent are body content. Nestedtemplate … endat the same indent are balanced via depth. - Body: captured verbatim, then auto-dedented to flush-left (the smallest non-blank leading indent is stripped). Relative indentation between body lines is preserved.
${…}interpolation is active — same parser as in a backtickwrite. This is the only thing that distinguishes the sugar fromblock_verbatim, which has interpolation OFF.- Empty body is allowed (
template\n…blanks…\nendproduces empty output, no error). - Backticks and backslashes inside the body are escaped
automatically for the synthesised backtick literal — paste HTML
with embedded
`characters worry-free.
Implementation note (for the curious): the sugar is rewritten into
a synthesised write … BEFORE the inner-DSL parser runs.
There's no new AST node and the renderer doesn't need to know the
sugar exists. So everything that works with a hand-authored
multi-line write works with template … end.
State mutation — set, append, prepend, merge, delete¶
A small inner DSL. Does not execute user source. It only mutates
the context map.
Statements sit directly in the function body alongside write calls.
The renderer ignores state-mutation statements during the render pass;
the run pass ignores write. Both walk the same AST so control flow
(if / for) that contains both kinds of statement behaves
consistently.
Operations available:
| Form | Effect |
|---|---|
set <path> <value> |
Set a field at a dotted/indexed path. |
append <list-path> <value> |
Append to a list. |
prepend <list-path> <value> |
Prepend to a list. |
merge <map-path> <map-value> |
Shallow-merge into a map. |
delete <path> |
Remove a field/key. |
if <expr> ... end |
Library-side conditional update. |
loop <var> in <expr> ... end |
Library-side iteration. |
(regex_match value pattern) |
Boolean expression value. |
(env "X"), (arg N), (read_file "P") |
Host capabilities. |
(os), (arch) |
Branch on host platform. |
error "<message>" |
Abort transpilation with a message. |
See inner-dsl.md for full details and examples.
Block functions¶
A function that opens a body block declares it in one of three modes —
exactly one of block_closer, block_open/close,
block_dedent, or block_verbatim:
| Directive | Body delimited by | Body parsed? | Use for |
|---|---|---|---|
block_closer NAME |
An indented body that ends at a NAME keyword |
Yes — as nested statements | Most cases — if … end, for … end, match … end. |
block_open "X" close "Y" |
A X-prefixed, Y-suffixed delimited body on the same line as the opener (newlines inside OK) |
Yes | Brace-style nesting, JSX-ish DSLs. |
block_dedent |
An indented body that ends at the first DEDENT — no closer keyword | Yes | CSS-style selector blocks, YAML-style sections. |
block_verbatim NAME |
An indented body that ends at a NAME keyword |
No — captured as raw source bytes | Code blocks, embedded HTML / SVG, anywhere the body is data not grammar. |
Named-closer example:
function if
arg literal "if"
arg capture cond any
block_closer end
write `if ${cond}:
${indent 4 body}
`
end
function end
end
Or with explicit delimiters:
function for
arg literal "for"
arg capture v ident
arg literal "in"
arg capture i any
block_open "{"
block_close "}"
write `for ${v} in ${i} {
${indent 2 body}
}
`
end
Dedent-only block (no keyword needed to close):
function selector
arg capture name ident
arg literal ":"
block_dedent
write `${name} {
${indent 2 body}}
`
end
Source:
Each selector's body is whatever is indented under it, up to the next peer line or the block's enclosing dedent. The parser is also lenient about stray indents inside a body — content nested deeper than its block's anchor (purely for visual styling) is treated as cosmetic, so hand-written DSL sources don't have to be pedantic about consistent indentation.
Verbatim block — captures the body as raw source bytes (NO nested
parsing). Use for code blocks, embedded HTML, or anywhere the body
is data not grammar. Pair with ${escapeHtml body} (see
templates.md) for XSS-safe emission:
function pre
arg capture lang ident
block_verbatim end
write `<pre><code class="language-${lang}">${escapeHtml body}</code></pre>
`
end
Source:
…renders the Go code with its quotes and angle-brackets HTML-escaped
inside a <pre><code> block, indentation preserved verbatim. Every
line between pre go and end is captured byte-for-byte; the user
can paste arbitrary code without it being re-parsed as Capy.
The capture is the literal source byte range, so blank lines and
comment-marker (#) lines are preserved exactly — a markdown …
end block with ## headings and blank paragraphs round-trips
untouched.
Inline syntax via group_open / group_close¶
For Markdown-style or LaTeX-style inline syntax ([label](url),
**bold**, ~~strike~~), declare a type that opens and closes on
specific delimiters:
type Bracketed
group_open "["
group_close "]"
end
type Parens
group_open "("
group_close ")"
end
function link
arg literal "link"
arg capture text Bracketed
arg capture url Parens
write `<a href="${escapeHtml url}">${escapeHtml text}</a>
`
end
Source link [Al the Alien](https://alien.com/1) matches link with
text = "Al the Alien" and url = "https://alien.com/1". See
types.md for the full reference including
nested groups and multi-char delimiters.
See block-functions.md for nesting and edge cases.
context — initial schema¶
Whatever fields your function bodies will manipulate (via set/
append/etc.). Lists default to
[], maps to {}. The context is rendered into the file template as
.context.
file_template — final assembly¶
Receives body (concatenation of all top-level statements' written
output) and context (final accumulated state). Common patterns:
# Python-style: imports at top, then body.
file_template
for imp in context.imports
write `import ${imp}
`
end
write body
end
priority¶
When two functions could match a prefix, the higher priority wins;
ties fall back to "more leading literals wins". You rarely need to set
this explicitly — the default (0) plus the literal-leading bias
handles most cases.
Validation¶
capy check lib.capy parses + validates a library without running any
source. It reports load-time errors with the offending function and
arg index.
Suggested authoring loop¶
capy init my-lib
cd my-lib
# edit lib.capy + script.capy in your editor
capy run lib.capy script.capy # see output
capy check lib.capy # validate
capy docs lib.capy > REFERENCE.md # regenerate reference docs
When the library stabilises, write a few sample scripts under
examples/ so behaviour stays pinned as you iterate.