Language Reference¶
The complete surface grammar Capy ships with. Everything user-facing is library-defined — this doc describes only the fixed lexical structure that all libraries share.
Tokens¶
The lexer produces the following kinds. No keywords are reserved by the
engine — words like if, loop, end, true, false, null are just
identifier tokens; their meaning depends on whether a library function or the
value-parser treats them specially.
| Token | Description |
|---|---|
IDENT |
A word: ASCII letters, digits, underscores, and any non-ASCII rune (accented Latin, CJK, emoji, em-dash, smart quotes). Must not start with a digit. |
NUMBER |
An integer or float literal. May be negative. |
STRING |
"..." or '...'. Both support ${expr} interpolation at eval time. |
TEMPLATE |
`...`. Multi-line. Same interpolation rules as STRING. |
PUNCT |
A run of = < > ! + - * / % & \| ^ ~ ? : , . ; @ $ # \. Lexed greedily. |
LPAREN/RPAREN/LBRACE/RBRACE/LBRACK/RBRACK |
( ) { } [ ] |
NEWLINE |
End of a logical line. |
INDENT/DEDENT |
Indent-level change at start of a line (4 spaces or 1 tab per level). |
EOF |
End of input. |
Multi-character operators emerge naturally from the greedy punct lexer:
==, !=, <=, >=, :=, ->, =>, |>, etc. are all single PUNCT
tokens. A library pattern matches the full text — write { kind: literal,
value: ":=" } to match :=.
Comments¶
# ... to end of line. Comment-only lines do not produce INDENT/DEDENT
changes.
Strings¶
"..."and'...'are single-line. Backslash escapes the next character (\n,\t,\",\\,\xNN,\uNNNN).`...`(backtick / template) is multi-line — both inside librarywriteblocks AND inside user scripts. Newlines between the opening and closing backtick become literal newlines in the captured value. Combine with${decoded text}(see templates.md) to recover the user-intended form.- Inside any string,
${expr}is interpolated at eval time.exprmay be a dotted identifier path; future versions will support full expressions. - For type-checking purposes, the source representation including its
quotes is what gets checked.
set_email "alice@example.com"produces a capture text of"alice@example.com"(with quotes); validation strips the quotes before applying patterns.
Indentation¶
- 4 spaces or 1 tab per level.
- Indentation is checked only at the start of a logical line and only when
the bracket level is 0. Lines inside
( ),[ ],{ }do not produce INDENT/DEDENT tokens, but they DO produce NEWLINE tokens (value parsers skip them).
Statements¶
A statement is a sequence of tokens terminated by NEWLINE (or end-of-file,
or a } if the statement is inside a delimiter-mode block body).
At each statement boundary, the parser tries each library function's
compiled Elements in priority order. The first complete match wins, where
"complete" means every element consumed plus a NEWLINE/EOF/} to follow.
Block bodies¶
A function may declare itself a block opener via block::
block: { closer: <function-name> }— body is delimited by INDENT/DEDENT; after DEDENT, the named closer function must match.block: { open: "{", close: "}" }— body is delimited by the named tokens. The}(or whatever close token) ends the body; no closer function is involved.
Block bodies are nested: inside one block, you can have more blocks.
Values (within an any-typed capture)¶
When a function captures <x:any>, the parser consumes one of:
| Literal kind | Examples |
|---|---|
| number | 42, -3.14 |
| string | "foo", 'bar', `tpl` |
| bool | true, false |
| null | null |
| ident path | x, user.address.city |
| paren sub-call | (str "hi" name) |
| list literal | [1, 2, 3] |
| object literal | {"k": "v", name: "Alice"} (keys may be strings OR identifiers) |
| comparison | a == b, a < b, not flag |
Multi-token arithmetic expressions like 4 + 5 are NOT parsed as a single
expression; they're parsed as two separate primitives with the operator as a
literal token in the pattern. A library that wants x = 4 + 5 defines:
assign_add:
args:
- { kind: capture, name: var, type: ident }
- { kind: literal, value: "=" }
- { kind: capture, name: a, type: any }
- { kind: literal, value: "+" }
- { kind: capture, name: b, type: any }
Object literals¶
Keys may be either quoted strings ("name") or bare identifiers (name):
Captures: source-text vs evaluated values¶
Every capture has two faces:
- In
writeliterals — captures resolve to source text.if x > 0exposescondas the literalx > 0so a Python emitter can writeif ${cond}:. - In state-mutation statements — captures resolve to evaluated values.
say "hello"exposesmsgas the Go string"hello"(no quotes) soappend context.greetings msgstores the raw value.
This dual model lets one capture serve both render-by-text (templates) and structured accumulation (context) without needing to convert.
Error format¶
Engine errors carry a line and column. The CLI renders:
When you embed Capy as a library, errors are *domain.CapyError values
with Line, Col, and Msg fields; use domain.FormatWithSource(err,
source) to get the rendered form.