Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
70 changes: 59 additions & 11 deletions _plans/ffm-better-strings.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,13 +2,23 @@

## Overview

Double-quoted strings (e.g. `"hello"`) are compiler sugar for a **quote that pushes character codes**. They are first-class values — a single pointer on the stack, like any other quote.
**Double-quoted** strings (e.g. `"hello"`) are compiler sugar for a **quote that contains a single single-quoted string literal**: the same characters appear inside `[ ... ]` as one `'...'` token.

**Single-quoted** strings (e.g. `'hi'`) are themselves sugar for **pushing each character as its own literal** (character codes / integers) in sequence.

Chaining those rules, all of the following are equivalent (same quote body after full desugar):

```
"hi" ≡ [ 'h' 'i' ] ≡ [ 104 105 ]
"hi" ≡ [ 'hi' ] ≡ [ 'h' 'i' ] ≡ [ 104 105 ]
```

The empty string is `0` — the same `0` that already means NOP/nil throughout the language. This gives strings a natural null terminator with no new machinery.
The **first** step for double quotes is still syntactic: `"hi"` becomes `[ 'hi' ]` — one string token inside the brackets, not two tokens `'h'` `'i'` at that stage. The equivalence to `[ 'h' 'i' ]` and `[ 104 105 ]` follows from how **single**-quoted strings desugar.

**Escaping** rules are unchanged between single- and double-quoted forms: whatever escapes apply inside `'...'` apply inside the text of `"..."` as well (the lexer/parser treats the payload the same; only the outer delimiters differ).

**No `0` on the double-quoted path:** desugaring `"..."` to `[ '...' ]` (and then to per-character pushes) does **not** add a `0` prefix or suffix. That is distinct from manually building a **nil-terminated cons chain** with `0 ... swons`, where `0` is the tail of the list — see Internal Representation.

The empty string: `""` ≡ `[ '' ]`, which desugars to an empty quote body `[]` (no pushes). Separately, **`0`** remains the language’s NOP/nil and the **terminator** of cons-chain string values when constructed with `cons`/`swons`; it is not inserted by the double-quote sugar itself.

---

Expand All @@ -20,15 +30,17 @@ Strings are built from **cons cells**, directly analogous to Lisp. Each cons cel
x y cons → ptr, body: [ PUSH x, CALL y ]
```

A string is a linked chain of cons cells terminating at `0`:
A **cons-chain string value** (what you hold on the stack after `0 ... swons`) is a linked list of cons cells terminating at `0`:

```
"hi" → ptrH, body: [ PUSH 104, CALL ptrI ]
ptrI, body: [ PUSH 105, CALL 0 ]
ptrH, body: [ PUSH 104, CALL ptrI ]
ptrI, body: [ PUSH 105, CALL 0 ]
```

Eval'ing the head pointer walks the chain, pushing each character code in order.

That shape is **not** the same token sequence as `"hi"` or `[ 'hi' ]` / `[ 104 105 ]`. The latter are **quotes** whose body (after desugar) is a flat sequence of pushes. Building a cons chain is still done manually (or by library words) with `0` and `cons`/`swons`. Double-quoted sugar never prepends or appends `0` to the quote body.

---

## Primitives (opcodes)
Expand Down Expand Up @@ -63,16 +75,52 @@ The plan originally called this `concat`, but the implementation uses `compose`

## Compiler Sugar

### Double quotes → one single-quoted string inside a quote

```
"hello" ≡ [ 'hello' ]
```

No `0` is added before or after the content when applying this rule.

### Single quotes → one push per character

A single-quoted string literal desugars to the same sequence of single-character literals (and thus to integer pushes) inside a quote:

```
"hello" → desugared at compile time to 0 'o' swons 'l' swons 'l' swons 'e' swons 'h' swons
'hello' ≡ 'h' 'e' 'l' 'l' 'o' /* inside a quote body */
[ 'hello' ] ≡ [ 'h' 'e' 'l' 'l' 'o' ] ≡ [ 104 101 108 108 111 ] /* example codes */
```

So eval'ing the result pushes characters left to order: `h`, `e`, `l`, `l`, `o`.
**Escaping:** the escape grammar for `'...'` and `"..."` stays the same; only the delimiters differ at the first double-quote step.

### Combined equivalence (example)

```
"hi" ≡ [ 'hi' ] ≡ [ 'h' 'i' ] ≡ [ 104 105 ]
```

The compiler may fuse steps internally (e.g. emit numeric pushes directly) as long as the result matches the above.

### Cons-chain construction (unchanged, not the same as `"..."` desugar)

Manual nil-terminated string **values** are still built with `0` and `swons`, for example:

```
0 'o' swons 'l' swons 'l' swons 'e' swons 'h' swons
```

That produces a **single** cons-chain pointer on the stack, not the quote `[ 'hello' ]`. Library words such as `sprint` expect evaluable quote bodies or cons chains per existing conventions.

**Status: ✅ Implemented** — **TypeScript core** (Node/Bun/Deno/web), **Go** (`go/src/compiler/compiler.go`), **Racket** (lexer/parser/compiler/runner/expander + `racket/private/unescape.rkt`), **Python** (`python/execute.py`: `run()` prepends `[`, integer char codes, `]` when it sees a `"..."` token — tokenizer leaves the token intact), **Ruby** (`ruby/execute.rb`: same in `run`), **Dart** (`dart/bin/dart.dart`: `ev()` prepends `[`, UTF-16 code unit strings, `]` when dequeuing a `"..."` token; `tokenize()` splits on whitespace only). In TS/Go/Racket the IR is BRA + pushes + KET; Python/Ruby/Dart feed the VM the equivalent queue sequence (`[` … codes … `]`).

Equivalent spellings and manual cons-chain construction:

**Status: ❌ NOT IMPLEMENTED** - No implementation currently desugars `"..."` syntax.
Users must manually construct strings using `0` and `cons`/`swons`:
```
0 'i' swons 'h' swons /* creates "hi" */
[ 'hi' ]
[ 'h' 'i' ]
[ 104 105 ]
0 'i' swons 'h' swons /* cons-chain "hi", includes terminating 0 */
```

---
Expand Down
19 changes: 16 additions & 3 deletions dart/bin/dart.dart
Original file line number Diff line number Diff line change
Expand Up @@ -383,10 +383,12 @@ void callOp(BigInt code) {
}
}

/// Whitespace-only split (no string or comment awareness).
List<String> tokenize(String s) {
return s.split(RegExp(r"\s+"))
.where((ss) => ss.trim() != '')
.toList();
return s
.split(RegExp(r'\s+'))
.where((ss) => ss.trim().isNotEmpty)
.toList();
}

void ev() {
Expand All @@ -408,6 +410,17 @@ void ev() {
var chars = unescapeQuotedString(text.substring(1, end)).split('');
var asc = chars.map((c) => BigInt.from(c.codeUnitAt(0))).toList();
stack.addAll(asc);
} else if (text.length > 1 &&
text.startsWith('"') &&
text.endsWith('"')) {
// Sugar: "..." is [ '...' ] — prepend '[', char code tokens, ']'
final inner = text.substring(1, text.length - 1);
final expanded = <Object>[
'[',
...unescapeQuotedString(inner).codeUnits.map((u) => u.toString()),
']',
];
pushFrontQueueAll(expanded);
} else if (symbols[text.toLowerCase()] != null) {
var code = getSymbol(text);
callOp(code);
Expand Down
41 changes: 41 additions & 0 deletions dart/tool/check_double_quote.dart
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
// Run: cd dart && dart run tool/check_double_quote.dart
import 'dart:convert';
import 'dart:io';

import '../bin/dart.dart' as ff;

void main() async {
// Tokenizer: whitespace-only split; "..." stays one token
final hiTok = ff.tokenize('"hi"');
if (hiTok.length != 1 || hiTok[0] != '"hi"') {
stderr.writeln('tokenize expected one token for "\\"hi\\"", got $hiTok');
exit(1);
}
final multi = ff.tokenize('a \t "x" \n b');
if (multi.length != 3 || multi[0] != 'a' || multi[1] != '"x"' || multi[2] != 'b') {
stderr.writeln('whitespace tokenize broken, got $multi');
exit(1);
}

final pkgRoot = File(Platform.script.toFilePath()).parent.parent.path;
final p = await Process.start(
'dart',
['run', 'bin/dart.dart'],
workingDirectory: pkgRoot,
);
p.stdin.add(utf8.encode('"hi" eval dup putn swap putn\n'));
await p.stdin.close();
final out = (await p.stdout.transform(utf8.decoder).join()).trim();
final err = (await p.stderr.transform(utf8.decoder).join()).trim();
final code = await p.exitCode;
if (code != 0) {
stderr.writeln('dart run failed ($code): $err');
exit(1);
}
if (out != '105104') {
stderr.writeln('expected stdout 105104, got ${jsonEncode(out)}');
exit(1);
}

stdout.writeln('ok');
}
48 changes: 48 additions & 0 deletions deno/src/double_quote_strings_test.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,48 @@
import { assertEquals } from "std/assert/mod.ts";

import { Compiler } from "./compiler.ts";
import { IROp } from "../../typescript/core/src/ir.ts";
import { OpCodes } from "../../typescript/core/src/opcodes.ts";

function irSnapshot(source: string) {
const ir = new Compiler().compileToIR(Compiler.tokenize(source), "test.ff");
return ir.map((i) => ({
op: i.op,
value: i.value,
}));
}

Deno.test('double-quoted string compiles like bracketed single-quoted', () => {
assertEquals(
irSnapshot('"hi"'),
irSnapshot("[ 'hi' ]"),
);
});

Deno.test("double-quoted string IR: BRA, character pushes, KET", () => {
const ir = new Compiler().compileToIR(Compiler.tokenize('"hi"'), "test.ff");
assertEquals(ir.length, 4);
assertEquals(ir[0]?.op, IROp.call);
assertEquals(ir[0]?.value, BigInt(OpCodes.BRA));
assertEquals(ir[1]?.op, IROp.push);
assertEquals(ir[1]?.value, 104n);
assertEquals(ir[2]?.op, IROp.push);
assertEquals(ir[2]?.value, 105n);
assertEquals(ir[3]?.op, IROp.call);
assertEquals(ir[3]?.value, BigInt(OpCodes.KET));
});

Deno.test('double-quoted empty string is empty quotation', () => {
const ir = new Compiler().compileToIR(Compiler.tokenize('""'), "test.ff");
assertEquals(ir.length, 2);
assertEquals(ir[0]?.op, IROp.call);
assertEquals(ir[0]?.value, BigInt(OpCodes.BRA));
assertEquals(ir[1]?.op, IROp.call);
assertEquals(ir[1]?.value, BigInt(OpCodes.KET));
});

Deno.test("double-quoted escapes match [ '...' ] form", () => {
assertEquals(irSnapshot('"\\n"'), irSnapshot("[ '\\n' ]"));
// \" in double quotes is ASCII 34; '\'' in single quotes is ASCII 39 — same rules, different payload.
assertEquals(irSnapshot('"\\""'), irSnapshot("[ 34 ]"));
});
15 changes: 15 additions & 0 deletions ff/lib/string/__tests__/double-quoted-strings.test.ffp
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
.import ../string.ffp
.import ../../seq/seq.ffp
.import ../../tap.ffp

TAP-VERSION

'\0Double-quoted\sstrings' SUBTEST
"hi" [ 'hi' ] seq= OK
"hi" eval 'i' = swap 'h' = and OK
"" [ ] seq= OK
'\n' "\n" eval = OK
'\"' "\"" eval = OK
5 PLAN OK

1 PLAN
22 changes: 22 additions & 0 deletions go/src/compiler/compiler.go
Original file line number Diff line number Diff line change
Expand Up @@ -231,6 +231,28 @@ func compileToIR(
}
} else if strings.HasPrefix(element, "[") && strings.HasSuffix(element, "]") {
push(getSymbol(element[1:len(element)-1]), element)
} else if element == "[" {
call(NewInt(OP_BRA), "[")
} else if element == "]" {
call(NewInt(OP_KET), "]")
} else if strings.HasPrefix(element, "\"") {
// Double-quoted: sugar for [ '...' ] — BRA, per-char pushes (same escapes as
// single-quoted), KET. No implicit 0.
l := 0
if strings.HasSuffix(element, "\"") && len(element) > 1 {
l++
}
s := convertEsc2Char(element[1 : len(element)-l])
call(NewInt(OP_BRA), "[")
for i := 0; i < len(s); i++ {
v := NewInt(int64(s[i]))
if i == 0 {
push(v, s)
} else {
push(v, "")
}
}
call(NewInt(OP_KET), "]")
} else if strings.HasPrefix(element, "'") {
l := 0
if strings.HasSuffix(element, "'") {
Expand Down
31 changes: 31 additions & 0 deletions go/src/compiler/compiler_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,37 @@ func resetCompilerStateForTest() {
code = -1
}

func TestCompileDoubleQuotedStringAsQuotation(t *testing.T) {
resetCompilerStateForTest()
Setup()

irDQ := CompileToIR(Tokenize(`"hi"`), "")
irBQ := CompileToIR(Tokenize("[ 'hi' ]"), "")
if len(irDQ) != len(irBQ) {
t.Fatalf("expected same IR length, got %d vs %d", len(irDQ), len(irBQ))
}
for i := range irDQ {
if irDQ[i].op != irBQ[i].op || irDQ[i].value.Cmp(irBQ[i].value) != 0 {
t.Fatalf("IR mismatch at %d: %+v vs %+v", i, irDQ[i], irBQ[i])
}
}
if len(irDQ) != 4 || irDQ[0].op != "call" || irDQ[0].value.Cmp(big.NewInt(OP_BRA)) != 0 ||
irDQ[3].op != "call" || irDQ[3].value.Cmp(big.NewInt(OP_KET)) != 0 {
t.Fatalf("expected BRA push push KET, got %+v", irDQ)
}
}

func TestCompileEmptyDoubleQuotedString(t *testing.T) {
resetCompilerStateForTest()
Setup()

ir := CompileToIR(Tokenize(`""`), "")
if len(ir) != 2 || ir[0].op != "call" || ir[0].value.Cmp(big.NewInt(OP_BRA)) != 0 ||
ir[1].op != "call" || ir[1].value.Cmp(big.NewInt(OP_KET)) != 0 {
t.Fatalf("expected empty quote BRA KET, got %+v", ir)
}
}

func TestCompileConsAsSystemWord(t *testing.T) {
resetCompilerStateForTest()
Setup()
Expand Down
7 changes: 5 additions & 2 deletions python/execute.py
Original file line number Diff line number Diff line change
Expand Up @@ -310,8 +310,7 @@ def token(s):
return s

def tokenize(text):
a = text.split()
return list(map(token, a))
return list(map(token, text.split()))

def run():
global queue
Expand All @@ -321,6 +320,10 @@ def run():

if type(s) == int:
stack.append(s)
elif isinstance(s, str) and len(s) > 1 and s.startswith('"') and s.endswith('"'):
# Sugar: "..." is [ '...' ] — prepend '[', char codes, ']' to the queue.
inner = unescape(s[1:-1])
queue = ['['] + [ord(c) for c in inner] + [']'] + queue
elif s.startswith('.') and len(s) > 1:
continue
elif s.startswith('[') and s.endswith(']'):
Expand Down
27 changes: 27 additions & 0 deletions python/test_double_quote_tokenize.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
#!/usr/bin/env python3
"""Double-quoted sugar: tokenizer keeps \"...\" as one token; run() expands to [ '...' ]."""
import importlib.util
import io
import sys
from pathlib import Path

root = Path(__file__).resolve().parent
spec = importlib.util.spec_from_file_location("ff_execute", root / "execute.py")
mod = importlib.util.module_from_spec(spec)
assert spec.loader is not None
spec.loader.exec_module(mod)

# Tokenizer must not expand double quotes
assert mod.tokenize('"hi"') == ['"hi"'], mod.tokenize('"hi"')
assert mod.tokenize('a "x" b') == ['a', '"x"', 'b']

# Runner expands and executes like [ 'hi' ] eval ...
old_stdout = sys.stdout
sys.stdout = buf = io.StringIO()
try:
mod.queue = mod.tokenize('"hi" eval dup putn swap putn')
mod.run()
finally:
sys.stdout = old_stdout
assert buf.getvalue() == "105104", repr(buf.getvalue())
print("ok")
14 changes: 13 additions & 1 deletion racket/private/compiler.rkt
Original file line number Diff line number Diff line change
Expand Up @@ -28,4 +28,16 @@
(define cmds (flatten (map (lambda (x) (list x 0)) chars)))
#`(list #,@cmds))

(provide ff-program ff-marker ff-push ff-call ff-string)
;; Double-quoted: sugar for [ '...' ] — BRA, per-char pushes, KET (same escapes as STR).
(define-macro (ff-string-dq STR)
(define chars (map char->integer (string->list (syntax->datum #'STR))))
(define cmds (flatten (map (lambda (x) (list x 0)) chars)))
#`(list op_bra 1 #,@cmds op_ket 1))

(define-macro (ff-bra . _)
#'(list op_bra 1))

(define-macro (ff-ket . _)
#'(list op_ket 1))

(provide ff-program ff-marker ff-push ff-call ff-string ff-string-dq ff-bra ff-ket)
13 changes: 12 additions & 1 deletion racket/private/expander.rkt
Original file line number Diff line number Diff line change
Expand Up @@ -57,4 +57,15 @@
(with-pattern ([(INTS ...) ints])
#`(begin (push INTS) ...)))

(provide ff-program ff-marker ff-push ff-call ff-string)
(define-macro (ff-string-dq STR)
(define ints (map char->integer (string->list (syntax->datum #'STR))))
(with-pattern ([(INTS ...) ints])
#`(begin (call op_bra) (push INTS) ... (call op_ket))))

(define-macro (ff-bra . _)
#'(call op_bra))

(define-macro (ff-ket . _)
#'(call op_ket))

(provide ff-program ff-marker ff-push ff-call ff-string ff-string-dq ff-bra ff-ket)
Loading