Benchmarks
Real-World Benchmarks — Apple Silicon (ms, lower is better)
Measured on macOS Apple Silicon (ARM64). C, Rust, and Go compile to native ARM64. Jda compiles to x86-64 Mach-O running via Rosetta 2 — a deliberate ISA handicap that makes the wins more striking.
| Benchmark | C | Rust | Go | Jda | Ruby | Python |
|---|---|---|---|---|---|---|
| Sudoku — 500 puzzles | 62 | 62 | 66 | 41 | 3,854 | 1,753 |
| LZ77 — 1 MB compress | 1,830 | 2,185 | 2,721 | 277 | 222,424 | — |
| Regex — 8 pats × 100K | 98 | 221 | 813 | 186 | 7,940 | 7,406 |
| B-Tree — 1M ops | 282 | 297 | 318 | 586 | 11,529 | 10,955 |
| Raytracer — 800×600 | 19 | 21 | 35 | 331 | 3,301 | 4,080 |
Key Results
- Sudoku: Jda is 1.5× faster than C and Rust — constraint propagation with bitmasking, DCE, and loop register promotion
- LZ77: Jda is 6.6× faster than C, 7.9× faster than Rust, 9.8× faster than Go —
MOD→ANDstrength reduction eliminates every IDIV in the hash-chain inner loop; LICM hoists the first-byte filter out of the match scan - Regex: Jda beats Rust (221ms) and Go (813ms) — Thompson NFA + DFA subset construction with 64-bit bitmask state, fits in L1 cache
- B-Tree: Jda ~2× slower than C — Rosetta 2 x86-64 emulation overhead on pointer-chasing; gap closes with linear-scan regalloc (planned)
- Raytracer: Jda 17× slower than C — C/Rust/Go use ARM SIMD (NEON); Jda emits scalar x86-64 SSE2 through Rosetta 2. The SIMD delta, not algorithm quality.
Full analysis with per-benchmark breakdowns and source code →
Environment
| Language | Version | Flags |
|---|---|---|
| C | Apple clang 21 | -O2 -lm |
| Rust | rustc 1.94 | -O |
| Go | go 1.26 | default |
| Jda | jda1 v1.1.0 | build --macos |
| Ruby | 4.0.2 | interpreted |
| Python | 3.11 | interpreted |
Jda runs x86-64 via Rosetta 2 — native ARM64 backend is in progress.
Compile Time (ms, lower is better)
| gcc -O2 | rustc -O | go build | Jda | |
|---|---|---|---|---|
| Average | 479 | 1,497 | 712 | 43 |
| vs Jda | 11× slower | 33× slower | 16× slower | — |
Jda’s single-pass compiler produces native Mach-O binaries directly — no linker step, no intermediate object files. 43ms average compile time.
Reproduce
# Clone the repo
git clone https://github.com/jdalang/jda-lang.git && cd jda-lang
# Build jda1 (needs Docker)
docker run --rm --platform linux/amd64 --ulimit stack=524288000:524288000 \
-v $(PWD):/jda -w /jda/bootstrap/stage0 jda-build make stage1
# Compile Jda benchmarks for macOS
for b in sudoku lz77 btree regex raytracer; do
docker run --rm --platform linux/amd64 --ulimit stack=524288000:524288000 \
-v $(PWD):/jda -w /jda/bootstrap/stage0 jda-build \
./jda1 build --macos /jda/benchmarks/complex/$b/$b.jda -o ${b}_mac
codesign -s - ${b}_mac
doneAll source code in benchmarks/complex/ — six implementations of each problem, side by side.
Complex Benchmarks: Jda vs The World
Real-world benchmarks comparing Jda against C, Rust, Go, Python, and Ruby on five non-trivial algorithms: a constraint-propagation sudoku solver, LZ77 compression, Thompson NFA regex, a B-tree, and a ray tracer. Each has full source code in all six languages.
Environment
Measured on macOS Apple Silicon (ARM64). C, Rust, and Go compile to native ARM64. Jda compiles to x86-64 Mach-O running via Rosetta 2 — a deliberate ISA handicap that makes the wins more striking.
ML Benchmark: Jda vs Python
A head-to-head neural network training benchmark comparing Jda and Python. Both implementations use identical algorithms – same loop structure, same SGD optimizer, same MSE loss function. No NumPy on the Python side. The only difference is the runtime: Jda compiles to native x86-64 machine code, Python interprets through CPython.
Performance Results (x86-64 Linux, best of 3)
| Task | Jda | Python (no NumPy) | Speedup |
|---|---|---|---|
| XOR training (5K epochs) | 21 ms | 778 ms | ~37x |
| Sine training (10K epochs) | 439 ms | 15,347 ms | ~35x |
| 64x64 matmul (per iter) | 3 ms | 75 ms | ~25x |
Tasks
Task 1: XOR Classification
- Architecture: 2->8->1 MLP (multi-layer perceptron)
- Dataset: 4 XOR samples
- Training: 5,000 epochs, learning rate 0.1
- Result: Jda completes in 21 ms vs Python’s 778 ms – a 37x speedup
Task 2: Sine Approximation
- Architecture: 1->16->1 MLP
- Dataset: 32 samples of sin(x), x in [0, 2*pi]
- Training: 10,000 epochs, learning rate 0.01
- Result: Jda completes in 439 ms vs Python’s 15,347 ms – a 35x speedup
Task 3: Matrix Multiply
- Size: 64x64 @ 64x64 (524,288 FLOPs per multiply)
- Iterations: 10
- Result: Jda averages 3 ms per multiply vs Python’s 75 ms – a 25x speedup
Why the Difference?
Both implementations use identical algorithms. The performance gap comes entirely from the runtime: