Benchmarks

Real-World Benchmarks — Apple Silicon (ms, lower is better)

Measured on macOS Apple Silicon (ARM64). C, Rust, and Go compile to native ARM64. Jda compiles to x86-64 Mach-O running via Rosetta 2 — a deliberate ISA handicap that makes the wins more striking.

Benchmark	C	Rust	Go	Jda	Ruby	Python
Sudoku — 500 puzzles	62	62	66	41	3,854	1,753
LZ77 — 1 MB compress	1,830	2,185	2,721	277	222,424	—
Regex — 8 pats × 100K	98	221	813	186	7,940	7,406
B-Tree — 1M ops	282	297	318	586	11,529	10,955
Raytracer — 800×600	19	21	35	331	3,301	4,080

Key Results

Sudoku: Jda is 1.5× faster than C and Rust — constraint propagation with bitmasking, DCE, and loop register promotion
LZ77: Jda is 6.6× faster than C, 7.9× faster than Rust, 9.8× faster than Go — MOD→AND strength reduction eliminates every IDIV in the hash-chain inner loop; LICM hoists the first-byte filter out of the match scan
Regex: Jda beats Rust (221ms) and Go (813ms) — Thompson NFA + DFA subset construction with 64-bit bitmask state, fits in L1 cache
B-Tree: Jda ~2× slower than C — Rosetta 2 x86-64 emulation overhead on pointer-chasing; gap closes with linear-scan regalloc (planned)
Raytracer: Jda 17× slower than C — C/Rust/Go use ARM SIMD (NEON); Jda emits scalar x86-64 SSE2 through Rosetta 2. The SIMD delta, not algorithm quality.

Full analysis with per-benchmark breakdowns and source code →

Environment

Language	Version	Flags
C	Apple clang 21	`-O2 -lm`
Rust	rustc 1.94	`-O`
Go	go 1.26	default
Jda	jda1 v1.1.0	`build --macos`
Ruby	4.0.2	interpreted
Python	3.11	interpreted

Jda runs x86-64 via Rosetta 2 — native ARM64 backend is in progress.

Compile Time (ms, lower is better)

	gcc -O2	rustc -O	go build	Jda
Average	479	1,497	712	43
vs Jda	11× slower	33× slower	16× slower	—

Jda’s single-pass compiler produces native Mach-O binaries directly — no linker step, no intermediate object files. 43ms average compile time.

Reproduce

# Clone the repo
git clone https://github.com/jdalang/jda-lang.git && cd jda-lang

# Build jda1 (needs Docker)
docker run --rm --platform linux/amd64 --ulimit stack=524288000:524288000 \
  -v $(PWD):/jda -w /jda/bootstrap/stage0 jda-build make stage1

# Compile Jda benchmarks for macOS
for b in sudoku lz77 btree regex raytracer; do
  docker run --rm --platform linux/amd64 --ulimit stack=524288000:524288000 \
    -v $(PWD):/jda -w /jda/bootstrap/stage0 jda-build \
    ./jda1 build --macos /jda/benchmarks/complex/$b/$b.jda -o ${b}_mac
  codesign -s - ${b}_mac
done

All source code in benchmarks/complex/ — six implementations of each problem, side by side.

Complex Benchmarks: Jda vs The World

Real-world benchmarks comparing Jda against C, Rust, Go, Python, and Ruby on five non-trivial algorithms: a constraint-propagation sudoku solver, LZ77 compression, Thompson NFA regex, a B-tree, and a ray tracer. Each has full source code in all six languages.

Environment

ML Benchmark: Jda vs Python

A head-to-head neural network training benchmark comparing Jda and Python. Both implementations use identical algorithms – same loop structure, same SGD optimizer, same MSE loss function. No NumPy on the Python side. The only difference is the runtime: Jda compiles to native x86-64 machine code, Python interprets through CPython.

Performance Results (x86-64 Linux, best of 3)

Task	Jda	Python (no NumPy)	Speedup
XOR training (5K epochs)	21 ms	778 ms	~37x
Sine training (10K epochs)	439 ms	15,347 ms	~35x
64x64 matmul (per iter)	3 ms	75 ms	~25x

Tasks

Task 1: XOR Classification

Architecture: 2->8->1 MLP (multi-layer perceptron)
Dataset: 4 XOR samples
Training: 5,000 epochs, learning rate 0.1
Result: Jda completes in 21 ms vs Python’s 778 ms – a 37x speedup

Task 2: Sine Approximation

Architecture: 1->16->1 MLP
Dataset: 32 samples of sin(x), x in [0, 2*pi]
Training: 10,000 epochs, learning rate 0.01
Result: Jda completes in 439 ms vs Python’s 15,347 ms – a 35x speedup

Task 3: Matrix Multiply

Size: 64x64 @ 64x64 (524,288 FLOPs per multiply)
Iterations: 10
Result: Jda averages 3 ms per multiply vs Python’s 75 ms – a 25x speedup

Why the Difference?

Both implementations use identical algorithms. The performance gap comes entirely from the runtime: