ML Benchmark: Jda vs Python

A head-to-head neural network training benchmark comparing Jda and Python. Both implementations use identical algorithms – same loop structure, same SGD optimizer, same MSE loss function. No NumPy on the Python side. The only difference is the runtime: Jda compiles to native x86-64 machine code, Python interprets through CPython.

Performance Results (x86-64 Linux, best of 3)

Task	Jda	Python (no NumPy)	Speedup
XOR training (5K epochs)	21 ms	778 ms	~37x
Sine training (10K epochs)	439 ms	15,347 ms	~35x
64x64 matmul (per iter)	3 ms	75 ms	~25x

Tasks

Task 1: XOR Classification

Architecture: 2->8->1 MLP (multi-layer perceptron)
Dataset: 4 XOR samples
Training: 5,000 epochs, learning rate 0.1
Result: Jda completes in 21 ms vs Python’s 778 ms – a 37x speedup

Task 2: Sine Approximation

Architecture: 1->16->1 MLP
Dataset: 32 samples of sin(x), x in [0, 2*pi]
Training: 10,000 epochs, learning rate 0.01
Result: Jda completes in 439 ms vs Python’s 15,347 ms – a 35x speedup

Task 3: Matrix Multiply

Size: 64x64 @ 64x64 (524,288 FLOPs per multiply)
Iterations: 10
Result: Jda averages 3 ms per multiply vs Python’s 75 ms – a 25x speedup

Why the Difference?

Both implementations use identical algorithms. The performance gap comes entirely from the runtime:

Jda compiles to native x86-64 machine code. Tensor operations are compiler builtins that emit direct floating-point instructions. No interpreter overhead, no garbage collector pauses, no dynamic dispatch.
Python (CPython) interprets bytecode. Every arithmetic operation goes through the interpreter loop, objects are heap-allocated and reference-counted, and attribute lookups are hash table probes.

This is a fair comparison: no NumPy, no C extensions, no BLAS. Pure algorithmic code on both sides.

Binary Sizes

Implementation	Size	Dependencies
Jda (`jda-ml-demo`)	~1.08 MB	None (static ELF binary)
Python (`ml-demo-python.py`)	363 lines	CPython runtime (~30 MB)

The Jda binary is a single static executable with zero external dependencies. The Python script requires a full CPython installation.

How to Reproduce

# Build the Jda binary
bash apps/build-ml-demo.sh

# Run Jda benchmark (in Docker)
docker run --rm --platform linux/amd64 --ulimit stack=524288000:524288000 \
  -v $(PWD):/jda -w /jda jda-build ./apps/jda-ml-demo

# Run Python benchmark
python3 apps/ml-demo-python.py

# Run both side by side with automated comparison
bash apps/run-ml-benchmark.sh

The benchmark script runs both implementations and prints a side-by-side comparison table.