ML Benchmark: Jda vs Python
A head-to-head neural network training benchmark comparing Jda and Python. Both implementations use identical algorithms – same loop structure, same SGD optimizer, same MSE loss function. No NumPy on the Python side. The only difference is the runtime: Jda compiles to native x86-64 machine code, Python interprets through CPython.
Performance Results (x86-64 Linux, best of 3)
| Task | Jda | Python (no NumPy) | Speedup |
|---|---|---|---|
| XOR training (5K epochs) | 21 ms | 778 ms | ~37x |
| Sine training (10K epochs) | 439 ms | 15,347 ms | ~35x |
| 64x64 matmul (per iter) | 3 ms | 75 ms | ~25x |
Tasks
Task 1: XOR Classification
- Architecture: 2->8->1 MLP (multi-layer perceptron)
- Dataset: 4 XOR samples
- Training: 5,000 epochs, learning rate 0.1
- Result: Jda completes in 21 ms vs Python’s 778 ms – a 37x speedup
Task 2: Sine Approximation
- Architecture: 1->16->1 MLP
- Dataset: 32 samples of sin(x), x in [0, 2*pi]
- Training: 10,000 epochs, learning rate 0.01
- Result: Jda completes in 439 ms vs Python’s 15,347 ms – a 35x speedup
Task 3: Matrix Multiply
- Size: 64x64 @ 64x64 (524,288 FLOPs per multiply)
- Iterations: 10
- Result: Jda averages 3 ms per multiply vs Python’s 75 ms – a 25x speedup
Why the Difference?
Both implementations use identical algorithms. The performance gap comes entirely from the runtime:
- Jda compiles to native x86-64 machine code. Tensor operations are compiler builtins that emit direct floating-point instructions. No interpreter overhead, no garbage collector pauses, no dynamic dispatch.
- Python (CPython) interprets bytecode. Every arithmetic operation goes through the interpreter loop, objects are heap-allocated and reference-counted, and attribute lookups are hash table probes.
This is a fair comparison: no NumPy, no C extensions, no BLAS. Pure algorithmic code on both sides.
Binary Sizes
| Implementation | Size | Dependencies |
|---|---|---|
Jda (jda-ml-demo) | ~1.08 MB | None (static ELF binary) |
Python (ml-demo-python.py) | 363 lines | CPython runtime (~30 MB) |
The Jda binary is a single static executable with zero external dependencies. The Python script requires a full CPython installation.
How to Reproduce
# Build the Jda binary
bash apps/build-ml-demo.sh
# Run Jda benchmark (in Docker)
docker run --rm --platform linux/amd64 --ulimit stack=524288000:524288000 \
-v $(PWD):/jda -w /jda jda-build ./apps/jda-ml-demo
# Run Python benchmark
python3 apps/ml-demo-python.py
# Run both side by side with automated comparison
bash apps/run-ml-benchmark.shThe benchmark script runs both implementations and prints a side-by-side comparison table.
See Also
- jda-ml-demo full source code – the complete 486-line Jda implementation
- Full benchmark analysis – Jda vs C, Rust, Go, and Python on general-purpose benchmarks