ML Benchmark: Jda vs Python

A head-to-head neural network training benchmark comparing Jda and Python. Both implementations use identical algorithms – same loop structure, same SGD optimizer, same MSE loss function. No NumPy on the Python side. The only difference is the runtime: Jda compiles to native x86-64 machine code, Python interprets through CPython.

Performance Results (x86-64 Linux, best of 3)

TaskJdaPython (no NumPy)Speedup
XOR training (5K epochs)21 ms778 ms~37x
Sine training (10K epochs)439 ms15,347 ms~35x
64x64 matmul (per iter)3 ms75 ms~25x

Tasks

Task 1: XOR Classification

  • Architecture: 2->8->1 MLP (multi-layer perceptron)
  • Dataset: 4 XOR samples
  • Training: 5,000 epochs, learning rate 0.1
  • Result: Jda completes in 21 ms vs Python’s 778 ms – a 37x speedup

Task 2: Sine Approximation

  • Architecture: 1->16->1 MLP
  • Dataset: 32 samples of sin(x), x in [0, 2*pi]
  • Training: 10,000 epochs, learning rate 0.01
  • Result: Jda completes in 439 ms vs Python’s 15,347 ms – a 35x speedup

Task 3: Matrix Multiply

  • Size: 64x64 @ 64x64 (524,288 FLOPs per multiply)
  • Iterations: 10
  • Result: Jda averages 3 ms per multiply vs Python’s 75 ms – a 25x speedup

Why the Difference?

Both implementations use identical algorithms. The performance gap comes entirely from the runtime:

  • Jda compiles to native x86-64 machine code. Tensor operations are compiler builtins that emit direct floating-point instructions. No interpreter overhead, no garbage collector pauses, no dynamic dispatch.
  • Python (CPython) interprets bytecode. Every arithmetic operation goes through the interpreter loop, objects are heap-allocated and reference-counted, and attribute lookups are hash table probes.

This is a fair comparison: no NumPy, no C extensions, no BLAS. Pure algorithmic code on both sides.

Binary Sizes

ImplementationSizeDependencies
Jda (jda-ml-demo)~1.08 MBNone (static ELF binary)
Python (ml-demo-python.py)363 linesCPython runtime (~30 MB)

The Jda binary is a single static executable with zero external dependencies. The Python script requires a full CPython installation.

How to Reproduce

# Build the Jda binary
bash apps/build-ml-demo.sh

# Run Jda benchmark (in Docker)
docker run --rm --platform linux/amd64 --ulimit stack=524288000:524288000 \
  -v $(PWD):/jda -w /jda jda-build ./apps/jda-ml-demo

# Run Python benchmark
python3 apps/ml-demo-python.py

# Run both side by side with automated comparison
bash apps/run-ml-benchmark.sh

The benchmark script runs both implementations and prints a side-by-side comparison table.

See Also