Not a chatbot pretending. Not a lookup table with a trench coat. A proper decoder-only transformer - attention, RMSNorm, feed-forward, residuals, the works. Two layers, four heads, 25,000 int8 parameters. Running in your browser right now with the exact same integer arithmetic the 6510 does on a real Commodore 64. Type something. The border will flash while it thinks!
Grab the disk image for VICE or real hardware. Or clone the full source - train your own soul, build your own floppy. Everything is open.
The same decoder-only architecture behind the large language models, compressed into 25 KB of integer-only arithmetic. No floating point. No GPU. Just shifts, adds, and a 128-entry exp lookup table for softmax.
2 layers, 4 heads × 8 dims
32 embedding dimensions
64 FFN hidden units
128 token vocabulary (BPE)
20 token context window
Q8.8 fixed-point activations
int8 weights, per-tensor shift
int16 pre-scaled biases
Integer sqrt + restoring division
Greedy decoding (argmax)
1 MHz 6510 CPU
64 KB RAM (25 KB weights)
~60+ s per forward pass
100% 6510 assembly
Fits on a floppy disk
The chip the C64 ships with can run the same architecture OpenAI or Google runs their models on. It's just slower. Much, much slower. Proudly slower.
This whole project started as a joke and turned into something I actually mean. The future isn't more muscle. The future is better thinking. A 25k-parameter transformer with a thoughtfully-trained tokenizer, sensible quantization, and honest arithmetic can have a broken, tiny, sweet conversation on a computer from 1982.
You can run your own AI chatbot on your own hardware. No excuses.
(*except for training, gaming and rendering, what have the romans ever done for us?)