Attention From Scratch

Jan 1, 2025 · 1 min read

This project explores how far a single engineer can go by combining open-source large language models with enterprise-class GPUs rented on demand. I’m documenting design choices, profiling results, and the infrastructure needed to run models such as Olmo 2 with production-grade reliability.