Attention From Scratch
Jan 1, 2025
·
1 min read
This project explores how far a single engineer can go by combining open-source large language models with enterprise-class GPUs rented on demand. I’m documenting design choices, profiling results, and the infrastructure needed to run models such as Olmo 2 with production-grade reliability.