Bio
Experience
Projects
Blog
Attention From Scratch

Recent & Upcoming Talks
- Example Talk
Blog
- DeepSeek mHC
- FlashAttention
Attention From Scratch
Projects
Projects
Experience
Teaching
- Learn JavaScript
- Learn Python
Publications

Attention From Scratch

Jan 1, 2025 · 1 min read

This project explores how far a single engineer can go by combining open-source large language models with enterprise-class GPUs rented on demand. I’m documenting design choices, profiling results, and the infrastructure needed to run models such as Olmo 2 with production-grade reliability.

Last updated on Jan 1, 2025

LLM CUDA Inference

Adin Mauer

Authors

Hardware Design Engineer

← MPI LU Decomposition Oct 26, 2024

© 2026 Me. This work is licensed under CC BY NC ND 4.0

Published with Hugo Blox Builder — the free, open source website builder that empowers creators.