
Slides for my DeepSeek mHC presentation, covering the key ideas and architecture choices along with where the approach fits in modern transformer pipelines. Presentation (PDF): mHC.pdf
Jan 15, 2026

FlashAttention is an IO-aware attention algorithm designed to reduce memory traffic and improve throughput by fusing attention operations and recomputing intermediates as needed. This post outlines the key ideas, practical implications for large-sequence workloads, and where the approach fits in modern transformer pipelines. I presented this paper to the UBC CPEN 511 class in February 2025. See the presentation below for the walkthrough, and download the source document for the full technical details. Source document (PDF): Review_FlashAttention_AdinMauer.pdf
Jan 12, 2026