Adin's Portfolio | Adin Mauer Portfolio Website

💻

Adin Mauer

Hardware Design Engineer

About Me

I’m a hardware design engineer at AMD (Vancouver), building high-throughput datapaths and interconnect logic where performance, correctness, and microarchitectural details all matter. I enjoy problems that sit between architecture and implementation—modeling bottlenecks, validating behavior in RTL, and pushing designs toward clean, measurable wins.

I earned a BASc in Computer Engineering from UBC (2024). My technical interests center on efficient attention, ML-systems performance, and the hardware/software boundary that makes modern accelerators fast in practice.

Download CV

Interests

Computer Architecture
Machine Learning
Signal Processing / Statistics

Education

BASc Computer Engineering
University of British Columbia (2024)

Selected Projects

CUDA

MPI LU Decomposition

See the presentation below for strategies to optimize LU decomposition with MPI by blocking the data, building detailed data-dependency graphs, and executing in parallel wherever possible.

Oct 26, 2024

Parallel Computing

Parallel FFT (CUDA / OpenMPI)

This project implements parallel and accelerated FFT variants in OpenMPI and CUDA, grounded in the mathematical and algorithmic foundations of the FFT and its historical development. It highlights the divide-and-conquer structure that reduces DFT complexity from O(N^2) to O(N log N), then develops the additional theory needed for parallel execution and addresses key optimization issues. Results are presented along with a discussion of known issues and potential improvements. See the presentation below for the technical walkthrough, and download the report for the full methodology and analysis. Project presentation (PDF).

Oct 26, 2023

Machine Learning

EEG CNN

Using a dataset that mapped EEG signal recordings to specific physical hand motions, we built a CNN that classifies the physical action by analyzing the EEG recordings. Project presentation (PDF).

Oct 26, 2023

My Interests

I’m interested in performance-critical systems for modern ML—especially attention and sequence models—where algorithmic choices show up directly in memory traffic, kernel design, and end-to-end latency. I like working at the boundary between architecture and software: profiling bottlenecks, forming clear hypotheses about where time/data is going, and turning that into measurable speedups.

More broadly, I’m drawn to computer architecture (memory systems, interconnects, accelerators, HW/SW tradeoffs) and to the mathematical foundations that make good engineering decisions possible (probability/statistics, linear algebra, signal processing).

I keep the theory grounded through hands-on accelerated computing work. For example, in my final semester at UBC I took CPEN 512 (Parallel and Configurable Computer Architecture), where I implemented dense linear algebra kernels (matmul, LU decomposition) across multiple parallel programming models (OpenMPI, pthreads, CUDA, Bluespec, Vectorblox). For the final project, I implemented a Cooley–Tukey FFT using CUDA and OpenMPI.

Blog Posts

DeepSeek mHC

Jan 15, 2026

Slides for my DeepSeek mHC presentation, covering the key ideas and architecture choices along with where the approach fits in modern transformer pipelines.

Presentation (PDF): mHC.pdf

Jan 15, 2026

FlashAttention

Jan 12, 2026

FlashAttention is an IO-aware attention algorithm designed to reduce memory traffic and improve throughput by fusing attention operations and recomputing intermediates as needed. This post outlines the key ideas, practical implications for large-sequence workloads, and where the approach fits in modern transformer pipelines. I presented this paper to the UBC CPEN 511 class in February 2025. See the presentation below for the walkthrough, and download the source document for the full technical details.

Source document (PDF): Review_FlashAttention_AdinMauer.pdf

Jan 12, 2026