Building an end-to-end inference stack for open-source LLMs and documenting lessons learned.
Jan 1, 2025