About Me


Hi there! I'm Jinyang Su, also known as susun online. I'm a systems engineer working on LLM inference and infrastructure.

what i do now

I work as an LLM inference engineer, building production serving infrastructure. My main project is Pegaflow — a distributed KV cache system with RDMA support for LLM inference. The daily work is a mix of storage architecture, networking, and GPU-side optimization — figuring out how to move data as fast as possible so models can serve at scale.

On the side, I'm building pegainfer — a from-scratch LLM inference engine written in Rust with hand-written CUDA kernels.

where i come from

Before LLM inference, I was a database storage engineer. I worked on storage engines, write-ahead logs, compaction strategies, and all the low-level plumbing that makes databases reliable. That background turns out to be surprisingly relevant — distributed caching, eviction algorithms, async I/O, and memory management are just as central to inference serving as they are to databases.

what i think about

I care about controlling complexity. The essence of programming is managing complexity, and I've learned (sometimes the hard way) that understanding must come before delegation — whether to a teammate or an AI coding agent. I use OKRs to keep myself focused on what matters, and I try to remind myself: if everything is equally important, nothing is.

this blog

Writing forces clarity. This blog is where I write about systems engineering, inference optimization, RDMA, storage internals, and lessons from building production systems. I write primarily for my future self — to solidify understanding and document the journey. If others find it useful, that's a bonus.

get in touch

You can find my work on GitHub. Feel free to reach out if you want to chat about systems, inference, or anything in between.