8 张 4090 微调 235B 模型:RoundPipe 如何打破消费级 GPU 训练的天花板
RoundPipe 通过无状态 GPU 池和非对称流水线切分,在 8 张 4090 上实现 235B MoE 模型的 LoRA 微调,吞吐量达到 A800 方案的 76%+。
Algorithm Engineer. System Builder. AI Explorer.
Interested in
I'm an algorithm engineer at a leading digital marketing group, where I design and build real-time bidding systems, ML model serving pipelines, and budget optimization algorithms for programmatic advertising at scale. My day-to-day involves Go and TensorFlow Serving — turning ad auction math into production models that handle millions of bid requests.
On the side, I run an AI infrastructure project: an LLM API gateway aggregating 40+ model providers, a lightweight agent framework, and a service quality monitoring system built on real-token probing. I care about systems that actually work under load — not just demos.
My path: from search-ads-rec system architecture to algorithm research. Currently exploring LLM4Rec and unified sequence modeling for large-scale recommendation — where transformer architectures meet feature interaction in conversion prediction. I believe the best way to understand a system is to build it yourself.
A native macOS voice-to-text app — press Fn, speak, and polished text lands at your cursor in any app.
A production-ready multi-agent platform with sandboxed execution, budget control, and observability.
A Claude Code skill that generates daily AI/tech intelligence reports from Hacker News and HuggingFace Papers.
A Claude Code skill that generates importable Excalidraw architecture diagrams from source code.
RoundPipe 通过无状态 GPU 池和非对称流水线切分,在 8 张 4090 上实现 235B MoE 模型的 LoRA 微调,吞吐量达到 A800 方案的 76%+。
Claw-Eval-Live 揭示静态 Agent 评估的三种失效模式,提出需求驱动的活 benchmark 设计——季度刷新任务分布,同时保持版本内可复现。