Changyu Lin, Ph.D. · San Francisco Bay Area
Infrastructure engineer building from hyperscale telemetry to the physical layer.
I build systems end to end: observability across 200,000+ switches and millions of servers, LLM-driven production troubleshooting, solo-built AI agent products, and hardware projects that start with a 3D scan and ship in the real world.
I'm an infrastructure engineer and tech lead with 8+ years at ByteDance/TikTok and Meta, where I built and owned large-scale observability platforms spanning 200,000+ network switches and millions of servers: distributed metrics, logs, traces, syslog pipelines, topology modeling, time-series anomaly detection, and automated root-cause analysis.
I was an early mover on AI-native, LLM-driven troubleshooting, shipping tool-calling agents for production root-cause analysis before ChatGPT existed. Today I'm a solo founder, building an AI agent product end to end: runtime, tool orchestration, memory, frontend, and the production backend underneath it.
What ties my work together is respect for physical, production-grade systems and a habit of owning things from zero to shipped. That's true at fleet scale, and it is just as true in my garage.
MEMETOPIA
Founder / Solo AI Agent Builder
- Solo-built and shipped an AI companion across iOS and Android: backend services, streaming chat, proactive agent loops, frontend UX, deployment, and launch.
- Engineered the agent runtime and tool-orchestration layer: MCP integrations, hierarchical long-term memory, retrieval routing, evaluation harnesses, and human-in-the-loop control.
- Run the production stack across Postgres, vector stores, Redis, multi-pod Kubernetes, and end-to-end observability.
ByteDance / TikTok
Senior Software Engineer / Tech Lead
- Led data-center and network observability across 200K+ switches and millions of servers with 99.9%+ parsing accuracy and auto-onboarding of new syslog patterns.
- Designed and built TopSight, an interactive React platform for live network topology and status that became the Infrastructure org's primary operations surface.
- Built threshold-free, TranAD-style anomaly detection that cut fault-detection time from 3-5 minutes to about 20 seconds.
- Pioneered LLM / tool-calling troubleshooting agents before ChatGPT, translating natural language into metrics, alerts, topology, and health checks for automated RCA.
Facebook / Meta
Senior Production Engineer
- Founding engineer of a network troubleshooting framework for large-scale production infrastructure.
- Built human-in-the-loop automation platforms combining scripted execution, engineer approval, and workflow state for fast, safe production changes.
Infinera
Senior System Engineer
- Built ML-based product-evaluation pipelines and designed multi-terabit long-haul optical transmission systems across modeling, simulation, experiments, and production engineering.
Research Internships
SubCom, Bell Labs, and Mitsubishi Electric Research Labs: LDPC codes, nonlinear optical transmission experiments, long-haul system modeling, and DSP simulation.
01
TopSight
Built the Infrastructure org's go-to operations surface: an interactive React platform for live network topology and status, adopted org-wide by dozens of engineers.
02
~20-second fault detection
Replaced 3-5 minute detection windows with threshold-free, TranAD-style time-series anomaly detection at fleet scale.
03
AIOps before it was a category
Shipped LLM / tool-calling RCA agents before ChatGPT, mapping natural language to metrics, alerts, topology, and health checks.
04
Hyperscale observability
Owned syslog and telemetry pipelines across 200K+ switches and millions of servers, with auto-onboarding for new patterns.
Observability & Telemetry
Large-scale metrics / logs / traces, syslog pipelines, threshold-free anomaly detection, topology modeling, SLOs/SLIs, alerting, automated RCA, Prometheus/Grafana-class stacks.
Distributed Systems & Platform
High-throughput event pipelines, control planes, fleet-scale API services, Go and Python on Kubernetes, ClickHouse, pgvector, Qdrant, Redis.
AI for Operations & Agents
LLM tool-calling agents, programmable knowledge bases, MCP orchestration, RAG with retrieval routing, evaluation loops, human-in-the-loop control.
Full-Stack Production
React / TypeScript frontends, design systems, REST / JSON APIs in Python and Go, containerized delivery on Kubernetes with CI/CD.
Beyond the data center
Same instinct, different tools.
The same instinct that makes me good at infrastructure — owning a system end to end, and staying close to the metal — is what I do for fun. I build real things with my hands and run them like production.
BUILD 01
Campervan, built from a 3D scan up
I built a campervan from zero: suspension, plumbing, windows, sealed penetrations, interior layout, cabinetry, surfaces, finish, and Home Assistant integration for lighting, power, climate, and monitoring.
BUILD 02
Home lab
Proxmox across multiple physical machines, GPU passthrough, UniFi networking, Nginx reverse proxy, Victron backup power, and an operations surface across the setup.
BUILD 03
Home automation & fabrication
Home Assistant runs lighting, climate, power, sensors, and automation across the house. I design and 3D-print brackets, mounts, enclosures, and parts that do not exist off the shelf.
Ph.D., Electrical & Computer Engineering
University of Arizona, 2016 · 10+ first-author papers on LDPC codes, modeling, and quantum information theory.
B.S.
University of Electronic Science and Technology of China, 2011 · ranked top 5% by academic performance and GPA.