I am a Senior Site Reliability Engineer with hands-on experience in platform reliability, observability, disaster recovery, cloud operations, and infrastructure automation. My current trajectory is focused on applying that production engineering background to MLOps platforms, where reliability, telemetry, repeatability, and operational discipline matter as much as model performance.
| Launch Path |
|---|
| 🌍 Earth ── Launch ── Orbit ── Transfer ── Approach ── 🔴 Mars |
| Flight Deck | Objective |
|---|---|
| Build reliable systems that can move from idea to production with the same confidence as a launch checklist. |
| Core Areas |
|---|
| Area | What I bring |
|---|---|
| Current role | Senior Site Reliability Engineer |
| Growth track | Transitioning into MLOps and production ML platforms |
| Core strengths | System monitoring, disaster recovery, automation, platform reliability |
| Cloud background | AWS, Azure, Google Cloud |
| Delivery style | Infrastructure as code, measurable operations, scalable systems |
| Long-term focus | Build ML systems that are observable, reproducible, and reliable in production |
- Built and improved observability platforms using Prometheus, Grafana, and modern telemetry practices.
- Automated infrastructure and recovery workflows to reduce operational risk and increase repeatability.
- Improved system throughput and platform scalability through performance tuning and orchestration work.
- Led cloud-native reliability efforts with Kubernetes, Terraform, CI/CD pipelines, and GitHub Actions.
- Applied DevOps discipline to create the foundation for scalable MLOps workflows.
| Focus Vector |
|---|
| Designing dependable infrastructure that supports modern ML workloads. |
| Turning SRE discipline into production-grade MLOps practices. |
| Building systems that are observable by default and recoverable under pressure. |
If you are building cloud infrastructure, platform engineering, observability, or MLOps systems, I am open to connecting and collaborating.




