CMS Workflow Management DevOps
Developing, Operating and extending on WMCore + Unified, so CMS Monte Carlo and reconstruction workloads stay healthy across the WLCG sites.
CMS Workflow Operations
I maintain the workflow management stack (WMCore + Unified) for the CMS experiment. This system handles scheduling for Monte Carlo production and data reconstruction across the Worldwide LHC Computing Grid (WLCG).
Key Contributions
- Operational Console: Built a React and FastAPI interface for operators. It replaces manual shell scripts, making it easier to triage requests, find bottlenecks, and update policies.
- Modern Deployment: Moved critical services to Kubernetes and ArgoCD. This modernized our deployment process, allowing for safer updates and quick rollbacks.
- Monitoring: Created dashboards in OpenSearch and Grafana to track telemetry from WMCore and our databases (Oracle, Mongo, MySQL). This gives us visibility into dataset delays before they become a problem.
- Automation: Added automation for priority and site policies, reducing the need for manual intervention.
Outcomes
- Faster Turnarounds: Reduced the time it takes to process datasets by automating routine tasks.
- Recognition: Received the 2024 CMS Award for modernizing the software base and improving tooling for the production teams.
- Stability: Established clear operational policies that align physicists and engineers, keeping the pipeline predictable.

