Staff DevOps Engineer
Company: Zoom
Location: San Jose
Posted on: April 1, 2026
|
|
|
Job Description:
Immigration sponsorship is not available for this position What
you can expect We are hiring a Staff DevOps/Site Reliability
Engineer to ensure reliability, scalability, and operational
excellence for our real-time communications platform. This platform
supports audio/video conferencing, recording, and live-streaming
functionalities. The position requires expertise in infrastructure
engineering, global team collaboration, and cross-functional
partnerships. About the Team This team manages essential meeting
service operations at Zoom. They handle global, large-scale
distributed systems and advance communication technology to connect
individuals across physical distances. Responsibilities Ensuring
reliability engineering and operations by owning the SLO/SLI
framework for real-time services, defining, tracking, and improving
latency, availability, jitter, and packet loss. Leading incident
response for critical outages across the real-time platform,
coordinating across time zones and engineering disciplines.
Promoting a blameless postmortem culture and ensuring action items
lead to measurable reliability enhancements. Implementing chaos
engineering and game day exercises to proactively identify failure
modes before user impact occurs. Building and evolving
observability tools — dashboards, alerting systems, and distributed
tracing — tailored for real-time media infrastructure challenges.
Serving as the architectural authority on deployment patterns,
infrastructure design, and operational readiness for real-time
services. Reviewing and contributing to system design proposals,
providing feedback on scalability, fault tolerance, and operational
complexity. Driving capacity planning, traffic modeling, and cost
optimization strategies across globally distributed infrastructure.
Evaluating and recommending infrastructure tools, platforms, and
vendors — including media servers, CDN providers, cloud-native
services, and edge networking. Ensuring consistent standards for
CI/CD pipelines, deployment safety, and progressive rollout
strategies across teams. Acting as the primary SRE partner for
multiple engineering teams building real-time features, attending
planning sessions, and providing operational readiness guidance.
Collaborating closely with network engineering, security, product,
and data teams to align on platform-wide reliability requirements.
Translating infrastructure constraints and reliability trade-offs
into actionable recommendations for product leaders and engineering
teams. Establishing and advocating DevOps best practices —
infrastructure-as-code, GitOps, automated testing, and deployment
automation — across partner teams. Guiding senior engineers on SRE
principles, reliability patterns, and operational discipline.
Serving as a technical liaison between US-based and
China/India-based engineering teams, bridging communication gaps
and providing technical context. Conducting architecture reviews,
incident retrospectives, and planning sessions in English and
Mandarin as appropriate. Maintaining a flexible schedule to ensure
meaningful overlap with teams in Beijing, Shanghai, Bangalore, and
Hyderabad. Building collaborative relationships across cultural and
geographic boundaries, adapting communication styles to foster
trust and alignment. Ensuring engineering documentation, runbooks,
and architectural decision records are accessible and
understandable for global team members. What we’re looking for 10
years in DevOps, SRE, or infrastructure engineering roles, with at
least 3 years at a staff or principal level scope. Have a proven
track record owning reliability for large-scale, distributed,
latency-sensitive systems in production Have experience in
supporting real-time or media-heavy platforms (video conferencing,
live streaming, gaming, trading systems, or similar). Demonstrate
ability to lead cross-functional technical initiatives without
direct authority, driving alignment across engineering, product,
and operations. Have conceptual and architectural understanding of
real-time communication protocols: WebRTC, RTP/RTCP, TURN/STUN,
SDP, and SFU/MCU topologies. Have solid expertise in cloud
infrastructure (AWS, GCP, or Azure) and container orchestration
(Kubernetes, Helm, ArgoCD). Demonstrate proficiency with
infrastructure-as-code tooling: Terraform, Pulumi, or equivalent.
Have experience with observability stacks: Prometheus, Grafana,
Datadog, Jaeger, OpenTelemetry, or equivalent. Have an
understanding of networking fundamentals: BGP, anycast routing,
DNS, load balancing, and CDN architecture. Utilize CI/CD tools such
as GitHub Actions, Jenkins, and Spinnaker to streamline workflows
and improve deployment processes. Implement deployment safety
practices like canary releases, feature flags, and blue/green
strategies to ensure reliable software delivery. Demonstrate
proficiency in Python, Bash, or Go for automation, tooling, and
incident response without requiring advanced software development
expertise. Salary Range or On Target Earnings: Minimum: $124 000,00
Maximum: $271 200,00 In addition to the base salary and/or OTE
listed Zoom has a Total Direct Compensation philosophy that takes
into consideration; base salary, bonus and equity value. Note:
Starting pay will be based on a number of factors and commensurate
with qualifications & experience. We also have a location based
compensation structure; there may be a different range for
candidates in this and other locations At Zoom, we offer a window
of at least 5 days for you to apply because we believe in giving
you every opportunity. Below is the potential closing date, just in
case you want to mark it on your calendar. We look forward to
receiving your application! Anticipated Position Close Date:
04/30/26 Ways of Working Our structured hybrid approach is centered
around our offices and remote work environments. The work style of
each role, Hybrid, Remote, or In-Person is indicated in the job
description/posting. Benefits As part of our award-winning
workplace culture and commitment to delivering happiness, our
benefits program offers a variety of perks, benefits, and options
to help employees maintain their physical, mental, emotional, and
financial health; support work-life balance; and contribute to
their community in meaningful ways. Click Learn for more
information. About Us Zoomies help people stay connected so they
can get more done together. We set out to build the best
collaboration platform for the enterprise, and today help people
communicate better with products like Zoom Contact Center, Zoom
Phone, Zoom Events, Zoom Apps, Zoom Rooms, and Zoom Webinars. We’re
problem-solvers, working at a fast pace to design solutions with
our customers and users in mind. Find room to grow with
opportunities to stretch your skills and advance your career in a
collaborative, growth-focused environment. Our Commitment? At Zoom,
we believe great work happens when people feel supported and
empowered. We’re committed to fair hiring practices that ensure
every candidate is evaluated based on skills, experience, and
potential. If you require an accommodation during the hiring
process, let us know—we’re here to support you at every step. If
you need assistance navigating the interview process due to a
medical disability, please submit an Accommodations Request Form
and someone from our team will reach out soon. This form is solely
for applicants who require an accommodation due to a qualifying
medical disability. Non-accommodation-related requests, such as
application follow-ups or technical issues, will not be
addressed.
Keywords: Zoom, Ceres , Staff DevOps Engineer, IT / Software / Systems , San Jose, California