Sr. Site Reliability Engineer (SRE)
Company: Siemens Mobility
Location: Charlotte
Posted on: January 16, 2025
Job Description:
In everchanging SaaS landscape there are a few persistent items
that contribute to developing quality solutions with speed. Namely,
ensuing operational activities are treated as software development
enhancements, manual tasks are remediated through automation, risk
reduction through compartmentalization of services/code and
consumption of readily available provider services.
Product/development teams require an accountable partner to advance
on these topics, The SRE (Site Reliability Engineering) team will
be this partner.The SRE team will support the Siemens Xcelerator
platform and will be responsible for identifying, managing,
improving, and reporting on availability, resiliency, reliability,
and stability efficiencies. This includes providing technical
guidance and leadership to drive solutions, create & enhance
processes that deliver excellence. A strong relationship with the
various product teams of the Xcelerator platform is necessary to
support core objectives. This role's success will be defined by
product teams meeting their SLOs with healthy product adoption and
operational excellence.This position will be responsible to support
technology and culture through an enterprise ecosystem to ensure
developers and products exceed product SLOs (Service Level
Objectives) and clearly, without dispute, benefit from every
interaction with the SRE team.Responsibilities
- Incident Management, Game Day coordination.
- Create and drive Metric/observability solutions and
reviews.
- Support production readiness reviews.
- Cross division role model to advance the SRE practice in
Siemens.
- Complete technological control over methods of automation,
codifying optional activities, microservice architecture, platform
engineering to ensure changes, updates or technical advancements
are in place for a product.
- Ensure the team can provide the design, deployment, automation,
and scripting solutions to drive new capabilities, visibility, and
efficiency.
- Simplify highly complex ideas, architectures and concepts to
encourage achievable adoption.
- Collaborate with other technical platforms and partners to
engineer automated and integrated solutions between tools,
services, teams that increase availability, reliability, and
performance.
- Own and ensure the internal and external SLAs meet and exceed
expectations.
- Be part of maintaining a 24x7, global, highly available SaaS
environment.
- Participate in an on-call rotation that supports our production
infrastructure.
- Troubleshoot production availability incidents that often span
across multiple teams and services.
- Ensure the SRE team can coordinate production incident
post-mortems, and contribute to solutions to prevent problem
recurrence; with the goal of automated response to all
non-exceptional service conditions.
- Communicate to business and technical partners on incidents as
they occur when they impact system performance or availability at a
critical level.Required Knowledge/Skills, Education, and Experience
- Bachelor's Degree or equivalent experience.
- Proven experience as a Site Reliability Engineer or equivalent
role.
- Experience working in a large organization through a SRE
transformation where existing applications were adapted to
contemporary targets.
- Proven experience with automation via scripting & API
development.
- Experience with software development in the cloud.
- Experience with monitoring tools (Datadog, CloudWatch,
CloudTrail, Cloudability, or equivalent tools).
- Proven experience with containerization, specifically
Kubernetes.
- Experience with Amazon Web Services (AWS) services and
Terraform, CloudFormation, Ansible, or equivalent tools.Preferred
Knowledge/Skills, Education, and Experience
- Desired certifications include: Datadog, Kubernetes, Security,
AWS certification.
- Understanding of ITIL.
- Deep understanding of SRE and Incident management
strategies.
- Experience with issue/incident tracking tool (ServiceNOW,
ServiceDesk, Jira or equivalent tools) and open source tools
(Linux, Python, Git, Ansible).
- Experience in Enterprise IT environment with distributed
environments.
- Networking concepts, including firewalls, VPN, routing, load
balancers, security and DNS.
- Senior level system administration experience, including
troubleshooting, support, mentorship/training, and oversight.Why
us?Working at Siemens Software means flexibility - Choosing between
working at home and the office at other times is the norm here. We
offer great benefits and rewards, as you'd expect from a world
leader in industrial software.A collection of over 377,000 minds
building the future, one day at a time in over 200 countries. We're
dedicated to equality, and we welcome applications that reflect the
diversity of the communities we work in. All employment decisions
at Siemens are based on qualifications, merit, and business need.
Bring your curiosity and creativity and help us shape
tomorrow!Siemens Software. Transform the EverydayThe salary range
for this position is $105,100 to $189,200 and this role is eligible
to earn incentive compensation. The actual compensation offered is
based on the successful candidate's work location as well as
additional factors, including job-related skills, experience, and
relevant education/training. Siemens offers a variety of health and
wellness benefits to employees. Details regarding our benefits can
be found . In addition, this position is eligible for time off in
accordance with Company policies, including paid sick leave, paid
parental leave, PTO (for non-exempt employees) or non-accrued
flexible vacation (for exempt employees).
#J-18808-Ljbffr
Keywords: Siemens Mobility, Columbia , Sr. Site Reliability Engineer (SRE), Professions , Charlotte, South Carolina
Didn't find what you're looking for? Search again!
Loading more jobs...