Site Reliability Engineer Lead
Company: Bank of America
Location: Charlotte
Posted on: April 1, 2026
|
|
|
Job Description:
Job Description: At Bank of America, we are guided by a common
purpose to help make financial lives better through the power of
every connection. We do this by driving Responsible Growth and
delivering for our clients, teammates, communities and shareholders
every day. Being a Great Place to Work is core to how we drive
Responsible Growth. This includes our commitment to being an
inclusive workplace, attracting and developing exceptional talent,
supporting our teammates’ physical, emotional, and financial
wellness, recognizing and rewarding performance, and how we make an
impact in the communities we serve. Bank of America is committed to
an in-office culture with specific requirements for office-based
attendance and which allows for an appropriate level of flexibility
for our teammates and businesses based on role-specific
considerations. At Bank of America, you can build a successful
career with opportunities to learn, grow, and make an impact. Join
us! Job Description: This job is responsible for partnering with
engineering and technology teams to implement measures prescribed
by the Site Reliability Engineer teams it leads. Key
responsibilities include ensuring appropriate instrumentation,
tooling, ticketing, alerting and on call routines are in place for
key services, demonstrating technical expertise within domains, and
decomposing objectives into work units. Job expectations include
advancing efficient solution delivery practices and promoting
exceptional design, engineering, and organizational practices. Role
Overview: The individual in this role is accountable for
establishing and maintaining partnerships with Application
Development and Production Support teams to implement the measures
prescribed through the collaboration of the Senior Site Reliability
Engineer (SRE) and the SRE team(s) they are leading. This
individual will include ensuring the appropriate instrumentation,
tooling, ticketing, alerting and on call routines are in place for
key services. This role demonstrates a high level of technical
expertise within one or more technical domains. This role
demonstrates the ability to decompose issues or objectives into
units of work that can be assigned to other team members. This
individual will advocate and advance more efficient solution
delivery practices and evangelize great design, engineering, and
organizational practices. Responsibilities: Collaborates with
Development and Infrastructure teams to understand technical
solutions and implement monitoring capabilities outlined in the
application and system monitoring designs put forward by the Senior
Site Reliability Engineer (SRE) Develops and maintains reliability
scripts, tools and libraries and leverages them for common
instrumentation, automation, and operational needs, and when
mentoring SRE resources on reliability practices and established
tools/capabilities Partners to implement code changes to make use
of common reliability libraries and tools and helps Application
Production Services and Application Development teammates
understand how to use them Participates regularly in architecture
community of practice meetings and communication via other channels
Identifies vulnerabilities and opportunities for reliability
improvement, such as investigating low level error rates and
'noise' in monitoring, and defines solutions to reduce manual
support effort and/or improve system reliability Engages as a
subject matter expert in major incident triage efforts and failure
scenario modelling and diagnosis with Problem Manager root causes
for major incident/problem management investigations Define and
maintain a multi-year stability roadmap aligned with business
objectives and technology strategy Identify critical dependencies,
risks, and mitigation strategies across infrastructure,
applications, and services Work with the architects to develop and
adhere to the enterprise architectural patterns and frameworks that
enhance system reliability and fault tolerance Ensure designs
adhere to best practices for high availability, disaster recovery,
and performance optimization Establish stability metrics, KPIs, and
compliance standards for technology teams Drive adoption of
reliability engineering principles across development and
operations Partner with engineering, operations, and product teams
to embed stability into the software development lifecycle Act as a
trusted advisor to senior leadership on stability-related
initiatives and investments Monitor emerging technologies and
industry trends to enhance stability strategies Lead post-incident
reviews and ensure lessons learned are incorporated into future
designs Collaborate with Development and Infrastructure teams to
understand technical solutions and to implement the monitoring
capabilities outlined in the application and system monitoring
designs put forward by the Senior SRE Develop and maintain a
catalog of extensible reliability scripts, tools, and libraries
that can be leveraged for common instrumentation, automation and
operational needs Partner to implement code changes to make use of
common reliability libraries and tools and help the Application
Production Services (APS) and Application Development teammates
understand how to use them Partner with infrastructure engineers
and application teams to implement the necessary code changes to
make use of common reliability libraries and tools and help the APS
and Application Development of teammates to understand how to use
them Engage as a subject matter expert (SME) in major incident
triage efforts, failure scenario modelling and work with the
Problem Manager to diagnose root causes for major incident /
problem management investigations Identify vulnerabilities and
opportunities for reliability improvement, such as investigating
low level error rates and 'noise' in monitoring, and to help define
solutions to reduce manual support effort and/or improve system
reliability Required Qualifications: 8 years in technology
architecture, reliability engineering, or infrastructure strategy
roles Proven track record of delivering stability-focused
initiatives in large-scale environments Strong knowledge of
distributed systems, cloud architecture (AWS, Azure, GCP), and
microservices Experience with reliability engineering, chaos
testing, and observability tools Ability to influence
cross-functional teams and communicate complex concepts to
non-technical stakeholders Desired Qualifications: SRE
Certification Skills: Automation Collaboration Influence Production
Support Result Orientation Analytical Thinking Application
Development Architecture Solution Design Stakeholder Management
Adaptability DevOps Practices Project Management Risk Management
Solution Delivery Process Shift: 1st shift (United States of
America) Hours Per Week: 40 Pay Transparency details US - NJ -
Pennington - 1300 American Blvd - Hopewell Bldg 3 (NJ2130) Pay and
benefits information Pay range $125,300.00 - $167,900.00 annualized
salary, offers to be determined based on experience, education and
skill set. Discretionary incentive eligible This role is eligible
to participate in the annual discretionary plan. Employees are
eligible for an annual discretionary award based on their overall
individual performance results and behaviors, the performance and
contributions of their line of business and/or group; and the
overall success of the Company. Benefits This role is currently
benefits eligible. We provide industry-leading benefits, access to
paid time off, resources and support to our employees so they can
make a genuine impact and contribute to the sustainable growth of
our business and the communities we serve.
Keywords: Bank of America, Charlotte , Site Reliability Engineer Lead, IT / Software / Systems , Charlotte, North Carolina