ProductBased.in

Land Your Dream Job at India's Top Product-Based Companies

Back to All Jobs

Site Reliability Engineer 3

Phonepe
Phonepe logo
Location
Bangalore
Job Type
Full-time
Posted
March 12, 2026

Job Description

About PhonePe Limited:

Headquartered in India, its flagship product, the PhonePe digital payments app, was launched in Aug 2016. As of April 2025, PhonePe has over 60 Crore (600 Million) registered users and a digital payments acceptance network spread across over 4 Crore (40+ million) merchants. PhonePe also processes over 33 Crore (330+ Million) transactions daily with an Annualized Total Payment Value (TPV) of over INR 150 lakh crore.

PhonePe’s portfolio of businesses includes the distribution of financial products (Insurance, Lending, and Wealth) as well as new consumer tech businesses (Pincode - hyperlocal e-commerce and Indus AppStore Localized App Store for the Android ecosystem) in India, which are aligned with the company’s vision to offer every Indian an equal opportunity to accelerate their progress by unlocking the flow of money and access to services.

Culture:

At PhonePe, we go the extra mile to make sure you can bring your best self to work, Everyday!. And that starts with creating the right environment for you. We empower people and trust them to do the right thing. Here, you own your work from start to finish, right from day one. PhonePe-rs solve complex problems and execute quickly; often building frameworks from scratch. If you’re excited by the idea of building platforms that touch millions, ideating with some of the best minds in the country and executing on your dreams with purpose and speed, join us!

Summary

We are seeking a highly motivated and experienced Site Reliability Engineer (SRE) with 7 to 12 years of experience to manage, scale, and ensure the high availability of our core infrastructure. This role involves deep expertise in cloud services, automation, monitoring, and complex networking to support a high-volume, mission-critical environment.

Key Responsibilities

  • Cloud & Infrastructure:Configure, maintain, and manage services and packages onUbuntu Virtual Machines in Azure. Design and manage Azure components for log storage, management, alerting, and monitoring.
  • Networking & Connectivity:Configure and maintain complex network components includingAzure Firewall, Route Tables, Virtual Network Gateways, and Express Route. Establish and manageIPsec and Express Routeconnectivity with external environments. Manage routing, troubleshooting connectivity issues, and support network component migrations with minimal downtime.
  • Automation & IaC:Drive automation for all BAU tasks usingTerraform, Saltstack,Ansible, and scripting languages. Write new Terraform code for infrastructure components.
  • Database & Data Management:Set up and manage high-availability services likeMysqlandAerospike. Implement database replication across regions, manage migrations, and ensure data sync. Handle backups of databases, logs, and configurations.
  • Monitoring & Observability:Implement and manage monitoring (e.g.,Prometheus, Victoria Metrics, Riemann) and centralized logging (Loki) solutions, with visualization onGrafana. Troubleshoot performance and system issues at the OS, platform, or application level.
  • Security & Compliance:Manage firewalls and integrate platform and VM-level services with the SOC. Collaborate with Infosec teams to evaluate and fix security vulnerabilities.
  • Capacity & Performance:Conduct proactive capacity planning. Manage critical infrastructure components likeNginx, HA Proxy, Docker, and RMQ.
  • Incident Management & DR:Participate in anon-call rotation. Structure and lead incident response,Root Cause Analysis (RCA), and post-mortem creation. Set up and support planning and execution of DR sites and failovers.

Required Technical Expertise

About PhonePe Limited:

Headquartered in India, its flagship product, the PhonePe digital payments app, was launched in Aug 2016. As of April 2025, PhonePe has over 60 Crore (600 Million) registered users and a digital payments acceptance network spread across over 4 Crore (40+ million) merchants. PhonePe also processes over 33 Crore (330+ Million) transactions daily with an Annualized Total Payment Value (TPV) of over INR 150 lakh crore.

PhonePe’s portfolio of businesses includes the distribution of financial products (Insurance, Lending, and Wealth) as well as new consumer tech businesses (Pincode - hyperlocal e-commerce and Indus AppStore Localized App Store for the Android ecosystem) in India, which are aligned with the company’s vision to offer every Indian an equal opportunity to accelerate their progress by unlocking the flow of money and access to services.

Culture:

At PhonePe, we go the extra mile to make sure you can bring your best self to work, Everyday!. And that starts with creating the right environment for you. We empower people and trust them to do the right thing. Here, you own your work from start to finish, right from day one. PhonePe-rs solve complex problems and execute quickly; often building frameworks from scratch. If you’re excited by the idea of building platforms that touch millions, ideating with some of the best minds in the country and executing on your dreams with purpose and speed, join us!

Summary

We are seeking a highly motivated and experienced Site Reliability Engineer (SRE) with 7 to 12 years of experience to manage, scale, and ensure the high availability of our core infrastructure. This role involves deep expertise in cloud services, automation, monitoring, and complex networking to support a high-volume, mission-critical environment.

Key Responsibilities

  • Cloud & Infrastructure:Configure, maintain, and manage services and packages onUbuntu Virtual Machines in Azure. Design and manage Azure components for log storage, management, alerting, and monitoring.
  • Networking & Connectivity:Configure and maintain complex network components includingAzure Firewall, Route Tables, Virtual Network Gateways, and Express Route. Establish and manageIPsec and Express Routeconnectivity with external environments. Manage routing, troubleshooting connectivity issues, and support network component migrations with minimal downtime.
  • Automation & IaC:Drive automation for all BAU tasks usingTerraform, Saltstack,Ansible, and scripting languages. Write new Terraform code for infrastructure components.
  • Database & Data Management:Set up and manage high-availability services likeMysqlandAerospike. Implement database replication across regions, manage migrations, and ensure data sync. Handle backups of databases, logs, and configurations.
  • Monitoring & Observability:Implement and manage monitoring (e.g.,Prometheus, Victoria Metrics, Riemann) and centralized logging (Loki) solutions, with visualization onGrafana. Troubleshoot performance and system issues at the OS, platform, or application level.
  • Security & Compliance:Manage firewalls and integrate platform and VM-level services with the SOC. Collaborate with Infosec teams to evaluate and fix security vulnerabilities.
  • Capacity & Performance:Conduct proactive capacity planning. Manage critical infrastructure components likeNginx, HA Proxy, Docker, and RMQ.
  • Incident Management & DR:Participate in anon-call rotation. Structure and lead incident response,Root Cause Analysis (RCA), and post-mortem creation. Set up and support planning and execution of DR sites and failovers.

Required Technical Expertise

  • Cloud Platform (Microsoft Azure):
    • Core Services:Deep, hands-on experience with Microsoft Azure components, includingVirtual Machines (Ubuntu/Linux),Azure Storage Accounts,CosmosDB, andAzure Data Explorer (ADX).
    • Networking:Expert-level knowledge in configuring and managing complex Azure networking components:Azure Firewall,Azure Route Tables,Virtual Network Gateways,Azure Express Route, andAzure Private DNS. Must be proficient in setting up and troubleshooting routing using protocols likeBGPwith on-prem DCs and managing network component migrations with minimal downtime.
    • Security/Compliance:Experience integrating platform and VM-level services with the Security Operations Center (SOC) and collaborating with Infosec teams on vulnerability evaluation and remediation.
  • Operating Systems & Scripting:
    • OS:Expert proficiency in Linux environments, specificallyUbuntu/Linux, for system administration, service configuration, and performance troubleshooting at the OS level.
    • High-Level Language:Deep expertise in at least one high-level language (Python, Go, or Java) for writing automation, services, and tooling.
    • Shell Scripting:Shell scripting (Bash)mastery is essential for day-to-day operational tasks and automation.
  • Monitoring, Observability & Logging:
    • Monitoring:Extensive experience implementing and maintaining modern monitoring systems such asPrometheus,Victoria Metrics, andRiemann.
    • Logging:Proficiency with centralized log management usingLokifor log ingestion, enrichment, lifecycle management, and providing a search/view platform.
    • Visualization:Expertise in creating and managing dashboards for visualization and alerting usingGrafana.
  • Configuration Management & IaC (Infrastructure as Code):
    • IaC:Mastery ofTerraformfor writing new component configurations and building automation for BAU (Business As Usual) tasks.
    • Configuration Management:Strong experience with configuration management tools likeSaltstack(or Ansible) for automated deployment and configuration of services on VMs.
  • Databases & Data Stores:
    • High-Availability Data Stores:Hands-on experience setting up, managing, and scaling high-availability databases likeMysqlandAerospike.
    • Time-Series/Search:Familiarity withElastic Searchand time-series databases likeInfluxDB.
    • Replication/DR:Expertise in database replication between different regions, managing database migrations, setting up circular replication, and ensuring data sync during system and network issues.
  • Core Infrastructure Services:
    • Web/Proxy:Expert management of critical infrastructure components likeNginxandHA Proxy, including proxy management, endpoint addition, header configuration, and writing rewrite rules.
    • Messaging/Container:Experience with messaging queues likeRMQ (RabbitMQ)and containerization technology likeDocker.
    • Networking Services:Deep knowledge ofDNSand other core network protocols.

Essential Soft Skills & Qualifications

  • Ownership and Accountability:A proactive approach to identifying and solving infrastructure challenges before they impact service availability.
  • Communication:Excellent written and verbal skills for documenting procedures, creating runbooks, and communicating with technical and non-technical stakeholders.
  • Mentorship:(For senior roles) Ability to mentor junior engineers and promote SRE best practices across the organization.
  • SLO/SLA Management:Experience defining, monitoring, and meetingService Level Objectives (SLOs)andService Level Indicators (SLIs)for critical services.
  • Toil Reduction:A commitment to measuring and actively reducing operational toil through automation (e.g., using SRE's Toil Reduction framework).
  • Cost Optimization:Experience identifying and implementing cloud resource optimization and cost-saving measures within the Azure environment.



PhonePe Full Time Employee Benefits (Not applicable for Intern or Contract Roles)

  • Insurance Benefits -Medical Insurance, Critical Illness Insurance, Accidental Insurance, Life Insurance
  • Wellness Program -Employee Assistance Program, Onsite Medical Center, Emergency Support System
  • Parental Support -Maternity Benefit, Paternity Benefit Program, Adoption Assistance Program, Day-care Support Program
  • Mobility Benefits -Relocation benefits, Transfer Support Policy, Travel Policy
  • Retirement Benefits -Employee PF Contribution, Flexible PF Contribution, Gratuity, NPS, Leave Encashment
  • Other Benefits -Higher Education Assistance, Car Lease, Salary Advance Policy

Our inclusive culture promotes individual expression, creativity, innovation, and achievement and in turn helps us better understand and serve our customers. We see ourselves as a place for intellectual curiosity, ideas and debates, where diverse perspectives lead to deeper understanding and better quality results. PhonePe is an equal opportunity employer and is committed to treating all its employees and job applicants equally; regardless of gender, sexual preference, religion, race, color or disability. If you have a disability or special need that requires assistance or reasonable accommodation, during the application and hiring process, including support for the interview or onboarding process,please fill out this form.

Read more about PhonePeon our blog.

Life at PhonePe

PhonePe in the news

Ready to Apply?

Apply for this Position

You'll be redirected to the company's application page

Share this job:

Job Information

Source: greenhouse
Remote Type: onsite
Allowed Locations: Bangalore
Skills & Tags:
Site Reliability

Get Jobs Like This

New Phonepe jobs and similar roles, straight to your inbox.

Weekly digest. Unsubscribe anytime.

🏙️

Considering Relocating for This Job?

Before you apply, see how far your salary will go in Bangalore. Compare take-home pay, rent, food & transport costs vs other tech cities.

Check Cost of Living →