Overview
Team: Infrastructure, Platform & Site Reliability Engineering
Employment Type: Full-Time, Permanent
Work Style: Hybrid
Infrastructure
Job Overview
At Datemil Inc., our platforms depend on stable, scalable, and secure systems to support a growing user base across multiple products. As a Senior Reliability Engineer – Platform & Infrastructure, you will help design, build, and operate the infrastructure that keeps our services reliable, performant, and available at all times.
This role focuses on system reliability, production stability, automation, and scalability across cloud and hybrid environments. You will work closely with Engineering, Data, Security, and Product teams to ensure our systems are built to handle growth while maintaining strong uptime and performance.
We are looking for an experienced engineer who enjoys solving complex operational problems, improving system reliability, and building tools that make production environments safer and easier to manage.
Apply for this role
Review the full role details below, then reach out to the team to continue the application process for this position.
Team: Infrastructure, Platform & Site Reliability Engineering
Employment Type: Full-Time, Permanent
Work Style: Hybrid
Reliability & Production Stability
Maintain high availability of production systems.
Monitor system health, uptime, and performance.
Respond to incidents and outages.
Perform root cause analysis and implement long-term fixes.
Improve reliability across services and infrastructure.
Infrastructure & Platform Engineering
Design and maintain cloud and hybrid infrastructure.
Support backend services, APIs, and data systems.
Improve system scalability and performance.
Help define platform architecture standards.
Automation & Infrastructure as Code
Automate operational tasks and deployments.
Maintain infrastructure-as-code environments.
Improve CI/CD pipelines.
Reduce manual work through scripting and tooling.
Cloud & Distributed Systems
Support cloud environments (AWS, GCP, Azure, or similar).
Manage networking, compute, storage, and load balancing.
Ensure secure and efficient service communication.
Optimize resource usage and cost.
Monitoring, Logging & Observability
Build and maintain monitoring and alerting systems.
Improve logging and tracing.
Ensure visibility across services.
Help teams detect problems early.
Performance & Capacity Planning
Analyze system performance.
Identify bottlenecks.
Plan for growth and traffic increases.
Ensure systems scale safely.
Security & Reliability Practices
Work with Security teams to maintain safe environments.
Implement access controls and protections.
Support secure deployment processes.
Ensure production best practices.
Cross-Functional Collaboration
Work with Engineering, Product, Security, and Data teams.
Support new feature launches.
Provide operational guidance during development.
Help teams design reliable systems.
Incident Response & Operational Excellence
Participate in incident response.
Improve runbooks and procedures.
Support on-call rotations when needed.
Drive continuous improvement in reliability practices.
Documentation & Standards
Maintain technical documentation.
Define operational guidelines.
Support change management processes.
Promote reliability best practices.
Typically requires 5–8 years of experience in Site Reliability, Infrastructure, DevOps, or Platform Engineering.
Strong experience with Linux systems.
Experience with cloud platforms (AWS, GCP, Azure, or similar).
Experience supporting production environments.
Strong scripting or programming skills (Python, Bash, Go, or similar).
Experience with monitoring and logging tools.
Experience with CI/CD pipelines.
Strong troubleshooting skills.
Comfortable working in fast-moving environments.
Strong attention to detail.
Experience with Kubernetes or container platforms.
Experience with Infrastructure as Code (Terraform, CloudFormation, etc.).
Experience with distributed systems.
Experience with high-traffic platforms.
Experience with networking fundamentals.
Experience with security best practices.
Salary Range: $130,000 – $155,000 annually, depending on experience and qualifications.
Datemil Inc. is the parent company of Datemil Date, V.I.Pursuit, Concierge Matchmaking, Plink Bestie, Networking, and Coaching. Our platforms rely on strong infrastructure, reliable systems, and thoughtful engineering to support safe and meaningful connections.
We encourage responsible use of automation and AI tools to improve system reliability and operational efficiency while maintaining strong human oversight for production environments.
Candidates may use AI tools during the application process but should not use them to misrepresent experience.
Datemil Inc. is committed to building an inclusive workplace where everyone is respected and supported. We welcome applicants from all backgrounds and provide reasonable accommodations throughout the hiring process.
We may use AI-assisted tools during recruitment, such as transcription or resume matching. These tools help efficiency, but all hiring decisions are made by people.
Participation in AI-supported interviews is optional.
Work on large-scale production systems
Build reliable infrastructure
Collaborate with engineering and security teams
Support growing platforms
Help design systems built for scale
Ready to apply for this role?
If this position matches your background and interest, continue to the application contact flow for this specific role.
Cookies
Accept all cookies or manage your preferences at any time.