Search Jobs

So sorry, this position is no longer available. Please go ahead and submit your application. We may have other positions that would be the perfect fit for you. Alternatively, you may want to apply to one of the following related jobs:

Site Reliability Engineering (SRE)

Bogot, DC 11023

Posted: 02/04/2026 Industry: IT/ Software Development Job Number: 26-16211 Pay Rate: 14-15 USD/Hour

Job Description

Description:
  • Design, deploy and configure various customer facing infrastructures, application, and services
  • Design and manage Cloud infrastructure and services that meet enterprise grade SLA standards
  • Resolve customer escalations and help prevent reiteration of those incidents by creating processes, procedures, and automations
  • Monitor, diagnose, and resolve urgent production issues during period potentially off normal business hours
  • Create and deploy scalable monitoring systems for massively growing global infrastructure
  • Write, augment, and maintain production documentations

Technical Skills
  • Incident Response & Production Support: Skilled in triaging live outages, assessing blast radius, and driving rapid mitigation.
  • Observability & Monitoring (Datadog): Experienced with metrics, logs, traces, alert tuning, and noise reduction.
  • Kubernetes Operations: Troubleshooting pods, deployments, restarts, resource constraints, and service health.
  • AWS Fundamentals: Proficient in EC2, IAM, networking basics, and queues/events.
  • Incident Management Platforms: Working knowledge of ServiceNow, Jira, and Incident.Io for incident lifecycle tracking.
  • Root Cause Investigation: Systems thinking across distributed services, dependencies, and failure modes.
  • Automation Mindset: Ability to script or build lightweight solutions to reduce repetitive operational toil.
  • AI First Tool Adoption: Leveraging Bedrock/Claude to accelerate analysis, documentation, and operational workflows.

Professional Skills
  • Bridge Call Leadership: Confident in running live incident calls with structure, calmness, and urgency.
  • Clear Written Communication: Strong in incident updates, stakeholder messaging, and postmortem documentation.
  • Autonomy in Ambiguity: Effective in dynamic, remote, multi timezone environments.
  • Cross Team Collaboration: Skilled at engaging partner engineering teams and driving alignment during response.
  • Operational Ownership: Accountable beyond mitigation, ensuring fixes, learnings, and improvements are implemented.
  • Service Desk Discipline: Familiar with high volume ticket workflows and structured incident processes.

Meet Your Recruiter

Apply Online

Send an email reminder to:

Share This Job:

Related Jobs:

Login to save this search and get notified of similar positions.