Mô tả công việc
ABOUT US
We are building a large- scale real- time sports data platform serving millions of users daily, requiring ultra- low latency and high reliability.
Our entire backend infrastructure runs on Google Cloud Platform (GCP) and Google Kubernetes Engine (GKE).
As we transition toward a full GitOps model for infrastructure and application deployment, we are looking for a DevOps Engineer with strong expertise in Kubernetes, GCP, and GitOps (ArgoCD) to lead this automation transformation.
JOB OBJECTIVES
Implement and manage GitOps workflows: Use ArgoCD to synchronize Kubernetes application states with declarative configurations stored in Git.
Ensure consistency and traceability: Manage all infrastructure and application configurations as code (IaC/CaC) to enable auditing, versioning, and easy rollback.
Build a complete CI/CD system: Integrate CI pipelines (build, test, push image) with ArgoCD for secure and efficient continuous deployment.
Optimize and monitor infrastructure: Operate and tune GCP & GKE environments for performance, cost efficiency, and high availability — especially for real- time workloads.
KEY RESPONSIBILITIES
GitOps & Deployment Management
Set up auto- sync and alerting for out- of- sync changes between Git and live clusters.
Design Git repository structures for manifests (Helm, Kustomize, YAML).
Install, configure, and manage ArgoCD for end- to- end application lifecycle on GKE.
CI/CD Automation
Integrate CI workflows to automatically update image tags in Git repositories, triggering ArgoCD deployments.
Build CI pipelines using GitLab CI or GitHub Actions to automatically build Docker images and push to Google Artifact Registry.
Infrastructure as Code (IaC)
Manage GCP resources (GKE, VPC, IAM) using Terraform.
Build and maintain secure, scalable Kubernetes clusters.
Monitoring, Optimization & Operations
Deploy monitoring and logging systems using Prometheus, Grafana, and Google Cloud Operations Suite.
Design effective auto- scaling strategies to handle real- time traffic spikes.
Ensure high availability (HA) and implement disaster recovery (DR) strategies.
Conduct FinOps optimization to manage and reduce cloud costs.
Collaborate with the Backend Team to operate infrastructure for data pipelines and message queues (e.g., Google Pub/Sub, Kafka).