Yêu cầu công việc
What You&039;ll Do (Key Responsibilities)
- Contribute to disaster recovery planning, backup strategies, and business continuity initiatives.
- Deploy, configure, and operate critical data infrastructure including PostgreSQL, MongoDB, OpenSearch, service buses, message queues, and serverless functions (Lambda).
- Architect and implement containerized solutions using Docker and orchestrate complex deployments on Kubernetes (K8s) clusters.
- Support the deployment and scaling of AI/ML workloads, including model inference pipelines and data processing workflows.
- Build and maintain CI/CD pipelines for automated deployment and infrastructure- as- code practices using tools like Terraform, CloudFormation, or ARM templates.
- Implement monitoring, logging, and alerting solutions to ensure high availability and proactive incident response.- Champion cloud security best practices, including network security, identity and access management, and compliance requirements.
- Design, deploy, and maintain robust, scalable, and secure cloud infrastructure across multiple cloud providers (AWS, Azure, GCP) and on- premise environments.
- Collaborate closely with development teams to optimize application performance, scalability, and reliability in cloud- native and hybrid environments.
What You&039;ll Bring (Qualifications)
Must- Haves:
- Strong problem- solving skills and ability to work in a fast- paced startup environment.
- Production experience with Kubernetes (K8s) including deployment, scaling, and troubleshooting.
- Hands- on experience deploying and operating:
+ Database systems (PostgreSQL, MongoDB).
+ Message queues and service buses (RabbitMQ, Apache Kafka, AWS SQS, Azure Service Bus).
+ Serverless functions (AWS Lambda, Azure Functions, Google Cloud Functions).
+ Search platforms (OpenSearch/Elasticsearch).
- Proficiency with at least one major cloud platform (AWS, Azure, Google Cloud, or Oracle Cloud).
- Solid understanding of microservices architecture and distributed systems.
- Experience with on- premise infrastructure deployment and hybrid cloud architectures.
- Minimum of 3 years of professional experience in cloud engineering, DevOps, or infrastructure roles.
- Strong expertise in Docker containerization and Linux system administration.
Huge Plus / Preferred Qualifications:
- Experience with both AWS and Azure cloud platforms.
- Hands- on experience with AI Operations (AI Ops) including:
- Excellent communication skills and collaborative mindset for working with cross- functional teams.
- Experience with GitOps workflows and automated deployment pipelinesUnderstanding of cost optimization strategies for cloud resources.
- Text- to- Speech (TTS) services.
- ML model serving and inference optimization.
- Experience with monitoring and observability tools (Prometheus, Grafana, ELK stack)
- Knowledge of network architecture, VPNs, and security protocols.
- Large Language Model (LLM) deployment and scaling.
- Automatic Speech Recognition (ASR) systems.
- Infrastructure- as- Code experience with Terraform, CloudFormation, or similar tools.