Mô tả công việc
Introduction
With over a decade of experience in IT and fintech, Blue Belt has become a leading software development company, delivering innovative technology solutions to a diverse global clientele. We specialize in developing web, mobile, payment, and blockchain applications that offer seamless user experiences. Headquartered in Tokyo, Japan, with a state- of- the- art Technology Hub in Hanoi, Vietnam, Blue Belt operates in more than ten countries, including Japan, Thailand, Indonesia, the Philippines, Malaysia, Taiwan, and Brazil. Our team of over 200 professionals brings a wealth of expertise to drive our global operations.
Infrastructure Automation & CI/CD
We are looking for a highly skilled DevOps Engineer to join our team, with a focus on deploying, scaling, and maintaining infrastructure for conversational AI and chatbot systems. You will work closely with AI engineers, software developers, and product teams to automate workflows, ensure high availability, and optimize performance for AI- driven applications
Job Description
Design, implement, and maintain CI/CD pipelines for chatbot and AI services.
Automate environment provisioning using tools like Terraform, Ansible, or Pulumi.
Integrate testing and deployment workflows to support agile delivery cycles.
Cloud Infrastructure Management
Build and manage infrastructure on cloud platforms AWS, tailored for AI workloads.
Implement secure and scalable architectures for real- time chatbot interactions.
Monitoring, Logging & Incident Management
Define and enforce SLOs/SLAs for chatbot uptime and response time.
Set up monitoring tools (Prometheus, Grafana, ELK, or Datadog) for proactive alerting.
Setup Logging Centralized using EFK (ElasticSearch, Fluentbit, Kibana)
Lead incident response and root cause analysis for system failures.
Security & Compliance
Perform ad- hoc DevOps tasks as required, including emergency patches, incident support, or rapid deployment of security updates.
Ensure best practices in infrastructure security (IAM, VPC, secrets management).
Support compliance efforts for data protection (GDPR, SOC2) in chatbot data pipelines.
AI Deployment Model
Collaborate with teams to containerize and deploy NLP models (e.g., with Docker, Kubernetes).
Manage GPU/TPU workloads, including dynamic scaling and resource optimization.
Monitor model inference performance and latency across staging and production environments.
Optimize cost, compute, and storage strategies for high- volume inference and training.