Yêu cầu công việc
Technical Skills
Ability to define, architect, and implement modern data architecture patterns (Medallion, data mesh, data product approach).
Expert- level Databricks skills (SparkSQL, PySpark, Spark DataFrames) and open table formats (Delta Lake, Apache Iceberg).
Extensive experience designing and implementing highly scalable streaming and batch data ingestion frameworks (Kafka, Autoloader, APIs, SFTP) and data/file formats (CSV, JSON, YAML).
Deep expertise in columnar storage formats, advanced performance tuning, and optimization strategies (Parquet, ORC, Z- Order, clustering).
Mastery of dbt (core/cloud) and advanced SQL for complex analytical transformations, including performance optimization. Expertise in establishing and enforcing data quality, testing, and governance frameworks (Great Expectations, dbt tests, data contracts).
Leadership in defining and implementing DevOps & infrastructure- as- code strategies (GitLab/GitHub CI/CD, Terraform). Proven ability to design and implement comprehensive observability & monitoring solutions (logging, alerting, pipeline performance tracking).
Experience with event- driven architectures (AWS EventBridge, GCP Pub/Sub, Azure Event Grid).
Expert Python engineering skills, leading best practices in software engineering (version control, modularity, testing).
Architect- level cloud platform expertise (AWS, GCP, or Azure) with deep experience in multiple warehouses (BigQuery, Redshift, Synapse). Knowledge and implementation of security and compliance in cloud data environments (RBAC, data masking, encryption, GDPR/CCPA) and implementation of cost optimization strategies for cloud data platforms.
Leadership in defining and implementing DevOps & infrastructure- as- code strategies (GitLab/GitHub CI/CD, Terraform). Proven ability to design and implement comprehensive observability & monitoring solutions (logging, alerting, pipeline performance tracking).
Professional Skills
Hands- on experience with real- time analytics and low- latency serving layers (e.g., Apache Flink, Materialize, Rockset).
Practical experience with vector databases (Pinecone, Weaviate, ChromaDB) or semantic search in AI workflows.
Solid experience with machine learning pipelines and MLOps (MLflow, Vertex AI, SageMaker, Azure ML).
Demonstrated experience in leading large data teams, driving collaboration with business, analysts, and data scientists, and influencing technical direction.
Proven ability in data product design and domain- driven design in data platforms.