YOUR MISSION:
Raise the bar on engineering and science quality: reproducibility, documentation, code reviews, and fit- for- purpose methods.
Architect and own the end- to- end R&D data platform (from collection to decisions), with clear SLAs, data contracts, and auditability.
Lead the analytics & modeling roadmap: from robust biostatistical analysis to forecasting, anomaly detection, and “what- if” optimization.
Turn messy factory/R&D streams into trusted, versioned datasets that power operations, science, and ML- driven planning.
KEY ACTIVITIES:
ML, forecasting & optimization: Ship baselines → robust models (e.g., xgboost/prophet/statsmodels) with backtests, SHAP/explainability, drift monitoring, and simple “what- if” tools tied to operational constraints.
Pipelines at scale: Design resilient ETL/ELT from lab/factory systems and files to curated tables (batch + near- real time); implement testing, monitoring, and rollback.
Decision- grade dashboards: Publish KPI and investigation views (Power BI/Grafana/Metabase) with freshness, QC flags, interval bands, and drill- downs to raw evidence.
Platform & governance: Define schemas/IDs, enforce data contracts and validation at ingest, set freshness/quality SLAs, and implement lineage, change logs, and WORM archives.
Biostatistics & inference: Specify study designs (power, randomization, blocking), run GLM/GLMM/ANCOVA/repeated- measures with diagnostics, and deliver effect- size, CI, and “so- what” narratives.
Operating model & mentoring: Establish coding standards, reviews, and reproducible notebooks; coach technicians/analysts; partner with Ops/QA/Scientists to turn questions into experiments and products.
WHAT WE NEED FROM YOU
Experience: 4+ years working with data in R&D or factory/production environments, delivering real tools used by non- data teams.
Skills (must- have):
- SQL & data modeling (advanced): design tables/IDs, write clean queries, build validation checks.
- Communication (strong): plain- English summaries, simple SOPs, clear recommendations.
- Python or R (advanced wrangling; solid modeling): pandas/tidyverse; basic ML (scikit- learn/xgboost or similar); good coding hygiene (version control).
- Dashboards/BI (proficient): Power BI/Grafana/Metabase; clear visuals with freshness/quality flags.
Strong plus (biostatistics): experiment design (control vs test, power, blocking), GLM/GLMM/ANOVA/ANCOVA, repeated measures; basic diagnostics.
Proof you’ve done it: examples of (1) a database or pipeline you owned, (2) a dashboard used by stakeholders, (3) an analysis that informed a decision, and (4) ideally a small forecast/anomaly model with a backtest.
Mindset: practical, improvement- oriented, comfortable collaborating with Ops, QA, and Scientists; you make complex things simple and usable.