Our client is the leading insights company, using market-leading data to drive better marketing decisions for global companies.
Job Responsibilities
SQL Databases
Design and refine schemas for OLTP and OLAP workloads (Azure SQL, Synapse, Delta Lake), incorporating partitioning, indexing, and row-level security for multi-tenant isolation.
Define and manage data contracts and versioning, oversee schema evolution, and implement CDC and SCD patterns.
Optimise performance through query tuning, resource configuration, caching strategies, and cost controls.
Data Pipelines
Architect and build ELT/ETL workflows across batch and streaming using Azure Data Factory/Synapse/Databricks, Event Hubs/Service Bus, Functions, and containerised workloads (Container Apps/AKS).
Deliver reliable, observable pipelines (idempotent, retryable, lineage-aware) with clear SLAs/SLOs and operational runbooks.
Implement CI/CD for data workloads (dbt/SQL projects, PySpark jobs, automated tests) using GitHub Actions and infrastructure-as-code (Terraform/Bicep).
Data Enrichment
Define and manage enrichment layers such as UPC/GS1, OCR/EXIF metadata, taxonomies, embeddings, and third-party data integrations.
Curate gold/semantic data models for analytics and product APIs, including ownership of feature/metric definitions and documentation.
Collaborate with Data Science/ML teams to productionise feature stores, model outputs, drift monitoring, and evaluation datasets.
Azure Architecture & Governance
Own the reference data architecture across ADLS Gen2, Synapse/Databricks, Azure SQL/SQL Server, Cosmos DB (including vector), Azure AI Search, Key Vault, and Purview.
Embed security and compliance by default: encryption, secret management, RBAC/ABAC, retention policies, and GDPR/SOC 2 aligned controls.
Drive observability using OpenTelemetry and Azure Monitor/App Insights, plus data quality checks, freshness SLAs, and lineage via Purview.
Examples of What Youll Build
A robust image ingestion and enrichment pipeline that validates assets, extracts OCR/UPC, computes embeddings, tracks lineage, and publishes search-ready views.
A hybrid retrieval layer (vector + filters) across Cosmos DB and Azure AI Search to power similarity search and recommendations.
Minimum Qualifications
Very strong Python and SQL skills (comfortable analysing complex query plans and working with both PySpark and pandas).
7+ years experience in data engineering/architecture with end-to-end ownership of production SQL databases and pipelines.
Deep hands-on experience with Azure data services: ADLS Gen2, Data Factory/Synapse/Databricks, Azure SQL/SQL Server, Functions, Event Hubs/Service Bus, Key Vault.
Solid background in data modeling (star/snowflake, Data Vault/Lakehouse), CDC/SCD patterns, and semantic modeling (dbt or equivalent).
Proven track record implementing data quality frameworks, lineage, and performance/cost guardrails at scale.
Strong understanding of multi-tenant SaaS architectures, security, and privacy (including core GDPR concepts).
Nice to Have
Experience with Cosmos DB (including vector capabilities) and Azure AI Search; exposure to embedding pipelines for images/text.
Background in feature stores, MLflow or similar model registries, and real-time inference pipelines.
Knowledge of SQL Server internals, PolyBase/Serverless SQL; familiarity with Postgres.
Experience rolling out Purview, governance frameworks, and data product operating models.
Our Clients Technology Stack
Cloud & Data: Azure (ADLS Gen2, Data Factory, Synapse, Databricks, Functions, Event Hubs, Key Vault, Monitor)
Storage & Compute: Delta/Parquet, Azure SQL/SQL Server, Cosmos DB (vector), Azure AI Search
Languages & Tools: Python (pandas, PySpark, FastAPI for data services), dbt (or equivalent), GitHub Actions, Terraform/Bicep
Observability:
OpenTelemetry, Azure Monitor/App Insights, Sentry/Datadog (as applicable)
Read more...