Our client is the leading insights company that helps organizations make smarter commercial and product decisions through large-scale, high-quality data platforms.
Job Responsibilities
SQL DatabasesDesign and evolve schemas for OLTP and OLAP workloads (Azure SQL, Synapse, Delta Lake), including partitioning, indexing, and row-level security in multi-tenant environments.Define and manage data contracts, versioning, and schema evolution; implement CDC and SCD patterns.Optimize performance through query tuning, resource configuration, caching strategies, and cost controls.
Data Pipelines
Architect and build ELT/ETL workflows across batch and streaming using Azure Data Factory/Synapse/Databricks, Event Hubs/Service Bus, Functions, and containerized workloads (Container Apps/AKS)Deliver reliable and observable pipelines (idempotent, retryable, lineage-aware) with defined SLAs/SLOs and operational runbooks.Implement CI/CD for data workloads (SQL/dbt projects, PySpark jobs, automated testing) using GitHub Actions and infrastructure-as-code (Terraform/Bicep).
Data Enrichment & Modeling
Design and manage enrichment layers such as standardized identifiers, metadata extraction, taxonomies, embeddings, and third-party data integrations.Curate gold/semantic data models for analytics and internal/external data services, owning feature and metric definitions with clear documentation.Collaborate with Data Science and ML teams to productionize feature stores, model outputs, drift monitoring, and evaluation datasets.
Azure Architecture & Governance
Own reference data architecture across ADLS Gen2, Synapse/Databricks, Azure SQL/SQL Server, Cosmos DB, Azure AI Search, Key Vault, and Purview.Embed security, privacy, and compliance by default, including encryption, secret management, RBAC/ABAC, retention policies, and GDPR/SOC-aligned controls.Drive observability using OpenTelemetry and Azure Monitor/Application Insights, along with data quality checks, freshness SLAs, and lineage tracking.
Examples of What Youll BuildA scalable media ingestion and enrichment pipeline that validates assets, extracts metadata, generates embeddings, tracks lineage, and publishes analytics-ready and search-ready views.A hybrid retrieval layer (vector search plus structured filters) using Cosmos DB and Azure AI Search to support similarity search and recommendation use cases.
Qualifications
Advanced Python and SQL skills, with experience analyzing complex query plans and working across PySpark and pandas.7+ years of experience in data engineering or data architecture with end-to-end ownership of production databases and pipelines.Strong hands-on experience with Azure data services, including ADLS Gen2, Data Factory/Synapse/Databricks, Azure SQL/SQL Server, Functions, Event Hubs/Service Bus, and Key Vault.Solid foundation in data modeling (star/snowflake, Data Vault, Lakehouse), CDC/SCD patterns, and semantic modeling (dbt or equivalent).Proven experience implementing data quality, lineage, and performance/cost guardrails at scale.Strong understanding of multi-tenant SaaS architectures, data security, and privacy principles (including core GDPR concepts).
Nice to Have
Experience with Cosmos DB (including vector capabilities) and Azure AI Search; exposure to image and text embedding pipelines.Background in feature stores, MLflow or similar model registries, and real-time inference pipelines.Knowledge of SQL Server internals, PolyBase, or serverless SQL; familiarity with PostgreSQL.Experience implementing data governance frameworks, Purview, or data product operating models.
Our Clients Technology Stack
Cloud & Data: Azure (ADLS Gen2, Data Factory, Synapse, Databricks, Functions, Event Hubs, Key Vault, Monitor)
Storage & Compute: Delta/Parquet, Azure SQL/SQL Server, Cosmos DB (vector), Azure AI Search
Languages & Tools: Python (pandas, PySpark, FastAPI for data services), dbt (or equivalent), GitHub Actions, Terraform/Bicep
Observability:
OpenTelemetry, Azure Monitor/App Insights, Sentry/Datadog (as applicable)
Read more...