Our client is the leading insights company that helps organizations make smarter commercial and product decisions through large-scale, high-quality data platforms.
Job Responsibilities
SQL Databases
- Design and evolve schemas for OLTP and OLAP workloads (Azure SQL, Synapse, Delta Lake), including partitioning, indexing, and row-level security in multi-tenant environments.
- Define and manage data contracts, versioning, and schema evolution; implement CDC and SCD patterns.
- Optimize performance through query tuning, resource configuration, caching strategies, and cost controls.
Data Pipelines
- Architect and build ELT/ETL workflows across batch and streaming using Azure Data Factory/Synapse/Databricks, Event Hubs/Service Bus, Functions, and containerized workloads (Container Apps/AKS)
- Deliver reliable and observable pipelines (idempotent, retryable, lineage-aware) with defined SLAs/SLOs and operational runbooks.
- Implement CI/CD for data workloads (SQL/dbt projects, PySpark jobs, automated testing) using GitHub Actions and infrastructure-as-code (Terraform/Bicep).
Data Enrichment & Modeling
- Design and manage enrichment layers such as standardized identifiers, metadata extraction, taxonomies, embeddings, and third-party data integrations.
- Curate gold/semantic data models for analytics and internal/external data services, owning feature and metric definitions with clear documentation.
- Collaborate with Data Science and ML teams to productionize feature stores, model outputs, drift monitoring, and evaluation datasets.
Azure Architecture & Governance
- Own reference data architecture across ADLS Gen2, Synapse/Databricks, Azure SQL/SQL Server, Cosmos DB, Azure AI Search, Key Vault, and Purview.
- Embed security, privacy, and compliance by default, including encryption, secret management, RBAC/ABAC, retention policies, and GDPR/SOC-aligned controls.
- Drive observability using OpenTelemetry and Azure Monitor/Application Insights, along with data quality checks, freshness SLAs, and lineage tracking.
Examples of What Youll Build
- A scalable media ingestion and enrichment pipeline that validates assets, extracts metadata, generates embeddings, tracks lineage, and publishes analytics-ready and search-ready views.
- A hybrid retrieval layer (vector search plus structured filters) using Cosmos DB and Azure AI Search to support similarity search and recommendation use cases.
Qualifications
- Advanced Python and SQL skills, with experience analyzing complex query plans and working across PySpark and pandas.
- 7+ years of experience in data engineering or data architecture with end-to-end ownership of production databases and pipelines.
- Strong hands-on experience with Azure data services, including ADLS Gen2, Data Factory/Synapse/Databricks, Azure SQL/SQL Server, Functions, Event Hubs/Service Bus, and Key Vault.
- Solid foundation in data modeling (star/snowflake, Data Vault, Lakehouse), CDC/SCD patterns, and semantic modeling (dbt or equivalent).
- Proven experience implementing data quality, lineage, and performance/cost guardrails at scale.
- Strong understanding of multi-tenant SaaS architectures, data security, and privacy principles (including core GDPR concepts).
Nice to Have
- Experience with Cosmos DB (including vector capabilities) and Azure AI Search; exposure to image and text embedding pipelines.
- Background in feature stores, MLflow or similar model registries, and real-time inference pipelines.
- Knowledge of SQL Server internals, PolyBase, or serverless SQL; familiarity with PostgreSQL.
- Experience implementing data governance frameworks, Purview, or data product operating models.
Our Clients Technology Stack
- Cloud & Data: Azure (ADLS Gen2, Data Factory, Synapse, Databricks, Functions, Event Hubs, Key Vault, Monitor)
- Storage & Compute: Delta/Parquet, Azure SQL/SQL Server, Cosmos DB (vector), Azure AI Search
- Languages & Tools: Python (pandas, PySpark, FastAPI for data services), dbt (or equivalent), GitHub Actions, Terraform/Bicep
- Observability:
OpenTelemetry, Azure Monitor/App Insights, Sentry/Datadog (as applicable)