Senior Data Engineer

Apply Now

Full Time

Remote

Our client is the leading insights company that helps organizations make smarter commercial and product decisions through large-scale, high-quality data platforms.

Job Responsibilities

SQL Databases

  • Design and evolve schemas for OLTP and OLAP workloads (Azure SQL, Synapse, Delta Lake), including partitioning, indexing, and row-level security in multi-tenant environments.
  • Define and manage data contracts, versioning, and schema evolution; implement CDC and SCD patterns.
  • Optimize performance through query tuning, resource configuration, caching strategies, and cost controls.

Data Pipelines

  • Architect and build ELT/ETL workflows across batch and streaming using Azure Data Factory/Synapse/Databricks, Event Hubs/Service Bus, Functions, and containerized workloads (Container Apps/AKS)
  • Deliver reliable and observable pipelines (idempotent, retryable, lineage-aware) with defined SLAs/SLOs and operational runbooks.
  • Implement CI/CD for data workloads (SQL/dbt projects, PySpark jobs, automated testing) using GitHub Actions and infrastructure-as-code (Terraform/Bicep).

Data Enrichment & Modeling

  • Design and manage enrichment layers such as standardized identifiers, metadata extraction, taxonomies, embeddings, and third-party data integrations.
  • Curate gold/semantic data models for analytics and internal/external data services, owning feature and metric definitions with clear documentation.
  • Collaborate with Data Science and ML teams to productionize feature stores, model outputs, drift monitoring, and evaluation datasets.

Azure Architecture & Governance

  • Own reference data architecture across ADLS Gen2, Synapse/Databricks, Azure SQL/SQL Server, Cosmos DB, Azure AI Search, Key Vault, and Purview.
  • Embed security, privacy, and compliance by default, including encryption, secret management, RBAC/ABAC, retention policies, and GDPR/SOC-aligned controls.
  • Drive observability using OpenTelemetry and Azure Monitor/Application Insights, along with data quality checks, freshness SLAs, and lineage tracking.

Examples of What Youll Build

  • A scalable media ingestion and enrichment pipeline that validates assets, extracts metadata, generates embeddings, tracks lineage, and publishes analytics-ready and search-ready views.
  • A hybrid retrieval layer (vector search plus structured filters) using Cosmos DB and Azure AI Search to support similarity search and recommendation use cases.

Qualifications

  • Advanced Python and SQL skills, with experience analyzing complex query plans and working across PySpark and pandas.
  • 7+ years of experience in data engineering or data architecture with end-to-end ownership of production databases and pipelines.
  • Strong hands-on experience with Azure data services, including ADLS Gen2, Data Factory/Synapse/Databricks, Azure SQL/SQL Server, Functions, Event Hubs/Service Bus, and Key Vault.
  • Solid foundation in data modeling (star/snowflake, Data Vault, Lakehouse), CDC/SCD patterns, and semantic modeling (dbt or equivalent).
  • Proven experience implementing data quality, lineage, and performance/cost guardrails at scale.
  • Strong understanding of multi-tenant SaaS architectures, data security, and privacy principles (including core GDPR concepts).

Nice to Have

  • Experience with Cosmos DB (including vector capabilities) and Azure AI Search; exposure to image and text embedding pipelines.
  • Background in feature stores, MLflow or similar model registries, and real-time inference pipelines.
  • Knowledge of SQL Server internals, PolyBase, or serverless SQL; familiarity with PostgreSQL.
  • Experience implementing data governance frameworks, Purview, or data product operating models.

Our Clients Technology Stack

  • Cloud & Data: Azure (ADLS Gen2, Data Factory, Synapse, Databricks, Functions, Event Hubs, Key Vault, Monitor)
  • Storage & Compute: Delta/Parquet, Azure SQL/SQL Server, Cosmos DB (vector), Azure AI Search
  • Languages & Tools: Python (pandas, PySpark, FastAPI for data services), dbt (or equivalent), GitHub Actions, Terraform/Bicep
  • Observability: OpenTelemetry, Azure Monitor/App Insights, Sentry/Datadog (as applicable)
Apply Now