Data Platform Engineering

RIVM DATA PLATFORM

Centralised Azure & Databricks data platform with a self-service data onboarding UI for the Dutch National Institute for Public Health.

Year

2025

Medium

Data Platform Engineering

Status

Archived

Client
Overview

As Senior Lead Cloud Data Platform Engineer at Rijksinstituut voor Volksgezondheid en Milieu (RIVM), NexOps owned the complete solution design and development of a centralised Azure-based data platform serving as the backbone for data-driven decision-making across a national public health organisation. The platform combines Azure Databricks, Infrastructure as Code, CI/CD automation, and AI/LLM capabilities to enable secure, scalable data processing and analytics. A key deliverable was a self-service data onboarding application enabling business users to configure and deploy data ingestion pipelines through a visual UI, with automated validation, deployment gate approvals, Terraform provisioning, and Databricks Asset Bundle deployment. The entire solution was designed and built by NexOps with security by design — VNet integration, private endpoints, Azure Key Vault, and managed identity throughout. Completed December 2025.

Approach

METHODOLOGY

  • 01

    Pioneered a multidisciplinary requirements discovery phase — engaging epidemiologists, public health advisors, data stewards, lab scientists, and policy makers across RIVM departments to translate complex domain-specific needs into platform requirements, shaping the self-service model, security classifications, and GDPR compliance approach

  • 02

    Owned complete solution design and development — from architecture and security reviews to production deployment, with no inherited codebase or external dependencies

  • 03

    Architected end-to-end Azure data infrastructure using Terraform and Bicep with multi-environment strategy (dev, test, production) including proper isolation, security boundaries, VNet integration, and private endpoints for all Azure services

  • 04

    Deployed and configured Databricks Unity Catalog for centralised data governance, fine-grained access control, and data lineage tracking

  • 05

    Built comprehensive Azure DevOps CI/CD pipelines with GitOps workflows, automated validation, infrastructure drift detection, and deployment gate approvals requiring data owner sign-off before provisioning

  • 06

    Designed and built a self-service web application enabling business users to visually configure data ingestion pipelines with live database/fileshare/HTTP source introspection, AI-powered column documentation (Azure OpenAI), strict schema validation, and YAML export

  • 07

    Implemented automated end-to-end deployment: UI triggers Azure DevOps pipeline → data owner approval gates → Terraform provisions Unity Catalog schemas and permissions → Databricks Asset Bundles deploy ingestion workflow jobs with schedules and configuration

  • 08

    Enabled scalable ETL/ELT workflows using Databricks Asset Bundles & Workflows with PySpark, Delta Lake, and medallion architecture patterns

  • 09

    Ensured security by design throughout — Azure VNet integration, private endpoints for all Azure services, network-integrated fileshare access, government-standard data sensitivity classification, PII flagging, GDPR compliance fields, and AD group-based access control

  • 10

    Containerised the platform with Docker (multi-stage build) deployed via Azure DevOps to Azure Container Registry and Azure App Service with managed identity, with Databricks Apps as an alternative deployment target

  • 11

    Developed LLM operations frameworks, AI-powered solutions, and multi-agent AI systems as proof of concept using LangChain, PydanticAI, Azure AI Foundry, and MLflow

Results

OUTCOMES

  • Established centralised data platform — single source of truth for organisational data with proper governance

  • Reduced data source onboarding time from days to hours through the self-service configuration platform — business users onboard independently without data engineer involvement

  • Automated the full provisioning chain: configuration → approval → Terraform (schemas, permissions, storage) → DAB deployment (ingestion jobs, schedules) — zero manual steps after data owner approval

  • Reduced infrastructure provisioning time from weeks to hours through Terraform and Bicep automation

  • Achieved 99.9% platform uptime with automated monitoring and remediation, plus 30% cost reduction through auto-scaling and resource tagging

  • Achieved compliance with national security standards and audit requirements through security by design — VNet integration, private endpoints, Key Vault, managed identity, data sensitivity classification, and PII controls

Next Step

NEED SIMILAR RESULTS?

We deliver production-grade data platforms and AI solutions for enterprise clients. Tell us about your challenge.

Currently accepting projects for Q3 2026