Azure Databricks
Platform Engineering
End-to-end platform engineering across Azure Databricks, Azure DevOps CI/CD, and Entra ID governance — built for the scale, security, and compliance demands of a top-tier financial institution.
Casino Gaming App CI/CD
Multi-stage YAML pipeline that builds, tests, and deploys the NeonDeck casino web application — slot machines, poker tables, and live dealer interfaces.
# NeonDeck Casino Gaming App — Azure DevOps Deployment Pipeline # Build React front-end, run Playwright tests, deploy to Azure App Service trigger: branches: include: [main, release/*] paths: include: [src/**, public/**, playwright/**] variables: - group: neondeck-casino-prod # Key Vault-linked variable group - name: appServiceName value: app-neondeck-casino-eastus2 stages: - stage: Build displayName: Build & Unit Test jobs: - job: BuildApp pool: { vmImage: ubuntu-latest } steps: - task: NodeTool@0 inputs: { versionSpec: '20.x' } - script: npm ci && npm run build displayName: Install & Build - script: npm run test:unit -- --coverage displayName: Jest Unit Tests - task: PublishBuildArtifacts@1 inputs: { PathtoPublish: dist/, ArtifactName: casino-app } - stage: E2E displayName: Playwright E2E Tests dependsOn: Build jobs: - job: PlaywrightTests steps: - script: npx playwright install --with-deps - script: npx playwright test --project=chromium displayName: Casino UI E2E Suite - stage: DeployStaging displayName: Deploy → Staging Slot dependsOn: E2E jobs: - deployment: DeploySlot environment: staging strategy: runOnce: deploy: steps: - task: AzureWebApp@1 inputs: azureSubscription: neondeck-svc-conn appName: $(appServiceName) deployToSlotOrASE: true slotName: staging package: $(Pipeline.Workspace)/casino-app/** - stage: Production displayName: Swap → Production dependsOn: DeployStaging condition: and(succeeded(), eq(variables['Build.SourceBranch'], 'refs/heads/main')) jobs: - deployment: SwapSlots environment: production # requires manual approval gate strategy: runOnce: deploy: steps: - task: AzureAppServiceManage@0 inputs: Action: Swap Slots SourceSlot: staging
Pipeline Simulator
Click Run to simulate deploying the NeonDeck casino gaming front-end — from PR merge through build, test, and slot swap to production.
| Run | Commit | Tests | Coverage | Slot | Status | Time |
|---|
Interview: Azure DevOps Pipeline Administration
"At NeonDeck, I administered 200+ Azure DevOps pipelines across the casino gaming and content delivery platforms. I designed multi-stage YAML pipelines with staging slot deployments and approval gates requiring lead engineer sign-off before production swap. Build artifacts were published to Azure Artifacts, and every PR required passing Jest unit tests plus a Playwright E2E suite covering the slot machine, poker table, and live dealer UIs. I configured Key Vault-linked variable groups so zero secrets existed in pipeline definitions, and built service connections using Entra-registered service principals scoped to least-privilege resource groups."
Banking Analytics Platform
Medallion architecture on Azure Databricks processing daily ACH, wire transfer, and mortgage origination data — powering AML/BSA compliance, credit risk analytics, and DFAST/CCAR regulatory reporting for Western Alliance Bank.
# Delta Live Tables — Western Alliance Bank Analytics Platform # Medallion architecture: Bronze → Silver → Gold # Handles ACH, wire transfers, mortgage originations, and deposit activity import dlt from pyspark.sql.functions import col, when, sum, count, current_date # -------- BRONZE: Raw ACH/Wire transactions from ADLS Gen2 -------- @dlt.table( name="bronze_transactions", comment="Raw ACH/wire/mortgage transactions from core banking system" ) def bronze_transactions(): return ( spark.readStream.format("cloudFiles") .option("cloudFiles.format", "parquet") .option("cloudFiles.schemaLocation", "dbfs:/schemas/wab_txn") .load("abfss://raw@wabdatalake.dfs.core.windows.net/transactions/") ) # -------- SILVER: AML-screened, BSA-compliant, cleaned -------- @dlt.table(name="silver_transactions") @dlt.expect_or_drop("valid_amount", "amount > 0") @dlt.expect_or_drop("valid_account", "account_id IS NOT NULL") @dlt.expect_or_drop("valid_routing", "LENGTH(routing_number) = 9") def silver_transactions(): return ( dlt.read_stream("bronze_transactions") .dropDuplicates(["transaction_id"]) .withColumn("aml_risk_score", /* ML AML model inference */) .withColumn("ctr_flag", when(col("amount") >= 10000, True).otherwise(False)) .withColumn("sar_candidate", when(col("aml_risk_score") > 0.75, True).otherwise(False)) ) # -------- GOLD: Regulatory reporting + credit risk analytics -------- @dlt.table(name="gold_daily_analytics") def gold_daily_analytics(): return ( dlt.read("silver_transactions") .groupBy("transaction_date", "product_type", "region") .agg( sum("amount").alias("total_volume"), count("*").alias("transaction_count"), sum(when(col("ctr_flag") == True, 1).otherwise(0)).alias("ctr_filings"), sum(when(col("sar_candidate") == True, 1).otherwise(0)).alias("sar_candidates") ) )
Banking Pipeline Simulator
Click Run to simulate processing a daily batch of ACH, wire, and mortgage transactions through the Bronze → Silver → Gold medallion layers — with AML screening and BSA compliance checks at each stage.
Interview: Azure Databricks Platform Engineering — Banking
"In my previous role I owned the Azure Databricks platform supporting the bank's Enterprise Data & Analytics function — including workspace management, cluster policies, Unity Catalog governance, and job orchestration for 50+ data engineering and ML workloads. Our Delta Live Tables pipeline processes 500K+ daily transactions — ACH, wire transfers, and mortgage originations — ingested via Auto Loader from ADLS Gen2. The Silver layer enforces BSA/AML rules inline: CTR flags on transactions ≥$10,000, SAR candidates scored by an AML ML model, and column-level security on PII fields (SSN, account numbers, routing numbers) enforced via Unity Catalog. Gold tables feed DFAST/CCAR regulatory reporting and the credit risk scorecard used by the commercial real estate team. I integrated Azure Key Vault for secret management across all cluster configurations, implemented Azure Monitor alerts on cluster health and job SLA breaches, and reduced cluster compute costs 38% by enforcing auto-termination policies and right-sizing instance types through Databricks cluster policies."
Developer RBAC & Identity Governance
Every developer who deploys across the three CI/CD pipelines has role-based access controlled by Azure Entra ID — with PIM elevation, conditional access, and just-in-time permissions.
| Role | DevOps Pipelines | Databricks Notebooks | Databricks Clusters | Key Vault Secrets | Production Deploy | Access Method |
|---|---|---|---|---|---|---|
| App Developer | Run | None | None | None | None | Direct Entra Group |
| Data Engineer | View | Edit | Start/Stop | None | None | Direct Entra Group |
| Lead Engineer | Run + Edit | Edit | Manage | PIM 4hr | PIM 4hr | PIM Elevation |
| Platform Admin | Full | Full | Full | PIM 4hr | PIM + Approval | PIM + Manager Approval |
| Compliance | Audit | Audit | None | Read | None | Direct Entra Group |
Conditional Access & PIM
Zero-trust policies enforce MFA, device compliance, and time-boxed elevated access across all three deployment targets.
Developer Access Simulator
Click Run to simulate a developer requesting PIM elevation to deploy across all three CI/CD pipelines — watch conditional access checks, MFA verification, and role activation in real time.
Interview: Azure Entra ID & Identity Governance
"I managed Azure Entra ID for NeonDeck's 3,400-user organization — configuring SAML/OIDC SSO for Databricks and Azure DevOps, deploying Conditional Access policies requiring MFA and Intune-compliant devices, and implementing PIM with approval workflows capping all elevated roles at 4-hour windows. I built the SCIM provisioning integration between Entra groups and Databricks Unity Catalog, reducing the access provisioning process from a 2-day manual ticket to 5 minutes automated. Quarterly Entra access reviews feed directly into our SOC 2 Type II audit evidence. Every developer who touches production — whether deploying the casino app via DevOps, running a DLT pipeline in Databricks, or rotating a Key Vault secret — must pass through this identity governance layer."
Design Decisions & Tradeoffs
Every architectural choice has a cost. Here’s why we chose these patterns over the alternatives.
Production Implementation
Real configuration and code behind each of the three pipelines—copy-paste ready for your own projects.
Azure DevOps Pipeline YAML (azure-pipelines.yml)
trigger: branches: include: [main] paths: include: [src/**, Dockerfile] variables: acrName: crneondeck imageName: casino-app aksCluster: aks-neondeck-prod canaryPct: 5 stages: # ── Build: Docker image with layer caching ── - stage: Build jobs: - job: DockerBuild pool: { vmImage: ubuntu-latest } steps: - task: Docker@2 inputs: containerRegistry: $(acrName) repository: $(imageName) command: buildAndPush Dockerfile: Dockerfile tags: $(Build.BuildId) arguments: --cache-from $(acrName).azurecr.io/$(imageName):latest # ── SAST: Semgrep security scan with fail threshold ── - stage: SAST dependsOn: Build jobs: - job: SemgrepScan steps: - script: | pip install semgrep semgrep --config=p/owasp-top-ten --config=p/typescript \ --error --severity ERROR \ --json --output semgrep-results.json \ src/ displayName: Semgrep OWASP Scan - task: PublishBuildArtifacts@1 inputs: { PathtoPublish: semgrep-results.json, ArtifactName: sast-report } condition: always() # ── Deploy: Canary rollout with health check ── - stage: CanaryDeploy dependsOn: SAST jobs: - deployment: Canary environment: production strategy: canary: increments: [5, 25, 50, 100] deploy: steps: - script: | kubectl set image deployment/casino-app \ casino-app=$(acrName).azurecr.io/$(imageName):$(Build.BuildId) kubectl rollout status deployment/casino-app --timeout=300s routeTraffic: steps: - script: | kubectl annotate ingress casino-app \ nginx.ingress.kubernetes.io/canary-weight="$(strategy.increment)" --overwrite postRouteTraffic: steps: - script: | # Health check: 15-min error-rate window BASELINE_ERR=$(curl -s 'http://prometheus:9090/api/v1/query?query=rate(http_errors[15m])' | jq '.data.result[0].value[1]') CANARY_ERR=$(curl -s 'http://prometheus:9090/api/v1/query?query=rate(http_errors{canary="true"}[15m])' | jq '.data.result[0].value[1]') if (( $(echo "$CANARY_ERR > $BASELINE_ERR * 2" | bc -l) )); then echo "##vso[task.logissue type=error]Canary error rate 2x baseline — rolling back" exit 1 fi on: failure: steps: - script: kubectl rollout undo deployment/casino-app displayName: Rollback canary
Databricks ML Pipeline (Python + MLflow)
import mlflow import mlflow.sklearn from pyspark.sql import SparkSession from sklearn.ensemble import GradientBoostingClassifier from sklearn.metrics import precision_score, recall_score, f1_score from mlflow.tracking import MlflowClient spark = SparkSession.builder.appName("wab-credit-risk").getOrCreate() client = MlflowClient() # ── 1. Load data from Delta Lake feature store ── df = spark.read.format("delta").table("wab_catalog.gold.credit_risk_features") feature_cols = ["transaction_amount", "acct_days_since_open", "avg_daily_balance", "overdraft_count_30d", "credit_utilization_pct"] pdf = df.select(feature_cols + ["is_default"]).toPandas() X = pdf[feature_cols] y = pdf["is_default"] # ── 2. Train with MLflow tracking ── mlflow.set_experiment("/wab/credit-risk-scorecard") with mlflow.start_run(run_name="gbm-v2.4") as run: model = GradientBoostingClassifier( n_estimators=500, max_depth=6, learning_rate=0.05, subsample=0.8, min_samples_leaf=20 ) model.fit(X, y) preds = model.predict(X) # Log metrics mlflow.log_metric("precision", precision_score(y, preds)) mlflow.log_metric("recall", recall_score(y, preds)) mlflow.log_metric("f1", f1_score(y, preds)) mlflow.log_param("n_estimators", 500) mlflow.sklearn.log_model(model, "credit-risk-model") # ── 3. Model registry: promote staging → production ── model_name = "wab-credit-risk-scorecard" model_uri = f"runs:/{run.info.run_id}/credit-risk-model" mv = mlflow.register_model(model_uri, model_name) # Compare against current production prod_versions = client.get_latest_versions(model_name, stages=["Production"]) if prod_versions: prod_run = client.get_run(prod_versions[0].run_id) prod_f1 = prod_run.data.metrics["f1"] new_f1 = f1_score(y, preds) if new_f1 > prod_f1: client.transition_model_version_stage( model_name, mv.version, stage="Production", archive_existing_versions=True ) print(f"Promoted v{mv.version}: F1 {new_f1:.4f} > {prod_f1:.4f}") else: client.transition_model_version_stage( model_name, mv.version, stage="Staging" ) print(f"Kept in staging: F1 {new_f1:.4f} ≤ {prod_f1:.4f}") else: client.transition_model_version_stage( model_name, mv.version, stage="Production" ) print(f"First production model: v{mv.version}")
SCIM Provisioning Handler (Python)
# SCIM 2.0 User provisioning endpoint for Entra ID → internal systems from flask import Flask, request, jsonify import uuid from datetime import datetime app = Flask(__name__) SCIM_SCHEMA = "urn:ietf:params:scim:schemas:core:2.0:User" def map_scim_to_internal(scim_user: dict) -> dict: """Map SCIM user attributes to internal schema.""" name = scim_user.get("name", {}) emails = scim_user.get("emails", [{}]) primary_email = next( (e["value"] for e in emails if e.get("primary")), emails[0].get("value", "") if emails else "" ) return { "external_id": scim_user.get("externalId"), "username": scim_user.get("userName"), "email": primary_email, "first_name": name.get("givenName", ""), "last_name": name.get("familyName", ""), "active": scim_user.get("active", True), } @app.route("/scim/v2/Users", methods=["POST"]) def create_user(): scim_user = request.get_json() # Validate SCIM schema schemas = scim_user.get("schemas", []) if SCIM_SCHEMA not in schemas: return jsonify({ "schemas": ["urn:ietf:params:scim:api:messages:2.0:Error"], "detail": "Missing required schema", "status": 400 }), 400 internal = map_scim_to_internal(scim_user) # Conflict detection: check if user already exists existing = db.users.find_one({"email": internal["email"]}) if existing: return jsonify({ "schemas": ["urn:ietf:params:scim:api:messages:2.0:Error"], "detail": f"User {internal['email']} already exists", "status": 409 }), 409 # Provision in downstream systems user_id = str(uuid.uuid4()) internal["id"] = user_id internal["created_at"] = datetime.utcnow().isoformat() + "Z" db.users.insert_one(internal) # SCIM-compliant response return jsonify({ "schemas": [SCIM_SCHEMA], "id": user_id, "externalId": internal["external_id"], "userName": internal["username"], "name": {"givenName": internal["first_name"], "familyName": internal["last_name"]}, "emails": [{"value": internal["email"], "primary": True}], "active": internal["active"], "meta": { "resourceType": "User", "created": internal["created_at"], "location": f"/scim/v2/Users/{user_id}" } }), 201
Innovation Spotlight
Forward-looking capabilities that move CI/CD from “it works” to “it learns.”