A garden of working notes.
Short, atomic notes on analytics engineering, dbt, BigQuery, marketing data, and AI agents. Topic guides stitch them into starting points — pick one and follow the threads. Filter by domain or topic, or just browse.
GA4 event_params Type Detection
How GA4 auto-detects parameter types across string_value, int_value, and double_value fields — and the defensive COALESCE pattern when the type isn't guaranteed.
LinkedIn Ads OAuth Token Management
LinkedIn's OAuth token expiration model for the Marketing API — 60-day access tokens, 365-day refresh tokens, forced annual re-authentication, and operational strategies for custom pipelines.
RSS Feed Deduplication in n8n
How to prevent duplicate Notion pages when polling RSS feeds in n8n, using a Merge node configured as a left anti-join.
Workload Identity Federation for CI/CD
Replace service account keys in GitHub Actions and other CI systems with keyless OIDC authentication — no credentials to store, rotate, or leak.
Elementary setup troubleshooting
Fixes for the most common Elementary installation failures: empty reports, missing edr command, BigQuery location errors, tables materialized as views, and Databricks permission issues.
Contract-First Development in dbt
Defining the contract before writing the SQL — the API design analogy, the workflow, and how ODCS + Data Contract CLI can generate dbt model YAML.
MCP SDK Selection for Data Engineering
Choosing between the Python and TypeScript MCP SDKs — installation, capabilities, and which one fits your data engineering team.
Salesforce to BigQuery Pipeline
Hub note for the Salesforce-to-BigQuery pipeline — from ingestion tool selection through polymorphic resolution, stage tracking, account hierarchies, and activity timelines.
Build vs. Buy Data Pipelines
A reading path through the shifting economics of managed vs. custom data pipelines — from Fivetran's pricing changes through AI-assisted development with dlt to the hybrid strategy
Lead Scoring in the Warehouse
Hub note for warehouse-native lead scoring — from rule-based weighted models in dbt to BigQuery ML classification, feature engineering, and reverse ETL back to the CRM.
Signals That Your Cron-Based dbt Setup Has Outgrown Itself
Five concrete indicators that a simple cron-scheduled dbt job has hit its limits — and what each one tells you about the orchestration capability you actually need.
Pipeline Enforcement Layer Strategy
The four-layer model for data contract enforcement across the full pipeline — pre-warehouse, post-load, transformation, and continuous observability — with practical adoption ordering.
dbt MCP Server: Local vs Remote
The two deployment modes for dbt's MCP server — local gives full CLI access and works without dbt Cloud, remote is read-only metadata and requires a Cloud plan.
Unit Testing CASE WHEN Boundary Logic in dbt
Systematic boundary testing for CASE WHEN statements — testing threshold values, just-under values, null handling, and implicit ELSE behavior.
dbt deps and the Package Lock File
How dbt resolves and installs packages — the difference between packages.yml and dependencies.yml, how the lock file works, and the flags worth knowing.
Google Workspace CLI (gws)
The gws CLI gives programmatic access to every Google Workspace API through a single binary — Gmail, Drive, Calendar, Sheets, Docs — filling the gap gcloud has never covered.
BigQuery Materialized Views
How BigQuery materialized views precompute aggregations, refresh incrementally, and transparently rewrite queries for automatic optimization.
OpenClaw Ecosystem and Community
The community and ecosystem around OpenClaw — ClawHub, ClawData, the viral growth story, the naming history, and what the ecosystem state means for adoption decisions.
dbt Package CI/CD
How to set up CI/CD for dbt packages — matrix testing across warehouses and dbt versions with GitHub Actions, credential management, and the integration test workflow.
Custom dbt Materializations
Hub note for custom dbt materializations — anatomy, decision framework, zero-downtime swap, secured table, and debugging patterns.
Meta Ads Actions Array in BigQuery
How to flatten Meta's nested actions JSON array in BigQuery — unnesting patterns, configurable action type pivots, dbt integration, and the action_values companion field.
GA4 dbt Package Ecosystem
An overview of the major open-source dbt packages for GA4 BigQuery exports — what they optimize for, what they miss, and when to build custom.
Claude Code ROI for Analytics Engineers
Realistic time-to-value for Claude Code in a dbt workflow — what setup actually costs, when consistent savings emerge, and the qualitative benefit of tasks that finally get done.
AI Tool Tiers for Data Engineering
The four capability tiers of AI tools for data engineering — autonomous agents, copilots, chat assistants, and platform-embedded AI — and why context determines which tier delivers value
Alternatives to Default dbt Docs
When to move beyond the default dbt docs frontend — Dagster's Next.js replacement, dbterd for ERDs, data catalogs, and dbt Cloud Catalog
dbt Testing Decision Framework
A three-question framework and decision tree for choosing the right dbt testing approach — unit tests, generic tests, singular tests, dbt-expectations, Elementary, or dbt-audit-helper.
BigQuery MCP Server Setup
A reading path through connecting BigQuery to AI assistants via MCP — comparing the two official options, authentication, custom queries, and cost control.
GCP IAM Least Privilege for Data Teams
A sequenced guide to auditing and fixing IAM debt on GCP data platforms — from surfacing over-permissioned principals to implementing policy tags and row-level security.
Custom MCP Servers for Data Engineering
A reading path through building custom MCP servers — from decision criteria and SDK selection through tool design, testing, and practical server patterns for data catalogs, pipelines, and quality.
AI Developer Skill Atrophy
How AI coding tools affect developer comprehension — Anthropic's RCT, the delegation vs. inquiry distinction, and why how you use AI matters as much as which tools you pick
Dagster-dbt Asset Mapping
How dagster-dbt reads your manifest.json to create one Dagster asset per dbt model, with automatic lineage from ref() calls, and how to customize the mapping with DagsterDbtTranslator.
The full_refresh: false Guard in dbt
When and why to set full_refresh: false on dbt incremental models — preventing accidental multi-hour rebuilds while keeping intentional full refreshes possible.
dbt Base Layer Patterns
What belongs in dbt base models — renaming, casting, deduplication, unnesting — and the one exception to the no-joins rule.
dbt-audit-helper Progressive Validation
The broad-to-narrow validation workflow for dbt-audit-helper — start with schema checks, escalate to row-level diffs only when needed.
Event-Grain Sessionization
Why enriching events with session context beats building session-grain tables, and how the pattern enables flexible downstream analysis.
MCP Tool Design Patterns
How to design MCP tools that work well with AI — docstrings as descriptions, Pydantic models for structured output, and input validation with schemas.
MCP Ecosystem Overview
A reading map for the MCP ecosystem — from protocol fundamentals through official servers, clients, data engineering integrations, and building custom servers.
GA4 BigQuery Timezone Handling
Three different timezone contexts coexist in GA4 BigQuery exports — event_timestamp, event_date, and _TABLE_SUFFIX each use different references that silently break date-range queries.
dbt-utils Generic Tests
Full reference for dbt-utils generic tests: YAML syntax, the Fusion arguments: key change, group_by_columns support, and when to use each test.
Layered SQL Review Pipeline for dbt
A four-layer architecture for SQL review in dbt projects — IDE feedback, pre-commit hooks, PR-level AI review, and CI testing — each catching a different class of error
dbt-to-Dataform Migration Hub
Hub note for migrating from dbt to Dataform — the decision, the concept mapping, the procedural steps, and what you'll lose. For BigQuery teams evaluating the switch.
Hosting dbt Docs Beyond Localhost
Deployment options for dbt docs by complexity — GitHub Pages, Netlify, GCS with IAP, S3 with CloudFront, and Docker with Nginx
Unit Testing Snapshot Consumers in dbt
Three strategies for testing snapshot-related logic — pre-snapshot base models, SCD2 date range calculations in downstream models, and change detection hashing.
dlt Incremental Loading
How dlt tracks state between pipeline runs using cursor-based incremental loading — the dlt.sources.incremental() helper, declarative REST API config, and why state lives in the destination.
YAML Formatting Options for dbt Descriptions
The four ways to write descriptions in dbt YAML — inline strings, folded scalars, literal scalars, and doc blocks — and when to use each one
Writing Reusable dbt Macros
A map through the garden notes on designing, naming, documenting, testing, and evolving dbt macros — from when to extract to how to handle breaking changes.
BigLake Metastore and Catalog Strategy
Why catalog infrastructure matters more than format choice on GCP, and how BigLake Metastore and Dataplex Universal Catalog provide unified governance across engines and formats.
Triangulated Marketing Measurement
Why resilient marketing measurement combines three approaches -- multi-touch attribution for daily optimization, media mix modeling for strategic allocation, and incrementality testing for causal validation
dbt MCP Server Setup and Configuration
Step-by-step installation and configuration of the dbt MCP server — uv, environment variables, feature toggles, and client setup for Claude Code and Claude Desktop.
dbt-utils Hub
Navigation hub for dbt-utils v1.3 — the full scope of the package, what moved to dbt-core, and pointers to each section of the reference.
GTM Server-Side on Cloud Run: Deployment and Configuration
How to deploy GTM Server-Side on Google Cloud Run — automatic vs manual provisioning, production configuration settings, custom domain setup, and multi-region architecture for global traffic.
BigQuery HyperLogLog Sketches
How HyperLogLog++ sketches in BigQuery enable composable, approximate distinct counts at a fraction of the cost of exact counting.
Custom Parameterized MCP Queries
Using the MCP Toolbox's tools.yaml to define constrained, parameterized queries that give AI assistants structured access to data without arbitrary SQL.
GA4 BigQuery Query Patterns
Efficient querying of GA4 date-sharded tables — _TABLE_SUFFIX filtering, inline vs FROM clause UNNEST, reusable dbt macros, and cost control practices.
dbt Materialization Anatomy
The six-step structure every dbt materialization follows — setup, pre-hooks, main SQL, post-hooks, cleanup, and return — plus the key objects and adapter methods.
BigQuery Clustering Mechanics
How BigQuery clustering sorts data within storage blocks, why column order matters critically, and how automatic re-clustering works at no cost.
dbt Production Safety Hooks
Using Claude Code PreToolUse hooks to block dangerous dbt commands before they execute — full-refresh on production, unscoped builds, and other high-risk operations
Meta Ads Insights API Structure
How the Meta Marketing API is organized — the five-level object hierarchy, Insights API as a reporting edge, versioning cadence, authentication models, and rate limit system.
Google Ads BigQuery Data Transfer Service
Hub note for the free Google Ads → BigQuery pipeline — setup, schema quirks, known data gaps, and dbt modeling patterns.
Custom Sessionization Patterns
How to build custom session definitions from raw events using LAG and running sums, with configurable timeouts, campaign-based splits, and session metrics.
Business Cost of Poor Data Quality
The measurable financial and operational impact of data quality failures — industry statistics, high-profile incidents, and why prevention costs a fraction of remediation.
MCP Protocol Architecture
What the Model Context Protocol is, how clients and servers communicate, and why it matters for connecting AI tools to your data infrastructure.
dlt Authentication Patterns
The authentication strategies dlt provides for API pipelines — bearer tokens, API keys, OAuth2 client credentials — and how to extend them for non-standard flows.
dbt profiles.yml with env_var for Multi-Client GCP
Using env_var() interpolation in profiles.yml so dbt reads GCP credentials and project from environment variables — enabling seamless client switching via direnv.
Prompting Claude Code for dbt
What separates dbt prompts that work from ones that produce generic output — specificity, codebase references, constraint encoding, and the session-less memory problem.
What dbt docs generate actually produces
The static site artifacts that dbt docs generate creates — manifest.json, catalog.json, index.html — and the flags that control how they are built
MCP Apps Protocol Internals
How MCP Apps extend the Model Context Protocol to render interactive HTML interfaces inside AI clients — the ui:// resource mechanism, iframe sandboxing, and bidirectional JSON-RPC communication.
Elementary Slack and Teams integration
How to connect Elementary alerts to Slack (token-based and webhook) and Microsoft Teams, including the tradeoffs between integration methods.
GA4 Events Sessionized Model
The implementation of the wide event-grain intermediate model for GA4 — the CTE structure, window function patterns, and design decisions that make downstream analysis flexible.
GitHub Actions for dbt Scheduling
Using GitHub Actions scheduled workflows as a zero-infrastructure dbt runner — what it covers well, where it falls short, and when to use it over Cloud Run.
Claude Code Skills Activation
How Claude Code skills work under the hood — keyword matching against YAML frontmatter, the ~20% auto-activation rate, and why skills fit background domain knowledge better than repeatable workflows
dbt Documentation Audience Mismatch
Why most dbt documentation goes unread — the fundamental mismatch between who writes docs (engineers) and who needs them (business users, analysts, and increasingly AI tools)
GTM Server-Side Hosting: Decision Framework
How to choose between Cloud Run, AWS ECS Fargate, Azure App Service, and managed providers for hosting your GTM Server-Side container in production.
BigLake Performance Characteristics
How BigLake external and Iceberg tables perform relative to native BigQuery tables, the role of metadata caching, and where the remaining gaps matter.
LinkedIn Marketing API Access
How to get approved for LinkedIn's Marketing API — the developer app setup, super admin verification, manual review process, rejection handling, and what to include in your application.
Reverse ETL Patterns for CRM Activation
How to push warehouse-computed scores and attributes back into Salesforce or HubSpot using reverse ETL tools — sync architecture, field mapping, sync frequency, and downstream automation.
Data Observability Total Cost of Ownership
The true cost comparison between OSS and managed data observability — accounting for engineering time, warehouse compute, training, and the costs that don't appear on invoices.
Identity Resolution Monitoring
Key metrics and anomaly detection SQL for monitoring a GA4 identity stitching pipeline — stitch rate, consolidation rate, shared device exposure, and week-over-week change alerts.
Late-Arriving Data in dbt — Hub
Hub note connecting all concepts around handling late-arriving data in dbt incremental models: measurement, lookback windows, partition strategies, deduplication, testing, and operational safety.
AI Tooling Cost for Solo Consultants
What a four-layer AI stack actually costs per month for an independent analytics engineering consultant — tool-by-tool breakdown, ROI assessment, and cost visibility gaps
HubSpot Deal Stage Modeling
Why deal stage transitions live in DEAL_STAGE not DEAL_PROPERTY_HISTORY, how to use the is_closed and label columns correctly, and patterns for time-in-stage and pipeline analysis.
dbt Project Structure: Guide Hub
A hub connecting all notes on structuring a dbt project — layers, naming, materialization, YAML, modern features, and marketing analytics patterns.
Lightdash Joins and Fanout Protection
How to define joins between dbt models in Lightdash YAML, why the relationship property matters for metric accuracy, and how Lightdash warns about fanout risk in one-to-many joins.
Metric Naming Conventions in dbt
How to name MetricFlow metrics so they stay discoverable and consistent as your project scales — patterns by metric type, grouping families, and the name vs label distinction
Dagster Fundamentals Hub
Hub note connecting all Dagster core concept notes — the asset-centric model, SDAs, resources, components, UI, pricing, GCP deployment, learning curve, and the dbt integration.
LLM Accuracy With Semantic Layers
Research benchmarks showing how semantic layers improve LLM accuracy on enterprise data questions from ~17% to 54-92% — the data.world study, Spider 2.0, and dbt Labs replication.
Salesforce Person Accounts and Multi-Currency in the Warehouse
Two Salesforce data model quirks that break standard warehouse patterns — Person Accounts that merge Account and Contact, and multi-currency orgs that require exchange rate conversion in dbt.
Elementary for dbt: setup guide
A sequenced map of notes covering Elementary installation from scratch -- dbt package, materialization override, CLI profile configuration, and troubleshooting.
Cloud Workflows Orchestration
GCP Cloud Workflows as a middle-ground orchestration layer between Cloud Scheduler and Cloud Composer — serverless, cheap, and capable enough for multi-step pipelines.
Dagster Resources
How Dagster resources work as centrally configured, injectable external connections — BigQueryResource, DbtCliResource, and the pattern for swapping environments without changing asset code.
BigQuery Column-Level Security with Policy Tags
Replace view-based column hiding with Data Catalog policy tags — storage-layer security that survives schema changes and doesn't require view maintenance.
dbt Documentation CI Enforcement
Tools and patterns for enforcing dbt documentation completeness in CI — dbt-coverage, dbt-checkpoint, dbt-score, and dbt-bouncer
dbt-utils Web Macros for URL Parsing
dbt-utils URL extraction macros for marketing analytics: get_url_host, get_url_path, and get_url_parameter. What they do, where they're useful, and what they don't handle.
MCP Discovery Resources
Where to find MCP servers — the official registry, community directories, and how to evaluate what you find before installing.
GA4 Sessionization Hub
Hub note connecting all concepts involved in building session tables from GA4 BigQuery event data.
dbt Mesh Governance Triad
How contracts, access controls, and model versioning combine in dbt Mesh to turn models into data products — and which models actually deserve that treatment.
Data quality KPIs from Elementary
Five data quality KPIs you can build from Elementary's warehouse tables, how to interpret them, and how they map to standard data quality dimensions.
OpenClaw vs Claude Code vs Cursor for Data Work
A clear-eyed comparison of three AI tools data people actually use — what each is for, where each falls short, and why the best practitioners run all three as a layered stack.
IAM Drift Monitoring for GCP
Catch IAM debt before it accumulates — IAM Recommender, INFORMATION_SCHEMA job monitoring, and audit log queries to detect permission drift quarterly.
The Freelance Admin Overhead Problem
Why solo consultants spend 20-30% of their time on non-billable admin, why the standard fixes don't work, and what makes a single agent different from another SaaS subscription.
Consent Mode v2 Hub
Hub note connecting all concepts involved in implementing, debugging, and maintaining Google Consent Mode v2 across web and server-side GTM containers.
dbt Macro Deprecation Pattern
How to change macro behavior without breaking callers — the staged deprecation pattern using exceptions.warn() that dbt-utils demonstrates.
dbt Dispatch Configuration
How to configure dbt's dispatch search order in dbt_project.yml — overriding package macros, adding Databricks support via spark_utils, and namespace resolution.
Metrics as Code
The practice of defining business metrics in version-controlled YAML — reviewed in pull requests, tested in CI/CD, and consumed by BI tools and AI agents
dbt Core vs Cloud Decision Framework
A structured comparison of dbt Core and dbt Cloud across deployment, interface, features, pricing, and team profile -- with decision heuristics for choosing between them.
dbt Docs Markdown Capabilities
What Markdown works in dbt docs and what does not — supported syntax, YAML scalar styles, image embedding, cross-referencing models, and known limitations
CLOUDSDK_CONFIG for Per-Project gcloud Isolation
How CLOUDSDK_CONFIG isolates all gcloud state per project — credentials, ADC files, active config — and why it's the missing piece for multi-client GCP work.
GA4 Identity Stitching Techniques
The four SQL patterns for resolving GA4 anonymous-to-known user identity — last-touch, first-touch, full backstitch, and session-scoped — with a decision framework for choosing between them.
dbt Test Failure Severity Framework
A four-tier framework for prioritizing dbt test failures by impact — combining test type, model layer, downstream dependents, and historical context into an actionable severity ranking.
Lightdash in Production: Kubernetes Deployment
Moving Lightdash from Docker Compose to Kubernetes with the community Helm chart — production checklist, external dependencies, authentication options, and upgrade strategy.
MCP Resources and Prompts
Beyond tools — using MCP resources for read-only data exposure, prompts for reusable templates, and the Context object for progress reporting in long-running operations.
GA4 Consent Mode Orphaned Events
How Consent Mode creates rows in GA4 BigQuery exports with null user_pseudo_id and session identifiers — what they are, how they affect counts, and same-page backstitching behavior.
GA4 Engagement Event Query Recipes
Production-ready BigQuery SQL for GA4 engagement events — page views, scroll depth, outbound clicks, file downloads, and video engagement funnels.
dbt MCP Server Safety Considerations
The risks of giving an AI assistant dbt CLI access — production data modification, credential scope, Copilot credit consumption, and practical mitigations.
BI Tool Self-Service Models
Three different approaches to self-service BI: governed exploration (Lightdash), visual query builder (Metabase), and LookML-powered Explore (Looker). How to match the model to your users.
Data Contract Ownership Models
Producer-defined vs consumer-defined data contracts — why who writes the contract determines whether the initiative succeeds.
Browser Cookie Restrictions in 2026
How Safari ITP, Firefox Total Cookie Protection, and Chrome handle tracking cookies differently in 2026 — and why the combined effect means client-side tracking misses 20-40% of visitors.
GA4 E-commerce Schema in BigQuery
The ecommerce RECORD and items REPEATED RECORD in GA4's BigQuery export — field reference, nested item_params, and query patterns for purchase analysis.
GA4 Traffic Source Fields
The four traffic source locations in GA4 BigQuery exports — their scopes, use cases, and the July 2024 cutoff that changed session attribution.
dbt Test Alert Routing and Ownership
How to route dbt test failures to the right people, configure tiered alert severity, and apply the Broken Window principle to test suite health.
dbt Unit Tests BigQuery Workarounds
BigQuery-specific gotchas for dbt unit tests — STRUCT completeness, ARRAY comparisons, column_transformations, slot costs, and common error solutions.
HubSpot to BigQuery Pipeline Hub
All the moving parts for a HubSpot-to-BigQuery pipeline with dbt: associations, lifecycle stages, deal stages, property history, ingestion tools, and the dbt_hubspot package.
dbt Scheduling Without an Orchestrator
How to run dbt in production without Airflow, Dagster, or Prefect — the practical options from $0/month GitHub Actions to Cloud Run Jobs, when each fits, and when to move on.
Window Function Patterns for Analytics SQL
Practical window function patterns for analytics SQL — ROW_NUMBER, LEAD/LAG, running totals, session detection, and deduplication
Headless BI Pattern
The architectural pattern of decoupling the semantic layer from visualization — exposing metrics via APIs so any frontend, AI agent, or application can consume governed data
dlt Environment Setup
Setting up a dlt project from scratch — Python virtual environment, installation, dlt init, and the project scaffold it creates.
Cloud Composer Cost and Capabilities
Cloud Composer 3's pricing model, committed use discounts, and the specific scenarios where its orchestration capabilities justify the $300-400/month minimum.
Deploying dbt Core on Cloud Functions
A step-by-step guide to deploying dbt Core on Google Cloud Functions — repository structure, service account setup, deployment, and scheduling with Cloud Scheduler.
dbt documentation drift detection
Techniques for detecting when dbt documentation falls out of sync with reality — column-level drift, git-based staleness signals, and schema drift for sources
Warehouse Attribution Data Sources
The three categories of data required for warehouse-based attribution -- website interactions, ad platform spend, and conversions -- with platform-specific loading patterns and common data quality traps
Claude Code Bang Prefix for Shell Commands
Using the ! prefix to run shell commands directly inside Claude Code — how it saves tokens, speeds up authentication, and keeps your flow uninterrupted
Claude Code Model Selection for Analytics Work
When to use Sonnet vs Opus in Claude Code for analytics engineering — daily work defaults, complex problem escalation, and practical cost-speed tradeoffs
dbt documentation coverage tracking
Measuring and trending dbt documentation coverage over time with dbt-coverage, dbt-score, and dbt Cloud — moving beyond pass/fail CI checks to spot erosion early
Looker Studio: Extract vs. Live Connection
When to use Looker Studio's extract mode versus live BigQuery connections, the 100 MB limit that catches teams off guard, and how to combine both in the same report.
dbt-audit-helper Hub
Hub note for dbt-audit-helper — the progressive validation workflow, macro reference, CI/CD integration, and related comparison topics.
Data team on-call strategies
How data teams structure on-call rotations, triage processes, and runbooks differently from software engineering on-call, and which metrics reveal whether the system is working.
dbt Intermediate Layer Patterns
What belongs in dbt intermediate models — joins, business logic, window functions — and the critical rule of never reducing grain.
OpenClaw Skills for Monitoring
How to write OpenClaw skill files for data pipeline monitoring — structuring SKILL.md instructions, categorizing failure types, formatting output for Slack, and adding context that makes alerts actionable.
Dataform-to-dbt Concept Mapping
A reference mapping of Dataform concepts to their dbt equivalents — refs, configs, sources, materializations, testing, and directory structure.
Migrating Incremental Models to Microbatch
How to convert traditional dbt incremental models to the microbatch strategy — step-by-step migration, side-by-side code examples, and first-run considerations.
MCP Data Engineering Servers
The MCP servers that actually matter for data engineering work — Snowflake, BigQuery, ClickHouse, centralmind/gateway, MindsDB, and Confluent.
dbt-project-evaluator for documentation enforcement
How dbt-project-evaluator and dbt_meta_testing enforce documentation completeness in CI — materializing coverage as models and setting folder-level requirements
dbt-expectations BigQuery Implementation Patterns
Real-world dbt-expectations implementation on BigQuery — complete GA4 and advertising data quality YAML, test placement by DAG layer, and a practical starting checklist.
GA4 user_id Data Quality
Common implementation bugs that corrupt GA4 user_id data — string 'null' values, logout tagging errors, suspicious high-cardinality IDs — and the SQL patterns to detect and filter them.
EL Tool Schema Contract Modes
How dlt, Fivetran, and Airbyte handle schema changes during extraction and loading — from dlt's granular freeze/evolve/discard modes to Fivetran's blunt blocking settings.
dbt Package Integration Testing
The integration_tests sub-project pattern for testing dbt packages — using seeds as mock data, comparing outputs to expected results, and running the full suite.
dbt Macro Naming Conventions
Naming patterns for dbt macros that make them discoverable, communicative, and well-organized — verb prefixes, descriptive names, internal helper conventions, and the one-macro-per-file rule.
MetricFlow Metric Types
The five metric types in dbt MetricFlow — simple, cumulative, derived, ratio, and conversion — with syntax, use cases, and gotchas for each
Unit Testing Window Functions in dbt
How to design test data that validates window function partitioning, ordering, and framing — with patterns for ROW_NUMBER, FIRST_VALUE, cumulative sums, and deliberate out-of-order inputs.
GCP Authentication for Multi-Client Consulting Work (Hub)
Hub note for GCP credential isolation across multiple client projects — the problem, the four-variable solution, tool-specific agent constraints, and the service account vs impersonation tradeoff.
BigQuery Pricing Policy Changes 2024–2025
Three BigQuery policy changes that affect cost modeling in 2024–2025: the flat-rate deprecation, the 200 TiB daily on-demand quota, and new Cloud Storage fees for external tables.
Why a dbt Documentation Style Guide Matters More Than Effort
The case for writing a documentation style guide for your dbt project — why inconsistency is the root problem, not effort, and how style guides serve both humans and AI tools
BigQuery Architecture for Analytics Engineers
How BigQuery works under the hood — columnar storage, slots, the separation of compute and storage — and why it matters for your queries and costs.
The Rule of Three for dbt Macros
Why you should wait until the third occurrence of a pattern before extracting a dbt macro — and what goes wrong when you don't.
Semantic Validation in dbt
How to encode business rules as dbt tests — regex pattern validation, cross-column logic, natural language AI validation, and when each approach fits.
OpenClaw Architecture and Design Principles
How OpenClaw is built — the Gateway daemon, model-agnostic BYOK design, HEARTBEAT.md proactive loop, and plain-text-first philosophy that makes it feel natural to data people.
Cascading Agent Pattern
The architecture where an always-on monitoring agent detects issues and triggers a coding agent to investigate and fix them — how OpenClaw and Claude Code hand off work
BigQuery Editions Testing Without Commitment
How to evaluate BigQuery Editions on real workloads before committing — creating a test reservation, rolling back instantly, opting out of org-level reservations, and using the Slot Estimator.
Time-Decay Attribution Model
Time-decay attribution using exponential decay with a configurable half-life — the formula, choosing half-life by industry, BigQuery SQL implementation, and parameterization
Claude Code Authentication Options
The two ways to authenticate Claude Code — subscription OAuth and API keys — when to use each, and the precedence rule that trips people up
GA4 Sharded-to-Partitioned Base Model
How to convert GA4's date-sharded BigQuery export into a properly partitioned incremental dbt model, and why the static lookback pattern is critical for correctness.
How Lightdash Connects to Your dbt Project
The three mechanisms for connecting Lightdash to a dbt project — Git repository integration, CLI deployment, and CI/CD automation — and how Lightdash generates a BI layer from dbt YAML.
dbt Model Description Style Guide
Hub note for the dbt documentation style guide — why consistency beats effort, what to put in model and column descriptions, YAML formatting options, doc blocks, CI enforcement, and rollout strategy
dbt Test Output Parsing for Automated Monitoring
How to extract structured, actionable information from dbt test output — distinguishing failure types, capturing sample rows, and handling partial runs so automated monitoring doesn't miss anything.
Asset-Centric Orchestration
The paradigm shift from task-based orchestration (what to run) to asset-based orchestration (what data should exist) — why it matters for analytics engineers and how it changes debugging, monitoring, and pipeline design.
BigQuery CLI Capabilities Beyond MCP
What the bq command-line tool can do that BigQuery MCP servers cannot — data loading, exports, table management, and the full feature gap with examples.
EU Cookie Consent Legal Framework
The two overlapping EU legal frameworks governing cookie consent — ePrivacy Directive and GDPR — what valid consent actually requires, which cookies are exempt, and where enforcement stands in 2026.
BigQuery Partitioning and Clustering
A structured reading path for understanding BigQuery partitioning and clustering -- mechanics, decision framework, configuration patterns, and anti-patterns.
dlt REST API Source Configuration
How to configure dlt's declarative REST API Source — the client block, resources block, endpoint paths, pagination wiring, and what dlt does automatically with the data.
OpenClaw for Data People — Hub
A reading map for the OpenClaw introductory guide — architecture and design principles, tool comparison, security risks, persistent memory, and the ecosystem around OpenClaw.
BigQuery Data Lake Common Mistakes
Three anti-patterns that cause the most problems in BigQuery data lake implementations: missing metadata caching, skipped partition filters, and over-engineered architectures.
Zero-Downtime Table Materialization in dbt
A custom dbt materialization that builds to a temp name, validates row counts, then swaps via rename — keeping the old table queryable until the new one is confirmed ready.
Ad Platform API Landscape
API characteristics, authentication models, and engineering gotchas for Google Ads, Meta, LinkedIn, Microsoft, TikTok, Pinterest, and Twitter ad platforms
dbt Doc Block Syntax and Reuse Patterns
How dbt doc blocks work — syntax, naming rules, cross-package references, and patterns for writing column and model descriptions once and reusing them across your project
Dagster Components
Dagster's newest major abstraction — YAML-configured objects that generate assets, checks, and schedules with minimal Python, lowering the barrier for SQL-first analytics engineers.
Attribution Channel Grouping Strategy
How to group marketing channels for data-driven attribution -- balancing granularity against data sparsity to produce stable, actionable model results
dbt Model Contract Mechanics
How dbt's native model contracts work — the preflight check, DDL generation, fail-fast behavior, configuration options, and what contracts do and don't validate.
Hybrid ELT Strategy
When to buy managed ELT, when to build with dlt + AI, and the practical migration path — a decision framework for splitting your pipeline portfolio strategically
MCP Apps for Data Engineers
A reading path through MCP Apps — the January 2026 extension to MCP that renders interactive HTML visualizations directly inside AI client conversations.
Google Ads to BigQuery: Loading Approaches
Four ways to load Google Ads data into BigQuery — a map through the decision landscape.
Security Posture for AI Agents
How to scope permissions, isolate environments, and treat always-on AI agents like OpenClaw as untrusted actors — practical security practices for data teams
MCP Ecosystem Governance
How MCP became a vendor-neutral open standard — the Linux Foundation donation, corporate adoption, and what broad industry support means in practice.
OpenClaw Security Risks — Hub
A reading map for the OpenClaw security risks guide — documented incidents, CVEs, regulatory warnings, supply chain attacks, context window safety failures, and what data teams specifically need to know.
GA4 dbt Project Configuration
The dbt_project.yml setup for a GA4 project — variable-driven configuration, folder-level materializations, and the project variables that make the template reusable.
Feature Engineering for ML in dbt
How to structure dbt intermediate models as ML feature tables — including time-windowed aggregations, domain-separated feature sets, and joining them into a labeled training dataset.
BigQuery Slot Usage Monitoring
How to monitor BigQuery slot usage with INFORMATION_SCHEMA, the Slot Estimator, and Cloud Monitoring -- practical queries and tools for capacity planning.
Google Ads Server-Side: Conversion Linker and Enhanced Conversions
How to configure Google Ads conversion tracking server-side — the Conversion Linker tag that manages the FPGCLAW cookie, Enhanced Conversions for hashed user data, and realistic uplift expectations.
Templating Language and Team Skills
How a team's existing skill mix — SQL practitioner, Python engineer, JavaScript developer — should shape the choice between Jinja and JavaScript templating in analytics engineering.
dbt Testing Taxonomy
A taxonomy of dbt test types — generic tests, singular tests, unit tests, contract tests, and data quality packages like dbt_expectations
AI Query Cost Control for BigQuery MCP
Managing the cost and safety risks of AI assistants running BigQuery queries through MCP — cost mitigation, write protection, and practical guardrails.
GA4 Parameter Extraction Macro
A reusable dbt macro for extracting GA4 event parameters without row multiplication, including the numeric variant for int/float/double fields.
OpenClaw Pipeline Monitoring
A reading path through the OpenClaw pipeline monitoring tutorial — cron scheduler mechanics, writing monitoring skills, tiered alerting delivery, BigQuery failure checks, and Snowflake cost monitoring.
Secured Table Materialization in dbt
A custom dbt materialization that automatically reapplies BigQuery row access policies, column descriptions, and data masking tags after every table rebuild.
dbt-expectations Hub
Hub note for dbt-expectations — setup, test reference, conditional filtering, severity tuning, BigQuery implementation patterns, and the unit test vs data test distinction.
on_schema_change in dbt Incremental Models
How dbt handles column additions and removals in incremental models, the four on_schema_change options, and why none of them backfill historical data.
Medallion Lakehouse on GCP
How the bronze-silver-gold medallion architecture maps to BigQuery table types, with BigLake Iceberg for flexibility and native tables for performance.
Markov Chain Attribution
How Markov chains model customer journeys as state transitions to calculate data-driven attribution through transition probabilities and the removal effect
MCP Client Primitives
The three capabilities MCP clients expose to servers — sampling (server-requested LLM completions), elicitation (server-requested user input), and roots (filesystem boundaries) — and when they matter for data engineering.
dbt as the Center of Gravity for BI
Why dbt has become the foundation layer that BI tools read from — not a parallel concern — and how the Fivetran merger accelerates this shift
Build vs. Buy Data Pipeline Economics
The three converging shifts that flipped the build-vs-buy calculation for data pipelines — pricing changes, AI-assisted development velocity, and open-source maturity
GA4 CROSS JOIN versus LEFT JOIN UNNEST
Why the comma syntax in FROM table, UNNEST(array) silently drops rows — and when to use LEFT JOIN UNNEST to preserve events without array data.
Documentation Quality Determines AI Usefulness
Why the quality of your dbt documentation directly determines how useful AI tools can be — the Roche chatbot failure, the docs-to-AI feedback loop, and case studies in enforcement
Attribution Touchpoint Table Design
How to design and build the touchpoint table that all attribution models consume -- field requirements, identity considerations, and the intermediate dbt model that maps raw events to attribution-ready rows
Incremental Strategy Decision Framework
A decision framework for choosing the right dbt incremental materialization strategy — merge, delete+insert, insert_overwrite, append, and microbatch
Building dlt Pipelines: From First Run to Incremental Loading
A reading path through the concepts in the hands-on dlt tutorial — environment setup, REST API Source config, dependent resources, and incremental loading.
dbt Slot Management on BigQuery
How dbt's execution model interacts with BigQuery slots -- why dbt is compute-heavy, the multi-project workaround, and best practices for sizing slots for dbt workflows.
Customer 360 Modeling
Hub note connecting the concepts involved in building a unified Customer 360 model from CRM and GA4 data — identity resolution, DAG architecture, conflict resolution, and privacy constraints.
FastMCP Server Skeleton
Minimal MCP server examples in Python (FastMCP) and TypeScript (McpServer) — the starting point for any custom server build.
Your First Hour with Claude Code (Analytics Engineer)
A sequenced reading path for getting started with Claude Code as an analytics engineer — from installation through your first useful output
BigQuery ML for Lead Scoring
Train a logistic regression or boosted tree model to predict lead conversion directly in BigQuery SQL — including the TRANSFORM clause, class imbalance, and how to evaluate model quality.
Google Ads BigQuery Data Transfer Service Setup
How the Google Ads BigQuery Data Transfer Service works — what it gives you, how the schema is organized, MCC vs per-account setup, and the defaults that will hurt you.
Custom MCP Server Decision Criteria
When to build a custom MCP server versus using an existing one — the build-vs-browse decision framework for data engineering teams.
Self-healing risk tiering
A framework for deciding which pipeline failures can self-heal automatically, which need human approval, and which should never be auto-remediated.
Debugging Custom dbt Materializations
Common errors in custom dbt materializations, what causes them, and how to test materializations systematically before deploying to production.
Elementary custom BI dashboards
How to build custom data quality dashboards in any BI tool by querying Elementary's warehouse tables directly, with example SQL for the most useful metrics.
dbt Testing Strategy
Hub note for building a complete dbt testing strategy — taxonomy, layer placement, unit test selection, alert routing, and package ecosystem.
AI Agent Regulatory Exposure for Data Teams
Why running AI agents against client data creates contractual and regulatory exposure for data teams — GDPR, data processing agreements, the open-source liability argument, and what the Dutch DPA warning actually means.
dbt Three-Layer Architecture
How the base, intermediate, and mart layers organize a dbt project, what belongs in each, and how data flows between them.
Schema Registry for Contract Enforcement
How schema registries enforce data contracts on event streams before data reaches the warehouse — compatibility modes, CEL validation rules, and production practices.
Cloud Storage Tiering for BigQuery
How to use Cloud Storage tiers and lifecycle policies alongside BigQuery for cost-effective data lake storage, including Autoclass and physical billing.
dbt Single Responsibility Macros
Why dbt macros should do one thing, how to recognize when they've outgrown their scope, and the composition pattern for building complex transformations from focused pieces.
dbt Unit Test File Organization
Where to put dbt unit test files, how to name tests consistently, and the co-location pattern with _unit_tests.yml.
GCP Application Default Credentials
The difference between gcloud auth login and Application Default Credentials — why they exist, how they work, and why ADC is what MCP servers and SDKs actually use.
dbt Schema Validation and Data Products Hub
Hub connecting notes on dbt's three validation mechanisms, source schema gaps, the Mesh governance triad, and contract-first development.
Data Contract Definition
What a data contract is, how it differs from schema tests and data quality checks, and why the 'non-consensual API' framing matters.
Cloud Run Jobs Deployment Script Pattern
An end-to-end deployment script for dbt on Cloud Run Jobs — service accounts, IAM bindings, Artifact Registry, job creation, and scheduling in a single reproducible script.
AI Personal CRM Pattern
Using an AI agent to auto-scan email and calendar for contact relationship tracking — how the pattern works, what SQLite with vector embeddings enables, and why this is the highest-risk integration to configure carefully.
BigQuery Fine-Grained Access Control
Column-level security with policy tags, row-level security with Row Access Policies, and dynamic data masking — the three layers of fine-grained access control in BigQuery beyond basic IAM roles.
GA4 session_start Event Unreliability
Why counting session_start events produces wrong session counts in GA4 BigQuery data, and the correct approach using distinct session IDs.
dbt Doc Block File Organization
How to organize doc block files in a dbt project — per-directory, per-model, centralized, and hybrid approaches with practical tradeoffs
BigQuery Storage Billing Strategies
Physical vs logical storage billing in BigQuery, long-term storage discounts, table expiration policies, and how to evaluate which billing mode saves money.
Ad Data Extraction Tools
Managed ELT, open-source, and native integration options for getting advertising data into your warehouse — Fivetran, Airbyte, dlt, Meltano, and BigQuery Data Transfer Service
dbt Constraint Enforcement Across Warehouses
How dbt constraint types behave across Postgres, Snowflake, BigQuery, Redshift, and Databricks — which constraints actually reject bad data and which are metadata only.
Dataform-to-dbt Migration
Migration paths between Dataform and dbt — tooling, realistic timelines by project size, and why macro conversion is where migrations get painful
Consent Mode Common Implementation Failures
The ten most frequent Consent Mode implementation mistakes, ordered by prevalence and damage — from missing defaults to untested consent states.
OpenClaw Security Risks — What's Documented
A factual catalogue of the specific, documented security incidents, CVEs, regulatory warnings, and threat patterns that analytics engineers need to know before running OpenClaw near client data.
Claude Code Strengths and Limitations for Data Work
Where Claude Code delivers real value in data engineering — boilerplate, multi-file changes, pattern replication — and where it struggles with novel logic, ambiguity, and over-engineering.
dbt Hub Publishing
How to publish a dbt package to the dbt Hub — requirements, the registration process, hubcap automation, and best practices for version management.
dbt-expectations Test Reference
A categorized reference of the highest-value dbt-expectations tests — table-level, pattern, range, multi-column, and completeness — with BigQuery-ready YAML examples.
Terminal Safety for Beginners
Which terminal commands are safe, which are dangerous, how to read error messages, and the keyboard shortcuts that save you when something goes wrong
generate_schema_name: Environment-Aware Schema Naming in dbt
How to override dbt's generate_schema_name macro so dev environments get prefixed schema names while prod uses clean custom schema names directly.
Organizing dbt Unit Tests at Scale
Tag strategies, CI pipeline tiers, and selection patterns for managing hundreds of dbt unit tests across a growing project.
dbt Unit Test Patterns
Hub note connecting all unit test patterns for dbt — incremental models, snapshots, window functions, business logic, marketing analytics, and edge cases.
GTM Server-Side Hosting on Azure
How to host the GTM Server-Side tagging container on Azure using App Service or Container Apps, with pricing tiers and SSL configuration notes.
Unit Testing GA4 Sessionization
How to unit test GA4 sessionization logic in dbt — session boundary detection, cross-midnight sessions, microsecond timestamps, and single-event sessions.
Microbatch Automatic Upstream Filtering
How dbt's microbatch strategy automatically filters upstream models by event_time, reducing full table scans — and when to opt out with .render().
HubSpot Lifecycle Stages in the Warehouse
How HubSpot's lifecycle stage model maps to warehouse columns, why forward-only transitions make funnel analysis straightforward, and how to handle merged contact artifacts.
dbt Incremental Strategy Configuration Patterns
Complete, runnable dbt config blocks for each incremental strategy — merge with predicates, delete+insert on Snowflake, insert_overwrite with static partitions, and replace_where on Databricks.
Try-Heal-Retry pattern
How to add AI-powered remediation to data pipelines using structured LLM output, Pydantic schemas, and circuit breakers, with production examples using Claude.
dbt Incremental Strategy Warehouse Behaviors
How dbt incremental strategies behave differently on BigQuery, Snowflake, and Databricks — the platform-specific quirks, gotchas, and limitations that the documentation doesn't emphasize enough.
OpenClaw for Freelance Consultants
A reading path through the OpenClaw admin automation use cases for solo consultants — morning briefings, expense capture, personal CRM, and meeting prep.
dbt Microbatch Strategy Tradeoffs
The practical limitations and design tradeoffs of dbt's microbatch incremental strategy — UTC assumptions, no sub-hourly batches, and sequential execution.
dbt Package Installation Types
The three ways to install dbt packages — Hub, Git, and local — and how to choose between them. Includes version conflict patterns and best practices for your root packages.yml.
Elementary materialization override for dbt 1.8+
Why Elementary requires a materialization override macro in dbt 1.8+ projects, what happens without it, and how to write it correctly for BigQuery and Snowflake.
Idempotent Incremental Models in dbt
How to build dbt incremental models that produce identical results regardless of how many times they run, using pre-deduplication and proper unique_key design.
Fivetran MAR Pricing Shift
How Fivetran's March 2025 shift to per-connector MAR pricing broke the economics of managed ELT — bulk discount elimination, 4-8x cost increases, and the marketing data problem
Dagster vs dbt Cloud Orchestration
When Dagster's dagster-dbt integration is worth the setup cost over dbt Cloud's built-in scheduler -- cost comparison, capability gaps, and the vendor independence argument after the Fivetran merger.
dbt persist_docs for Warehouse Comments
How persist_docs pushes dbt descriptions directly to your data warehouse as table and column comments, making documentation available where analysts already work
GA4 Event Data Structure
How GA4 structures event data in BigQuery — the event model, nested parameters, and the patterns you need to query it effectively.
Stale documentation is worse than missing documentation
Why outdated documentation that looks complete causes more damage than obvious gaps — the false confidence problem in data teams
BigQuery Reservation Hierarchy
The three layers of BigQuery's capacity model -- commitments, reservations, and assignments -- and how they work together to manage slot allocation.
BigQuery On-Demand Billing Mechanics
How BigQuery on-demand pricing actually charges you — columnar billing, the LIMIT clause trap, 10 MB minimums, caching, the free tier, and cross-cloud pricing.
Google Sheets as Analytics Data Source
How Google Sheets functions as a shadow data source in GCP analytics stacks — the integration patterns, the automation gap gws fills, and the convergence of data and productivity tooling.
Data Contract Adoption Friction
Reducing the friction that kills data contract adoption: SDK-based onboarding, audience-specific messaging, post-mortem data as leverage, and the Data Product Manager role.
GTM Server-Side: Ten Implementation Failures and How to Avoid Them
The ten most common GTM Server-Side implementation mistakes — from missing custom domains and silent trigger failures to Cloud Logging cost surprises and Safari IP mismatch — with diagnostic guidance for each.
MetricFlow semantic model components
The three building blocks of a MetricFlow semantic model: entities (join keys), dimensions (group-by columns), and measures (numeric aggregations that feed metrics).
MCP Pipeline Monitoring Server Pattern
A practical MCP server pattern for pipeline monitoring — checking job status, listing failures, and triggering reruns across orchestrators like Airflow and Dagster.
Essential Terminal Commands
The core terminal commands for navigation, file operations, viewing content, and finding things — the foundation of terminal literacy
Data Observability Build vs. Buy
A reading path through the data observability decision — from the tool landscape through scaling thresholds, ML vs statistical detection, TCO, and the minimum viable stack.
GA4 BigQuery Export Table Types
The four table types in a GA4 BigQuery export dataset — daily, intraday, and user tables — their timing, limitations, costs, and when to use each.
CRM Data Architecture Hub
Hub note connecting all garden notes on modeling Salesforce and HubSpot data in a modern warehouse with dbt and BigQuery.
dbt Migration Validation Patterns
How to validate a dbt migration — parallel execution, comparison queries, ML regression testing, and the practical approach to proving equivalence.
dbt Service Account Setup for Multi-Project GCP Architectures
How to create and configure a dbt service account when your source data, transformation output, and compute infrastructure live in separate GCP projects.
Attribution Lookback Windows
How to set attribution lookback windows by industry and purchase cycle -- benchmarks, consequences of wrong windows, and implementation in SQL
GA4 Schema Evolution Monitoring
GA4's BigQuery schema changes without announcements and new fields are never retroactive. How to detect additions before they break production queries.
GTM Server-Side Hosting Costs: Self-Hosted vs Managed
The real cost of running GTM Server-Side — Cloud Run pricing by traffic tier, the Cloud Logging cost trap, and a comparison of managed alternatives (Stape, Addingwell, Cloudflare Zaraz).
HubSpot Associations as Bridge Tables
HubSpot's many-to-many association model requires bridge tables at every layer. How to model them correctly, handle fan-out, and resolve the primary company problem.
Google OAuth CLI Setup Gotchas
The specific mistakes that cause OAuth setup to fail silently for Google Workspace CLI tools — wrong application type, missing test users, and the scope limit trap.
Customer 360 dbt DAG Architecture
How to structure a dbt project for Customer 360 models — the identity resolution layer between base and mart, the wide customer table, and materialization choices.
Building MCP Apps Visualization Server
How to build a custom MCP Apps visualization server in TypeScript — registering app tools with UI metadata, serving HTML resources, and implementing the client SDK for bidirectional communication.
OpenClaw for dbt Monitoring
Using OpenClaw as an always-on monitoring layer for dbt projects — cron-based testing, Slack alerting, mobile access, and practical use cases for solo consultants
GTM Server-Side: Map of Content
Index of garden notes on GTM Server-Side — architecture, Cloud Run deployment, GA4 configuration, Meta CAPI, Google Ads, hosting costs, and common failures.
BigQuery Partitioning vs Clustering Decision Framework
A practical decision framework for choosing between BigQuery partitioning, clustering, or both based on table size, query patterns, and operational needs.
Proactive vs. Reactive AI Agents
The distinction between AI tools that respond to prompts and AI agents that act on schedules — why this shift matters for automation use cases, and where each model fits.
MCP Data Catalog Server Pattern
A practical MCP server pattern for exposing internal data catalogs — table search, metadata retrieval, and lineage tracing as AI-accessible tools.
Removal Effect in Attribution
The removal effect measures how much conversion probability drops when a channel is removed -- the mathematical foundation of both Markov chain and Shapley value attribution
dbt BigQuery Configuration
How to configure dbt for BigQuery — profiles.yml setup, authentication methods, generate_schema_name, job labels for cost attribution, and cost control settings.
Organizing Lightdash Metrics at Scale
How to keep a large Lightdash implementation navigable — groups, group_details, the Metrics Catalog with Spotlight categories, and reusable parameters for values that change across deployments.
Cloud Functions as a dbt Execution Environment
When and why to use Google Cloud Functions to run dbt Core — how it compares to Cloud Run Jobs, what it's good at, and where it falls short.
Agent Dashboard Scraping: The Fragility Problem
How browser automation works for dashboards without APIs, the five-step scraping loop, session management patterns, and why silent failure is the central limitation that makes this a fallback of last resort.
Unit Testing Incremental Models in dbt
The dual-mode testing pattern for incremental models — overriding is_incremental, mocking this, and understanding that expect blocks show inserts, not final state.
GA4 User-Provided Data BigQuery Trap
Enabling User-provided data in GA4 admin permanently disables user_id export to BigQuery with no reversal option — what this means and how to protect your pipelines.
MetricFlow installation and setup
Installing MetricFlow for dbt Core with adapter-specific packages, the dbt Cloud alternative, and the initial project configuration steps needed before defining semantic models.
AI SQL Review Tools
A reference of tools that apply AI to SQL and dbt code review — Altimate AI, Greptile, CodeRabbit, and MotherDuck FixIt — with benchmarks and differentiators
dbt-utils SQL Generators
Reference for dbt-utils SQL generation macros: date_spine, deduplicate, star, union_relations, pivot, unpivot, and the smaller helpers. What each does, how to call it, and the gotchas.
Fivetran dbt Packages Architecture
How Fivetran structures its 60+ dbt packages — the unified source-plus-transform model, cross-platform reporting bundles, and the installation pattern that avoids version conflicts.
Ad Pipeline Engineering Challenges
The operational challenges of maintaining advertising data pipelines — API rate limits, schema changes, attribution window normalization, currency handling, and privacy compliance
Terminal Cross-Platform Setup
How to set up and use the terminal on macOS, Linux, and Windows — including WSL, Git Bash, and PowerShell options with a command equivalence table
Floating-Point Precision in Data Comparison
Why exact equality fails for floating-point values in data comparison, and practical strategies for handling precision mismatches.
BigQuery Cost Optimization
A structured guide to BigQuery cost optimization covering the cost model, query patterns, dbt configurations, pricing models, storage billing, and governance.
GA4 dbt Unnesting Layer Architecture
How to structure a dbt project for GA4 unnesting — base layer for parameter extraction, intermediate for event-specific models, mart for analytics-ready aggregations.
Looker Studio Limits and Upgrade Path
The hard technical limits of Looker Studio that optimization can't fix, what Looker Studio Pro actually adds, and when to evaluate enterprise Looker or alternative BI tools.
MCP Server Project Setup
Step-by-step project initialization for a custom MCP server — directory structure, dependencies, client installation, and the typical project layout.
dlt: Python-Native Data Loading
A reading path through dlt's core mechanics — from building blocks through BigQuery-specific loading to incremental state tracking.
dbt Package Ecosystem Hub
Navigation hub for the dbt package ecosystem — how installation works, what's available, version compatibility, and how to evaluate packages for production use.
BigQuery Regional Architecture
How BigQuery's region model works — multi-region vs. single region, the cross-region join constraint, and how to choose a region you'll live with permanently.
Unified Ad Model Downstream Patterns
What becomes practical once you have a unified cross-platform ad model — blended ROAS, budget pacing, and Marketing Mix Modeling data preparation.
When to Write Custom dbt Materializations
Decision framework for when custom dbt materializations are worth the maintenance burden versus post-hooks, macros, or built-in incremental strategies.
dlt Pipeline Testing
Testing dlt pipelines locally with DuckDB before hitting production — unit tests with resource limits, integration tests for schema validation, and common debugging patterns.
Data Contract Tooling Ecosystem
The landscape of data contract tools in 2026 — dedicated contract tools, quality frameworks with contract support, and governance platforms.
dbt Unit Test Mocking Dependencies
How to mock refs, sources, macros, variables, and the 'this' keyword in dbt unit tests — with patterns for multi-join models and incremental overrides.
MCP Apps vs Traditional BI
When to use MCP Apps for data visualization versus a dedicated BI tool — the honest comparison, what each does well, and the hybrid architecture that makes sense for most teams.
dbt Macro Testing Patterns
Two approaches to testing dbt macros — integration test models and dbt 1.8 unit tests — plus the compile-and-inspect workflow for debugging.
BigQuery Editions
The three BigQuery Editions tiers -- Standard, Enterprise, and Enterprise Plus -- what each offers, their limits, and how they compare to on-demand pricing.
dbt Ad Reporting Patterns
How to model advertising data in dbt — the dbt_ad_reporting package, cross-platform UNION patterns, platform-specific normalization, and reconciliation testing
Claude Code Status Line Configuration
How to set up Claude Code's status line to display git branch, active model, and context usage — practical setup for analytics engineers
Orchestrator Pricing for dbt Teams
Managed orchestration costs compared — Dagster+, Prefect Cloud, Astronomer, Cloud Composer, and dbt Cloud — with entry-tier pricing, scaling models, and the hidden costs that shift the math.
RAG for dbt Documentation
How retrieval-augmented generation bridges the business context gap in AI-generated dbt documentation — from full RAG pipelines to the simpler CLAUDE.md workaround
Open Data Contract Standard
ODCS v3.1.0 under the Linux Foundation's Bitol project — what it covers, how it compares to the Data Contract Specification, and where harmonization stands.
dbt Validation Mechanisms Compared
How dbt contracts, data tests, and dbt-expectations differ in when they fire, what they cover, and what they cost — and why you need all three.
BigQuery Editions Migration Anti-Patterns
Five mistakes teams make when migrating from BigQuery on-demand to Editions — and how to avoid them.
MCP Context Window Overhead
The concrete token cost of MCP tool definitions in an LLM's context window — measurements from Anthropic and practitioners, and why it matters for long sessions.
When to Write dbt Unit Tests
Specific decision criteria for where native dbt unit tests pay off — complex logic scenarios, the incremental model override pattern, and what to skip.
Markdown-to-Notion Blocks Parser
How to convert markdown to Notion's block API format in JavaScript, including handling rich_text objects, the 2000-character limit, and the 100-block request cap.
BI Tool Self-Hosting and Licensing
How MIT, AGPL, and proprietary licensing affect what you can do with self-hosted BI tools — feature gates, copyleft obligations, and what 'free' actually means for Lightdash, Metabase, and Looker.
dlt RESTClient Mechanics
How dlt's RESTClient works — instantiation, the paginate() method, key parameters, and built-in error handling with retry and backoff.
dbt Contract Rollout Strategy
How to adopt dbt model contracts in an existing project — identifying candidates, scaffolding YAML, phased enablement, and CI/CD integration for governance-only checks.
Orchestrator Architectural Philosophies
The three competing mental models in data orchestration — process-oriented (Airflow), data-oriented (Dagster), and function-oriented (Prefect) — and why the abstraction matters more than the feature list.
Dataform for BigQuery
A structured guide to evaluating Dataform as a BigQuery transformation tool — what it is, how it compares to dbt, and when it makes sense
Cross-Platform Ad Metric Comparability
Why only five metrics can be meaningfully compared across ad platforms, how to handle platform-specific metrics, and conversion configuration details that determine what your 'conversions' column actually means.
dlt Core Concepts
The four building blocks of dlt pipelines — sources, resources, pipelines, and schemas — and the three write dispositions that control how data lands.
ELT Connector Quality and Coverage Comparison
How Fivetran, Airbyte, and dlt differ in connector count, quality tiers, and their approaches to handling sources that don't have pre-built connectors.
Jinja Templating for SQL Practitioners
Why Jinja feels natural to SQL-first analytics engineers — the double-brace model, macros as SQL helpers, and the separation of concerns that keeps transformation files focused.
CRM Data Extraction Challenges
Why CRM data is harder to warehouse than most sources — mutability, API-based extraction, soft deletes, formula field blind spots, and rate limits.
Building Custom API Pipelines with dlt
A map of the concepts and patterns involved in building production API pipelines with dlt — from choosing an approach through deployment.
Identity Resolution for Customer 360
How to link CRM contact records to GA4 cookie identifiers in BigQuery — the three join key strategies, deterministic vs probabilistic matching, and open-source tooling.
dbt Testing Anti-Patterns
Four common testing mistakes in dbt projects -- over-testing, happy-path-only coverage, drifting thresholds, and testing warehouse functions -- and what to do instead.
BigQuery Idle Slot Sharing
How idle slot sharing works in BigQuery Enterprise editions -- requirements, configuration, preemption behavior, and when to disable it.
dbt-audit-helper CI/CD Integration
How to integrate dbt-audit-helper into CI/CD pipelines — dbt Cloud PR jobs, GitHub Actions with --defer, and automated regression detection.
Privacy Constraints for Linked Analytics Data
GDPR and CNIL implications when linking GA4 cookie identifiers to CRM contact records — consent exemption loss, right to deletion cascades, and the architectural requirements for compliant Customer 360 models.
LLM as Content Cleaner
Using a cheap LLM like GPT-4o-mini to strip navigation, CTAs, and HTML noise from scraped markdown — a reliable pattern for web content pipelines.
Lead Scoring Signal Dimensions
The four categories of signals that drive lead scoring — demographic fit, firmographic fit, behavioral engagement, and recency — and why the warehouse sees all of them when the CRM can't.
Eventarc Event-Driven dbt Triggers
Using Eventarc to trigger dbt runs when upstream data arrives — Cloud Storage object creation, BigQuery audit log events, and combining event-driven with scheduled execution.
Meta Ads to BigQuery Pipeline — Hub
Map of content for building and maintaining a Meta Ads to BigQuery pipeline — API structure, actions array flattening, attribution windows, iOS signal loss, and operational maintenance.
Data Contract Rollout Change Management
The organizational change management strategy for data contracts: start with two datasets, create urgency through visible cost, and measure conversations rather than coverage.
Salesforce vs HubSpot Data Models
How Salesforce and HubSpot structure CRM data differently — metadata-driven relational models vs many-to-many associations — and what that means for warehouse modeling.
Measuring Data Latency Before Choosing an Incremental Strategy
How to profile the gap between event time and load time in your source tables, and use that distribution to size lookback windows and choose the right incremental strategy.
Preparing for the dbt Analytics Engineering Certification
What the dbt developer certification actually tests, where people get tripped up, and how hands-on project experience matters more than studying.
Dagster Freshness Policies and Scheduling
How Dagster tracks asset freshness rather than just execution timestamps, and how to schedule dbt runs using cron schedules, sensors, and automation conditions.
Fivetran dbt Packages for CRM
What dbt_salesforce and dbt_hubspot provide out of the box — model coverage, configuration, pass-through columns, history mode support, and naming convention tradeoffs.
Kestra Declarative Orchestration
Kestra's YAML-first orchestration model — how it differs from Python-decorator tools, its rapid growth, enterprise adoption, and why production evidence at small-to-mid scale is still thin.
dbt Doc Block Jinja Limitations
What you cannot do inside dbt doc blocks — restricted Jinja context, the README parsing gotcha, and the missing column description inheritance feature
LinkedIn Ads dbt Modeling
How to model LinkedIn Ads data in dbt — the campaign hierarchy rename, metric normalization, cross-platform integration via dbt_ad_reporting, and the incremental strategy for 90-day attribution windows.
Fivetran-dbt Merger and Orchestration Independence
Why the October 2025 Fivetran-dbt merger makes external orchestration more strategically important — vendor optionality, platform lock-in risk, and the case for controlling your orchestration layer.
TDD with Claude Code for dbt
How test-driven development works with Claude Code for dbt models — write tests first, let the agent iterate to pass them, then refactor with confidence
BI Tool Selection Framework
A decision framework for choosing a BI tool in 2026 — four key questions, a comparison of Lightdash vs Looker vs Metabase, and the market landscape from dbt-native to enterprise tools
MCP Setup Troubleshooting
Common failure modes when setting up MCP servers — macOS PATH problems, silent JSON config failures, tool count limits, and where to find debug logs.
Managed ELT Tool Architectures: Fivetran, Airbyte, and dlt
How the three dominant data ingestion tools approach the same problem differently — fully managed connectors, self-hosted open source, and Python-native libraries.
Shapley Value Attribution
How cooperative game theory's Shapley values produce provably fair attribution by calculating each channel's average marginal contribution across all possible channel coalitions
Dataform-to-dbt Migration Hub
Hub note connecting all garden notes related to migrating from Dataform to dbt — decision criteria, concept mapping, templating differences, and validation.
Per-Workload Service Account Naming Conventions
One service account per workload with a compute-platform prefix — so logs, cost attribution, and incident response all point to the right place immediately.
MetricFlow CLI querying
How to query MetricFlow metrics from the CLI in dbt Core (mf) and dbt Cloud (dbt sl): group-by, filters with Jinja dimension syntax, multi-metric queries, and the semantic manifest.
dbt Testing Strategy by Layer
What to test at each layer of the dbt DAG — sources, base, intermediate, and mart — and why testing intensity should increase toward the edges.
Dagster + dbt Integration Hub
Hub note for the dagster-dbt integration — how the mapping works, quality checks, freshness monitoring, CI/CD workflows, and the case for choosing Dagster over dbt Cloud.
Salesforce Ingestion Tool Selection
Choosing between Fivetran, Airbyte, dlt, Hevo, and custom Python for Salesforce extraction — connector mechanics, cost realities, and the AppExchange dispute.
Baseline vs. Autoscaling Slots in BigQuery
How baseline and autoscaling slots work in BigQuery Editions -- guaranteed capacity vs. elastic scaling, the 60-second autoscale window, and slot usage priority.
Data Contract Anti-Patterns
Where data contract initiatives go wrong: misplaced enforcement, paper-only contracts, one-size-fits-all implementations, and unfunded ownership.
Attribution Dashboard Design
How to design attribution dashboards for multiple audiences — essential metrics, audience-tiered hierarchy, Looker Studio implementation patterns, and working around BI tool limitations
Position-Based Attribution Models
U-shaped and W-shaped attribution models that weight credit by journey position — formulas, edge cases, industry weight variations, and BigQuery SQL implementation
dlt Dependent Resources
How dlt lets one resource use another's output to configure its endpoint — the path template syntax for multi-step API traversal.
Elementary edr monitor alerting
How edr monitor works, how it differs from edr report, and how to configure alert metadata in model YAML to control who gets notified and when.
dbt Materialization Cost Impact on BigQuery
How dbt materialization choices affect BigQuery costs -- table vs view vs ephemeral trade-offs, the view chain anti-pattern, and why defaulting to tables usually wins.
Salesforce Account Hierarchy with Recursive CTEs
How to resolve Salesforce's self-referential ParentAccountId into a flattened hierarchy using recursive CTEs in BigQuery — the SQL pattern, ultimate parent resolution, and revenue rollup.
Terminal Fundamentals
What the terminal actually is, how it differs from a shell, and the working directory mental model that makes navigation intuitive
The AI Production Gap in Data Engineering
Why AI gets you to 80% fast but the remaining 20% — security, compliance, temporal consistency, governance — is where most of the real work lives.
BigQuery Cost Governance Guardrails
Query-level limits, project-level quotas, authorized views, and access patterns that prevent expensive BigQuery mistakes before they happen.
Modern BI Landscape
Hub note for understanding BI in 2026 — the semantic layer, metrics-as-code, headless BI, dbt centrality, and how to choose a tool
dbt-utils Introspective Macros
How dbt-utils compile-time introspection macros work — get_column_values, get_relations_by_pattern, get_query_results_as_dict, and get_single_value — and when they cause problems.
dbt Docs Customization and Deployment
A reading path through customizing and deploying dbt docs beyond localhost — from understanding the build artifacts to choosing a hosting platform, automating deployment, and knowing when to replace the default frontend
Dataform Decision Framework
When Dataform is the right choice and when dbt wins — a decision framework based on platform commitment, budget, team preferences, and use case complexity
GA4-Specific dbt Testing Patterns
Data quality tests for GA4 dbt projects that catch tracking failures standard schema tests miss — missing session_start events, orphaned transactions, suspicious session metrics.
OpenClaw Persistent Memory Model
How OpenClaw's Markdown-based persistent memory differs from session-based tools, what it enables for long-running data monitoring, and how memory files work in practice.
Attribution Model Disagreement as Signal
Why running multiple attribution models in parallel reveals more than any single model, and how to use the disagreement between them to communicate uncertainty and drive better decisions
dbt Documentation People Actually Read
A reading path through writing dbt documentation that gets used — from diagnosing why docs go unread to writing patterns, delivery mechanisms, and the AI quality feedback loop
Lightdash Metric Types and Definition Syntax
The three categories of Lightdash metrics — aggregate, non-aggregate, and post-calculation — plus column-level vs model-level placement, filters, and display configuration.
Meta Ads Attribution Windows
How Meta's attribution windows work, the June 2025 on-Meta/off-Meta split, which windows survived the January 2026 deprecation, and what this means for warehouse data.
MCP Transport Configuration
Practical configuration for MCP's two transport modes — stdio for local development and streamable HTTP for production deployment.
Debugging dbt with Claude Code
How to use Claude Code for dbt debugging — letting the agent face errors directly, tracing data issues through upstream models, and using subagents for complex investigations
Pipeline Alerting Delivery Patterns
How to structure pipeline monitoring alerts — tiered severity routing, Slack vs. Telegram tradeoffs, delivery modes (channel, DM, webhook, silent), and designing alert systems that don't become noise.
Elementary report sections
What each section of the Elementary HTML report shows and when to use each one during a data quality review.
Self-Hosting Lightdash with Docker Compose
How to run Lightdash with Docker Compose — required services, environment variables, known gotchas, and what to expect in small-team production deployments.
Self-healing pipeline maturity spectrum
Five levels of self-healing capability in data pipelines, from basic retries to fully agentic systems, and where production value actually concentrates.
Markov Attribution SQL Implementation
SQL patterns for extracting journey paths and calculating transition probabilities in BigQuery, the data preparation layer for Markov chain attribution
Dagster Full-Stack Pipeline Architecture
How Dagster unifies ingestion, transformation, Python processing, and downstream triggers in a single asset graph — the pattern that justifies Dagster over simpler orchestration approaches.
dbt Unit Test CLI Commands
How to run, filter, debug, and exclude dbt unit tests from the command line — including output interpretation and production exclusion patterns.
Claude Code for dbt Development
A reading path through the core workflows for using Claude Code in a dbt project — base models, tests, documentation, debugging, refactoring, and prompting.
dbt Source Schema Validation
How to validate source schema in dbt when contracts can't reach — using dbt-expectations on sources to catch column drift before transformation runs.
dbt-expectations row_condition Pattern
How the row_condition parameter in dbt-expectations enables conditional test filtering — applying tests to specific segments without custom SQL.
Microbatch Backfill and Full Refresh Protection
How to use dbt's built-in microbatch backfill commands, retry failed batches, and protect large incremental tables from accidental full refreshes.
dbt Documentation with Claude Code
A systematic approach to dbt documentation using Claude Code — the codegen-plus-AI pattern, docs blocks for consistency, lineage diagrams, and slash commands for automation
dbt Identity Resolution Pipeline
Production dbt DAG structure for GA4 identity resolution — the incremental identity mapping model, stitched events model, schema tests, and the 3-day lookback window for late-arriving data.
BigQuery Slots
What BigQuery slots are, how queries use them, what happens during slot contention, and the two ways to get slots.
Looker Studio + BigQuery Performance — Hub
Map of garden notes on optimizing Looker Studio dashboards backed by BigQuery: BI Engine, extract mode, blending pitfalls, caching, credentials, and upgrade decisions.
Choosing Between BigQuery MCP Options
Decision framework for BigQuery MCP access — Remote Server vs Toolbox vs bq CLI, matched to your client, team setup, and use case.
Elementary alerting hub
A reading path through Elementary's alerting system -- from the edr monitor command through Slack/Teams setup, filter-based routing, alert fatigue reduction, and on-call strategy.
HubSpot Property History Mechanics
How HubSpot's property history tables work, their retention limits, why CALCULATED properties inflate sync costs, and how to model history data without surprises.
Server-Side Cookies and Safari ITP Bypass
How setting cookies via HTTP Set-Cookie header from a same-domain server bypasses Safari's 7-day JavaScript cookie cap — the FPID mechanism, the IP mismatch problem, and the three approaches that solve it.
dbt-utils generate_surrogate_key
How generate_surrogate_key works, why null handling matters, and why migrating from the old surrogate_key() macro can silently break incremental models and snapshots.
Dataform Ecosystem and Tooling Gaps
Where Dataform falls short beyond testing — CI/CD automation, IDE tooling, package ecosystem, and platform lock-in compared to dbt
Agentic Workflow Shift in Data Engineering
How agentic AI tools change the data engineering workflow from manual template adaptation to describe-and-review — and why the real shift is from syntax to modeling decisions.
Google Ads Performance Max Data Gaps
Why Performance Max campaign data is incomplete in BigQuery DTS, what's actually missing, and how to get the data you need.
Claude Code Behind the Scenes
What commands Claude Code actually runs when it explores code, searches for patterns, edits files, and manages git — understanding the mechanics builds confidence and helps you learn
Late-Arriving Data and the Lookback Window Pattern
How to handle late-arriving data in dbt incremental models using lookback windows, including window sizing trade-offs and the limits of any lookback approach.
Consent Mode Server-Side GTM Propagation
How consent signals travel from the web container to server-side GTM via gcs and gcd parameters, and why non-Google vendor tags require manual consent enforcement.
GA4 Session Key Construction
Why ga_session_id alone fails as a session identifier, how to build the correct composite key, and the edge cases that produce null sessions.
Code Generation over Tool Calling Pattern
The emerging pattern of having LLMs write code against APIs rather than generate tool calls — Cloudflare's Code Mode, Anthropic's code execution, and what it means for MCP's future.
BigQuery IAM Patterns
Least-privilege IAM for BigQuery — predefined roles, the data vs. compute permission split, service account strategy, and common anti-patterns.
Google Ads ClickType Impression Trap
Why Google Ads DTS stats tables silently inflate impression counts 3-6x, and the exact SQL filter that fixes it without breaking click counts.
MCP Client Landscape
The major MCP clients — desktop apps, code editors, and CLI tools — and how to choose between them based on your workflow.
CLAUDE.md as Project Memory
How CLAUDE.md gives Claude Code persistent project context — what to include, what to leave out, and why reactive additions beat proactive documentation
dlt Secrets Management
How dlt's configuration hierarchy keeps credentials out of code — the priority order, secrets.toml for local development, environment variables for CI/CD, and vault integrations.
GA4 User Backstitching
How to retroactively apply GA4 user_id to anonymous sessions in the warehouse — the SQL pattern, shared device handling, and when backstitching is worth the complexity.
Codebase Refactoring with Claude Code
How Claude Code enables project-wide dbt refactoring — column renames, naming convention migrations, and ref() updates across dozens of files without the manual search-and-miss problem.
Google Ads Developer Token
What the Google Ads developer token is, how access levels work, why approval takes months, and which loading tools require one.
dbt Repository Structure for Cloud Function Deployment
How to restructure a dbt project repository for Cloud Function deployment — the subdirectory pattern, main.py, requirements.txt, and profiles.yml with oauth.
Incremental Predicates for dbt Merge
How incremental_predicates limit destination table scans during dbt merge operations, turning full table scans into partition-pruned reads.
Identity Resolution for Ad Measurement
How Enhanced Conversions, Unified ID 2.0, and data clean rooms recover attribution signal after cookies fail — what each approach does, what it requires, and realistic uplift estimates.
Elementary alert routing with filters
How to run multiple edr monitor commands with different filters to route alerts by tag, owner, status, or resource type to different channels and incident management tools.
Elementary for dbt
How Elementary extends dbt with data observability — anomaly detection, automated freshness monitoring, test result history, and Slack alerting
BigQuery BI Engine
How BigQuery BI Engine provides in-memory acceleration for dashboard queries, what it supports, what it silently skips, and how to verify it's actually working.
Layered AI Stack for Analytics Engineering
The mental model of thinking about AI tools in layers — IDE, coding agent, orchestration, review — rather than choosing a single tool for everything
JavaScript vs Jinja in Analytics Engineering
The philosophical and practical differences between Dataform's JavaScript templating and dbt's Jinja2 — where they diverge, what each excels at, and how to convert between them.
dbt Unit Test YAML Syntax
Complete reference for dbt unit test YAML structure — required elements, input formats (dict, csv, sql), optional configuration, and version-specific features.
Agent Skill Supply Chain Attacks
How malicious skills in agent ecosystems like ClawHub bypass traditional antivirus detection, why natural-language malware is a fundamentally different threat class, and how to evaluate skills before installing them.
Salesforce Opportunity Stage Duration Analysis
How to calculate time spent in each pipeline stage using OpportunityFieldHistory and LEAD window functions — the SQL pattern, downstream analysis, and win rate metrics.
GA4 BigQuery Number Discrepancies
Why your BigQuery session and user counts won't match the GA4 interface, and the practical approach to handling the 1-5% variance.
dlt Google Ads Pipeline
Building a Google Ads to BigQuery pipeline with dlt — the verified source, GAQL query patterns, incremental loading, and deployment options.
ML Anomaly Detection vs Statistical Methods
When ML-powered anomaly detection earns its cost over simpler Z-score approaches — and why the answer depends on data complexity, not marketing materials.
Dagster GCP Deployment
How to deploy Dagster on GCP — Serverless vs Hybrid modes, GKE with Helm, Workload Identity authentication, Cloud SQL for storage, and the community Cloud Run option.
GA4 User Mart Pattern
Building a user-grain mart from GA4 session data — first/last touch attribution, lifetime value aggregation, and identity stitching with user_pseudo_id and user_id.
GA4 dbt Project Template
Hub connecting all concepts in building a production-ready dbt project for GA4 BigQuery exports — from base model to marts, with testing and documentation.
Analytics Engineer Skills in the Agent Era
Seven skills worth investing in now that agents handle execution — AI orchestration, specification engineering, critical code review, domain expertise, governance, systems thinking, and tool fluency.
Ad Platform Metric Divergence
Why impressions, clicks, and conversions mean different things on Google, Meta, and LinkedIn — and why pretending they're equivalent produces misleading cross-platform reports.
BigQuery Cost Attribution with INFORMATION_SCHEMA
Using INFORMATION_SCHEMA queries to find expensive queries, attribute costs by user and dataset, identify unoptimized tables, and build a weekly cost review practice.
Consent Mode Basic vs Advanced
How Basic and Advanced Consent Mode differ in tag behavior, cookieless pings, and conversion modeling — and the traffic thresholds that determine whether Advanced mode actually helps.
dbt Cloud Managed Platform
What dbt Cloud provides beyond Core -- web IDE, job scheduling, collaboration tools, managed infrastructure, and the pricing model that shapes adoption decisions.
Salesforce Unified Activity Timeline
Combining Salesforce Tasks and Events into a single activity timeline with consistent column naming and polymorphic entity resolution.
dbt Weighted Attribution Models
Implementing position-based and time-decay attribution in dbt with configurable weights via dbt variables — model SQL, project configuration, and revenue integrity testing
Airbyte Pricing and Self-Hosting Costs
Airbyte's February 2025 capacity-based pricing model and the hidden infrastructure costs of self-hosting — NAT Gateway, Kubernetes overhead, and what 'free' actually costs.
dlt RESTClient vs REST API Source
The two approaches dlt offers for building custom API pipelines — imperative RESTClient and declarative REST API Source — and how to choose between them.
Looker Studio Data Blending Pitfalls
Why Looker Studio data blending silently creates cartesian products, how to identify it, and why pre-joining in BigQuery is almost always the right fix.
Base Model Generation with Claude Code
How to use Claude Code to generate dbt base models — the pattern-replication workflow, prompting constraints, and CLAUDE.md defaults that eliminate inconsistency.
GA4 Acquisition Performance Mart
A daily x source/medium grain mart for GA4 acquisition reporting — aggregating sessionized events into dashboard-ready metrics with conversion rates and revenue.
dbt-Fivetran Merger and the 2026 Transformation Landscape
How the October 2025 dbt-Fivetran merger reshaped the analytics engineering landscape — unified platform strategy, Core/Cloud divergence, and what it means for tool choice.
Cloud Scheduler OIDC Authentication for HTTP Triggers
How Cloud Scheduler authenticates to secure HTTP endpoints using OIDC tokens — the service account requirements, the gcloud setup, and the pattern for Cloud Functions and Cloud Run.
GA4 Ecommerce Checkout Funnel Pattern
Session-based checkout funnel analysis from GA4 BigQuery data — counting distinct sessions at each funnel stage from view_item through purchase.
dbt Docker Containerization
Patterns for containerizing dbt Core for production — multi-stage Dockerfiles, version pinning, Artifact Registry, and the two-repository strategy that separates transformation logic from infrastructure.
direnv for Multi-Client GCP Credential Management
Automate per-project GCP credential loading with direnv — .envrc configuration, the four-variable pattern, and a five-minute setup for each new client.
Ad Platform Attribution Bias
Why every ad platform overcounts conversions, how walled-garden incentives create measurement gaps, and what only becomes visible when ad data lives in the warehouse
Dagster UI for Analytics Engineers
A walkthrough of Dagster's web UI — the Asset Catalog, Global Asset Lineage, Run Details, health indicators, and the Dagster+ Pro features that matter most for analytics engineers on dbt + BigQuery.
Data Observability Minimum Viable Stack
The four non-negotiable observability capabilities every data team needs regardless of tooling — primary key tests, freshness monitoring, volume anomaly detection, and actionable alerting.
dbt Materialization Default: Tables Everywhere
Why materializing every dbt model as a table by default — not views, not ephemeral — produces more debuggable, stable, and maintainable projects.
Probabilistic Matching Limitations in GA4
Why probabilistic identity matching fails with GA4's BigQuery export — the signals GA4 intentionally excludes, what coarse data remains, and the compounding cost of false positives.
Analytics Engineer as Director of AI
The role identity shift as agents take over execution — from producing analytical work to directing it. What stays human, what moves to agents, and how to think about your own value in the transition.
Privacy Sandbox Collapse
How Google's Privacy Sandbox went from the industry's best hope for a cookie replacement to a quiet retirement — the timeline, what survived, and why it sealed the case for server-side infrastructure.
BigQuery Multi-Environment Patterns
Three patterns for separating dev, staging, and production in BigQuery — separate projects, dataset prefixes, and central data lake with department marts.
BigQuery Dynamic Data Masking
Show sensitive column structure without exposing values — SHA256 hashing, nullification, and default masking for analysts who need to write queries but not read PII.
dbt Packageable Model Patterns
Three patterns that make dbt models installable by anyone — configurable sources with var(), enable/disable flags, and namespaced model names.
Orchestrator Learning Curves
An honest assessment of ramp-up time and friction points for Dagster, Airflow, and Prefect — what trips up analytics engineers and what helps.
Dataform vs dbt Cost Comparison
The real cost equation between Dataform and dbt — licensing savings vs ecosystem gaps, migration costs, and hidden engineering overhead
MCP JSON-RPC Wire Format
The actual message format MCP uses under the hood — initialization handshake, capability negotiation, tool discovery, and tool invocation — with examples for debugging.
BigQuery Autoscaling Cost Overhead
Why theoretical slot-hour costs rarely match your actual BigQuery bill — the 1.5x autoscaling multiplier, 60-second billing window, and how workload shape changes everything.
Elementary HTML report generation
How the edr report command works, which flags matter in practice, and patterns for generating targeted reports for different audiences.
GA4 BigQuery Schema Hub
Hub note connecting all concepts needed to understand and query the GA4 BigQuery export schema — table types, nested structures, gotchas, and query patterns.
CLAUDE.md BigQuery Specifics
What to put in CLAUDE.md when your dbt project runs on BigQuery — GoogleSQL dialect enforcement, partition filter requirements, and incremental model config templates.
GTM Server-Side Managed Hosting Providers
Comparison of Stape, Addingwell, TAGGRS, and Cloudflare Zaraz as managed alternatives to self-hosting GTM Server-Side containers on cloud infrastructure.
dbt Integration Depth Across Orchestrators
How dagster-dbt, astronomer-cosmos, and prefect-dbt differ in integration depth — from first-class asset mapping to operational wrappers — and what that means when something breaks.
Orchestrator Comparison for dbt Teams Hub
Hub note for the Dagster vs Airflow vs Prefect comparison — architectural philosophies, dbt integration depth, developer experience, pricing, learning curves, and the decision framework.
Agentic AI Fit for Data Work
Why data engineering is structurally well-suited for agentic AI tools — repetitive patterns, multi-language context-switching, and cross-layer debugging make the case.
SQL Dialect Divergences Across Warehouses
Where SQL syntax breaks across BigQuery, Snowflake, and Databricks — date functions, type casting, and argument ordering differences that matter for portable dbt code.
Multi-Client Agent Reporting Architecture
How to structure per-client isolation for OpenClaw reporting workflows — separate cron jobs, credential management at scale, failure containment, and the security tradeoffs of running multiple clients on a single machine.
Campaign Naming and UTM Standardization
How to standardize campaign names across ad platforms using naming conventions, regex parsing, and seed overrides — plus UTM hygiene rules that make cross-platform attribution possible.
Prompt Injection and the Lethal Trifecta
Simon Willison's lethal trifecta — why combining private data access, untrusted content exposure, and external communication ability creates a uniquely dangerous attack surface for AI agents handling data work.
Google Workspace CLI for AI Agents (Hub)
Hub note for the gws CLI ecosystem — the tool itself, agent-first design principles, OAuth setup, CLI vs MCP tradeoffs, and Sheets as a data source.
Consent Mode v2 Parameter Architecture
The four Consent Mode v2 parameters, how upstream browser controls differ from downstream server instructions, and the legal mandate that forced the change.
Dagster+ Pricing and Credit Model
How Dagster+ pricing works — the credit model (1 credit = 1 asset materialization), plan tiers, overage costs, and how it compares to dbt Cloud and Cloud Composer for analytics engineering teams.
Claude Code Hooks
How hooks give Claude Code deterministic guardrails — shell commands that execute at specific lifecycle points to enforce rules, auto-format code, and block dangerous operations
Unit Testing Conversion Funnels in dbt
How to unit test funnel analysis models in dbt — step-over-step conversion rates, user drop-off tracking, and the step-skipping edge case.
dbt Macros
How dbt macros work — Jinja fundamentals, writing custom macros, using dbt_utils, dispatch patterns, and when macros help vs hurt
BigQuery Editions and Slot-Based Pricing
When to switch from on-demand to slot-based pricing, how autoscaling works, committed use discounts, and a feature comparison across BigQuery editions.
GCP Processing Engine Selection: Dataflow, Dataproc, and BigQuery
When to use Dataflow, Dataproc, Dataproc Serverless, and BigQuery SQL for data transformation on GCP — matched to team expertise and workload type, not arbitrary scale thresholds.
dbt Package Anatomy
What makes a dbt package different from a regular project — the three design principles, standard directory structure, and dbt_project.yml configuration for reusable packages.
Claude Code Slash Commands for dbt
How to create custom slash commands in Claude Code that automate repeatable dbt workflows — test generation, model documentation, and prompt validation
MCP Protocol Fundamentals
Reading map for the foundational MCP concepts — how the protocol works, what messages look like, what primitives exist, and how they fit together for data engineering.
Dagster Software-Defined Assets
The core building block of Dagster — how @dg.asset works, automatic dependency inference, the Definitions object, and how SDAs differ from traditional orchestrator primitives.
Elementary dashboard organization
How to organize Elementary dashboards and reports by domain, criticality, and refresh cadence so they stay useful as your project grows.
Attribution Analysis
A structured guide to marketing attribution — from SQL implementation patterns through multi-model comparison, dashboard design, and incrementality testing
Consent Mode Debugging Network Parameters
How to decode the gcs and gcd parameters in Google Analytics network requests to verify Consent Mode implementation without relying on CMP interfaces.
Data Quality Validation Layers
The three-layer model for data quality — proactive contracts, reactive schema tests, and anomaly detection — and why you need all three.
dbt Cross-Database Array Operations
How array syntax diverges across BigQuery, Snowflake, and Databricks — UNNEST vs LATERAL FLATTEN vs EXPLODE — and dispatch macros to handle it.
dbt Core Open-Source Fundamentals
What dbt Core is, how its CLI-driven workflow operates, the open-source ecosystem that powers it, and the technical profile of teams that choose it.
dbt Docs Site Customization Options
What you can customize in the default dbt docs site — the overview page, DAG node colors, hiding models — and where the customization options end
Consent Mode US Privacy Requirements
Why US-only sites increasingly need Consent Mode — Enhanced Conversions requirements, expanding state privacy laws, and the recommended region-specific configuration.
dbt as AI Knowledge Base
How a well-structured dbt project functions as a shared context layer that improves every AI tool in your stack — models, tests, documentation, and semantic definitions as machine-readable knowledge.
BigQuery Remote MCP Server Setup
Google's managed BigQuery MCP endpoint — enabling the service, configuring Claude Desktop and Claude Code, and why token expiration limits its usefulness.
BigQuery Row Access Policies
Dynamic row-level filtering using CREATE ROW ACCESS POLICY — replace per-segment views with policies that apply automatically based on querying user identity.
MCP Official Reference Servers
The servers maintained by the MCP Steering Group — which are actively developed, which have been handed to vendors, and why the distinction matters.
dbt Agent Skills
dbt Labs' official Markdown skill files that teach AI coding agents how to follow dbt best practices — what they cover, how they work, and what the benchmarks actually show.
Data Contract Adoption Challenges
Why data contract initiatives fail — the execution gap between contract-as-documentation and contract-as-enforcement, and the cultural change that matters more than the YAML.
Incrementality Testing for Attribution
How to validate attribution models with causal experiments — holdout tests, geo tests, and platform lift studies that measure whether a channel actually drives conversions
OpenClaw dbt Data Quality Assistant
A reading path through the building blocks of a 24/7 automated dbt data quality assistant — test execution and parsing, severity assessment, documentation cross-referencing, morning summaries, and an honest maturity assessment.
dbt Docs Performance at Scale
Why the default dbt docs site becomes unusable for large projects — the AngularJS frontend, client-side JSON parsing, and the performance ceiling that drives teams to alternatives
Slack KPI Summary Format for Agent-Delivered Reports
A practical template for agent-generated Slack KPI summaries — directional arrows, week-over-week structure, percentage points vs. percentages, and how to handle the LLM math reliability problem in the output layer.
GA4 First dbt Models Tutorial
Hub note for building your first GA4 dbt models — from understanding the raw event schema through base, intermediate, and mart layers.
GCP Auth Constraints for AI Coding Agents
How Claude Code, Codex, and Cursor each handle GCP authentication — and where each one breaks when tokens expire, contexts conflict, or interactive flows are required.
dbt Package Ecosystem Governance
Who maintains the dbt package ecosystem — dbt Labs, Fivetran, and community contributors — and how to evaluate a package's reliability before committing to it in production.
dlt and BigQuery Integration
How dlt loads data into BigQuery — the two loading strategies (streaming vs. GCS staging), the bigquery_adapter for partitioning and clustering, nested JSON normalization, and the metadata tables dlt creates.
SCD Type 2 with dbt Snapshots
How dbt snapshots implement slowly changing dimension Type 2 — tracking every version of a record over time with timestamp and check strategies, plus Fivetran History Mode as an alternative.
GA4 Reporting Identity Modes
How GA4's three reporting identity modes (Blended, Observed, Device-based) apply user resolution in the interface — and why none of that logic reaches BigQuery.
Multi-Source Conflict Resolution
Three patterns for resolving conflicting data when merging records from multiple source systems — priority-based, recency-based, and source-specific fields.
Elementary alert fatigue reduction
How to configure suppression intervals, alert grouping, and sampling controls in Elementary to keep signal-to-noise ratio high as test suites grow.
Data Comparison Tool Landscape
When to use dbt-audit-helper, Elementary, dbt-expectations, Datafold, or Soda for data comparison and validation.
OpenClaw Cron Scheduler Mechanics
How OpenClaw's built-in cron scheduler works — session modes, job persistence, exponential backoff, and the configuration patterns that make scheduled monitoring reliable.
dbt vs Dataform Templating Hub
Navigation hub for notes comparing Jinja (dbt) and JavaScript (Dataform) templating in analytics engineering — syntax, philosophy, strengths, and team fit.
Dagster Branch Deployments for dbt
How Dagster+ branch deployments create ephemeral preview environments for dbt changes on PR, with state-based selection and partitioned execution for CI/CD workflows.
GA4 Event Ordering with Batch Fields
How to use batch_event_index, batch_ordering_id, and batch_page_id for deterministic event sequencing in GA4 BigQuery exports.
Dataform Dynamic Model Generation
How Dataform's JavaScript enables programmatic DAG construction — generating dozens of models from a single loop — and what dbt teams do instead.
LLM Training Data Asymmetry for Tool Use
Why LLMs write better shell commands than MCP tool calls — the training data distribution that makes CLI fluency outperform structured tool-calling for well-established tools.
dbt-expectations Setup and Configuration
How to install and configure dbt-expectations — packages.yml, timezone variable, platform compatibility, and dependency management.
BigQuery Partitioning Mechanics
How BigQuery partitioning physically divides tables, the three partitioning types, key constraints, and when partition pruning does and doesn't work.
Choosing Between Fivetran, Airbyte, and dlt
A decision framework for picking the right ELT tool based on team skills, budget, connector needs, and tolerance for operational burden — with practitioner sentiment from the field.
BigQuery Partition Pruning Patterns
How to combine partitioning and clustering in BigQuery for maximum scan reduction, including anti-patterns that silently defeat pruning.
dbt Macro Documentation in YAML
Why _macros.yml beats inline SQL comments for documenting dbt macros, and how to write entries that developers actually use.
Cross-Platform Ad Testing Patterns
How to test unified ad reporting models in dbt — source freshness, spend reconciliation, grain testing, and the manual checks that automated tests can't replace.
dbt Package Anti-Patterns
Common mistakes in dbt packages — hardcoded schemas, missing dispatch, tight version constraints, generic model names, table defaults, and missing version bounds.
dbt Packages vs Mesh
When to use dbt packages (code sharing) vs dbt Mesh (data product sharing) — the conceptual distinction, practical differences, and how to choose.
Elementary CLI profile configuration
How to configure the Elementary CLI (edr) profile for BigQuery, Snowflake, and Databricks -- including the gotchas that differ from your dbt profile.
MetricFlow time spine
The MetricFlow time spine is a continuous date table used for cumulative metrics and time series gap filling. How to create it, configure it, and understand when it's required.
Metric Anti-Patterns in dbt
Common mistakes when defining MetricFlow metrics — one-off models for metrics, sum-of-ratios errors, hardcoded measure filters, and missing descriptions
Looker Studio Caching Mechanics
How Looker Studio's per-chart cache works, why date range selection affects cache hit rates, the difference between owner and viewer credential caches, and how to pre-warm dashboards.
Dataform Testing Limitations
Dataform's built-in assertions cover three scenarios — uniqueness, null checks, and row conditions. Everything else requires custom implementation.
Data Contracts Hub
Hub note connecting garden notes on data contracts — definitions, specifications, ownership, tooling, validation layers, and adoption challenges.
dbt Private Packages via Git
How to distribute internal dbt packages as Git dependencies — version pinning, authentication options, and trade-offs compared to Hub packages.
Google Ads DTS dbt Integration
How to model Google Ads BigQuery DTS tables in dbt — source configuration, incremental strategy for partition replacement, and conversion lookback windows.
AI Tools for dbt Documentation
A comparison of dbt Copilot, Claude Code with MCP, and Altimate AI for generating dbt model and column documentation — capabilities, limitations, and selection guidance
Salesforce Record Type Partitioning in dbt
How to handle Salesforce RecordTypeId in the warehouse — filtering by record type in base models, splitting objects into separate models, and storing IDs in dbt vars.
dbt Test Severity and Performance Tuning
How to configure dbt test severity levels, optimize expensive tests on BigQuery, and structure test execution for cost-effective data quality.
OpenClaw Reporting Assistant
A reading map for the OpenClaw client KPI reporting guide — GA4 skill integration, dashboard scraping tradeoffs, direct warehouse queries, multi-client architecture, and Slack summary formatting.
GA4 Identity Graph in BigQuery
How to build a production identity graph from GA4 BigQuery data — mapping user_id to all associated devices, detecting shared devices and anomalies, and structuring forward and reverse lookups.
Unit Testing String Extraction in dbt
How to unit test regex and string manipulation logic in dbt — edge case documentation, graceful failure handling, and regression protection for fragile parsing.
CLAUDE.md for Analytics Engineering — Hub
Hub note connecting all CLAUDE.md configuration concepts for dbt and BigQuery analytics engineering — project memory, dbt templates, BigQuery specifics, hooks, and slash commands.
BigQuery Cross-Organization Data Sharing
Patterns for sharing BigQuery data across organizations — agency/client models, Analytics Hub, authorized views, and row/column-level security.
dbt MCP Server Setup
A reading path through connecting dbt to AI assistants via MCP — choosing between local and remote modes, tool capabilities, configuration, and safety.
dbt Operational Slash Commands
Practical Claude Code slash commands for daily dbt operations — building models, generating base models, running modified code, auditing quality, and cleaning up artifacts
dbt Model Description Writing Patterns
Practical patterns for writing dbt model, column, and source descriptions that serve both business users and engineers — the three-question framework and when to use meta instead of description
CI/CD Data Quality Testing in dbt
How to integrate data quality testing into CI/CD pipelines — Slim CI with state:modified+, GitHub Actions workflows, and tools like Datafold and Recce for regression detection.
Incremental Models in dbt
How dbt incremental models work, when to use them, the available strategies, and the trade-offs you need to understand.
Expense Capture as a Habit Layer
Using natural language logging and receipt OCR to close the gap between 'I spent money' and 'that expense is recorded somewhere useful' — why capture is the real problem, not the accounting.
Claude Code Skill Description Engineering
How to write Claude Code skill descriptions that actually trigger activation — explicit keywords, negative boundaries, and the specificity principle
Dataform-to-dbt Migration Decision Criteria
When migrating from Dataform to dbt makes sense, when it doesn't, and the realistic cost-benefit calculation.
Elementary data quality dashboards
Hub for building data quality dashboards with Elementary: generating reports, hosting them for team access, building custom BI dashboards, and designing KPIs.
BigQuery Job Failure Monitoring with INFORMATION_SCHEMA
SQL patterns for monitoring BigQuery job failures and detecting cost anomalies using INFORMATION_SCHEMA.JOBS — with filtering strategies for multi-project setups.
MetricFlow Advanced Patterns
Complex metric patterns in MetricFlow — period-over-period comparisons with offset_window, filtered metrics with Jinja, and handling null gaps in time series
LinkedIn Ads Analytics Endpoint
The engineering quirks of LinkedIn's adAnalytics endpoint — no pagination, 15K element cap, 20-metric limit per request, query tunneling, cursor pagination migration, and monthly API versioning.
Unit Tests vs Data Tests in dbt
The two-checkpoint model for dbt testing — unit tests gate deployments by verifying transformation logic, data tests gate production by verifying data health.
dbt Attribution Packages Landscape
Open-source dbt packages and Python libraries for production-ready attribution models -- Snowplow, Tasman, Rittman Analytics, ChannelAttribution, and when to build your own
Dataform as a GCP Service
What Dataform is in 2026 — a fully managed BigQuery transformation service with deep GCP integration, zero licensing cost, and SQLX/JavaScript templating
BigQuery Data Lake Patterns
A reading guide for understanding BigQuery data lake architecture: table types, the medallion lakehouse pattern, catalog strategy, performance, cost optimization, and common mistakes.
dbt Core vs Cloud Hub
Hub note connecting garden notes decomposed from the dbt Core vs dbt Cloud comparison article.
BigQuery MCP Toolbox Setup
Installing and configuring Google's open-source MCP Toolbox for Databases — the self-hosted option for connecting BigQuery to AI assistants with ADC authentication.
Meta Ads Pipeline Maintenance
Operational practices for keeping a Meta Ads pipeline running — token expiry monitoring, spend reconciliation, API version lifecycle management, and circuit breaker patterns.
Google Ads Scripts for BigQuery Export
Using Google Ads Scripts to export performance data directly to BigQuery — how the authentication model works, what the execution limits are, and when this approach beats the alternatives.
AI SQL Review Tradeoffs
The practical costs of AI SQL review — false positive rates, conflicting tool feedback, CI latency, annual spend, and the configuration investment that makes it worthwhile
Semantic Layer Architecture
How semantic layers work in the modern data stack — competing implementations (MetricFlow, Snowflake Semantic Views, Databricks Metric Views), the OSI initiative, and why the semantic layer determines AI accuracy
Metric Organization in dbt Projects
How to organize semantic models and metrics in dbt — co-located vs parallel subfolder structures, the one-primary-entity rule, and scaling patterns for large projects
Semantic Layer Adoption Readiness
When to invest in a semantic layer, what barriers you'll face, and how to start small — a practical readiness assessment based on team size, tooling maturity, and organizational commitment.
dbt Documentation Scaffolding Tools
How dbt-codegen and dbt-osmosis handle the mechanical parts of documentation — generating YAML skeletons and propagating descriptions through your DAG
GA4 Ecommerce Items UNNEST Pattern
How to handle GA4's nested items array in dbt — building a separate item-level grain model with intentional Cartesian UNNEST.
CRM Modeling Patterns in dbt
How to apply the three-layer dbt architecture to Salesforce and HubSpot data — base model conventions, intermediate enrichment, mart design, and incremental strategies.
BigQuery Resource Hierarchy
How BigQuery organizes resources from organization to table level — projects as billing boundaries, datasets as access control units, and naming conventions that scale.
GA4 Flattened Events Materialization
When and how to pre-unnest GA4 events into a flat table — the cost-performance tradeoff, the CREATE TABLE pattern, and why dbt models formalize this approach.
Meta CAPI Server-Side Setup: Deduplication and Event Match Quality
How to configure Meta Conversions API via server-side GTM — event deduplication with shared event_id, user data mapping for EMQ score, and forwarding the _fbp and _fbc cookies.
Soda Data Contract Verification
How Soda's contract engine validates schema, freshness, and quality rules against warehouse tables after loading but before transformation — filling the gap between EL and dbt.
dbt observe-fix remediation pattern
How to embed self-healing logic directly in the dbt DAG by detecting problems in base models and applying fixes in downstream layers.
BigQuery SQL Patterns for Analytics Engineers
A reading guide to essential BigQuery SQL patterns covering query optimization, nested data, window functions, dbt incrementals, and marketing analytics.
dbt Fusion Package Compatibility
How the dbt Fusion engine (v2.0) affects package compatibility — version bounds, manifest format changes, the Fusion badge, and how to prepare your project and packages for migration.
Data Observability Tool Landscape
A reference comparison of data observability tools in 2026 — Elementary, Monte Carlo, Soda, Bigeye, Datafold, and Atlan — covering capabilities, pricing, and positioning.
Looker Studio Credentials and Security
The security risks of owner's credentials in public Looker Studio reports, the LeakyLooker vulnerability, cost attribution, and using service accounts for production dashboards.
Server-Side Tracking Data Quality Evidence
The quantitative case for server-side tracking — the 41% average data quality improvement, case studies from Finobo, Forward Media, and seoplus+, ad platform Conversions API adoption, and the cost-benefit calculation that has flipped.
dbt Testing Pyramid
The layered testing pyramid for dbt projects -- broad data test coverage at the base, targeted unit tests in the middle, anomaly detection and data diffs at the top.
The Chatbot → Copilot → Agent Paradigm Shift
How AI's relationship to the developer changed across three distinct eras — chatbot (demand), copilot (alongside), agent (autonomous) — and why each phase is qualitatively different, not incrementally better.
Data Architecture as Human Judgment
Why data architecture — DAG design, ownership models, temporal logic, team boundaries — resists AI automation and remains a fundamentally human discipline.
IAM Debt Audit for GCP Data Platforms
Bash and SQL queries to surface Editor roles, service accounts with keys, and shared credentials — the starting point for any GCP IAM cleanup.
Lightdash's Semantic Layer vs MetricFlow
How Lightdash's native metric layer differs from MetricFlow — simpler syntax, tighter coupling, no cross-platform API — and when the tradeoffs favor each approach.
LinkedIn Ads B2B Data Value
What makes LinkedIn Ads data uniquely valuable for B2B analytics — professional demographic pivots, the negative CTR-to-pipeline correlation, company-level impression attribution, and what metrics actually matter.
Lightdash Open Source & Self-Hosting Hub
Hub note for Lightdash self-hosting — connecting to dbt, Docker Compose setup, Kubernetes deployment, and the open-source vs paid tier tradeoffs.
Agent-First CLI Design Principles
Seven principles for building CLIs that AI agents can consume reliably — from Justin Poehnelt's design of the Google Workspace CLI, with implications for any tool targeting agent consumers.
dbt Built-In Cross-Database Macros
Reference for dbt's built-in cross-database macros in the dbt namespace — dateadd, datediff, safe_cast, concat, type helpers, and the migration path from dbt_utils.
Google DDA Silent Fallback
GA4's Data-Driven Attribution silently falls back to last-click when data thresholds aren't met -- how to detect it and why warehouse-native attribution avoids this trap
dbt Quality Morning Summary Pattern
A two-cycle design for automated dbt quality reporting — daily morning summaries with Slack threading and follow-up capability, plus a weekly digest that surfaces patterns individual days miss.
dbt Features Without a Dataform Equivalent
The dbt capabilities that simply don't exist in Dataform — snapshots, the package ecosystem, microbatch incremental strategy, and Slim CI. These are the blockers that stall dbt-to-Dataform migrations.
Dagster Asset Checks from dbt Tests
How Dagster automatically converts dbt tests into asset checks since version 1.7 -- severity mapping, health badges, and what this means for unified data quality monitoring.
Lightdash + dbt YAML: Metrics Reference Hub
Hub note for Lightdash metric configuration in dbt YAML — dimensions, metric types, joins, and scaling organization.
Pipeline retry and catch-up patterns
How to configure retries, exponential backoff, and catch-up mechanisms in data pipelines so that transient failures resolve themselves without human intervention.
Cloud Run Jobs for dbt
Why Cloud Run Jobs is the optimal dbt execution environment for most GCP teams — capabilities, container setup, authentication, monitoring, and cost profile.
dbt Groups and Access Modifiers
How dbt groups and access modifiers (private, protected, public) organize model ownership and enforce boundaries — and why they're worth using even in single projects.
dbt Orchestration Decision Framework for GCP
A decision framework for choosing between Cloud Run Jobs, Cloud Workflows, and Cloud Composer for dbt orchestration on GCP — based on actual requirements, not arbitrary complexity thresholds.
Automating dbt Docs Deployment
Patterns for keeping dbt docs automatically updated — CI/CD workflows, Astronomer Cosmos operators, and tools that push documentation to platforms like Notion
dbt Unit Testing Implementation
Hub note for implementing dbt unit tests — from YAML syntax and mocking patterns to BigQuery workarounds and CI/CD integration.
Claude Code CLI Basics
Installation, essential CLI flags, built-in slash commands, and how to read Claude Code's output — the practical starting point for new users
dbt Project Structure and Naming
How to organize a dbt project — folder structure, model naming conventions, layer responsibilities, and dbt_project.yml configuration patterns
Silent SQL Errors in AI-Generated Code
Why AI-generated SQL that compiles and runs is more dangerous than SQL that fails — the 3% warning rate, temporal filter inconsistencies, and the review practices that catch what linters miss
dbt Model Versioning
How dbt model versions work — breaking vs non-breaking changes, the state:modified selector, version integers, deprecation dates, and the friction points.
BigQuery Table Types
Native BigQuery tables, BigLake external tables, and BigLake Iceberg tables — what each optimizes for, when to use them, and a decision framework for choosing.
Lightdash Dimension Configuration in dbt YAML
How Lightdash turns dbt column definitions into dimensions — types, display properties, time intervals, and computed additional_dimensions.
LinkedIn Ads Pipeline — Hub
Salesforce Polymorphic Relationship Resolution
How to resolve Salesforce's WhoId and WhatId polymorphic foreign keys in the warehouse using ID prefix routing — the pattern, the SQL, and where it recurs.
iOS 14.5 Signal Loss and Meta Measurement
How Apple's App Tracking Transparency changed Meta ad measurement — IDFA collapse, default attribution window changes, Aggregated Event Measurement, and Conversions API as the response.
MetricFlow setup hub
Hub note connecting garden notes extracted from the MetricFlow getting started tutorial: installation, semantic model components, time spine, metric types, CLI querying, and organization.
dbt Documentation Rollout Strategy
A practical week-by-week approach to rolling out dbt documentation standards — starting with model descriptions, adding enforcement incrementally, and using AI tools to close coverage gaps
dbt documentation automation strategy
A graduated approach to automating dbt documentation freshness — from a single pre-commit hook to comprehensive drift detection, coverage tracking, and AI remediation
Context Engineering for Data Pipelines
How the value in data engineering is shifting from writing code to structuring context — the emerging discipline of context engineering, the ETL-to-ECL reframe, and the skills pipeline risk.
dbt-to-Dataform Migration Process
The step-by-step process for migrating a dbt project to Dataform — auditing what you have, running the automated tool, converting macros to JavaScript includes, recreating tests as assertions, and setting up orchestration.
CLI vs MCP for AI Agents
The practical tradeoffs between CLI commands and MCP tool calls for AI agent workflows — benchmark data, token efficiency, and when each approach wins.
dbt Attribution Comparison Pattern
How to structure a dbt project for multi-model attribution — running first-touch, last-touch, linear, position-based, and time-decay models in parallel with a union comparison layer
BigQuery Slots and Reservations
A reading guide to BigQuery's compute model -- slots, reservations, editions, autoscaling, fair scheduling, and slot management for dbt workflows.
SQL Attribution Patterns
SQL implementation patterns for marketing attribution — first-touch, last-touch, linear, position-based, time-decay, and algorithmic models
The Context Gap in AI Data Engineering
Why business context — what 'Status' means, whether 'Amount' is net or gross, tacit SAP knowledge — is the core limitation of AI in data engineering.
OpenClaw GA4 Skill Integration
How to use community GA4 skills from ClawHub to pull analytics metrics into OpenClaw — the two main options, what each extracts, and how to feed the output into scheduled reporting.
Rule-Based Lead Scoring in dbt
How to build a configurable weighted lead scoring model in dbt using vars, seed files, and Jinja macros — so marketing can adjust weights without touching SQL.
dbt Unit Test Edge Case Patterns
Three essential edge case patterns for dbt unit tests — null handling, empty tables with format: sql, and date boundary testing.
dlt Pagination Patterns
The built-in paginators dlt provides for common API patterns, and how to extend BasePaginator for APIs that don't follow standard conventions.
n8n RSS-to-Notion Workflow
How to build an automated RSS reader that fetches, cleans, and stores articles in Notion using n8n, Jina AI, and ChatGPT.
dbt Cross-Database Macros
Hub for writing dbt macros that work across BigQuery, Snowflake, and Databricks — dialect differences, dispatch configuration, built-in macros, and array operations.
AI Judgment Failures in dbt Development
The category of mistakes AI makes in dbt projects that aren't syntax errors — wrong joins, rebuilt existing assets, wrong layer sourcing — and why they require business context that no prompt can fully provide.
BigQuery Cost Model
How BigQuery pricing works across on-demand and editions models — bytes billed, slot hours, storage costs, and optimization levers
AI-Powered dbt Documentation
A reading path through automating dbt documentation — from scaffolding tools to AI generation, business context enrichment, and CI enforcement
Snowflake Cost Monitoring with Warehouse History
SQL patterns for Snowflake cost monitoring using QUERY_HISTORY and WAREHOUSE_METERING_HISTORY — daily cost summaries, per-warehouse breakdowns, and translating credits into dollars for non-technical stakeholders.
dbt Documentation Freshness
A reading path through keeping dbt documentation accurate as your project evolves — from the case for automation to drift detection, coverage tracking, and a graduated rollout strategy
BigQuery Fair Scheduling
How BigQuery distributes slots among competing queries -- the two-level fair scheduling algorithm, its project-level implications, and why project architecture matters for performance.
dlt for AI-Assisted Pipeline Development
Why dlt's Python-native, declarative design maps well to AI-assisted development — the REST API builder, BigQuery-specific features, LLM-friendly docs, and production results
BI Tool Migration and Portability
Switching costs between BI tools depend on where your metric definitions live. LookML is proprietary and expensive to migrate away from. dbt YAML and Metabase's per-question definitions are more portable.
2-Layer RBAC with Google Groups
Bind IAM roles to Google Groups representing job functions, not individual users — the pattern that makes onboarding, offboarding, and permission audits tractable.
MCP Server Testing and Debugging
Testing MCP servers with the Inspector, the stderr logging gotcha that bites everyone, and a practical three-stage testing workflow.
Advanced Claude Code Workflows for dbt
A reading path through Claude Code configuration, testing, documentation, and debugging workflows for analytics engineers working with dbt on BigQuery
Entity-Centric Naming for dbt Intermediate Models
Why intermediate models should be named for the entity they represent, not the transformation they perform — and the self-documenting join notation that makes it work.
dbt-audit-helper Macro Reference
Reference for every dbt-audit-helper macro — parameters, output format, platform support, and practical usage notes.
dbt Package Development Hub
A hub connecting all notes on building, testing, and publishing dbt packages — from project anatomy to CI/CD to Hub distribution.
GTM Server-Side: Architecture and Four Building Blocks
How GTM Server-Side works as an intermediary layer — the request/response data flow, and the four component types (Clients, Tags, Triggers, Variables/Transformations) that make it up.
Cursor for dbt Development
How Cursor works as the IDE layer for dbt projects — strengths with dbt Power User, limitations for multi-file work, and where it fits alongside Claude Code
AI-Generated SQL Failure Modes
Why AI-generated SQL is dangerous — it runs without errors but returns wrong results. Research on temporal filter inconsistencies, join failures, and the confidence problem.
Elementary report hosting
How to host Elementary HTML reports on S3, GCS, or Azure Blob Storage so the whole team has access, and how to automate report generation in CI pipelines.
OpenClaw Persistent Memory for dbt Context
How to load dbt project documentation, schema descriptions, and failure history into OpenClaw's persistent memory so that monitoring reports include business context rather than just technical output.
Consent Mode Impact on Identity Resolution
How GA4 Consent Mode V2 changes what identity data reaches BigQuery — cookieless pings without identifiers, the same-page backstitch nuance, and filtering consented data for stitching pipelines.
Claude Code Stop and Session Hooks
How Stop and SessionStart hooks complement per-tool hooks — running quality gates after Claude finishes responding and loading project context at session start
MCP Data Quality Server Pattern
A practical MCP server pattern for data quality — running validation checks, retrieving quality scores, and surfacing tables that need attention.
Advertising Data in the Warehouse
Hub note for the complete guide to centralizing advertising data — from the measurement problem through extraction, pipeline challenges, and dbt transformation patterns
Orchestrator Developer Experience Comparison
Local development, testing patterns, and CI/CD workflows across Dagster, Airflow, and Prefect — where the day-to-day friction lives.
Visualization MCP Server Ecosystem
The available MCP servers for generating charts and interactive visualizations — AntV, Vega-Lite, DuckDB-Plotly, and how to pick between them.
dbt-utils v1.0 Migration: What Moved to dbt-core
The complete list of macros that moved from dbt-utils to the dbt namespace at v1.0, what was removed entirely, and how to migrate an existing project.
dbt Unit Test CI/CD Workflow
A production-ready GitHub Actions workflow for running dbt unit tests on BigQuery — unique CI datasets, the --empty flag, cost optimization, and production exclusion.
Dagster Learning Curve for Analytics Engineers
Where the friction shows up when analytics engineers adopt Dagster — Python proficiency, conceptual overhead, manifest management, pricing surprises, and the best onboarding path.
AI Limitations in Data Engineering
A reading path through the five core limitations of AI in data engineering — SQL failure modes, the context gap, architectural judgment, the production gap, and context engineering as the response.
Star Schema vs One Big Table
When to use entity-separated star schema vs wide denormalized tables in your data warehouse — BigQuery performance characteristics, OBT benchmarks, and the practical answer of building both.
dbt MCP Server Tool Reference
Complete reference for the 20+ tools exposed by the dbt MCP server — CLI commands, metadata discovery, Semantic Layer queries, and job management.
GA4 Channel Grouping Macro
A dbt macro that encapsulates Google's default channel grouping logic as reusable SQL, with the regex patterns and edge cases you need to know.
BigQuery Partitioning Configuration Patterns
Domain-specific partitioning and clustering configurations for BigQuery in dbt -- event data, marketing, multi-tenant SaaS, and IoT patterns with rationale.
Context Window Compaction and Agent Safety
How LLM context window compaction causes AI agents to lose or deprioritize stop commands during long-running tasks — and why bulk data operations are the highest-risk scenario.
GA4 User Identity
Map of content for GA4 identity resolution in BigQuery — from understanding the two identifier types through stitching techniques, production pipelines, and ongoing monitoring.
dlt Deployment Options
Where and how to run dlt pipelines in production — GitHub Actions, Airflow, Modal serverless, and other platforms — with the dlt deploy command as the starting point.
Unit Testing Attribution Models in dbt
How to unit test first-touch, last-touch, and multi-touch attribution in dbt — multi-session journeys, single-touch conversions, and the no-conversion exclusion pattern.
AI Agent Data Quality: What Works Today vs. What's Aspirational
An honest assessment of which AI agent capabilities for dbt data quality are production-ready, which require significant work but are achievable, and which are still too unreliable to depend on.
Data Observability Scaling Thresholds
Team size and technical complexity thresholds that determine when to move from dbt tests to OSS observability to paid platforms.
Orchestration Market Landscape in 2026
Where each major data orchestrator stands in 2026 — Airflow's scale, Dagster's dbt dominance, Prefect's developer velocity, Kestra's rapid rise, and the tools in decline.
GA4 Window Function Pitfalls
Three window function traps specific to GA4 sessionization: the LAST_VALUE framing trap, IGNORE NULLS for sparse event data, and MAX for session-scoped boolean flags.
CLAUDE.md for dbt Projects
A concrete CLAUDE.md template for dbt projects — what to include, what to leave out, and why the file should be grown reactively from real mistakes rather than written upfront.
GA4 Unnesting Patterns Hub
Hub connecting all concepts for extracting data from GA4's nested BigQuery schema — UNNEST approaches, JOIN types, engagement recipes, e-commerce funnels, and dbt architecture.
Testing Late-Arriving Data Handling in dbt
How to write dbt unit tests that simulate late arrivals, and how to use audit_helper to detect drift between incremental and full-refresh results in production.
Consent Mode Implementation Mechanics
The technical implementation of Consent Mode v2: default state configuration, CMP integration, GTM trigger ordering, and the wait_for_update race condition.
First-Party Data and Compliance Hub
Hub connecting the browser restrictions, server-side infrastructure, EU/US legal frameworks, and identity resolution approaches that together determine how much advertising and analytics signal you can legally collect in 2026.
dbt Mart Layer Patterns
What belongs in dbt mart models — reporting aggregations, activation exports, ML feature tables — and the principle that every mart serves a specific consumer.
Service Account Key Files vs Impersonation Tokens
The practical tradeoff between GCP service account key files and short-lived impersonation tokens — when each is appropriate and what the honest security calculus looks like for consultants.
GTM Server-Side Hosting on AWS
How to host the GTM Server-Side tagging container on AWS using ECS Fargate, why App Runner costs more, and why Lambda is architecturally incompatible.
OpenClaw Morning Briefing Pattern
How to configure an OpenClaw cron job to deliver a daily personal briefing — covering calendar, email priority, pipeline status, and time tracking — to Telegram before your first coffee.
KPI Reporting via Direct Warehouse Queries
Why querying the warehouse directly beats dashboard scraping for scheduled KPI delivery — the BigQuery and Snowflake CLI patterns, how to structure pre-written SQL for agent-driven reporting, and the tradeoffs of the approach.
Let’s talk.
Tell me what’s broken. I’ll reply within two working days with whether I can help — and if I can’t, I’ll point you somewhere useful.
Get in touch →