Building a Layered Testing Strategy for a Production ETL Pipeline

While working on a production Databricks ETL pipeline, the project reached a point where structured testing became necessary.

The challenge wasn’t whether to add tests, but figuring out the right way to test a platform-heavy pipeline while keeping feedback fast and changes safe.

Starting with Nutter Tests

The first testing framework we introduced was Nutter.

Initially, Nutter was used almost like unit testing:

testing individual functions
validating small pieces of ETL logic
running directly on Databricks clusters

This worked well functionally, but there was a downside.

Because Nutter runs via notebook submit jobs on Databricks clusters, execution was significantly slower than traditional unit test frameworks. As the test suite grew, feedback loops became longer than we were comfortable with.

Introducing Unit Tests for Speed

To address this, we added traditional unit tests (UTCs) for pure Python logic.

Much faster execution
Easy to run locally and in CI
Enabled use of coverage tooling

This became the default way to test:

transformation logic
helper functions
edge cases and regressions

However, unit tests have limits in a Databricks environment.

Anything involving:

catalog reads/writes
Spark sessions
dbutils

would require heavy mocking, reducing confidence.

Repositioning Nutter as Integration Testing

Instead of removing Nutter, we reframed its purpose.

Nutter was kept specifically for:

functions that rely on Databricks-native behavior
reading from or writing to catalogs
interactions with Spark and dbutils

In this setup:

unit tests handled speed and coverage
Nutter acted as integration tests, validating real platform behavior

This separation made both test types more effective.

Testing ETL Steps Individually

As the pipeline evolved, we introduced another layer of Nutter tests focused on ETL steps, not functions.

Each ETL step was tested independently
Different use cases and edge conditions were covered
Failures were easier to isolate

This acted like “unit testing” for the pipeline structure itself, without running the full flow.

End-to-End Tests and Value Validation

Finally, we added end-to-end (E2E) tests.

These tests:

created a fresh ETL pipeline daily
ran using a masked version of real client data
executed the full flow from ingestion to final outputs

The outputs were compared against a baseline.

This served two purposes:

sanity-checking pipeline correctness
detecting unexpected changes in output values

Making Impact Visible

One unexpected benefit of the daily E2E runs was impact awareness.

Instead of just knowing that values changed, we could see:

which output metrics were affected
how broadly a change propagated

This made it easier to:

communicate changes to downstream users
set expectations around new features
avoid surprises in reports already in use

In practice, this shifted testing from “did we break something?” to “who will feel this change?”

Takeaways

Different tests exist for different reasons — no single framework fits all
Speed matters as much as correctness during development
Integration tests are essential in platform-heavy environments like Databricks
End-to-end tests are most valuable when they highlight impact, not just failures
Good testing improves trust, not just stability

Aditya Goyal

Building a Layered Testing Strategy for a Production ETL Pipeline

Starting with Nutter Tests

Introducing Unit Tests for Speed

Repositioning Nutter as Integration Testing

Testing ETL Steps Individually

End-to-End Tests and Value Validation

Making Impact Visible

Takeaways

Related Links

Phone

Email

Follow Me