Migrating Databricks Pipelines from Hive to Unity Catalog

This migration came up as part of a larger retail ETL platform that was already running in production.

At the time, most tables lived in the Hive metastore, while the actual data files were spread across ADLS paths. Functionally, things worked — but governance and consistency were becoming harder to manage.

Unity Catalog offered a cleaner way to bring tables, storage, and permissions under a single, governed model. The goal wasn’t to fix a broken pipeline, but to put the platform on a more solid footing long term.

Why a Full Migration

A phased approach wasn’t really an option.

UC-enabled clusters don’t work cleanly with the legacy Hive metastore, and Hive-based clusters can’t interact with UC catalogs. Mixing both models quickly becomes messy, especially for shared pipelines.

Because of that, we chose to migrate everything in one go and switch clusters based on configuration. The pipeline itself was already modular, which made this feasible.

What Actually Changed

Since the pipelines were already stable on Hive, this wasn’t about rewriting logic from scratch.

Most of the work fell into a few clear areas:

Removing RDD usage wherever it still existed
Updating table references to include catalog and schema names
Routing all ADLS paths through shared constants
Updating cluster settings to be UC-compatible

Because base paths and database names were already configurable, these changes were mostly mechanical. The UC-compatible code also continued to work on Hive, which helped with backward compatibility and client-specific requirements.

Where Things Got Interesting

The real surprises showed up during testing.

Some jobs that ran perfectly fine on Hive suddenly took hours on Unity Catalog — especially parts of the pipeline that involved loops or iterative processing.

Nothing obvious was “wrong” with the code. But execution patterns that were acceptable on Hive didn’t translate cleanly under UC.

This forced us to take a closer look at parts of the pipeline we had previously taken for granted and rethink how certain workloads were structured.

Takeaways

A few things stood out after the migration:

Unity Catalog migrations aren’t just metadata changes — performance characteristics can change
Legacy patterns like RDD usage become much more visible under UC
Config-driven pipelines make platform-level changes far easier
Testing on UC needs to be taken seriously, even for unchanged logic

Overall, the migration reinforced a familiar lesson: platform upgrades are rarely just upgrades. Even when the code stays mostly the same, the system around it doesn’t.

Aditya Goyal

Migrating Databricks Pipelines from Hive to Unity Catalog

Why a Full Migration

What Actually Changed

Where Things Got Interesting

Takeaways

Related Links

Phone

Email

Follow Me