Migrating Databricks Pipelines from Hive to Unity Catalog
This migration came up as part of a larger retail ETL platform that was already running in production.
At the time, most tables lived in the Hive metastore, while the actual data files were spread across ADLS paths. Functionally, things worked — but governance and consistency were becoming harder to manage.
Unity Catalog offered a cleaner way to bring tables, storage, and permissions under a single, governed model. The goal wasn’t to fix a broken pipeline, but to put the platform on a more solid footing long term.
Why a Full Migration
A phased approach wasn’t really an option.
UC-enabled clusters don’t work cleanly with the legacy Hive metastore, and Hive-based clusters can’t interact with UC catalogs. Mixing both models quickly becomes messy, especially for shared pipelines.
Because of that, we chose to migrate everything in one go and switch clusters based on configuration. The pipeline itself was already modular, which made this feasible.
What Actually Changed
Since the pipelines were already stable on Hive, this wasn’t about rewriting logic from scratch.
Most of the work fell into a few clear areas:
-
Removing RDD usage wherever it still existed
-
Updating table references to include catalog and schema names
-
Routing all ADLS paths through shared constants
-
Updating cluster settings to be UC-compatible
​
Because base paths and database names were already configurable, these changes were mostly mechanical. The UC-compatible code also continued to work on Hive, which helped with backward compatibility and client-specific requirements.
Where Things Got Interesting
The real surprises showed up during testing.
Some jobs that ran perfectly fine on Hive suddenly took hours on Unity Catalog — especially parts of the pipeline that involved loops or iterative processing.
Nothing obvious was “wrong” with the code. But execution patterns that were acceptable on Hive didn’t translate cleanly under UC.
This forced us to take a closer look at parts of the pipeline we had previously taken for granted and rethink how certain workloads were structured.
Takeaways
A few things stood out after the migration:
-
Unity Catalog migrations aren’t just metadata changes — performance characteristics can change
-
Legacy patterns like RDD usage become much more visible under UC
-
Config-driven pipelines make platform-level changes far easier
-
Testing on UC needs to be taken seriously, even for unchanged logic
Overall, the migration reinforced a familiar lesson: platform upgrades are rarely just upgrades. Even when the code stays mostly the same, the system around it doesn’t.