Databricks Unity Catalog: Complete Guide to Data Governance
Unity Catalog is Databricks' unified governance solution for data and AI assets across the lakehouse. It centralizes access control, audit, lineage, and discovery for tables, views, files, ML models, and AI assets — across workspaces, regions, and clouds. Originally launched in 2022 and substantially expanded with open-source contributions in 2024–2026, Unity Catalog has become the standard governance plane for organizations running on Databricks at scale.
This guide explains what Unity Catalog is, how its model works, and where it fits alongside the broader enterprise governance estate.
Unity Catalog is Databricks' centralized governance layer. It uses a three-level namespace — catalog.schema.table — to organize all data and AI assets, applies fine-grained access control via standard ANSI SQL GRANT/REVOKE, captures column-level lineage automatically, audits every action, and powers cross-workspace discovery. Since 2024 it is also available as an open-source project, with growing ecosystem support beyond Databricks.
What Is Unity Catalog?
Before Unity Catalog, governance on Databricks meant duplicating ACLs across the legacy Hive metastore in every workspace, managing storage credentials manually, and assembling lineage from logs after the fact. Each workspace was its own island.
Unity Catalog replaces that fragmented model with a single account-level service that:
- Holds metadata for every catalog, schema, table, view, function, model, and volume.
- Enforces a unified permissions model with standard SQL grants.
- Captures lineage automatically as queries run.
- Logs every read, write, schema change, and grant for audit.
- Surfaces all governed assets through a searchable Catalog Explorer.
Critically, Unity Catalog is not a side service bolted onto Databricks — it is the default governance layer for any new Databricks workspace and the only path to features like Delta Sharing, Lakehouse Federation, and Databricks Marketplace.
Core Concepts
Unity Catalog organizes governance around several primitives:
- Metastore — the top-level container, scoped to one cloud region per Databricks account. A metastore aggregates all governed assets in that region. An organization typically operates one or a small number of metastores.
- Catalog — the first level of the namespace. Catalogs typically map to a domain, environment, or business unit (
finance,marketing,prod,dev). - Schema — the second level (formerly called database in Spark SQL terminology). Schemas group related tables and views.
- Table — managed or external Delta tables, plus tables in other formats (Parquet, CSV, Iceberg via UniForm).
- View — saved SQL queries, including dynamic views with row- and column-level filtering.
- Volume — governed storage location for non-tabular files (PDFs, images, audio, ML model artifacts, raw data files).
- Function — registered SQL or Python user-defined functions.
- Model — registered MLflow models, governed alongside the data they consume and produce.
- Storage credential + External location — Unity Catalog–managed authentication to external cloud storage. Storage credentials never need to be embedded in user code.
Three-Level Namespace
Unity Catalog's signature feature is the three-level namespace: every object is addressed as catalog.schema.object. This is a deliberate departure from the legacy Spark two-level (database.table) model.
The third level matters because catalogs are administratively meaningful. They map to teams, environments, or domains, and they are the natural unit for top-level governance decisions: who owns this data, what region is it in, what permission scope applies?
Some example references:
finance.raw.invoices— raw invoice data in the finance catalogmarketing.gold.attribution_daily— production attribution tablemain.mlmodels.churn_v3— registered ML model
Access Control
Unity Catalog uses standard ANSI SQL grants. The same syntax works across SQL Editor, notebooks, jobs, and the API:
GRANT SELECT ON finance.gold.revenue_daily TO `data-analysts`;
GRANT MODIFY ON marketing.events.campaigns TO `marketing-engineers`;
GRANT USE CATALOG ON finance TO `finance-domain`;Access can be granted at any level — metastore, catalog, schema, table, column. Privileges propagate down: a grant on a catalog applies to all its schemas and tables. Common privileges include SELECT, MODIFY, USE CATALOG, USE SCHEMA, EXECUTE (on functions), and READ VOLUME / WRITE VOLUME.
Row- and Column-Level Security
Beyond table-level grants, Unity Catalog enforces row filters and column masks via SQL functions. A row filter is a function that returns a boolean; a column mask is a function that transforms a value. Both attach to a table and are evaluated automatically on every query — including queries from BI tools.
CREATE FUNCTION mask_pii(email STRING)
RETURN CASE WHEN is_member('analysts') THEN email
ELSE 'REDACTED' END;
ALTER TABLE finance.raw.customers
SET COLUMN MASK email USING mask_pii(email); Identity Federation
Unity Catalog supports identity federation with Microsoft Entra ID, Okta, and other SAML/SCIM providers. Users and groups defined in the IdP map directly to Unity Catalog principals — no shadow user lists.
Data Lineage
Unity Catalog captures column-level lineage automatically for every query that runs through the platform — Spark SQL, notebooks, jobs, dashboards, ML model training. There is nothing to enable: as long as the table is governed by Unity Catalog, lineage is recorded.
Lineage covers tables, views, dashboards, ML models, and (since 2024) AI workflows. The Catalog Explorer surfaces lineage as an interactive graph; the same lineage is queryable through the system tables (system.access.table_lineage and system.access.column_lineage) for custom analysis. For broader context on how lineage works, see data lineage and column-level lineage.
Unity Catalog lineage stops at the platform boundary. It captures every operation that happens inside Databricks — but not what happens before data arrives (extract jobs, source systems) or after it leaves (downstream BI, operational systems). For end-to-end enterprise lineage, Unity Catalog typically pairs with a broader catalog like Microsoft Purview, Collibra, or Dawiso.
Audit and Discovery
Every action against a Unity Catalog object is logged: queries, schema changes, grants, deletes. Audit logs are surfaced in system.access.audit, which can be queried directly or shipped to a SIEM via cloud-native log delivery (CloudWatch, Azure Monitor, Stackdriver).
The Catalog Explorer is the discovery UI: a searchable, filterable browser over every governed asset, with descriptions, tags, lineage, owners, and previews. Tags can be applied at any level and propagate to children, supporting common governance patterns like pii=true or domain=finance classifications.
Volumes for Unstructured Data
Volumes extend Unity Catalog governance to non-tabular files: documents, images, audio, video, ML model artifacts, raw payloads. Volumes are mounted at known paths (/Volumes/<catalog>/<schema>/<volume>/...) and accessed through the same SQL grant system that controls tables.
This matters most for AI workloads. Most enterprise AI requires both structured tables and unstructured documents — and before volumes, governing those two cleanly required two different systems. With volumes, a RAG pipeline can read PDFs, embed them, and write the resulting vectors back to a table, all under one consistent permission model.
Unity Catalog Open Source
In June 2024, Databricks open-sourced Unity Catalog under the Linux Foundation. The OSS project is governance-API-compatible with the managed Databricks service, with a growing ecosystem of integrations including Apache Spark, Trino, Apache Iceberg, DuckDB, and others. The OSS project is not a drop-in replacement for the Databricks-managed service — features like managed lineage and the Catalog Explorer UI ship only with the commercial product — but the open core ensures Unity Catalog metadata can be read and respected outside Databricks.
How Dawiso Complements Unity Catalog
Unity Catalog governs the lakehouse extremely well. It does not, by itself, govern the rest of the enterprise: the operational systems that feed it, the BI tools that consume it, the business glossary that defines what "revenue" means, and the human stewardship workflows that decide who owns what.
Dawiso integrates with Unity Catalog through its metadata API, ingesting catalog structure, lineage, classifications, and tags — and then enriches that picture with business context Unity Catalog does not capture: business glossary definitions, stewardship assignments, data quality rules, and end-to-end lineage that crosses Databricks, source systems, and downstream BI. The result is a single governance picture: technical depth from Unity Catalog, business semantics and human workflow from Dawiso.