Skip to main content
unity catalogdatabricks governancedata catalogdata access controldatabricks lineage

Databricks Unity Catalog: Complete Guide to Data Governance

Unity Catalog is Databricks' unified governance solution for data and AI assets across the lakehouse. It centralizes access control, audit, lineage, and discovery for tables, views, files, ML models, and AI assets — across workspaces, regions, and clouds. Originally launched in 2022 and substantially expanded with open-source contributions in 2024–2026, Unity Catalog has become the standard governance plane for organizations running on Databricks at scale.

This guide explains what Unity Catalog is, how its model works, and where it fits alongside the broader enterprise governance estate.

TL;DR

Unity Catalog is Databricks' centralized governance layer. It uses a three-level namespacecatalog.schema.table — to organize all data and AI assets, applies fine-grained access control via standard ANSI SQL GRANT/REVOKE, captures column-level lineage automatically, audits every action, and powers cross-workspace discovery. Since 2024 it is also available as an open-source project, with growing ecosystem support beyond Databricks.

What Is Unity Catalog?

Before Unity Catalog, governance on Databricks meant duplicating ACLs across the legacy Hive metastore in every workspace, managing storage credentials manually, and assembling lineage from logs after the fact. Each workspace was its own island.

Unity Catalog replaces that fragmented model with a single account-level service that:

  • Holds metadata for every catalog, schema, table, view, function, model, and volume.
  • Enforces a unified permissions model with standard SQL grants.
  • Captures lineage automatically as queries run.
  • Logs every read, write, schema change, and grant for audit.
  • Surfaces all governed assets through a searchable Catalog Explorer.

Critically, Unity Catalog is not a side service bolted onto Databricks — it is the default governance layer for any new Databricks workspace and the only path to features like Delta Sharing, Lakehouse Federation, and Databricks Marketplace.

Core Concepts

Unity Catalog organizes governance around several primitives:

  • Metastore — the top-level container, scoped to one cloud region per Databricks account. A metastore aggregates all governed assets in that region. An organization typically operates one or a small number of metastores.
  • Catalog — the first level of the namespace. Catalogs typically map to a domain, environment, or business unit (finance, marketing, prod, dev).
  • Schema — the second level (formerly called database in Spark SQL terminology). Schemas group related tables and views.
  • Table — managed or external Delta tables, plus tables in other formats (Parquet, CSV, Iceberg via UniForm).
  • View — saved SQL queries, including dynamic views with row- and column-level filtering.
  • Volume — governed storage location for non-tabular files (PDFs, images, audio, ML model artifacts, raw data files).
  • Function — registered SQL or Python user-defined functions.
  • Model — registered MLflow models, governed alongside the data they consume and produce.
  • Storage credential + External location — Unity Catalog–managed authentication to external cloud storage. Storage credentials never need to be embedded in user code.

Three-Level Namespace

Unity Catalog's signature feature is the three-level namespace: every object is addressed as catalog.schema.object. This is a deliberate departure from the legacy Spark two-level (database.table) model.

Unity Catalog — Three-Level Namespace UNITY CATALOG — THREE-LEVEL NAMESPACE METASTORE — region-scoped CATALOG: finance SCHEMA: finance.raw tbl: invoices tbl: payments vol: receipts/ view: q3_kpis function: convert_currency() SCHEMA: finance.gold tbl: revenue_daily tbl: forecast_q4 model: ml.forecast_revenue CATALOG: marketing SCHEMA: marketing.events tbl: campaigns tbl: leads tbl: web_clicks view: mql_funnel SCHEMA: marketing.attribution tbl: touchpoints model: attribution_ml function: weighted_credit() Table View Volume Model Function
Click to enlarge

The third level matters because catalogs are administratively meaningful. They map to teams, environments, or domains, and they are the natural unit for top-level governance decisions: who owns this data, what region is it in, what permission scope applies?

Some example references:

  • finance.raw.invoices — raw invoice data in the finance catalog
  • marketing.gold.attribution_daily — production attribution table
  • main.mlmodels.churn_v3 — registered ML model

Access Control

Unity Catalog uses standard ANSI SQL grants. The same syntax works across SQL Editor, notebooks, jobs, and the API:

GRANT SELECT ON finance.gold.revenue_daily TO `data-analysts`;
GRANT MODIFY ON marketing.events.campaigns TO `marketing-engineers`;
GRANT USE CATALOG ON finance TO `finance-domain`;

Access can be granted at any level — metastore, catalog, schema, table, column. Privileges propagate down: a grant on a catalog applies to all its schemas and tables. Common privileges include SELECT, MODIFY, USE CATALOG, USE SCHEMA, EXECUTE (on functions), and READ VOLUME / WRITE VOLUME.

Row- and Column-Level Security

Beyond table-level grants, Unity Catalog enforces row filters and column masks via SQL functions. A row filter is a function that returns a boolean; a column mask is a function that transforms a value. Both attach to a table and are evaluated automatically on every query — including queries from BI tools.

CREATE FUNCTION mask_pii(email STRING)
RETURN CASE WHEN is_member('analysts') THEN email
            ELSE 'REDACTED' END;

ALTER TABLE finance.raw.customers
SET COLUMN MASK email USING mask_pii(email);

Identity Federation

Unity Catalog supports identity federation with Microsoft Entra ID, Okta, and other SAML/SCIM providers. Users and groups defined in the IdP map directly to Unity Catalog principals — no shadow user lists.

Data Lineage

Unity Catalog captures column-level lineage automatically for every query that runs through the platform — Spark SQL, notebooks, jobs, dashboards, ML model training. There is nothing to enable: as long as the table is governed by Unity Catalog, lineage is recorded.

Lineage covers tables, views, dashboards, ML models, and (since 2024) AI workflows. The Catalog Explorer surfaces lineage as an interactive graph; the same lineage is queryable through the system tables (system.access.table_lineage and system.access.column_lineage) for custom analysis. For broader context on how lineage works, see data lineage and column-level lineage.

Unity Catalog lineage stops at the platform boundary. It captures every operation that happens inside Databricks — but not what happens before data arrives (extract jobs, source systems) or after it leaves (downstream BI, operational systems). For end-to-end enterprise lineage, Unity Catalog typically pairs with a broader catalog like Microsoft Purview, Collibra, or Dawiso.

Audit and Discovery

Every action against a Unity Catalog object is logged: queries, schema changes, grants, deletes. Audit logs are surfaced in system.access.audit, which can be queried directly or shipped to a SIEM via cloud-native log delivery (CloudWatch, Azure Monitor, Stackdriver).

The Catalog Explorer is the discovery UI: a searchable, filterable browser over every governed asset, with descriptions, tags, lineage, owners, and previews. Tags can be applied at any level and propagate to children, supporting common governance patterns like pii=true or domain=finance classifications.

Volumes for Unstructured Data

Volumes extend Unity Catalog governance to non-tabular files: documents, images, audio, video, ML model artifacts, raw payloads. Volumes are mounted at known paths (/Volumes/<catalog>/<schema>/<volume>/...) and accessed through the same SQL grant system that controls tables.

This matters most for AI workloads. Most enterprise AI requires both structured tables and unstructured documents — and before volumes, governing those two cleanly required two different systems. With volumes, a RAG pipeline can read PDFs, embed them, and write the resulting vectors back to a table, all under one consistent permission model.

Unity Catalog Open Source

In June 2024, Databricks open-sourced Unity Catalog under the Linux Foundation. The OSS project is governance-API-compatible with the managed Databricks service, with a growing ecosystem of integrations including Apache Spark, Trino, Apache Iceberg, DuckDB, and others. The OSS project is not a drop-in replacement for the Databricks-managed service — features like managed lineage and the Catalog Explorer UI ship only with the commercial product — but the open core ensures Unity Catalog metadata can be read and respected outside Databricks.

How Dawiso Complements Unity Catalog

Unity Catalog governs the lakehouse extremely well. It does not, by itself, govern the rest of the enterprise: the operational systems that feed it, the BI tools that consume it, the business glossary that defines what "revenue" means, and the human stewardship workflows that decide who owns what.

Dawiso integrates with Unity Catalog through its metadata API, ingesting catalog structure, lineage, classifications, and tags — and then enriches that picture with business context Unity Catalog does not capture: business glossary definitions, stewardship assignments, data quality rules, and end-to-end lineage that crosses Databricks, source systems, and downstream BI. The result is a single governance picture: technical depth from Unity Catalog, business semantics and human workflow from Dawiso.

Dawiso
Built with love for our users
Make Data Simple for Everyone.
Try Dawiso for free today and discover its ease of use firsthand.
© Dawiso s.r.o. All rights reserved