Skip to main content
medallion architecturebronze silver golddata lakehousedelta lakedata quality layers

Medallion Architecture: Complete Guide to Bronze, Silver, and Gold Layers

Medallion architecture is a data design pattern that organises data in a lakehouse into three progressive layers — Bronze, Silver, and Gold — each representing an increasing level of quality, structure, and business readiness. Originally popularised by Databricks as a best practice for Delta Lake, the pattern has since become a widely adopted standard for organising data in modern data platforms built on Apache Iceberg, Apache Hudi, and similar open table formats.

TL;DR

Medallion architecture is a three-layer data lakehouse pattern (Bronze → Silver → Gold) that progressively refines raw ingested data into business-ready datasets. It solves data quality, governance, and reprocessing challenges by preserving every layer as a permanent, queryable dataset — making it the foundational design pattern for modern data governance programmes.

What Is Medallion Architecture?

At its core, medallion architecture is a multi-hop data pipeline pattern. Data flows from its source through a series of transformation stages, each stored as a persistent, queryable layer rather than a transient pipeline stage. This persistence is what distinguishes medallion architecture from traditional ETL pipelines: every layer is a first-class dataset that can be queried, monitored, and governed independently.

The name "medallion" reflects the idea that each layer represents an incrementally higher grade of data, much like bronze, silver, and gold medals in athletics. The metaphor is intuitive and has contributed to the pattern's widespread adoption. Some organisations extend the model with additional layers — a raw landing zone before Bronze, or a Platinum layer for highly aggregated executive reporting — but the three-layer model remains the canonical form.

Medallion architecture is storage-format agnostic in principle, but it works best when the underlying storage layer supports ACID transactions, schema evolution, and time travel. This is why it is most commonly associated with open table formats such as Delta Lake, Apache Iceberg, and Apache Hudi, all of which provide these capabilities on cloud object storage.

The Bronze Layer: Raw Data Preservation

The Bronze layer is the landing zone for all raw data ingested from source systems. It receives data exactly as it arrives — no schema enforcement, no deduplication, no transformation. The Bronze layer is append-only by design: records are added but never modified or deleted. This immutability makes it the authoritative historical record of everything the organisation has ever received.

Typical data in the Bronze layer includes database change data capture (CDC) events, API responses, log files, CSV exports, streaming records from Apache Kafka or Azure Event Hubs, and any other source format. Data is stored with ingestion metadata such as timestamps, source system identifiers, and pipeline run IDs, enabling full auditability.

Because Bronze data is unvalidated, it may contain nulls, malformed records, duplicate events, schema drift, and other quality issues. This is intentional. The Bronze layer is a safety net: if downstream transformations introduce bugs or business rules change, teams can always reprocess from Bronze without re-ingesting from source systems.

Data should land in Bronze as quickly as possible after generation. Partitioning in Bronze is almost always by ingestion date or arrival timestamp rather than by business keys. Retention policies should align with regulatory requirements — in many industries, raw data must be retained for seven years or more.

Medallion Architecture: Bronze, Silver, Gold layers Medallion Architecture — Data Flow SOURCES Databases APIs Event Streams Files / Logs SaaS Tools BRONZE Raw · Append-only No schema enforcement Ingestion metadata Immutable history Partition by date 🟤 Raw data preserved clean SILVER Cleansed · Conformed Deduplication Schema enforcement Type casting SCD logic Validated entities aggregate GOLD Business-ready Dashboards & BI ML feature stores Financial reporting Data products 🏆 Certified & governed INCREASING DATA QUALITY →
Click to enlarge

The Silver Layer: Cleansed and Conformed Data

The Silver layer transforms raw Bronze data into clean, validated, and conformed datasets. This is where the bulk of data engineering work happens: deduplication, null handling, type casting, standardisation of formats (dates, currencies, identifiers), application of business rules, and light transformations that make data queryable by a broad audience without domain expertise.

Silver tables typically map to business entities: customers, orders, products, transactions, events. They represent a single version of truth for each entity, with duplicates removed and conflicting records resolved according to defined merge logic. Schema enforcement is applied at this layer — records that fail validation are quarantined into a separate error table rather than silently dropped or corrupted.

A key characteristic of the Silver layer is that it is still relatively close to the source domain. Silver data is cleaned and validated but not yet aggregated or shaped for a specific business use case. A Silver customer table contains all known attributes about a customer, not a pre-filtered subset for a particular campaign. This generality is what allows the Silver layer to serve as a foundation for multiple Gold datasets with different business purposes.

The Silver layer is the appropriate place to implement slowly changing dimension (SCD) logic. When a customer changes their address or a product is recategorised, the Silver layer captures the history of those changes using SCD Type 1, 2, or 4 patterns. With open table formats like Apache Iceberg, SCD logic can be implemented using MERGE statements that atomically upsert changed records while preserving history through time travel.

The Gold Layer: Business-Ready Data

The Gold layer contains data that is ready for direct consumption by business users, analysts, data scientists, and applications. Gold datasets are purpose-built for specific analytical use cases: executive dashboards, marketing attribution models, financial reporting, machine learning feature stores, and customer-facing data products.

Unlike Bronze and Silver, Gold tables are not general-purpose. A single Silver customer table might power dozens of Gold datasets, each shaped differently for a different audience. The Gold layer applies the final business logic transformations: aggregations, joins across domains, calculation of derived metrics, filtering to relevant time windows, and formatting for consumption tools such as Power BI, Tableau, or Looker.

Gold datasets are typically optimised for query performance: thoughtful partitioning, clustering by frequently joined keys, and pre-computation of expensive aggregations. Because Gold data is derived from Silver, it can always be regenerated if business logic changes — teams do not need to maintain complex backfill processes for historical Gold data.

The Gold layer is the primary consumption surface for business users, which makes it the most important layer from a data governance perspective. Access control policies, row-level security, and column masking are typically applied at the Gold layer. Certification and trust signals are attached to Gold datasets to communicate quality and reliability to consumers.

Implementation: Key Decisions

Organisations implementing medallion architecture face several key design decisions. The first is whether to use a single storage account for all three layers or separate storage accounts per layer. Separate accounts provide stronger access isolation but increase operational complexity. Most organisations start with a single account and use folder-level or namespace-level access policies.

The second decision is pipeline orchestration. Popular choices include Apache Airflow, Databricks Workflows, Azure Data Factory, and dbt for transformation logic. dbt is increasingly used for Silver and Gold transformations because its model-based approach aligns naturally with the medallion pattern, and its lineage documentation integrates with data catalogs like Dawiso.

The third decision is data observability. Each layer should have automated quality checks: Bronze monitors for ingestion completeness and file arrival SLAs; Silver monitors for schema conformance, null rates, and referential integrity; Gold monitors for metric consistency, row count expectations, and freshness SLAs.

Governance Across Layers

One of the most significant benefits of medallion architecture from a governance perspective is that the three-layer structure creates natural checkpoints for data lineage tracking. Every transformation between Bronze and Silver, and between Silver and Gold, is an explicit data movement event that can be captured in a metadata system.

This is where platforms like Dawiso provide critical value. Dawiso's data catalog capabilities allow data teams to document each layer's tables, track column-level lineage across Bronze-to-Silver and Silver-to-Gold transformations, and surface that lineage to business users. When a Gold metric looks wrong, analysts can trace it backward through the lineage graph to the specific Bronze record or Silver transformation rule that introduced the problem.

Dawiso also supports tagging and classification at each layer. Bronze tables can be tagged with their source system and data sensitivity level. Silver tables carry documentation of cleaning rules and ownership. Gold datasets are certified with quality scores and linked to the business glossary terms they represent.

Databricks, Apache Iceberg, and Apache Hudi

Databricks introduced the medallion pattern as a Delta Lake best practice and continues to develop tooling around it, including Delta Live Tables (DLT), which provides a declarative framework for building and monitoring medallion pipelines with built-in data quality constraints and automatic dependency tracking.

Apache Iceberg has emerged as a strong alternative to Delta Lake for medallion implementations, particularly in multi-engine environments where different teams use Spark, Trino, Flink, and Snowflake against the same tables. Iceberg's catalog abstraction and its support for hidden partitioning, schema evolution, and row-level deletes make it well suited to the transformation patterns required at each medallion layer.

Apache Hudi is commonly used for high-frequency upsert workloads at the Bronze and Silver layers. Hudi's Copy-on-Write and Merge-on-Read storage types allow teams to trade off between write latency and read performance depending on the layer's requirements.

Medallion Architecture Best Practices

Teams that have successfully implemented medallion architecture at scale share a number of consistent practices:

  • Treat each layer as a product with defined consumers, SLAs, and named ownership
  • Invest in automated data quality monitoring from day one, not as an afterthought
  • Document lineage and business logic as part of the pipeline development process
  • Resist the temptation to skip layers — the efficiency gains from bypassing Silver are almost always outweighed by governance and debugging costs
  • Require every Silver and Gold table to have a named owner, a documented purpose, and a freshness SLA before promotion to production
  • Use streaming processing at Bronze and Silver for near-real-time use cases, reserving batch for computationally intensive Gold aggregations

The medallion pattern is not a rigid specification but a set of principles. Teams should adapt it to their context: the number of layers, the granularity of quality checks, and the tools used should all reflect the organisation's size, maturity, and use case mix. What matters is that the core principles — raw data preservation, incremental quality improvement, and clear consumption surfaces — are maintained consistently across the data platform.

Dawiso
Built with love for our users
Make Data Simple for Everyone.
Try Dawiso for free today and discover its ease of use firsthand.
© Dawiso s.r.o. All rights reserved