What Is Snowflake?
Snowflake is a cloud-native data platform built on a unique multi-cluster shared-data architecture that separates compute, storage, and metadata into independently scalable layers. Founded in 2012 and IPO'd in 2020 in what was at the time the largest software IPO in history, Snowflake pioneered the now-mainstream pattern of decoupling compute from storage in cloud data warehousing.
This guide explains how Snowflake works, why its architecture matters, what it costs, and where it fits alongside Databricks, BigQuery, and Microsoft Fabric in 2026.
Snowflake is a SaaS cloud data platform with three independent layers: storage (data on cloud object storage in proprietary columnar format), compute (multiple independently sized virtual warehouses), and cloud services (metadata, security, query optimization). Customers pay separately for storage and per-second compute. Snowflake runs on AWS, Azure, and GCP with seamless cross-cloud sharing. The platform now spans warehousing, data engineering, data sharing, applications, and AI through Snowflake Cortex.
What Is Snowflake?
Snowflake is delivered as a fully managed service — there are no servers to provision, no clusters to size manually, no software to patch. Customers connect to a Snowflake account through a web UI, SQL clients, drivers, or the REST API. Behind the scenes, Snowflake runs on top of one or more public clouds (AWS, Azure, or GCP) but exposes a consistent SQL interface and operational model regardless of the underlying provider.
The product has expanded well beyond its origins as a SQL data warehouse. Snowflake today includes: a data marketplace, native applications, Snowpark (Python, Java, Scala for in-warehouse processing), Streamlit-based UI hosting, Snowflake Cortex (LLMs and ML), and Iceberg table support — alongside the original SQL warehouse.
Architecture: Multi-Cluster Shared Data
Snowflake's signature design is the three-layer architecture. Each layer scales independently and can fail or be replaced without affecting the others. This separation is what enables features like elastic compute, instant cloning, and time travel.
Why the architecture matters
In traditional MPP warehouses (Teradata, Netezza, on-prem Vertica) compute and storage are tightly coupled. Adding capacity means adding nodes that bring compute and storage together. Workload isolation requires copying data into a second cluster.
Snowflake's separation lets you:
- Spin up a new compute cluster (virtual warehouse) instantly without copying data.
- Pay only for compute when queries are running — warehouses auto-suspend after a few minutes of idle.
- Run multiple workloads (ETL, BI, data science) on the same data with different compute clusters that do not interfere with each other.
- Resize compute up or down with a single SQL command, without redistributing data.
Virtual Warehouses Explained
A virtual warehouse is a compute cluster sized in T-shirt sizes: X-Small, Small, Medium, Large, X-Large, 2X-Large, up to 6X-Large. Each step roughly doubles the cluster size and the credit consumption rate.
Virtual warehouses can be:
- Auto-suspended — stop after N minutes of inactivity. Storage costs continue but compute does not.
- Auto-resumed — resume on the next query. Cold-start latency is single-digit seconds.
- Multi-cluster — for high-concurrency workloads, spawn additional clusters of the same size as load increases. Used for BI workloads where many users query simultaneously.
- Scaled in seconds — resize up or down with a single SQL
ALTER WAREHOUSE. No data redistribution.
Most Snowflake customers run several virtual warehouses tuned to specific workloads — a large one for nightly ETL, a smaller multi-cluster one for BI, a separate one for ad-hoc analyst queries — to prevent workload contention.
Storage Layer
Snowflake stores data in micro-partitions: immutable, automatically created chunks of 50–500MB compressed columnar data. Snowflake's native format is proprietary, but every micro-partition lives on the underlying cloud object store (S3, ADLS Gen2, GCS) so it inherits the durability and availability characteristics of those services.
Three storage features stand out:
- Time Travel — query data as of any point within a configurable retention window (1–90 days, depending on edition).
SELECT * FROM customers AT(TIMESTAMP => '2026-04-01 10:00:00')returns the table as it existed at that moment. - Zero-copy cloning —
CLONEcreates a new table or schema that shares storage with the original until either is modified. Cloning a 100TB warehouse takes seconds and consumes no additional storage. - Continuous data protection — Time Travel + Fail-safe (a 7-day post-Time-Travel retention period) means even hard
DROPs can be recovered.
Since 2024, Snowflake also supports Apache Iceberg as an external table format — letting Snowflake query Iceberg tables in customer-controlled S3 / ADLS / GCS buckets without bringing data into Snowflake's native storage. This is a significant interop concession that brings Snowflake closer to the open-lakehouse pattern dominant on Databricks.
Cloud Services Layer
The cloud services layer is the brain of Snowflake. It coordinates everything that is not pure compute or storage:
- Metadata management — table schemas, statistics, micro-partition info.
- Query optimization and parsing.
- Authentication, authorization, and SSO integration with Microsoft Entra, Okta, etc.
- Transaction management — Snowflake provides full ACID across the platform.
- Security functions — encryption key management, network policies, dynamic data masking.
- Result caching — Snowflake automatically caches query results for 24 hours; identical queries return instantly without re-running.
Cloud Services is included in Snowflake pricing (no separate charge), with a fair-use cap that applies only to extreme metadata-intensive workloads.
Key Features
- Snowpark — write Python, Java, or Scala that runs inside Snowflake. Pandas-like DataFrames execute against the warehouse without data movement.
- Streams & Tasks — change-data-capture and lightweight orchestration native to the platform.
- Dynamic Tables — declarative materialized views that refresh automatically based on freshness targets.
- Snowflake Marketplace — curated data products from third-party providers, available without ETL.
- Secure Data Sharing — share live data with other Snowflake accounts (or via Reader Accounts) with no copying. Often the killer feature for B2B data exchange.
- Native Apps — package Snowpark code, UI (Streamlit), and data into a distributable application that runs in a customer's Snowflake account.
- Snowsight — modern web UI for SQL, Streamlit dashboards, and data exploration.
Snowflake vs Traditional Warehouses
Compared to legacy MPP warehouses (Teradata, Netezza) and earlier cloud warehouses (Redshift original architecture):
- Elasticity — Snowflake scales compute in seconds; traditional warehouses require resizing or migration projects.
- Workload isolation — multiple warehouses on the same data instead of contention or copies.
- No tuning — no indexes, no statistics maintenance, no vacuum. Snowflake's optimizer and micro-partition pruning do the work automatically.
- SaaS operations — no patching, no backup management, no DBA infrastructure work.
- Pricing model — usage-based by the second instead of fixed-capacity license.
The trade-off is reduced control. You cannot pick storage formats, manually optimize physical layout (beyond clustering keys), or run workloads off-platform. For most organizations this is the right trade.
Pricing Model
Snowflake separates two cost streams:
- Storage — typically $23–$40 per TB per month depending on cloud, region, and whether it is on-demand or capacity storage.
- Compute — billed in credits. One credit equals one node-hour of an X-Small warehouse. Credit prices range from $2 (Standard edition, AWS US) to $4 or higher (Enterprise / Business Critical / VPS editions). Larger warehouses consume credits proportional to their size: a Medium = 4 credits/hour, a Large = 8, etc.
The combination of per-second billing and auto-suspend means a well-managed Snowflake account often costs far less than the on-paper credit rate suggests — but ungoverned access to large warehouses is one of the fastest ways to a surprise cloud bill in the industry. Resource monitors, query timeouts, and warehouse size limits are essential.
Snowflake cost surprises usually come from one of three places. First, oversized warehouses left on (a 4X-Large running 24/7 burns 64 credits/hour ≈ $128+/hour). Second, query patterns that miss the result cache and scan large tables repeatedly. Third, unbounded compute for ad-hoc analyst access. All three are preventable with resource monitors and warehouse policies — but only if someone is watching.
Snowflake Cortex AI
Snowflake Cortex is Snowflake's AI suite, integrated directly into the SQL surface. Highlights:
- Cortex LLM Functions — call hosted LLMs (Mistral, Llama, Snowflake Arctic, others) via SQL:
SELECT SNOWFLAKE.CORTEX.COMPLETE('llama3-70b', prompt) FROM .... - Cortex Search — managed retrieval over text columns; powers RAG patterns inside Snowflake.
- Cortex Analyst — natural-language to SQL over a defined semantic model, intended for self-service analytics.
- Cortex Agents — orchestrated multi-step LLM workflows that combine tools, retrieval, and Cortex models.
- Snowflake Arctic — Snowflake's open foundation model family, optimized for enterprise SQL and code generation.
Cortex's distinguishing feature is the data-residency story: your data does not leave the Snowflake security boundary to be processed by a model. For customers in regulated industries this is a meaningful differentiator over patterns that route data to external LLM providers.
Common Use Cases
- Enterprise data warehousing — the original use case. SQL analytics over consolidated business data.
- Multi-cloud analytics — Snowflake accounts can replicate data across AWS, Azure, and GCP regions; consumers query the closest copy.
- Data sharing and B2B data exchange — Secure Data Sharing eliminates the need to FTP files or stand up extract APIs.
- SaaS application back-end — Native Apps and Snowpark let ISVs ship analytical applications inside customers' Snowflake accounts.
- Marketing analytics and customer 360 — high-cardinality joins across CRM, web analytics, and ad platforms.
- Real-time pipelines — Streams + Tasks + Dynamic Tables handle micro-batch CDC patterns; Snowpipe Streaming supports sub-second ingestion.
- RAG and AI applications — Cortex provides retrieval, models, and orchestration without data movement.
Snowflake's reach has expanded enormously since 2020. It is no longer "just" a warehouse — it is a data platform that competes with Databricks for ML workloads, with Microsoft Fabric for BI integration, and with cloud-native AI vendors for enterprise generative AI. The original architectural decision to separate compute and storage remains the foundation that makes all of it possible. Governance over what lives in Snowflake — ownership, classification, lineage, business meaning — is where products like Dawiso fit alongside Snowflake's native data dictionary.