Building a Modern SQL Data Warehouse with Medallion Architecture

SQL data warehouse project overview

Project Overview

This project implements a SQL-based data warehouse using a Medallion architecture (Bronze → Silver → Gold) to integrate CRM and ERP CSV files into a single analytics-ready environment. The warehouse supports customer and sales analysis, from raw data ingestion all the way to Tableau dashboards built on top of a star-schema sales data mart.

Objectives

- Integrate CRM and ERP data into a centralized warehouse.
- Apply Medallion design with clearly separated Bronze, Silver, and Gold layers.
- Design a star-schema sales data mart for self-service analytics.
- Build a customer-focused analytics view with value and recency metrics.
- Deliver dashboards that surface sales trends and customer segments.

Project Planning in Notion

I used Notion to break the project into tasks such as requirement gathering, data exploration, data modeling, SQL development, quality checks, and dashboard design. Kanban-style boards and checklists helped me track each layer (Bronze, Silver, Gold) and the dependencies between them.

Notion project planning for SQL data warehouse

Data Architecture and Flow

The warehouse ingests CSV files from CRM and ERP systems into a Bronze layer, standardizes and enriches them in the Silver layer, and exposes business-ready data in the Gold layer. The following diagrams show the high-level flow, the layer responsibilities, and the final star-schema design of the sales data mart.

Data Flow Diagram

This diagram illustrates how each source table from CRM and ERP flows into Bronze, is copied into Silver, and then contributes to Gold-layer objects such as fact_sales, dim_customers, and dim_products.

End-to-end data flow from sources to Bronze, Silver, and Gold layers

Warehouse Layers and Responsibilities

The second diagram summarizes how each layer is used: Bronze stores raw tables, Silver stores cleaned and standardized tables, and Gold exposes business-ready views built with star-schema and aggregated tables. It also highlights load strategies such as batch processing and full reloads.

Bronze, Silver, and Gold layer responsibilities

Sales Data Mart (Star Schema)

The sales data mart follows a classic star schema with one fact table and two core dimensions. gold.fact_sales contains one row per order line, linked to gold.dim_customers and gold.dim_products via surrogate keys. This structure makes it easy to slice metrics such as revenue and quantity by customer attributes (country, age, gender) and product attributes (category, subcategory, product line).

Star schema of sales data mart

Bronze Layer – Raw Ingestion

The Bronze layer stores raw copies of CRM and ERP CSV files. Tables mirror the structure of the source systems: CRM sales details, customer information, product details, and ERP customers, locations, and pricing categories. Only light technical cleaning is applied (type casting and obvious null handling), preserving full lineage for auditability.

Silver Layer – Cleaned & Conformed Data

In the Silver layer, I standardized customer identifiers, cleaned text fields, validated dates and numeric values, and ensured that all sales records link consistently to customer and product entities. This layer contains conformed tables such as unified customers, enriched products, and validated sales.

Gold Layer – Business-Ready Marts

The Gold layer contains the star-schema sales data mart and business-friendly views. Key objects include dim_customers, dim_products, dim_date, and fact_sales, as well as a customer analytics view that aggregates customer-level metrics used in dashboards.

Customer Analytics View

I created a dedicated view that aggregates customer performance and behavior using the Gold layer. For each customer, the view reports total orders, total sales, quantity purchased, number of distinct products, recency (days since last order), lifespan (days between first and last order), average order value, and average monthly spend. Customers are then grouped into segments such as VIP, Regular, or Occasional based on revenue thresholds, and can also be analyzed by age_group and geographic attributes.

SQL view for customer analytics

Sales Trends and Cumulative Analysis

To understand long-term performance, I aggregated sales by year and computed cumulative running totals and a moving average over time. This provides a smooth view of revenue growth and helps separate structural trends from short-term fluctuations.

Dashboards and Visualizations

I connected Tableau directly to the Gold schema to build interactive dashboards. The customer overview dashboard includes KPIs, segment distributions, age-group breakdowns, and a recency vs. revenue scatter plot. A separate view visualizes yearly sales with bars for annual totals and lines for cumulative revenue and moving-average trends.

Customer analytics Tableau dashboard

Data Quality and Governance

Throughout the pipeline, I implemented checks for missing or invalid dates, negative quantities or prices, and duplicate customer records. All Silver and Gold tables include load timestamps so data freshness can be monitored. Business rules for segmentation and filtering are documented alongside the SQL scripts, supporting transparency and reproducibility.

Project Impact and Applications

This project demonstrates an end-to-end data warehousing workflow: from planning in Notion and modeling with draw.io to implementing SQL transformations and building dashboards. It shows how structured layers and a star schema make it easier for business users to explore customer value, product performance, and sales trends without needing to understand the complexity of the raw source systems.

Deliverables

- GitHub repository with SQL scripts for Bronze, Silver, and Gold layers.
- Draw.io diagrams of the data flow, warehouse layers, and star schema exported as images and embedded on this page.
- Tableau dashboards for customer analytics and sales trends.
- This written report summarizing design choices, methodology, and key insights.