Generate Reports using DW

In the last section we designed and created a PostgreSQL DW, in this section we will create the reports we specifically designed the DW to be able to answer with ease.

Objectives


I’ll be working this project at times on the cloud and at times on the local machine. Here are the overall tasks we will perform

  • Load data into Data Warehouse
  • Write aggregation queries
  • Create MQTs

Setup


We will import an SQL file that contains the schema for all the tables, here is the content of the file

-- Create the table

---------------------------------------
CREATE TABLE public."DimDate"
(
    dateid integer NOT NULL,
    date date,
    Year integer,
    Quarter integer,
    QuarterName character(50),
    Month integer,
    Monthname character(50),
    Day integer,
    Weekday integer,
    WeekdayName character(50),
    CONSTRAINT "DimDate_pkey" PRIMARY KEY (dateid)
);

-------------------------------------------------------

CREATE TABLE public."DimCategory"
(
    categoryid integer NOT NULL,
    category character(50),
    CONSTRAINT "DimCategory_pkey" PRIMARY KEY (categoryid)
);

-------------------------------------------------------

CREATE TABLE public."DimCountry"
(
    countryid integer NOT NULL,
    country character(50),
    CONSTRAINT "DimCountry_pkey" PRIMARY KEY (countryid)
);

-----------------------------------------------------------

CREATE TABLE public."FactSales"
(
    orderid integer NOT NULL,
    dateid integer,
    countryid integer,
    categoryid integer,
    amount integer,
    CONSTRAINT "FactSales_pkey" PRIMARY KEY (orderid)
);

Create db

  1. Create PostgreSQL Instance
  2. Save passw:
  3. Open pgAdmin UI
  4. Right click Databases
  5. Create a new db: Test1

Import Schema

  • Now that we have a new db staging we’ll import the schema we just created above
  • Choose the db from left tree pane
  • Scroll down to Schemas
  • Right click > Query Tool
  • Click on Folder icon > right side the …
  • Make sure the directory is set to /var/lib/pgadmin > Upload
  • Drag the sql file over
  • X out the prompt
  • Select the file
  • Click on Select
  • Now the script is in the SQL query pane
  • Run
  • Tables are created

Import the sql file above to create the following tables

Import Tables Data


  1. Download all the csv locally
  2. Right click on DimDate table
  3. Import/Export Data
  4. Next window > Import
  5. Folder icon
  6. Click on … dots
  7. Make sure the directory is set to /var/lib/pgadmin/
  8. Drag all the data files at once into the window
  9. X out of the popup window
  10. You end up at the screen above
  11. Go from table to table and choose the appropriate csv file to populate all of them
  12. Make sure to check the HEADER option to yes so the headers are imported accordingly

Populate DimDate

  1. The data for the tables is located in file: DimDate.csv so import it locally
  2. Save the data file in /var/lib/pgadmin/
  3. Load the data into DimDate table
  4. View the first 5 rows
SELECT * FROM public."DimDate"
LIMIT 5

Populate DimCategory

  1. The data for the tables is located in file: DimCategory.csv so import it locally
  2. Save the data file in /var/lib/pgadmin/
  3. Load the data into DimCategory table
  4. View the first 5 rows
SELECT * FROM public."DimCategory"
LIMIT 5

Populate DimCountry

  1. The data for the tables is located in file: DimCountry.csv so import it locally
  2. Save the data file in /var/lib/pgadmin/
  3. Load the data into DimCountry table
  4. View the first 5 rows
SELECT * FROM public."DimCountry"
LIMIT 5

Populate FactSales

  1. The data for the tables is located in file: FactSales.csv so import it locally
  2. Save the data file in /var/lib/pgadmin/
  3. Load the data into FactSales table
  4. View the first 5 rows ( I Ordered them by ascending order as well, not that it would make a diff)
SELECT * FROM public."FactSales"
ORDER BY orderid ASC LIMIT 5

Query the Data


Create a Grouping Sets

  • Create a grouping sets query using the columns country, category, totalsales.
SELECT
        b.country,
        f.categoryid,
        SUM(f.amount) AS totalsales
FROM
        public."FactSales" f
INNER JOIN
        public."DimCountry" b ON f.countryid = b.countryid
GROUP BY GROUPING SETS (
        (b.country, f.categoryid),
        b.country,
        f.categoryid,
        ()
)
ORDER BY
        b.country,
    f.categoryid;

Since the request was to group by category not by categoryid we need to join another table to be able to extract the category itself from the id

SELECT
        b.country,
        c.category,
        SUM(f.amount) AS totalsales
FROM
        public."FactSales" f
INNER JOIN
        public."DimCountry" b ON f.countryid = b.countryid
INNER JOIN
        public."DimCategory" c ON f.categoryid = c.categoryid
GROUP BY GROUPING SETS (
        (b.country, f.categoryid, c.category),
        b.country,
        f.categoryid,
        ()
)
ORDER BY
        b.country,
    f.categoryid;

Another way by using category to group instead of categoryid

SELECT
        b.country,
        c.category,
        SUM(f.amount) AS totalsales
FROM
        public."FactSales" f
INNER JOIN
        public."DimCountry" b ON f.countryid = b.countryid
INNER JOIN
        public."DimCategory" c ON f.categoryid = c.categoryid
GROUP BY GROUPING SETS (
        (b.country,  c.category),
        b.country,
        c.category,
        ()
)
ORDER BY
        b.country,
        c.category;

Create a Rollup Query

  • Create a rollup query using the columns year, country, and totalsales.

Without Rollup

SELECT
        d.year,
        b.country,
        SUM(f.amount) AS totalsales
FROM
        public."FactSales" f
INNER JOIN
        public."DimCountry" b ON f.countryid = b.countryid
INNER JOIN
        public."DimDate" d ON f.dateid = d.dateid
GROUP BY GROUPING SETS (
        (d.year, b.country),
        d.year,
        b.country,
        ()
)
ORDER BY
        d.year,
        b.country;

With Rollup

SELECT
        d.year,
        b.country,
        SUM(f.amount) AS totalsales
FROM
        public."FactSales" f
INNER JOIN
        public."DimCountry" b ON f.countryid = b.countryid
INNER JOIN
        public."DimDate" d ON f.dateid = d.dateid
GROUP BY ROLLUP
        (d.year, b.country)
ORDER BY
        d.year,
        b.country;

Create a Cube Query

  • Create a cube query using the columns year, country, and average sales.
SELECT
        d.year,
        b.country,
        AVG(f.amount) AS AverageSales
FROM
        public."FactSales" f
INNER JOIN
        public."DimCountry" b ON f.countryid = b.countryid
INNER JOIN
        public."DimDate" d ON f.dateid = d.dateid
GROUP BY 
        CUBE (d.year, b.country)
ORDER BY
        d.year,
        b.country;

Create an Materialized Query Table

  • Create a MQT (materialized query View) named total_sales_per_country that has the columns country and total_sales.
  • Open a Query pane
CREATE MATERIALIZED VIEW total_sales_per_country AS
SELECT
        b.country,
        SUM(f.amount) AS total_sales
FROM
        public."FactSales" f
INNER JOIN
        public."DimCountry" b ON f.countryid = b.countryid

GROUP BY
        b.country
ORDER BY
        b.country
WITH DATA;
REFRESH MATERIALIZED VIEW total_sales_per_country;
SELECT * FROM total_sales_per_country;