Complete Project
Background
I’ll start by introducing you to the data platform architecture of an ecommerce company named SoftCart. SoftCart uses a hybrid architecture, with some of its databases on premises and some on cloud.
Tools and Technologies:
- OLTP database - MySQL
- NoSql database - MongoDB
- Production Data warehouse – DB2 on Cloud
- Staging Data warehouse – PostgreSQL
- Big data platform - Hadoop
- Big data analytics platform – Spark
- Business Intelligence Dashboard - IBM Cognos Analytics
- Data Pipelines - Apache Airflow
Process:
- SoftCart’s online presence is primarily through its website, which customers access using a variety of devices like laptops, mobiles and tablets.
- All the catalog data of the products is stored in the MongoDB NoSQL server.
- All the transactional data like inventory and sales are stored in the MySQL database server.
- SoftCart’s webserver is driven entirely by these two databases.
- Data is periodically extracted from these two databases and put into the staging data warehouse running on PostgreSQL.
- The production data warehouse is on the cloud instance of IBM DB2 server.
- BI teams connect to the IBM DB2 for operational dashboard creation. IBM Cognos Analytics is used to create dashboards.
- SoftCart uses Hadoop cluster as its big data platform where all the data is collected for analytics purposes.
- Spark is used to analyse the data on the Hadoop cluster.
- To move data between OLTP, NoSQL and the data warehouse, ETL pipelines are used and these run on Apache Airflow.
Task 1
OLTP database - MySQL
- design the schema for OLTP database.
- load data into OLTP database.
- automate admin tasks.
Task 2
NoSQL - Setup MongoDB
- import data into a MongoDB database.
- query data in a MongoDB database.
- export data from MongoDB.
Task 3
Create DW
We’ll start with these steps to create a DataWarehouse
- Design a Data Warehouse using the pgAdmin ERD design tool.
- Create the schema in the Data Warehouse
As well as being able to create reports like:
- total sales per year per country
- total sales per month per category
- total sales per quarter per country
- total sales per category per country
Task 4
Generate Reports
We’ll start with these steps to create a DataWarehouse
- Load data into Data Warehouse
- Write aggregation queries
- Create MQTs