Complete Project

Background


I’ll start by introducing you to the data platform architecture of an ecommerce company named SoftCart. SoftCart uses a hybrid architecture, with some of its databases on premises and some on cloud.

Tools and Technologies:

  • OLTP database - MySQL
  • NoSql database - MongoDB
  • Production Data warehouse – DB2 on Cloud
  • Staging Data warehouse – PostgreSQL
  • Big data platform - Hadoop
  • Big data analytics platform – Spark
  • Business Intelligence Dashboard - IBM Cognos Analytics
  • Data Pipelines - Apache Airflow

Process:

  • SoftCart’s online presence is primarily through its website, which customers access using a variety of devices like laptops, mobiles and tablets.
  • All the catalog data of the products is stored in the MongoDB NoSQL server.
  • All the transactional data like inventory and sales are stored in the MySQL database server.
  • SoftCart’s webserver is driven entirely by these two databases.
  • Data is periodically extracted from these two databases and put into the staging data warehouse running on PostgreSQL.
  • The production data warehouse is on the cloud instance of IBM DB2 server.
  • BI teams connect to the IBM DB2 for operational dashboard creation. IBM Cognos Analytics is used to create dashboards.
  • SoftCart uses Hadoop cluster as its big data platform where all the data is collected for analytics purposes.
  • Spark is used to analyse the data on the Hadoop cluster.
  • To move data between OLTP, NoSQL and the data warehouse, ETL pipelines are used and these run on Apache Airflow.

Task 1


OLTP database - MySQL

  • design the schema for OLTP database.
  • load data into OLTP database.
  • automate admin tasks.

Task 2


NoSQL - Setup MongoDB

  • import data into a MongoDB database.
  • query data in a MongoDB database.
  • export data from MongoDB.

Task 3


Create DW

We’ll start with these steps to create a DataWarehouse

  • Design a Data Warehouse using the pgAdmin ERD design tool.
  • Create the schema in the Data Warehouse

As well as being able to create reports like:

  • total sales per year per country
  • total sales per month per category
  • total sales per quarter per country
  • total sales per category per country

Task 4


Generate Reports

We’ll start with these steps to create a DataWarehouse

  • Load data into Data Warehouse
  • Write aggregation queries
  • Create MQTs