Apache Airflow

For more information regarding AA please visit the AA section. As a reminder:

Apache Airflow is a platform to programmatically author, schedule and monitor workflows. It is a platform that lets you build and run workflows.

This section will cover ETL Pipeline projects setup in Apache Airflow

Bash


This section will make use of Bash to setup automated ETLs and Pipelines in Apache Airflow

Import TXT Server Data for ETL in AA

We’ll Extract server data from an online data source, Transform and Load the data to a local TXT file in a pipeline. Using BashOperator

Import Mulitiple Formats TGZ data ETL

This is a typical ETL Pipeline, where data is imported from different sources in different formats. Here we setup a pipeline to Extract, Transform and monitor the ETL Pipeline

Multiple Projects using BashOperator for ETL Pipelines

This is a collection of multiple short ETL Pipelines demonstrating the use of BashOperator with Apache Airflow

Python


This section will make use of Python to setup automated ETLs in Apache Airflow

Remake of Import TXT Server Data for ETL in Python

We’ll Extract server data from an online data source, Transform and Load the data to a local TXT file in a pipeline. Using PythonOperator

Remake of Import Mulitiple Formats TGZ data ETL

We’ll Extract server data from an online data source, Transform and Load the data to a local TXT file in a pipeline. Using PythonOperator

Import Customer Data Transform and Load to CSV file

This ETL Pipeline will import customer data, transform the information then Load the results into a CSV file that will be passed to the Data Analysts.