ETL - Howto

A collection of ETL processes

Functions


Movies Table - BS into DB

Extract from online movies table using requests.get. Parse with BS into df. Load to CSV. Create DB. Load in DB

REQUESTS.GET

BeautifulSoup

FIND.ALL tables

FIND.ALL rows

LOOP rows into df

SAVE df to CSV

CREATE SQLITE3 DB

SAVE df to db

Scripts


This section is more of a complete script section, each file contains a running ETL that will run upon file execution. Not automated yet, that will be done in another section.

Employee CSV - SQLite3 - SQL DB

Import CSV file with open(). Create DB. Query. Transform. Save DB

WITH OPEN

R & W into file

READ_CSV

create SQLite DB

TO_SQL to load data to DB

READ_SQL to query DB

save DB

CLOSE connection

Multiple Sources - json csv xml - Pandas - GLOB - Log Processes

Import from multiple sources: json, csv, xml. Extract with GLOB. Transform. Load to CSV. Log the entire ETL process

WGET zip - shell

UNZIP - shell

REQUESTS.GET

RESPONSE.CONTENT

EXTRACTALL

CSV EXTRACT

JSON EXTRACT

XML EXTRACT

ElementTree

GLOB

Extract loop the GLOB list

Transform data

Load to CSV

LOG process

GDP - Pandas - SQLite3 - ETL w Log

Scrape online GDP table from site with Pandas. Save to CSV. Save in SQLite3 DB. Query. Log ETL process

SQLite3

READ_HTML

df.TO_CSV

df.TO_SQL

READ_SQL

LOG file

GDP - BS - SQLite3 - ETL w Log

Scrape online GDP table from site. Parse with BeautifulSoup. Save to CSV. Save in SQLite3 DB. Query. Log ETL process