//cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBMSkillsNetwork-PY0221EN-Coursera/labs/v2/INSTRUCTOR.csv
wget https:
# now the file is stored locally
.11 -m pip install pandas python3
Import CSV - Create db
Objective
Create a database using Python
Load the data from a CSV file as a table to the database
Run basic “queries” on the database to access the information
Scenario
- Consider a dataset of employee records that is available with an HR team in a CSV file.
- You are required to create the database called
STAFF
and - load the contents of the CSV file as a table called
INSTRUCTORS
. - The headers of the available data are :
Header | Description |
---|---|
ID | Employee ID |
FNAME | First Name |
LNAME | Last Name |
CITY | City of residence |
CCODE | Country code (2 letters) |
Import Data
Using the shell
- From cmd
- wget the csv file
- install pandas
- Import libraries
import pandas as pd
import sqlite3
- Data is found here
= "https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBMSkillsNetwork-PY0221EN-Coursera/labs/v2/INSTRUCTOR.csv"
url = "D:/Data/INSTRUCTOR.csv" destination_file
- Skip this part if you already used the shell to import the data file
- There are many ways to read the csv file with pandas:
df = pd.read_csv('url')
- But I will use the with open() as shown below to read the file and copy it locally
library(reticulate)
import requests
import sqlite3
import pandas as pd
With open
with open(url, "r") as source_file:
with open(destination_file, 'w') as local_file:
for line in source_file:
destination_file.write(line)
Create SQLite3 db
To create a table in the database, you first need to have the attributes of the required table. Attributes are columns of the table. Along with their names, the knowledge of their data types are also required.
- We first assign the conn to the name of database
- Name the table
- Assign the column names
= sqlite3.connect('D:/data/STAFF.db')
conn = 'D:/data/INSTRUCTOR.csv'
file_path = 'INSTRUCTOR'
table_name = ['ID', 'FNAME', 'LNAME', 'CITY', 'CCODE'] attribute_list
Read csv File
- Now, to read the CSV using Pandas, you use the
read_csv()
function. - Since this CSV does not contain headers
- Use the keys of the
attribute_dict
dictionary as a list to assign headers to the data with names=()
= pd.read_csv(file_path, names = attribute_list) df
Load Data to Table
The pandas
library provides easy loading of its dataframes directly to the database. For this, you may use the to_sql()
method of the dataframe
object.
- If a table with the same name already exists in the database. If so, and it isn’t required anymore, the tables should be replaced with the one you are loading here.
- You may also need to append some information to an existing table. For this purpose,
to_sql()
function uses the argumentif_exists
. The possible usage ofif_exists
is tabulated below.
Argument usage | Description |
---|---|
if_exists = 'fail' |
Default. The command doesn’t work if a table with the same name exists in the database. |
if_exists = 'replace' |
The command replaces the existing table in the database with the same name. |
if_exists = 'append' |
The command appends the new data to the existing table with the same name. |
- As you need to create a fresh table upon execution, add the following commands to the code.
- The print command is optional, but helps identify the completion of the steps of code until this point.
= 'replace', index =False) df.to_sql(table_name, conn, if_exists
14
print('Table is ready')
Table is ready
Run Queries
Now that the data is uploaded to the table in the database, anyone with access to the database can retrieve this data by executing SQL queries.
- Some basic SQL queries to test this data are
SELECT
queries for viewing data COUNT
query to count the number of entries
SQL queries can be executed on the data using the read_sql
function in pandas
.
- Now, run the following tasks for data retrieval on the created database.
View Table
= f"SELECT * FROM {table_name}"
query_statement = pd.read_sql(query_statement, conn)
query_output print(query_statement)
SELECT * FROM INSTRUCTOR
print(query_output)
ID FNAME LNAME CITY CCODE
0 1 Rav Ahuja TORONTO CA
1 2 Raul Chong Markham CA
2 3 Hima Vasudevan Chicago US
3 4 John Thomas Illinois US
4 5 Alice James Illinois US
5 6 Steve Wells Illinois US
6 7 Santosh Kumar Illinois US
7 8 Ahmed Hussain Illinois US
8 9 Nancy Allen Illinois US
9 10 Mary Thomas Illinois US
10 11 Bharath Gupta Illinois US
11 12 Andrea Jones Illinois US
12 13 Ann Jacob Illinois US
13 14 Amit Kumar NewDelhi IN
View 1 column
= f"SELECT FNAME FROM {table_name}"
query_statement = pd.read_sql(query_statement, conn)
query_output print(query_statement)
SELECT FNAME FROM INSTRUCTOR
print(query_output)
FNAME
0 Rav
1 Raul
2 Hima
3 John
4 Alice
5 Steve
6 Santosh
7 Ahmed
8 Nancy
9 Mary
10 Bharath
11 Andrea
12 Ann
13 Amit
Count Rows
= f"SELECT COUNT(*) FROM {table_name}"
query_statement = pd.read_sql(query_statement, conn)
query_output print(query_statement)
SELECT COUNT(*) FROM INSTRUCTOR
print(query_output)
COUNT(*)
0 14
Append to Table
Now try appending some data to the table. Consider the following.
a. Assume the ID
is 100
.
b. Assume the first name, FNAME
, is John
.
c. Assume the last name as LNAME
, Doe
.
d. Assume the city of residence, CITY
is Paris
.
e. Assume the country code, CCODE
is FR
.
Create a new df to append
= {'ID' : [100],
data_dict 'FNAME' : ['John'],
'LNAME' : ['Doe'],
'CITY' : ['Paris'],
'CCODE' : ['FR']}
= pd.DataFrame(data_dict) data_append
Append df to db
- Now use the following statement to append the data to the
INSTRUCTOR
table.
= 'append', index =False) data_append.to_sql(table_name, conn, if_exists
1
print('Data appended successfully')
Data appended successfully
Count Again
- To verify the new information, recount the rows
= f"SELECT COUNT(*) FROM {table_name}"
query_statement = pd.read_sql(query_statement, conn)
query_output print(query_statement)
SELECT COUNT(*) FROM INSTRUCTOR
print(query_output)
COUNT(*)
0 15
Close Connection
conn.close()
Run in Shell
- If you are using shell to run the python file, go to the directory where the script file is located and run
.11 db_code.py python3