Import CSV - Create db

Objective

Create a database using Python

Load the data from a CSV file as a table to the database

Run basic “queries” on the database to access the information

Scenario

Consider a dataset of employee records that is available with an HR team in a CSV file.
You are required to create the database called STAFF and
load the contents of the CSV file as a table called INSTRUCTORS.
The headers of the available data are :

Header	Description
ID	Employee ID
FNAME	First Name
LNAME	Last Name
CITY	City of residence
CCODE	Country code (2 letters)

Import Data

Using the shell

From cmd
- wget the csv file
- install pandas

wget https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBMSkillsNetwork-PY0221EN-Coursera/labs/v2/INSTRUCTOR.csv

# now the file is stored locally
python3.11 -m pip install pandas

Import libraries

import pandas as pd
import sqlite3

Data is found here

url = "https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBMSkillsNetwork-PY0221EN-Coursera/labs/v2/INSTRUCTOR.csv"
destination_file = "D:/Data/INSTRUCTOR.csv"

Skip this part if you already used the shell to import the data file
There are many ways to read the csv file with pandas: df = pd.read_csv('url')
But I will use the with open() as shown below to read the file and copy it locally

library(reticulate)

import requests
import sqlite3
import pandas as pd

With open

with open(url, "r") as source_file:
        with open(destination_file, 'w') as local_file:
                for line in source_file:
                        destination_file.write(line)

Create SQLite3 db

To create a table in the database, you first need to have the attributes of the required table. Attributes are columns of the table. Along with their names, the knowledge of their data types are also required.

We first assign the conn to the name of database
Name the table
Assign the column names

conn = sqlite3.connect('D:/data/STAFF.db')
file_path = 'D:/data/INSTRUCTOR.csv'
table_name = 'INSTRUCTOR'
attribute_list = ['ID', 'FNAME', 'LNAME', 'CITY', 'CCODE']

Read csv File

Now, to read the CSV using Pandas, you use the read_csv() function.
Since this CSV does not contain headers
Use the keys of the attribute_dict dictionary as a list to assign headers to the data with names=()

df = pd.read_csv(file_path, names = attribute_list)

Load Data to Table

The pandas library provides easy loading of its dataframes directly to the database. For this, you may use the to_sql() method of the dataframe object.

If a table with the same name already exists in the database. If so, and it isn’t required anymore, the tables should be replaced with the one you are loading here.
You may also need to append some information to an existing table. For this purpose, to_sql() function uses the argument if_exists. The possible usage of if_exists is tabulated below.

Argument usage	Description
`if_exists = 'fail'`	Default. The command doesn’t work if a table with the same name exists in the database.
`if_exists = 'replace'`	The command replaces the existing table in the database with the same name.
`if_exists = 'append'`	The command appends the new data to the existing table with the same name.

As you need to create a fresh table upon execution, add the following commands to the code.
The print command is optional, but helps identify the completion of the steps of code until this point.

df.to_sql(table_name, conn, if_exists = 'replace', index =False)

print('Table is ready')

Table is ready

Run Queries

Now that the data is uploaded to the table in the database, anyone with access to the database can retrieve this data by executing SQL queries.

Some basic SQL queries to test this data are SELECT queries for viewing data
COUNT query to count the number of entries

SQL queries can be executed on the data using the read_sql function in pandas.

Now, run the following tasks for data retrieval on the created database.

View Table

query_statement = f"SELECT * FROM {table_name}"
query_output = pd.read_sql(query_statement, conn)
print(query_statement)

SELECT * FROM INSTRUCTOR

print(query_output)

    ID    FNAME      LNAME      CITY CCODE
0    1      Rav      Ahuja   TORONTO    CA
1    2     Raul      Chong   Markham    CA
2    3     Hima  Vasudevan   Chicago    US
3    4     John     Thomas  Illinois    US
4    5    Alice      James  Illinois    US
5    6    Steve      Wells  Illinois    US
6    7  Santosh      Kumar  Illinois    US
7    8    Ahmed    Hussain  Illinois    US
8    9    Nancy      Allen  Illinois    US
9   10     Mary     Thomas  Illinois    US
10  11  Bharath      Gupta  Illinois    US
11  12   Andrea      Jones  Illinois    US
12  13      Ann      Jacob  Illinois    US
13  14     Amit      Kumar  NewDelhi    IN

View 1 column

query_statement = f"SELECT FNAME FROM {table_name}"
query_output = pd.read_sql(query_statement, conn)
print(query_statement)

SELECT FNAME FROM INSTRUCTOR

print(query_output)

      FNAME
0       Rav
1      Raul
2      Hima
3      John
4     Alice
5     Steve
6   Santosh
7     Ahmed
8     Nancy
9      Mary
10  Bharath
11   Andrea
12      Ann
13     Amit

Count Rows

query_statement = f"SELECT COUNT(*) FROM {table_name}"
query_output = pd.read_sql(query_statement, conn)
print(query_statement)

SELECT COUNT(*) FROM INSTRUCTOR

print(query_output)

   COUNT(*)
0        14

Append to Table

Now try appending some data to the table. Consider the following.
a. Assume the ID is 100.
b. Assume the first name, FNAME, is John.
c. Assume the last name as LNAME, Doe.
d. Assume the city of residence, CITY is Paris.
e. Assume the country code, CCODE is FR.

Create a new df to append

data_dict = {'ID' : [100],
            'FNAME' : ['John'],
            'LNAME' : ['Doe'],
            'CITY' : ['Paris'],
            'CCODE' : ['FR']}
data_append = pd.DataFrame(data_dict)

Append df to db

Now use the following statement to append the data to the INSTRUCTOR table.

data_append.to_sql(table_name, conn, if_exists = 'append', index =False)

print('Data appended successfully')

Data appended successfully

Count Again

To verify the new information, recount the rows

query_statement = f"SELECT COUNT(*) FROM {table_name}"
query_output = pd.read_sql(query_statement, conn)
print(query_statement)

SELECT COUNT(*) FROM INSTRUCTOR

print(query_output)

   COUNT(*)
0        15

Close Connection

conn.close()

Run in Shell

If you are using shell to run the python file, go to the directory where the script file is located and run

python3.11 db_code.py