-d":" -f1,3,6 /etc/passwd cut
Account Info File to PostgreSQL
In this project we’ll
- Extract data from a delimited file containing: password, userid …
- Transform text data.
- Load data into a PostgreSQL
Extract
- Extract the fields
- field 1 (user name)
- field 2 (user id)
- field 6 (home directory path) from /etc/passwd
Transform
- Transform : to ,
- save data in transformed-data.csv
-data.txt | tr ":" "," > transformed-data.csv extracted
Start PostgreSQL
- From the SkillsNetwork tools, under Databases choose
PostgresSQL
Database server and clickStart
to start the server. This will take a few mins. - Click
PostgresSQL CLI
on the screen to start interacting wit the PostgresSQL server. - This will start the interactive
psql
client which connects to the PostgreSQL server withpostgres=#
prompt as shown below.
@theiadocker-emhrcf:/home/project$ export PGPASSWORD=M8fYujLI4OSDHqtXUU2VVJ4N;
theia--host 172.21.64.88 -p 5432 -U postgres
psql psql (14.12 (Ubuntu 14.12-0ubuntu0.22.04.1), server 13.2)
"help" for help.
Type
=# postgres
Create Table
We’ll create a table called users
in the PostgreSQL database using PostgresSQL CLI. This table will hold the user account information.
The table users
will have the following columns:
- username
- uid
- home
- You will connect to
template1
database which is already available by default. To connect to this database, run the following command at the ‘postgres=#’ prompt.
\c template1
- You will get the following message.
"template1" as user "postgres". You are now connected to database
- Also, your prompt will change to ‘template1=#’.
- Run the following statement at the ‘template1=#’ prompt to create the table.
users(username varchar(50),userid int,homedirectory varchar(100)); create table
- If the table is created successfully, you will get the message below.
CREATE TABLE
Load
Data into Table
- Open a new terminal
- In the terminal, run the following command to create a new shell script named
csv2db.sh
.
touch csv2db.sh
- Open the file in the editor.
- Copy and paste the following lines into the newly created file.
# Extracts data from /etc/passwd file into a CSV file.
# The csv data file contains the user name, user id and
# home directory of each user account defined in /etc/passwd
# Transforms the text delimiter from ":" to ",".
# Loads the data from the CSV file into a table in PostgreSQL database.
Save the file by presseing
Ctrl+s
or by using the File->Save menu option.You need to add lines of code to the script that will xtract user name (field 1), user id (field 3), and home directory path (field 6) from /etc/passwd file using the
cut
command.Copy the following lines and paste them to the end of the script and save the file.
# Extract phase
"Extracting data"
echo
# Extract the columns 1 (user name), 2 (user id) and
# 6 (home directory path) from /etc/passwd
-d":" -f1,3,6 /etc/passwd cut
- Run the script
bash csv2db.sh
Verify that the output contains the three fields, that you extracted.
Change the script to redirect the extracted data into a file named
extracted-data.txt
Replace the cut command at end of the script with the following command.
-d":" -f1,3,6 /etc/passwd > extracted-data.txt cut
- Run the script
bash csv2db.sh
- Run this command to verify the file
extracted-data.txt
is created, and has the content
-data.txt cat extracted
- The output shows the extracted columns are separated by the original “:” delimiter. You need to convert this into a “,” delimited file.
- Add the below lines at the end of the script and save the file.
# Transform phase
"Transforming data"
echo # read the extracted data and replace the colons with commas.
":" "," < extracted-data.txt > transformed-data.csv tr
- Run the script
` bash csv2db.sh
- Run the command below to verify that the file
transformed-data.csv
is created, and has the content.
-data.csv cat transformed
- To load data from a shell script, you will use the
psql
client utility in a non-interactive manner. This is done by sending the database commands through a command pipeline topsql
with the help ofecho
command.
PostgreSQL command to copy data from a CSV file to a table is COPY
.
The basic structure of the command which we will use in our script is,
'filename' DELIMITERS 'delimiter_character' FORMAT; COPY table_name FROM
Now, add the lines below to the end of the script ‘csv2db.sh’ and save the file.
# Load phase
"Loading data"
echo # Set the PostgreSQL password environment variable.
# Replace <yourpassword> with your actual PostgreSQL password. export PGPASSWORD=<yourpassword>;
# Send the instructions to connect to 'template1' and
# copy the file to the table 'users' through command pipeline.
"\c template1;\COPY users FROM '/home/project/transformed-data.csv' DELIMITERS ',' CSV;" | psql --username=postgres --host=postgres echo
- Run the script
bash csv2db.sh
- Now, add the line below to the end of the script ‘csv2db.sh’ and save the file.
"SELECT * FROM users;" | psql --username=postgres --host=postgres template1 echo
Run the script
bash csv2db.sh
Entire script file
# Extracts data from /etc/passwd file into a CSV file.
# The csv data file contains the user name, user id and
# home directory of each user account defined in /etc/passwd
# Transforms the text delimiter from ":" to ",".
# Loads the data from the CSV file into a table in PostgreSQL database.
# Extract phase
"Extracting data"
echo
# Extract the columns 1 (user name), 2 (user id) and
# 6 (home directory path) from /etc/passwd
-d":" -f1,3,6 /etc/passwd
cut # Transform phase
"Transforming data"
echo # read the extracted data and replace the colons with commas.
":" "," < extracted-data.txt > transformed-data.csv
tr # Load phase
"Loading data"
echo # Set the PostgreSQL password environment variable.
# Replace <yourpassword> with your actual PostgreSQL password.
=<yourpassword>;
export PGPASSWORD# Send the instructions to connect to 'template1' and
# copy the file to the table 'users' through command pipeline.
"\c template1;\COPY users FROM '/home/project/transformed-data.csv' DELIMITERS ',' CSV;" | psql --username=postgres --host=postgres
echo "SELECT * FROM users;" | psql --username=postgres --host=postgres template1 echo