CSV files are frequently used to store data in tabular form, separated into rows and columns with commas. With the increasing application of CSV files, it is very important for a user of Linux or another Unix-based operating system to know how to read the CSV files using just command lines. In this article, I am going to demonstrate every possible case that you may face while reading a CSV file using a bash script.
Key Takeaways
- Read CSV file line by line using bash script.
- Map the data of the CSV file in array.
- Read any number of columns or any category of data separately using bash script.
- Detect any missing data in CSV file.
Free Downloads
What Is a CSV File?
CSV, as its name suggests, is a comma-separated values file. It is a text file where tabulated data are arranged in fields separated with a comma (“”) from the neighbouring data. In a CSV file, every line presents a row of data, and commas separate the values of each row. Usually, the first row contains the column headers, representing the data type in each column.
2 Methods to Read a CSV File Line by Line Using Bash Script
You can read CSV files in two ways , using a loop, which processes every line, or using IFS, or Internal Field Separator, to separate fields in the CSV file, and these separated fields are then stored in an array.
I am going to use the CSV file named employee_data.csv given below to demonstrate both of the methods.
John Doe,12345,Manager
Jane Smith,67890,Assistant
Michael Johnson,24680,Developer
Sarah Thompson,13579,Designer
You can read our Comparative Analysis of Methods to distinguish between these two methods and pick the best one for your needs.
Method 01: Read CSV File With a Loop
This method of reading CSV files with a loop is relatively simple; you will use a while loop to loop through each line of the CSV file and then store them in an array.
➊ First, launch an Ubuntu Terminal. You can use the shortcut keys CTRL+ALT+T to do so.
➋ Then create a file with the extension .sh using the nano command and open the script in the text editor like the following.
nano read_data.sh
➌ After that copy the following script on your text editor. Here, you must use the command read to get the input as a variable.
#!/bin/bash
#creating a variable that will contain the CSV file
FILE="employee_data.csv"
ARRAY=() #creating an empty array
#Starting while loop to read through a CSV file line to line
While read -r line; do
ARRAY+=("$line") #append a line to the array
Done < "$FILE" #End of the loop and redirecting the file
#For loop to print out the array
for line in "${ARRAY[@]}"; do
echo "$line" #print out the line
done
The code starts with #!/bin/bash, which specifies the Bash interpreter. Then it declares a variable named File and assigns the value of employee_data.csv to it. After that, an empty array was created to store the reads from the CSV file. Then the script enters a while loop where each line is read until the end of the file. Next, a for loop is used to iterate over every element in ARRAY and echo command is used to print every line and the loop ends.
➍ To save and exit the text editor press CTRL+ O and CTRL+X.
➎ Now, you need to make the bash script file executable. Type the following line in the terminal to do so.
chmod +x read_data.sh
➏ Finally, you can simply run the file from your command line by typing:
./read_data.sh
From the above image, you can see that I have read the contents from the CSV file named employee_data.csv.
Method 02: Read CSV File With IFS
Internal Field Seperator (IFS) is a special variable in Bash scripting, which separates characters or split strings. As a result, you can also read the CSV file using IFS. To do so, use the code below.
You can follow the steps of Method 01 to know about creating and saving shell scripts.
Script (readcsv1.sh) >
#!/bin/bash
#create a variable containing the filename
FILE="employee_data.csv"
ARRAY=() #create an empty array
#starting while loop to read through a CSV file line by line (delimiter is comma)
while IFS= read -r line; do
ARRAY+=("$line")
done < "$FILE"
# Use a for loop to iterate over the array and print each record
for record in "${ARRAY[@]}"; do
echo "$record"
done
This code is quite similar to the previous one. The main difference is the use of IFS within the while loop. In previous code, this IFS was not explicitly set, so the read command would trim the leading and trailing whitespace from each and every line by default. So the setting of IFS to an empty value effectively disables the default trimming of whitespace by the read command.
After running the bash script, now you can read the CSV file information on the screen.
6 Cases to Read CSV Files Using Bash Scripts
You may have additional requirements when reading a CSV file, such as reading a file containing special characters, reading only specific columns or rows, or mapping the CSV file data into a Bash array. I’ve gone over each of these cases in detail below.
Case 1: Reading CSV Files with Special Characters
CSV files may contain special characters such as quotes and newlines. These files should be handled differently because special characters must be handled correctly.
Here is the CSV file, employee.csv containing a special character that I am using in this example.
'John "The Boss" Doe,12345,"Manager, Department A"'
'Jane "The "Assistant"",67890,"Assistant, Department B"'
'Michael Johnson,24680,"Developer, Department C"'
'Sarah Thompson,13579,Designer'
Here as you can see double quotes are a special character and I will use sed and awk command to remove this and read the CSV file.
A. Using sed Command
The sed command is a short form of stream editor which is a text processing tool in Unix based operating systems. It can take input from a file, apply specific operations and produce modified output. I have used this command below to modify the CSV file with special characters.
You can follow the steps of Method 01 to know about creating and saving shell scripts.
Script (read_csv1.sh) >
#!/bin/bash
FILE="employee.csv"
ARRAY=()
# reading the file line by line and process each line
while read -r line; do
line=$(echo "$line" | sed 's/"//g') # removing double quotes from the line using sed
ARRAY+=("$line") # adding the modified line to the array
done < "$FILE"
# iterating over the array and print each line
for line in "${ARRAY[@]}"; do
echo "$line"
done
This code is similar to the loop code used in method 1. But the one difference is that this code uses the sed command to remove the special character. As you can see in the while loop, the sed command is used to remove double quotes from the line, and the ‘s/”//g’ in sed replaces all double quotes with an empty string and effectively removes them. The rest of the code works just as it is mentioned in method 1.
After executing the script, you can see the printed CSV file on the screen without including the quotation mark, which was present in the CSV file. That means the sed command has successfully removed it from the CSV file.
B. Using Awk Command
Awk is also a text processing tool like sed. It allows users to process and manipulate text by specifying a pattern or actions. It is more powerful than sed when it comes to complex operations and field manipulation. I have used this awk command in my code below to remove special characters and read the CSV file.
You can follow the steps of Method 01 to know about creating and saving shell scripts.
Script (special_char.sh) >
#!/bin/bash
#setting variable to store modified file
FILE="corrected_employee.csv"
ARRAY=() #empty array to store each line of CSV file
awk '{ gsub(/"/, ""); print }' employee.csv > $FILE # awk command to replace special character
while read -r line; do
ARRAY+=("$line")
done < "$FILE"
for line in "${ARRAY[@]}"; do
echo "$line"
done
In this code File variable is set to a file where the modified code will be stored and an empty ARRAY is initialised to store each line of the CSV file. To replace the double quote from the employee.csv file (“ “) , I have used the awk command with an empty space that saves the modified content in the variable FILE. Then the while loop reads each file of the modified FILE and appends it to the ARRAY.Finally, the for loop goes through each line of ARRAY and prints every line to the console.
In the above image, you can see that there is no special character printed from the CSV file.
Case 2: Read the First Two Columns Excluding the Header
Suppose you want to read the CSV file from the third row and only the first two columns of a file, in that case, I am going to use a bash script to do so. Here is the CSV file that I have used for demonstration purposes, and suppose I want to read the first three columns, of a CSV file named employee_header.csv which contains the data below.
Name,ID,Position
John Doe,12345,Manager
Jane Smith,67890,Assistant
Michael Johnson,24680,Developer
Sarah Thompson,13579,Designer
Therefore, I have used the following code to read the employee_header.csv file line by line, excluding the header.
You can follow the steps of Method 01 to know about creating and saving shell scripts.
Script (first_few_columns.sh) >
#!/bin/bash
#reading each line from employee_header.csv, skipping header
while IFS="," read -r rec_column1 rec_column2 rec_remaining
do
echo "Employee Name-$rec_column1" #print employee name
echo "ID: $rec_column2" #print employee ID
echo ""#print a blank line
done < <(tail -n +2 employee_header.csv) #redirecting the input of while loop to employee.csv excluding header
First, it uses the while loop to read each line of the CSV sets IFS to a comma to ensure that each line is split into separate variables based on the comma. Then, inside this loop, sets variables named rec_column1.., where the first, second, and remaining columns are assigned. In the next three lines, the script prints Employee Name and ID and an empty line. In last line tail command is used to exclude the header and the -n+2 option ensures that line reading has to start from the third line so, the resulting lines are then passed as input to the loop.
The output shows only the first two columns of information, excluding the header and second line of the CSV file.
Case 3: Parse Specific Columns from CSV File Using Bash Script
Suppose you want to read some specific lines from a CSV file. You can do this using bash scripting. Here is the CSV file that I named employee_header.csv, to demonstrate:
Name,ID,Position
John Doe,12345,Manager
Jane Smith,67890,Assistant
Michael Johnson,24680,Developer
Sarah Thompson,13579,Designer
Suppose you want to extract the Name and Portion Columns from the CSV file. Use the code given below to do so.
You can follow the steps of Method 01 to know about creating and saving shell scripts.
Script (read_csv3.sh) >
#!/bin/bash
#assigns the CSV file to a variable
FILE="employee_header.csv"
#reads each line and splits it into three variables
while IFS=, read -r name id position; do
echo "$name" "$position" #prints name and position
done < "$FILE"
This code used the 2nd method which was using IFS to assign and separate columns by commas. Within the loop, values from each line of the CSV file are assigned to the variable name and position, and then the echo command prints these with space. Thus the loop continues until the last line of the CSV file and the input for the loop is specified using input redirection < “$FILE”.
In the output, the script prints only the name and position columns of the CSV file.
Case 4: Parse CSV Files with Line Breaks and Commas
Suppose I have a CSV file that contains line breaks and commas within the sentences. Now I want to read it with a bash script. I have used the CSV file address.csv, which is you can find below.
123 Main St,
City Name,
State, ZIP
Here is the code that I have used to read this CSV file.
You can follow the steps of Method 01 to know about creating and saving shell scripts.
Script (address_read.sh) >
#!/bin/bash
# Read the CSV file
while IFS= read -r line
do
IFS=',' read -ra fields <<< "$line" #split the line by comma
for field in "${fields[@]}" #process each field
do
echo "$field"
done
done < address.csv
The CSV file is read line by line with the while loop, and the IFS splits the line by a comma and reads the variable line by splitting it into an array named fields based on the comma. Next the for loop processes each element of the fields array, and echo each array to the terminal. Finally, at the end of the while loop and it redirects the contents of address.csv as input for the loop.
After executing the script it prints out the contents of the CSV file that used to contain commas and line breaks.
Case 5: Reading Columns of CSV File into Bash Arrays
Suppose you want to map the columns of your CSV file into arrays. To do so I’ve used the previous CSV file named employee_data.csv as my input file. Follow the code given below to print the Name, ID and position of the employee in an array.
You can follow the steps of Method 01 to know about creating and saving shell scripts.
Script (array.sh)>
#!/bin/bash
#creates an array ‘arr_record’ containing first column values from employee.csv file
arr_record1=( $(tail -n +2 employee_data.csv | cut -d ',' -f1) )
arr_record2=( $(tail -n +2 employee_data.csv | cut -d ',' -f2) )
arr_record3=( $(tail -n +2 employee_data.csv | cut -d ',' -f3) )
#print the contents of ‘arr-record1’ array
echo "array of Name : ${arr_record1[@]}"
echo "array of ID : ${arr_record2[@]}"
echo "array of Position: ${arr_record3[@]}"
The line arr_record1=( $(tail -n +2 employee.csv | cut -d ‘,’ -f1) ) ; reads the contents of the input file, excluding the first line using the tail -n +2 , and extracts the values from the first column by cut -d ‘,’ -f1 and assigns them to arr_record1. I’ve followed similar process in the 2nd and 3rd column. Then the line echo “array of Name : ${arr_record1[@]}” prints the contents of the variable arr_record1, displaying all the names of all the employees in the array. I have followed the same process in the next two lines to print all the IDs and Positions of the CSV file in an array.
Here you can see the names, IDs and positions of the employees in the form of an array.
Case 6: Dealing With Missing Data/Value or Field
Suppose, I have a CSV file that contains some missing values. It is possible to detect those missing values using an IF or Case statement. I am using a CSV file named missing_data.csv , which you can find below.
Name,Age,Gender,Grade,Address
John Doe,17,Male,11th,123 Main St
Jane Smith,,Female,10th,
Mark Johnson,16,,9th,456 Elm St
Here is the bash code that I have used to read those missing lines in my CSV file.
You can follow the steps of Method 01 to know about creating and saving shell scripts.
Script (missing_line.sh) >
#!/bin/bash
# Variable to track if missing values are found
missing=false
# Loop through each line of the input CSV file
while IFS=, read -r field1 field2 field3 field4
do
if [ "$field1" == "" ]
then
echo "field1 is empty or no value set"
missing=true
elif [ "$field2" == "" ]
then
echo "field2 is empty or no value set"
missing=true
elif [ "$field3" == "" ]
then
echo "field3 is empty or no value set"
missing=true
elif [ "$field4" == "" ]
then
echo "field4 is empty or no value set"
missing=true
else
echo "$field1, $field2, $field3, $field4"
fi
done < missing_data.csv
echo "Missing: $missing" #debugging: Print the value of the "missing" variable
if [ "$missing" == true ] #use double quotes and double equals for string comparison
then
echo "WARNING: Missing values in the CSV file. Please use the proper format. Operation failed."
exit 1
else
echo "CSV file read successfully."
fi
The script starts with an initialization of a boolean variable named missing to false, which will be used to track whether the CSV file contains any missing values. After that, a while loop is used to iterate each line and the columns in the CSV file are separated by commas and the values from each line’s columns are assigned to the variables field1 field2 field3 field4. To check if any of the fields are empty or have no value, conditional statements were used within the loop. If any empty or missing value is found in any field a corresponding echo message is printed and the missing variable is set to true. After processing the entire CSV file, the script prints the value of missing variable. Based on the value of the missing variable the script proceeds further and if missing is set to true it echoes a warning message for the missing value or else indicates success.
The output displays the message Field3 is empty and the missing variable is set to true with a warning message because there were missing fields in my CSV file.
Comparative Analysis of the Methods
In this article, I’ve discussed two methods of reading a CSV file in Bash. Here you will get a comparative analysis of these two methods and an idea of where to use which method for convenience.
Methods
Pros
Cons
Method 01
Method 02
Both methods are useful based on the criteria for which you will use them. Method 1 is useful in cases of data extraction to extract specific columns and rows from a file. You can use the loop to clean or transform data, such as removing duplicates and converting formats.
On the other hand, method 2 is more useful when you are retrieving data or performing calculations on individual fields. Also, you can use IFS to perform simple transformations on CSV files, such as converting values or formatting dates.
Conclusion
In conclusion, reading CSV files using bash is important when it comes to automation and data management tasks. You can easily go through each and every line of CSV files using loops and extract data as per your requirements within a short amount of time. I have demonstrated every case you may face while reading a CSV file in Bash and discussed the possible solutions. I hope you find this article useful. Feel free to comment if you have any questions or suggestions regarding this article.
People Also Ask
Related Articles
- How to Use Input Argument in Bash [With 4 Practical Examples]
- How to Use Bash Input Parameter? [5 Practical Cases]
- How to Read User Input in Bash [5 Practical Cases]
- How to Wait for User Input in Bash [With 4 Practical Examples]
<< Go Back to Bash Input | Bash I/O | Bash Scripting Tutorial