How to Read CSV Files in Bash? [2 Methods]

CSV files are frequently used to store data in tabular form, separated into rows and columns with commas. With the increasing application of CSV files, it is very important for a user of Linux or another Unix-based operating system to know how to read the CSV files using just command lines. In this article, I am going to demonstrate every possible case that you may face while reading a CSV file using a bash script.

What is a CSV File?

CSV file is a comma-separated values file. It is a text file where tabulated data are arranged in fields separated with a comma (“”) from the neighboring data. In a CSV file, every line presents a row of data, and commas separate the values of each row. Usually, the first row contains the column headers, representing the data type in each column.

2 Methods to Read CSV File Line by Line Using Bash Script

You can read CSV files in two ways , using a loop, which processes every line, or using IFS, or Internal Field Separator, to separate fields in the CSV file, and these separated fields are then stored in an array.

I am going to use the CSV file named employee_data.csv  given below to demonstrate both of the methods:

John Doe,12345,Manager
Jane Smith,67890,Assistant
Michael Johnson,24680,Developer
Sarah Thompson,13579,Designer

You can read our Comparative Analysis of Methods to distinguish between these two methods and pick the best one for your needs.

1. Read CSV File With a Loop

This method of reading CSV files with a loop is relatively simple; you will use a while loop to loop through each line of the CSV file and then store them in an array. Let’s see the bash script to read the CSV file using while loop:

#!/bin/bash
#creating a variable that will contain the CSV file
FILE="employee_data.csv"
ARRAY=() #creating an empty array

#Starting while loop to read through a CSV file line to line
while read -r line; do
            ARRAY+=("$line") #append a line to the array
done < "$FILE" #End of the loop and redirecting the file

#For loop to print out the array
for line in "${ARRAY[@]}"; do
          echo "$line" #print out the line
done

EXPLANATION

The code starts with declaring a variable named File and assigns the value of employee_data.csv to it. After that, an empty array was created to store the reads from the CSV file. Then the script enters a while loop where each line is read until the end of the file. Next, a for loop is used to iterate over every element in ARRAY and echo command is used to print every line and the loop ends.

Read from CSV file using BashFrom the above image, you can see that I have read the contents from the CSV file named employee_data.csv.

2. Read CSV File With IFS

Internal Field Seperator (IFS) is a special variable in Bash scripting, which separates characters or split strings. As a result, you can also read the CSV file using IFS. To do so, use the code below:

#!/bin/bash
#create a variable containing the filename
FILE="employee_data.csv"
ARRAY=() #create an empty array

#starting while loop to read through a CSV file line by line (delimiter is comma)
while IFS= read -r line; do 
ARRAY+=("$line")
done < "$FILE"

# Use a for loop to iterate over the array and print each record
for record in "${ARRAY[@]}"; do
echo "$record"
done

EXPLANATION

This code is quite similar to the previous one. The main difference is the use of IFS within the while loop. In previous code, this IFS was not explicitly set, so the read command would trim the leading and trailing whitespace from each and every line by default. So the setting of IFS to an empty value effectively disables the default trimming of whitespace by the read command.

Read CSV file using IFS After running the bash script, now you can read the CSV file information on the screen.

6 Cases to Read CSV Files Using Bash Scripts

You may have additional requirements when reading a CSV file, such as reading a file containing special characters, reading only specific columns or rows, or mapping the CSV file data into a Bash array. I’ve gone over each of these cases in detail below.

1. Reading CSV Files With Special Characters

CSV files may contain special characters such as quotes and newlines. These files should be handled differently because special characters must be handled correctly.

Here is the CSV file, employee.csv containing a special character that I am using in this example:

'John "The Boss" Doe,12345,"Manager, Department A"'
'Jane "The "Assistant"",67890,"Assistant, Department B"'
'Michael Johnson,24680,"Developer, Department C"'
'Sarah Thompson,13579,Designer'

Here as you can see double quotes are a special character and I will use sed and awk command to remove this and read the CSV file.

 A. Using “sed” Command

The sed command is a short form of stream editor which is a text processing tool in Unix based operating systems. It can take input from a file, apply specific operations and produce modified output. I have used this command below to modify the CSV file with special characters. Here’s the bash script:

#!/bin/bash
FILE="employee.csv"
ARRAY=()

# reading the file line by line and process each line
while read -r line; do
line=$(echo "$line" | sed 's/"//g') # removing double quotes from the line using sed
ARRAY+=("$line") # adding the modified line to the array
done < "$FILE"

# iterating over the array and print each line
for line in "${ARRAY[@]}"; do
echo "$line"
done

EXPLANATION

This code is similar to the loop code used in method 1. But the one difference is that this code uses the sed command to remove the special character. As you can see in the while loop, the sed command is used to remove double quotes from the line, and the s/”//g in sed replaces all double quotes with an empty string and effectively removes them. The rest of the code works just as it is mentioned in method 1.

Read CSV file using Bash command After executing the script, you can see the printed CSV file on the screen without including the quotation mark, which was present in the CSV file. That means the sed command has successfully removed it from the CSV file.

B. Using “awk” Command

awk is also a text processing tool like sed. It allows users to process and manipulate text by specifying a pattern or actions. It is more powerful than sed when it comes to complex operations and field manipulation. I have used this awk command in my code below to remove special characters and read the CSV file:

#!/bin/bash
#setting variable to store modified file
FILE="corrected_employee.csv"
ARRAY=() #empty array to store each line of CSV file
awk '{ gsub(/"/, ""); print }' employee.csv > $FILE # awk command to replace special character

while read -r line; do
ARRAY+=("$line")
done < "$FILE"

for line in "${ARRAY[@]}"; do
echo "$line"
done

EXPLANATION

In this code File variable is set to a file where the modified code will be stored and an empty ARRAY is initialised to store each line of the CSV file. To replace the double quote from the employee.csv file (“ “) , I have used the awk command with an empty space that saves the modified content in the variable FILE. Then the while loop reads each file of the modified FILE and appends it to the ARRAY. Finally, the for loop goes through each line of  ARRAY and prints every line to the console.

Read CSV file using awk commadIn the above image, you can see that there is no special character printed from the CSV file.

2. Read the First Two Columns Excluding the Header

Suppose you want to read the CSV file from the third row and only the first two columns of a file, in that case, I am going to use a bash script to do so. Here is the CSV file that I have used for demonstration purposes, and suppose I want to read the first three columns, of a CSV file named employee_header.csv which contains the data below:

Name,ID,Position
John Doe,12345,Manager
Jane Smith,67890,Assistant
Michael Johnson,24680,Developer
Sarah Thompson,13579,Designer

Therefore, I have used the following code to read the employee_header.csv file line by line, excluding the header:

#!/bin/bash
#reading each line from employee_header.csv, skipping header
while IFS="," read -r rec_column1 rec_column2 rec_remaining
do
echo "Employee Name-$rec_column1" #print employee name
echo "ID: $rec_column2" #print employee ID
echo ""#print a blank line

done < <(tail -n +2 employee_header.csv) #redirecting the input of while loop to employee.csv excluding header

EXPLANATION

First, it uses the while loop to read each line of the CSV sets IFS to a comma to ensure that each line is split into separate variables based on the comma. Then, inside this loop, sets variables named rec_column1.., where the first, second, and remaining columns are assigned.  In the next three lines, the script prints Employee Name and ID and an empty line. In last line tail command is used to exclude the header and the -n+2 option ensures that line reading has to start from the third line so, the resulting lines are then passed as input to the loop.

Read the First Two Columns Excluding the HeaderThe output shows only the first two columns of information, excluding the header and second line of the CSV file.

3. Parse Specific Columns From CSV File Using Bash Script

Suppose you want to read some specific lines from a CSV file. You can do this using bash scripting. Here is the CSV file that I named employee_header.csv. Now, check the bash script to know how to parse specific columns from CSV file:

Name,ID,Position
John Doe,12345,Manager
Jane Smith,67890,Assistant
Michael Johnson,24680,Developer
Sarah Thompson,13579,Designer

Suppose you want to extract the Name and Portion Columns from the CSV file. Use the code given below to do so:

#!/bin/bash
#assigns the CSV file to a variable
FILE="employee_header.csv"

#reads each line and splits it into three variables
while IFS=, read -r name id position; do
echo "$name" "$position" #prints name and position
done < "$FILE"

EXPLANATION

This code used the 2nd method which was using IFS to assign and separate columns by commas. Within the loop, values from each line of the CSV file are assigned to the variable name and position, and then the echo command prints these with space. Thus the loop continues until the last line of the CSV file and the input for the loop is specified using input redirection  < "$FILE".

Read specific columns of CSV file using Bash In the output, the script prints only the name and position columns of the CSV file.

4. Parse CSV Files With Line Breaks and Commas

Suppose I have a CSV file that contains line breaks and commas within the sentences. Now I want to read it with a bash script. I have used the CSV file address.csv, which you can find below:

123 Main St,
City Name,
State, ZIP

Here is the code that I have used to read this CSV file:

#!/bin/bash
# Read the CSV file
while IFS= read -r line
do

IFS=',' read -ra fields <<< "$line" #split the line by comma

for field in "${fields[@]}" #process each field
do
echo "$field"
done

done < address.csv

EXPLANATION

The CSV file is read line by line with the while loop, and the IFS splits the line by a comma and reads the variable line by splitting it into an array named fields based on the comma. Next the for loop processes each element of the fields array, and echo each array to the terminal. Finally, at the end of the while loop and it redirects the contents of address.csv as input for the loop.

Read CSV files with line breaksAfter executing the script it prints out the contents of the CSV file that used to contain commas and line breaks.

5. Reading Columns of CSV File Into Bash Arrays

Suppose you want to map the columns of your CSV file into arrays. To do so I’ve used the previous CSV file named employee_data.csv as my input file. Follow the code given below to print the Name, ID and position of the employee in an array:

#!/bin/bash
#creates an array ‘arr_record’ containing first column values from employee.csv file
arr_record1=( $(tail -n +2 employee_data.csv | cut -d ',' -f1) )
arr_record2=( $(tail -n +2 employee_data.csv | cut -d ',' -f2) )
arr_record3=( $(tail -n +2 employee_data.csv | cut -d ',' -f3) )

#print the contents of ‘arr-record1’ array
echo "array of Name  : ${arr_record1[@]}"
echo "array of ID  : ${arr_record2[@]}"
echo "array of Position: ${arr_record3[@]}"

EXPLANATION

The line arr_record1=( $(tail -n +2 employee.csv | cut -d ',' -f1) ) ; reads the contents of the input file, excluding the first line using the tail -n +2 , and extracts the values from the first column by cut -d ',' -f1 and assigns them to arr_record1. I’ve followed similar process in the 2nd and 3rd column. Then the line echo "array of Name  : ${arr_record1[@]}" prints the contents of the variable arr_record1, displaying all the names of all the employees in the array. I have followed the same process in the next two lines to print all the IDs and Positions of the CSV file in an array.

Reading columns of CSV file into bash arraysHere you can see the names, IDs and positions of the employees in the form of an array.

6. Dealing With Missing Data/Value or Field

Suppose, I have a CSV file that contains some missing values. It is possible to detect those missing values using an IF or Case statement. I am using a CSV file named missing_data.csv , which you can find below:

Name,Age,Gender,Grade,Address
John Doe,17,Male,11th,123 Main St
Jane Smith,,Female,10th,
Mark Johnson,16,,9th,456 Elm St

Here is the bash code that I have used to read those missing lines in my CSV file:

#!/bin/bash
# Variable to track if missing values are found
missing=false

# Loop through each line of the input CSV file
while IFS=, read -r field1 field2 field3 field4
do
if [ "$field1" == "" ]
then
echo "field1 is empty or no value set"
missing=true
elif [ "$field2" == "" ]
then
echo "field2 is empty or no value set"
missing=true
elif [ "$field3" == "" ]
then
echo "field3 is empty or no value set"
missing=true
elif [ "$field4" == "" ]
then
echo "field4 is empty or no value set"
missing=true
else
echo "$field1, $field2, $field3, $field4"
fi

done <  missing_data.csv
echo "Missing: $missing" #debugging: Print the value of the "missing" variable

if [ "$missing" == true ] #use double quotes and double equals for string comparison
then
echo "WARNING: Missing values in the CSV file. Please use the proper format. Operation failed."
exit 1
else
echo "CSV file read successfully."
fi

EXPLANATION

The script starts with an initialization of a boolean variable named missing to false, which will be used to track whether the CSV file contains any missing values. After that, a while loop is used to iterate each line and the columns in the CSV file are separated by commas and the values from each line’s columns are assigned to the variables field1 field2 field3 field4.

To check if any of the fields are empty or have no value, conditional statements were used within the loop. If any empty or missing value is found in any field a corresponding echo message is printed and the missing variable is set to true. After processing the entire CSV file, the script prints the value of missing variable. Based on the value of the missing variable the script proceeds further and if missing is set to true it echoes a warning message for the missing value or else indicates success.

Read columns of CSV file into bash arraysThe output displays the message Field3 is empty and the missing variable is set to true with a warning message because there were missing fields in my CSV file.

Comparative Analysis of the Methods to Read CSV Files in Bash

In this article, I’ve discussed two methods of reading a CSV file in Bash. Here you will get a comparative analysis of these two methods and an idea of where to use which method for convenience.

Methods Pros Cons
Method 01
  • Using loops allows you to go through each line of the CSV file, giving you flexibility and control over the process.
  • You can use loops to automate repetitive tasks involving the CSV file, such as data transformation, loading, and extraction.
  • While dealing with large CSV files, reading line by line with the loop is inefficient.
  • Handling errors or data inconsistency might need additional effort.
Method 02
  • Using IFS is relatively straightforward and requires fewer lines of code.
  • Again, the IFS approach may be more efficient than iterating each line using loops, as it is quicker.
  • IFS treats all fields of CSV files as strings by default. As a result, it has limited file types for file handling.
  • It does not provide built-in support for handling potential CSV quirks, making it difficult to deal with unconventional or problematic CSV files.

Both methods are useful based on the criteria for which you will use them. Method 1 is useful in cases of data extraction to extract specific columns and rows from a file. You can use the loop to clean or transform data, such as removing duplicates and converting formats.

On the other hand, method 2 is more useful when you are retrieving data or performing calculations on individual fields. Also, you can use IFS to perform simple transformations on CSV files, such as converting values or formatting dates.

Conclusion

In conclusion, reading CSV files using bash is important when it comes to automation and data management tasks. You can easily go through each and every line of CSV files using loops and extract data as per your requirements within a short amount of time. I have demonstrated every case you may face while reading a CSV file in Bash and discussed the possible solutions. I hope you find this article useful. Feel free to comment if you have any questions or suggestions regarding this article.

People Also Ask

How do I read CSV file in Bash?

To read CSV file in Bash, you can use while loop or IFS (Internal Field Separator). Moreover, you can employ the awk or sed command to read CSV files in bash.

How to parse CSV file into Bash array?

To parse CSV file into Bash array, first read a file line by line and split each line into fields using IFS . Then append the values to the bash array using += operator and finally print the records of the array with the for loop.

Can Linux read CSV files?

Yes, you can read CSV files in Linux. You can read it in Linux by using the command line tools, or you can install open-source spreadsheet applications or use a programming language like Bash.

How do I read a file in Bash terminal?

You can easily read files in the bash terminal with some built-in commands. These commands are cat, less, head, and tail. Cat prints the full file in the terminal; less shows the file page by page, making it more readable; on the other hand, head, and tail only show the first or last 10 lines of a file, respectively.

How to open CSV files on Ubuntu?

As CSV files are text files, you can use any text editor such as nano, vim, or vi to open the CSV files. Again you can install LibreOffice Calc, a popular open-source spreadsheet program used to open and manipulate CSV files. You have to install it in your system using the following commands. sudo apt update sudo apt install libreoffice-calc.

How do I open a CSV file in the Ubuntu terminal?

To open a CSV file in the Ubuntu terminal you can use different built-in commands. Such as cat, less. Cat displays the contents of the CSV file and less will allow you to view the file page by page.


Related Articles


<< Go Back to Bash Input | Bash I/O | Bash Scripting Tutorial

5/5 - (2 votes)
Lamisa Musharrat

Hello there. My name is Lamisa Musharat, and I'm an Linux Content Developer Executive at SOFTEKO. I earned a bachelor's degree in Naval Architecture and Marine Engineering from Bangladesh University of Engineering and Technology (BUET).I learned Linux out of my curiosity and now I find it useful as automation is easier using Linux. I take great pleasure in assisting others with Linux-related issues. I really want you to enjoy and benefit from my efforts.Read Full Bio

Leave a Comment