How to Extract Bash Substring? [5 methods]

LINUX
FUNDAMENTALS
A Complete Guide for Beginners Enroll Course Now

In Bash, a substring is a part of a string derived or extracted from the string. Substring provides powerful features for text manipulation and processing. Substring extraction is essential for text manipulation.

You can use the following methods to extract Bash substring:

  1. Using Bash’s substring expansion: ${input_string:start_index:length}
  2. Using the “cut” command: cut -c N-M <<< input_string
  3. Using the “awk” command: awk '{print substr($input_string,start_index, length)}'
  4. Using the “expr” command: expr substr input_string start_index length
  5. Using the “grep” command: echo input_string | grep -o “substring”

There are two types of bash substring extraction: index-based and pattern-based.

In this article, I’ll explain 4 methods of index-based substring extraction and 3 methods of pattern-based substring extraction in Bash. So let’s get started!

A. Index-Based Substring Extraction

Index-based extraction involves extracting a substring from an original string based on specified start and end positions of characters. Bash strings are zero-indexed. You can extract a substring based on the index in various ways like Bash’s substring expansion, using the “cut” command, using the “awk” command, and using the “expr” command.

1. Using Bash’s Substring Expansion

The simplest method to extract a substring from a string is to use the expression ${string:start_index:length} where the string variable holds the main text or string. The start_index denotes the initial position of characters from which the extraction begins, while the length specifies the size of the resulting substring.

You can check the following examples of substring extraction for a clearer understanding of the topic:

i. From the Start of the String

To extract a substring from the start of the string set the “start_index” value to 0 and specify the “length” as your preference. For example, to extract a substring of length 11 from the starting, you can check the following script:

#!/bin/bash

#Define a string variable
string="Linuxsimply and Linux"


#Print the string variable
printf "The main string:\n$string"

#Extract a substring from the first character of the string
substring="${string:0:11}"

#Print the substring value
printf "\n\nThe substring:\n$substring\n"

EXPLANATION
The syntax "${string:0:11}" extracts a substring from the first character (index 0) and includes the next 11 characters of the string variable.

Index-based substring extraction from the start index in Bash.The output shows the extracted substring of a specified length from the main string.

ii. From the Middle of the String

To extract a substring from the middle of the string set the “start_index” to any index value rather than 0 and the “last_index” of the string, and specify the “length”.

You can check the following example to extract a substring of length 9 from the original string, starting at index 8:

#!/bin/bash

#Define a string variable
string="Extract substring from middle."

#Print the string variable
printf "The main string:\n$string"

#Extract a substring from the first character of the string
substring="${string:8:9}"

#Print the substring value
printf "\n\nThe substring:\n$substring\n"

EXPLANATION
In the script, substring="${string:8:9}" extract a substring from the string variable. The substring is extracted from the index 8 to index 16. As the substring length is 9, the ending index is 8+9-1=16.

Index-based substring extraction from the middle of the string using substring expansion.The output shows the extracted substring from the middle of the main string.

iii. From the Positive Index Position

Positive position refers to the positions or indices counted from the beginning of the string, starting with 0 for the first character.

To extract a string from the positive position, provide the “start_index” to indicate from which the extraction should begin. Follow the script below:

#!/bin/bash

#Define a string variable
string="Extract substring from positive starting position."

#Print the string variable
printf "The main string:\n$string"

#Extract a substring from a positive starting position
substring="${string:18}"

#Print the substring value
printf "\n\nThe substring:\n$substring\n"

EXPLANATION
The code snippet substring="${string:18}" extracts a substring from the variable string, starting at index 18 and extending to the end of the string as length is not mentioned.

Index-based substring extraction with the positive index in BashThe result displays the substring obtained from the positive starting position and continuing until the end index.

iv. From the Negative Starting Index Position

The negative starting position refers to the character index counted backward of the string, with -1 representing the last character. To extract a substring from the negative starting position use the syntax, substring="${string: -start_index: length}".

See the following bash scripts to extract a substring from a negative position:

  1. To simply specify the negative “start_index” use the code below:
    #!/bin/bash
    
    #Define a string variable
    string="Extract substring from the negative starting position."
    
    #Print the string variable
    printf "The main string:\n$string"
    
    #Extract a substring from the negative starting position
    substring="${string: -27}"
    
    #Print the substring value
    printf "\n\nThe substring:\n$substring\n"

    EXPLANATION
    Here, substring="${string: -27}" extracts a substring from the negative  “start_index” -27 which is n and continues until the end of the string.

    Index-based substring extraction with the negative index position.The output shows the extracted substring from the negative starting position of the main string.

  2. You can set both “start_index” and “length” too. In that case, follow the below script:

    #!/bin/bash
    
    #Define a string variable
    string="Extract substring from the negative starting position."
    
    #Print the string variable
    printf "The main string:\n$string"
    
    #Extract a substring from the negative starting position
    substring="${string: -27: 8}"
    
    #Print the substring value
    printf "\n\nThe substring:\n$substring\n"

    Index-based substring extraction with the negative index position and length.The output shows the extracted substring from the negative starting position which is 27th character from the end of the string, with a length of 8 characters.

2. Using the “cut” command

If you want to extract the Nth to Mth character of a main string using the “cut” command along with the -c option, you can use the syntax cut -c N-M <<< input_string.

Check the following example:

cut -c 9-17<<< 'Extract Substring'
EXPLANATION

Here, the cut command with the -c option extracts a substring consisting of characters 9 to 17 from the main string Extract Substring.

Index-based substring extraction using cut command in Bash.The output shows the extracted “Substring” from the main string “Extract Substring”.

3. Using the “awk” Command

The awk command is equipped with a built-in substr($s, i, n) function that allows to directly invoke the function for obtaining substrings. The “substr($s, i, n)” function has three arguments which are input string (s), start index (i), and length (n). The syntax to extract a substring using the “awk” command is as below:

awk '{print substr($s, i, n)}'

You can check the following example:

awk '{print substr($0, 11, 9)}' <<< 'Extract a substring'
EXPLANATION

The awk command extracts a substring from the input string ‘Extract a substring’. It starts at the 11th character (‘s’) and includes the next 9 characters.

Index-based substring extraction using "awk" command in Bash.The output shows the extracted substring from the input string.

4. Using the “expr” command

The expr command extracts a substring from a string based on a specific starting index and length whose syntax is expr substr input_string start_index length. Here, substr is a subcommand of expr. Check the following example:

expr substr "Extracting substring using awk" 12 9

EXPLANATION
The expr command extracts a substring from the given string, starting at position 12 and including the next 9 characters.

Index-based substring extraction using the "expr" command in Bash.The output displays the extracted substring from the main string.

B. Pattern-Based Extraction

Pattern-based substring extraction in Bash involves using patterns or regular expressions to identify and isolate specific substrings within a larger string. This is usually achieved through tools like ‘grep’, ‘sed’, or ‘awk commands. In this section, 3 ways of pattern-based substring extraction will be discussed.

1. Using “cut” Command

To extract a substring, utilize the “cut” command with the -d option to define a delimiter and the -f option to designate the field number of the desired substring. The syntax is, cut -d '<delimiter>' -f <field_number>.

For a pattern-based substring extraction using the cut command, use the following Bash script:

#!/bin/bash

# Declare a variable
string="Extract Substring"

# Extract the substring
substring1=$(echo ${string} | cut -d ' ' -f 1)
substring2=$(echo ${string} | cut -d ' ' -f 2)

#Print the string variable
printf "The main string:\n$string\n\n"

# Print the substring
echo "First substring: $substring1"
echo "Second substring: $substring2"
EXPLANATION

The cut command along with the echo command extracts the fields (substring) from the original string based on the delimiter (space), which is specified by the option -d. The -f option specifies which field to extract.

Pattern-based substring extraction using "cut" command in Bash.The output shows the extracted fields from the main string using the space as a delimiter.

2. Using “awk” Command

You can utilize the awk command along with the field separator option -F. Follow the below script to extract pattern-based substring using the awk command:

#!/bin/bash

#Define a string variable
string="Try to extract substring using awk."

#Print the string variable
printf "The main string:\n$string\n\n"

#Extract a substring
awk -F 'to |using ' '{print $2}' <<< "$string"

EXPLANATION
Here, awk -F 'to |using ' sets the field separator -F to a regular expression that matches either “to ” or “using “. As a result, the awk command treats the text “to ” and “using ” as separate fields. So the string is separated into three fields. The '{print $2}' instructs awk to print the second field which is the text between  “to” and “using”.

Pattern-based substring extraction using "awk" command in Bash.Here’s the extracted substring shown in the output.

3. Using “grep” Command

The grep command can search for a specific substring along with the -o option. Check the following example to extract substring using the “grep” command:

echo "Extracting substring using patterns" | grep -o 'substring'
EXPLANATION

The grep command with the -o option searches for the specified pattern substring in the input string “Extracting substring using patterns”.

Pattern-based substring extraction using "grep" command in Bash.The output shows the searched substring from the input string.

Common Issues of Bash Substring Extraction

When working with substring and its extraction, users may encounter some common issues like off-by-one errors, handling spaces, and unintended option interpretation. This section will discuss these issues with their corresponding solutions:

1. Off-By-One Errors

As Bash strings are zero-indexed, sometimes off-by-errors can occur if you start counting from 1 instead of 0. So be mindful that the initial character of the string is located at position 0 to prevent potential off-by-one errors.

2. Handling spaces

Handling spaces in Bash substring extraction requires careful consideration. Spaces can affect the interpretation of field separators and indices. When handling spaces in Bash:

  1. Use double quotes for variable expansion ("${variable:start:length}").
  2. Quote arguments in commands to preserve spaces (cut -d ' ' -f2).

3. Unintended Option Interpretation

When utilizing negative indices, it’s important to include a space before the ‘’. Omitting this space may lead Bash to interpret the negative index (say -10) as an option for the command rather than as a negative index. For example,

#!/bin/bash

#Define the string
string='Bash substring extraction'

#substring extraction
#with a space before the negative index
substring1=${string: -10}

#without a space before the negative index
substring2=${string:-10}

echo "The substring with space before negative index:"
echo $substring1
echo "The substring without space before negative index:"
echo $substring2

Unintended option error issue in substring extraction.Here, due to not using space before the negative index, the second substring extraction doesn’t occur and the main string is shown instead of the extracted substring.

Conclusion

In conclusion, mastering substring and its manipulation is a valuable skill for any Bash script developer. It’s necessary for parsing log files and manipulating text or data. This article discusses 4 methods of index-based substring extraction and 3 methods of pattern-based substring extraction. It also shows the common issues with their solutions that can cause problems while working with Bash substring. Hope this guide clears your concepts on the Bash substring and its extraction and eases your advanced approaches.

People Also Ask

What are the applications of substring in Bash?

Bash substring extraction is commonly used in scripting for various purposes, such as:

  1. Data processing and extraction.
  2. Text manipulation
  3. Data cleaning
  4. String manipulation in automation
  5. Filename manipulation

How do you replace a substring in Bash?

In Bash, you can replace a substring in a string using the ‘awk’ command. Here’s a simple example:

echo 'Hello, World!' | awk '{gsub(/World/, "Universe"); print}'

Here, gsub function inside the “awk” command searches for the regular expression /World/ in the input text and replaces all occurrences with the string “Universe”.

How to find a substring in a string in Bash?

In Bash, you can find a substring using the [[ ]] operator along with the * wildcard for pattern matching. Here’s an example:

#!/bin/bash

string="Hello, World!"
substring="World"

if [[ $string == *"$substringd"* ]]; then
echo "Substring found in the string."
else
echo "Substring not found in the string."
fi

What is the “substr” function in Unix?

The ‘substr’ function is a text-processing tool for extracting substring from a string based on a  specific starting position and optional length. The syntax is substr(string, start [, length]). In AWK, the “substr” function is used to extract a portion of a string.

Related Articles


<< Go Back to Bash String | Bash Scripting Tutorial

Rate this post
Auhona Islam

Auhona Islam is a dedicated professional with a background in Electronics and Communication Engineering (ECE) from Khulna University of Engineering & Technology. Graduating in 2023, Auhona is currently excelling in her role as a Linux content developer executive at SOFTEKO to provide a more straightforward route for Linux users. She aims to generate compelling materials for Linux users with her knowledge and skills. She holds her enthusiasm in the realm of Machine Learning (ML), Deep Learning (DL), and Artificial Intelligence (AI). Apart from these, she has a passion for playing instruments and singing. Read Full Bio

Leave a Comment