How to Extract Bash Substring? [5 methods]

Q: What is the “substr” function in Unix?

The ‘substr’ function is a text-processing tool for extracting substring from a string based on a specific starting position and optional length. The syntax is substr(string, start [, length]). In AWK, the “substr” function is used to extract a portion of a string.

In Bash, a substring is a part of a string derived or extracted from the string. Substring provides powerful features for text manipulation and processing. Substring extraction is essential for text manipulation.

You can use the following methods to extract Bash substring:

Using Bash’s substring expansion: ${input_string:start_index:length}
Using the “cut” command: cut -c N-M <<< input_string
Using the “awk” command: awk '{print substr($input_string,start_index, length)}'
Using the “expr” command: expr substr input_string start_index length
Using the “grep” command: echo input_string | grep -o “substring”

There are two types of bash substring extraction: index-based and pattern-based.

In this article, I’ll explain 4 methods of index-based substring extraction and 3 methods of pattern-based substring extraction in Bash. So let’s get started!

A. Index-Based Substring Extraction

Index-based extraction involves extracting a substring from an original string based on specified start and end positions of characters. Bash strings are zero-indexed. You can extract a substring based on the index in various ways like Bash’s substring expansion, using the “cut” command, using the “awk” command, and using the “expr” command.

1. Using Bash’s Substring Expansion

The simplest method to extract a substring from a string is to use the expression ${string:start_index:length} where the string variable holds the main text or string. The start_index denotes the initial position of characters from which the extraction begins, while the length specifies the size of the resulting substring.

You can check the following examples of substring extraction for a clearer understanding of the topic:

i. From the Start of the String

To extract a substring from the start of the string set the “start_index” value to 0 and specify the “length” as your preference. For example, to extract a substring of length 11 from the starting, you can check the following script:

#!/bin/bash

#Define a string variable
string="Linuxsimply and Linux"


#Print the string variable
printf "The main string:\n$string"

#Extract a substring from the first character of the string
substring="${string:0:11}"

#Print the substring value
printf "\n\nThe substring:\n$substring\n"

EXPLANATION

The syntax "${string:0:11}" extracts a substring from the first character (index 0) and includes the next 11 characters of the string variable.

The output shows the extracted substring of a specified length from the main string.

ii. From the Middle of the String

To extract a substring from the middle of the string set the “start_index” to any index value rather than 0 and the “last_index” of the string, and specify the “length”.

You can check the following example to extract a substring of length 9 from the original string, starting at index 8:

#!/bin/bash

#Define a string variable
string="Extract substring from middle."

#Print the string variable
printf "The main string:\n$string"

#Extract a substring from the first character of the string
substring="${string:8:9}"

#Print the substring value
printf "\n\nThe substring:\n$substring\n"

EXPLANATION

In the script, substring="${string:8:9}" extract a substring from the string variable. The substring is extracted from the index 8 to index 16. As the substring length is 9, the ending index is 8+9-1=16.

The output shows the extracted substring from the middle of the main string.

iii. From the Positive Index Position

Positive position refers to the positions or indices counted from the beginning of the string, starting with 0 for the first character.

To extract a string from the positive position, provide the “start_index” to indicate from which the extraction should begin. Follow the script below:

#!/bin/bash

#Define a string variable
string="Extract substring from positive starting position."

#Print the string variable
printf "The main string:\n$string"

#Extract a substring from a positive starting position
substring="${string:18}"

#Print the substring value
printf "\n\nThe substring:\n$substring\n"

EXPLANATION

The code snippet substring="${string:18}" extracts a substring from the variable string, starting at index 18 and extending to the end of the string as length is not mentioned.

The result displays the substring obtained from the positive starting position and continuing until the end index.

iv. From the Negative Starting Index Position

The negative starting position refers to the character index counted backward of the string, with -1 representing the last character. To extract a substring from the negative starting position use the syntax, substring="${string: -start_index: length}".

See the following bash scripts to extract a substring from a negative position:

To simply specify the negative “start_index” use the code below:
```
#!/bin/bash

#Define a string variable
string="Extract substring from the negative starting position."

#Print the string variable
printf "The main string:\n$string"

#Extract a substring from the negative starting position
substring="${string: -27}"

#Print the substring value
printf "\n\nThe substring:\n$substring\n"
```
EXPLANATION
Here, substring="${string: -27}" extracts a substring from the negative “start_index” -27 which is n and continues until the end of the string.

The output shows the extracted substring from the negative starting position of the main string.
You can set both “start_index” and “length” too. In that case, follow the below script:
```
#!/bin/bash

#Define a string variable
string="Extract substring from the negative starting position."

#Print the string variable
printf "The main string:\n$string"

#Extract a substring from the negative starting position
substring="${string: -27: 8}"

#Print the substring value
printf "\n\nThe substring:\n$substring\n"
```
The output shows the extracted substring from the negative starting position which is 27th character from the end of the string, with a length of 8 characters.

2. Using the “cut” command

If you want to extract the Nth to Mth character of a main string using the “cut” command along with the -c option, you can use the syntax cut -c N-M <<< input_string.

Check the following example:

cut -c 9-17<<< 'Extract Substring'

EXPLANATION

Here, the cut command with the -c option extracts a substring consisting of characters 9 to 17 from the main string Extract Substring.

The output shows the extracted “Substring” from the main string “Extract Substring”.

3. Using the “awk” Command

The awk command is equipped with a built-in substr($s, i, n) function that allows to directly invoke the function for obtaining substrings. The “substr($s, i, n)” function has three arguments which are input string (s), start index (i), and length (n). The syntax to extract a substring using the “awk” command is as below:

awk '{print substr($s, i, n)}'

You can check the following example:

awk '{print substr($0, 11, 9)}' <<< 'Extract a substring'

EXPLANATION

The awk command extracts a substring from the input string ‘Extract a substring’. It starts at the 11th character (‘s’) and includes the next 9 characters.

The output shows the extracted substring from the input string.

4. Using the “expr” command

The expr command extracts a substring from a string based on a specific starting index and length whose syntax is expr substr input_string start_index length. Here, substr is a subcommand of expr. Check the following example:

expr substr "Extracting substring using awk" 12 9

EXPLANATION

The expr command extracts a substring from the given string, starting at position 12 and including the next 9 characters.

The output displays the extracted substring from the main string.

B. Pattern-Based Extraction

Pattern-based substring extraction in Bash involves using patterns or regular expressions to identify and isolate specific substrings within a larger string. This is usually achieved through tools like ‘grep’, ‘sed’, or ‘awk commands. In this section, 3 ways of pattern-based substring extraction will be discussed.

1. Using “cut” Command

To extract a substring, utilize the “cut” command with the -d option to define a delimiter and the -f option to designate the field number of the desired substring. The syntax is, cut -d '<delimiter>' -f <field_number>.

For a pattern-based substring extraction using the cut command, use the following Bash script:

#!/bin/bash

# Declare a variable
string="Extract Substring"

# Extract the substring
substring1=$(echo ${string} | cut -d ' ' -f 1)
substring2=$(echo ${string} | cut -d ' ' -f 2)

#Print the string variable
printf "The main string:\n$string\n\n"

# Print the substring
echo "First substring: $substring1"
echo "Second substring: $substring2"

EXPLANATION

The cut command along with the echo command extracts the fields (substring) from the original string based on the delimiter (space), which is specified by the option -d. The -f option specifies which field to extract.

The output shows the extracted fields from the main string using the space as a delimiter.

2. Using “awk” Command

You can utilize the awk command along with the field separator option -F. Follow the below script to extract pattern-based substring using the awk command:

#!/bin/bash

#Define a string variable
string="Try to extract substring using awk."

#Print the string variable
printf "The main string:\n$string\n\n"

#Extract a substring
awk -F 'to |using ' '{print $2}' <<< "$string"

EXPLANATION

Here, awk -F 'to |using ' sets the field separator -F to a regular expression that matches either “to ” or “using “. As a result, the awk command treats the text “to ” and “using ” as separate fields. So the string is separated into three fields. The '{print $2}' instructs awk to print the second field which is the text between “to” and “using”.

Here’s the extracted substring shown in the output.

3. Using “grep” Command

The grep command can search for a specific substring along with the -o option. Check the following example to extract substring using the “grep” command:

echo "Extracting substring using patterns" | grep -o 'substring'

EXPLANATION

The grep command with the -o option searches for the specified pattern substring in the input string “Extracting substring using patterns”.

The output shows the searched substring from the input string.

Common Issues of Bash Substring Extraction

When working with substring and its extraction, users may encounter some common issues like off-by-one errors, handling spaces, and unintended option interpretation. This section will discuss these issues with their corresponding solutions:

1. Off-By-One Errors

As Bash strings are zero-indexed, sometimes off-by-errors can occur if you start counting from 1 instead of 0. So be mindful that the initial character of the string is located at position 0 to prevent potential off-by-one errors.

2. Handling spaces

Handling spaces in Bash substring extraction requires careful consideration. Spaces can affect the interpretation of field separators and indices. When handling spaces in Bash:

Use double quotes for variable expansion ("${variable:start:length}").
Quote arguments in commands to preserve spaces (cut -d ' ' -f2).

3. Unintended Option Interpretation

When utilizing negative indices, it’s important to include a space before the ‘–’. Omitting this space may lead Bash to interpret the negative index (say -10) as an option for the command rather than as a negative index. For example,

#!/bin/bash

#Define the string
string='Bash substring extraction'

#substring extraction
#with a space before the negative index
substring1=${string: -10}

#without a space before the negative index
substring2=${string:-10}

echo "The substring with space before negative index:"
echo $substring1
echo "The substring without space before negative index:"
echo $substring2

Here, due to not using space before the negative index, the second substring extraction doesn’t occur and the main string is shown instead of the extracted substring.

Conclusion

In conclusion, mastering substring and its manipulation is a valuable skill for any Bash script developer. It’s necessary for parsing log files and manipulating text or data. This article discusses 4 methods of index-based substring extraction and 3 methods of pattern-based substring extraction. It also shows the common issues with their solutions that can cause problems while working with Bash substring. Hope this guide clears your concepts on the Bash substring and its extraction and eases your advanced approaches.

How to Extract Bash Substring? [5 methods]

A. Index-Based Substring Extraction

1. Using Bash’s Substring Expansion

i. From the Start of the String

ii. From the Middle of the String

iii. From the Positive Index Position

iv. From the Negative Starting Index Position

2. Using the “cut” command

3. Using the “awk” Command

4. Using the “expr” command

B. Pattern-Based Extraction

1. Using “cut” Command

2. Using “awk” Command

3. Using “grep” Command

Common Issues of Bash Substring Extraction

1. Off-By-One Errors

2. Handling spaces

3. Unintended Option Interpretation

Conclusion

People Also Ask

What are the applications of substring in Bash?

How do you replace a substring in Bash?

How to find a substring in a string in Bash?

What is the “substr” function in Unix?

Auhona Islam

Leave a Comment Cancel reply

Company

Services

Resources

Legal Corner