FUNDAMENTALS A Complete Guide for Beginners
In Bash, a substring is a part of a string derived or extracted from the string. Substring provides powerful features for text manipulation and processing. Substring extraction is essential for text manipulation.
You can use the following methods to extract Bash substring:
- Using Bash’s substring expansion:
${input_string:start_index:length}
- Using the “cut” command:
cut -c N-M <<< input_string
- Using the “awk” command:
awk '{print substr($input_string,start_index, length)}'
- Using the “expr” command:
expr substr input_string start_index length
- Using the “grep” command:
echo input_string | grep -o “substring”
There are two types of bash substring extraction: index-based and pattern-based.
In this article, I’ll explain 4 methods of index-based substring extraction and 3 methods of pattern-based substring extraction in Bash. So let’s get started!
A. Index-Based Substring Extraction
Index-based extraction involves extracting a substring from an original string based on specified start and end positions of characters. Bash strings are zero-indexed. You can extract a substring based on the index in various ways like Bash’s substring expansion, using the “cut” command, using the “awk” command, and using the “expr” command.
1. Using Bash’s Substring Expansion
The simplest method to extract a substring from a string is to use the expression ${string:start_index:length}
where the string variable holds the main text or string. The start_index
denotes the initial position of characters from which the extraction begins, while the length
specifies the size of the resulting substring.
You can check the following examples of substring extraction for a clearer understanding of the topic:
i. From the Start of the String
To extract a substring from the start of the string set the “start_index” value to 0 and specify the “length” as your preference. For example, to extract a substring of length 11 from the starting, you can check the following script:
#!/bin/bash
#Define a string variable
string="Linuxsimply and Linux"
#Print the string variable
printf "The main string:\n$string"
#Extract a substring from the first character of the string
substring="${string:0:11}"
#Print the substring value
printf "\n\nThe substring:\n$substring\n"
"${string:0:11}"
extracts a substring from the first character (index 0) and includes the next 11 characters of the string variable.The output shows the extracted substring of a specified length from the main string.
ii. From the Middle of the String
To extract a substring from the middle of the string set the “start_index” to any index value rather than 0 and the “last_index” of the string, and specify the “length”.
You can check the following example to extract a substring of length 9 from the original string, starting at index 8:
#!/bin/bash
#Define a string variable
string="Extract substring from middle."
#Print the string variable
printf "The main string:\n$string"
#Extract a substring from the first character of the string
substring="${string:8:9}"
#Print the substring value
printf "\n\nThe substring:\n$substring\n"
substring="${string:8:9}"
extract a substring from the string variable. The substring is extracted from the index 8 to index 16. As the substring length is 9, the ending index is 8+9-1=16
.The output shows the extracted substring from the middle of the main string.
iii. From the Positive Index Position
Positive position refers to the positions or indices counted from the beginning of the string, starting with 0 for the first character.
To extract a string from the positive position, provide the “start_index” to indicate from which the extraction should begin. Follow the script below:
#!/bin/bash
#Define a string variable
string="Extract substring from positive starting position."
#Print the string variable
printf "The main string:\n$string"
#Extract a substring from a positive starting position
substring="${string:18}"
#Print the substring value
printf "\n\nThe substring:\n$substring\n"
substring="${string:18}"
extracts a substring from the variable string, starting at index 18 and extending to the end of the string as length is not mentioned.The result displays the substring obtained from the positive starting position and continuing until the end index.
iv. From the Negative Starting Index Position
The negative starting position refers to the character index counted backward of the string, with -1 representing the last character. To extract a substring from the negative starting position use the syntax, substring="${string: -start_index: length}"
.
See the following bash scripts to extract a substring from a negative position:
- To simply specify the negative “start_index” use the code below:
#!/bin/bash #Define a string variable string="Extract substring from the negative starting position." #Print the string variable printf "The main string:\n$string" #Extract a substring from the negative starting position substring="${string: -27}" #Print the substring value printf "\n\nThe substring:\n$substring\n"
EXPLANATIONHere,substring="${string: -27}"
extracts a substring from the negative “start_index” -27 which isn
and continues until the end of the string.The output shows the extracted substring from the negative starting position of the main string.
-
You can set both “start_index” and “length” too. In that case, follow the below script:
#!/bin/bash #Define a string variable string="Extract substring from the negative starting position." #Print the string variable printf "The main string:\n$string" #Extract a substring from the negative starting position substring="${string: -27: 8}" #Print the substring value printf "\n\nThe substring:\n$substring\n"
The output shows the extracted substring from the negative starting position which is 27th character from the end of the string, with a length of 8 characters.
2. Using the “cut” command
If you want to extract the Nth to Mth character of a main string using the “cut” command along with the -c
option, you can use the syntax cut -c N-M <<< input_string
.
Check the following example:
cut -c 9-17<<< 'Extract Substring'
Here, the cut command with the -c
option extracts a substring consisting of characters 9 to 17 from the main string Extract Substring
.
The output shows the extracted “Substring” from the main string “Extract Substring”.
3. Using the “awk” Command
The awk command is equipped with a built-in substr($s, i, n) function that allows to directly invoke the function for obtaining substrings. The “substr($s, i, n)” function has three arguments which are input string (s), start index (i), and length (n). The syntax to extract a substring using the “awk” command is as below:
awk '{print substr($s, i, n)}'
You can check the following example:
awk '{print substr($0, 11, 9)}' <<< 'Extract a substring'
The awk command extracts a substring from the input string ‘Extract a substring’. It starts at the 11th character (‘s’) and includes the next 9 characters.
The output shows the extracted substring from the input string.
4. Using the “expr” command
The expr command extracts a substring from a string based on a specific starting index and length whose syntax is expr substr input_string start_index length
. Here, substr is a subcommand of expr. Check the following example:
expr substr "Extracting substring using awk" 12 9
The output displays the extracted substring from the main string.
B. Pattern-Based Extraction
Pattern-based substring extraction in Bash involves using patterns or regular expressions to identify and isolate specific substrings within a larger string. This is usually achieved through tools like ‘grep’, ‘sed’, or ‘awk commands. In this section, 3 ways of pattern-based substring extraction will be discussed.
1. Using “cut” Command
To extract a substring, utilize the “cut” command with the -d
option to define a delimiter and the -f
option to designate the field number of the desired substring. The syntax is, cut -d '<delimiter>' -f <field_number>
.
For a pattern-based substring extraction using the cut command, use the following Bash script:
#!/bin/bash
# Declare a variable
string="Extract Substring"
# Extract the substring
substring1=$(echo ${string} | cut -d ' ' -f 1)
substring2=$(echo ${string} | cut -d ' ' -f 2)
#Print the string variable
printf "The main string:\n$string\n\n"
# Print the substring
echo "First substring: $substring1"
echo "Second substring: $substring2"
The cut command along with the echo command extracts the fields (substring) from the original string based on the delimiter (space), which is specified by the option -d
. The -f
option specifies which field to extract.
The output shows the extracted fields from the main string using the space as a delimiter.
2. Using “awk” Command
You can utilize the awk command along with the field separator option -F
. Follow the below script to extract pattern-based substring using the awk command:
#!/bin/bash
#Define a string variable
string="Try to extract substring using awk."
#Print the string variable
printf "The main string:\n$string\n\n"
#Extract a substring
awk -F 'to |using ' '{print $2}' <<< "$string"
awk -F 'to |using '
sets the field separator -F
to a regular expression that matches either “to ” or “using “. As a result, the awk command treats the text “to ” and “using ” as separate fields. So the string is separated into three fields. The '{print $2}'
instructs awk to print the second field which is the text between “to” and “using”.Here’s the extracted substring shown in the output.
3. Using “grep” Command
The grep command can search for a specific substring along with the -o
option. Check the following example to extract substring using the “grep” command:
echo "Extracting substring using patterns" | grep -o 'substring'
The grep command with the -o
option searches for the specified pattern substring
in the input string “Extracting substring using patterns”.
The output shows the searched substring from the input string.
Common Issues of Bash Substring Extraction
When working with substring and its extraction, users may encounter some common issues like off-by-one errors, handling spaces, and unintended option interpretation. This section will discuss these issues with their corresponding solutions:
1. Off-By-One Errors
As Bash strings are zero-indexed, sometimes off-by-errors can occur if you start counting from 1 instead of 0. So be mindful that the initial character of the string is located at position 0 to prevent potential off-by-one errors.
2. Handling spaces
Handling spaces in Bash substring extraction requires careful consideration. Spaces can affect the interpretation of field separators and indices. When handling spaces in Bash:
- Use double quotes for variable expansion (
"${variable:start:length}"
). - Quote arguments in commands to preserve spaces (
cut -d ' ' -f2
).
3. Unintended Option Interpretation
When utilizing negative indices, it’s important to include a space before the ‘–’. Omitting this space may lead Bash to interpret the negative index (say -10) as an option for the command rather than as a negative index. For example,
#!/bin/bash
#Define the string
string='Bash substring extraction'
#substring extraction
#with a space before the negative index
substring1=${string: -10}
#without a space before the negative index
substring2=${string:-10}
echo "The substring with space before negative index:"
echo $substring1
echo "The substring without space before negative index:"
echo $substring2
Here, due to not using space before the negative index, the second substring extraction doesn’t occur and the main string is shown instead of the extracted substring.
Conclusion
In conclusion, mastering substring and its manipulation is a valuable skill for any Bash script developer. It’s necessary for parsing log files and manipulating text or data. This article discusses 4 methods of index-based substring extraction and 3 methods of pattern-based substring extraction. It also shows the common issues with their solutions that can cause problems while working with Bash substring. Hope this guide clears your concepts on the Bash substring and its extraction and eases your advanced approaches.
People Also Ask
What are the applications of substring in Bash?
Bash substring extraction is commonly used in scripting for various purposes, such as:
- Data processing and extraction.
- Text manipulation
- Data cleaning
- String manipulation in automation
- Filename manipulation
How do you replace a substring in Bash?
In Bash, you can replace a substring in a string using the ‘awk’ command. Here’s a simple example:
echo 'Hello, World!' | awk '{gsub(/World/, "Universe"); print}'
Here, gsub function inside the “awk” command searches for the regular expression /World/ in the input text and replaces all occurrences with the string “Universe”.
How to find a substring in a string in Bash?
In Bash, you can find a substring using the [[ ]]
operator along with the *
wildcard for pattern matching. Here’s an example:
#!/bin/bash
string="Hello, World!"
substring="World"
if [[ $string == *"$substringd"* ]]; then
echo "Substring found in the string."
else
echo "Substring not found in the string."
fi
What is the “substr” function in Unix?
The ‘substr’ function is a text-processing tool for extracting substring from a string based on a specific starting position and optional length. The syntax is substr(string, start [, length])
. In AWK, the “substr” function is used to extract a portion of a string.
Related Articles
- Bash String Basics
- Bash String Operations
- String Manipulation in Bash
- 8 Methods to Split String in Bash [With Examples]
- Check String in Bash
- A Complete Guide to Bash Regex
- Bash Multiline String
<< Go Back to Bash String | Bash Scripting Tutorial