Regular Expressions
With the Regex crate added to our project, we’ll replace the pattern string
slice we’ve been using with a regular expression.
Using Regex
The Regex modules defines a new method that takes a regular expression,
attempts to compile it, and returns a Regex object.1
let pattern = "[Ee]xample";
let re = Regex::new(pattern);
Since compiling a regular expression can fail (e.g., due to an invalid pattern),
new returns a Result. Here is the function signature for new2:
fn new(re: &str) -> Result<Regex, Error>
The function signature indicates that the Ok variant returns a Regex, while
the Err variant returns an Error. Since our rustle program can’t continue
with an invalid regular expression, we need to catch that case, display a
helpful error message, and exit the program.
Let’s put all this together:
extern crate regex; // this is needed for the playground
use regex::Regex;
use std::process::exit;
fn main() {
let pattern = "(missing the closing parenthesis"; // invalid expression
// compile the regular expression
match Regex::new(pattern) {
// the underscore (_) means we are ignoring the value returned by new
Ok(_) => println!("{pattern} is a valid regular expression!"),
// e is the error value returned by new
Err(e) => {
eprintln!("{e}"); // eprintln! writes to standard error
exit(1); // exit with error code 1
}
};
}
Run the code to see the error. Then, correct it by adding the missing
parenthesis ) and re-run the code.
Updating Rustle
We now have enough context to modify our rustle program to include regular expression support. Below are the changes, with the unrelated parts of the program hidden:
#![allow(unused_imports)]
use std::fs::File;
use std::io::Read;
use std::io::{BufRead, BufReader};
use std::process::exit;
extern crate regex; // this is needed for the playground
use regex::Regex;
fn find_matching_lines(lines: &[String], regex: Regex) -> Vec<usize> {
lines
.iter()
.enumerate()
.filter_map(|(i, line)| match regex.is_match(line) {
true => Some(i),
false => None,
})
.collect() // turns anything iterable into a collection
}
fn create_intervals(
lines: Vec<usize>,
before_context: usize,
after_context: usize,
) -> Vec<(usize, usize)> {
lines
.iter()
.map(|line| {
(
line.saturating_sub(before_context),
line.saturating_add(after_context),
)
})
.collect()
}
fn merge_intervals(intervals: &mut Vec<(usize, usize)>) {
// merge overlapping intervals
intervals.dedup_by(|next, prev| {
if prev.1 < next.0 {
false
} else {
prev.1 = next.1;
true
}
})
}
fn print_results(intervals: Vec<(usize, usize)>, lines: Vec<String>) {
for (start, end) in intervals {
for (line_no, line) in
lines.iter().enumerate().take(end + 1).skip(start)
{
println!("{}: {}", line_no + 1, line)
}
}
}
fn read_file(file: impl Read) -> Vec<String> {
BufReader::new(file).lines().map_while(Result::ok).collect()
}
fn main() {
let poem = "I have a little shadow that goes in and out with me,
And what can be the use of him is more than I can see.
He is very, very like me from the heels up to the head;
And I see him jump before me, when I jump into my bed.
The funniest thing about him is the way he likes to grow -
Not at all like proper children, which is always very slow;
For he sometimes shoots up taller like an india-rubber ball,
And he sometimes gets so little that there's none of him at all.";
let mock_file = std::io::Cursor::new(poem);
// command line arguments
let pattern = "(all)|(little)";
let before_context = 1;
let after_context = 1;
// attempt to open the file
let lines = read_file(mock_file);
//let lines = match File::open(filename) {
// // convert the poem into lines
// Ok(file) => read_file(file),
// Err(e) => {
// eprintln!("Error opening {filename}: {e}");
// exit(1);
// }
//};
// compile the regular expression
let regex = match Regex::new(pattern) {
Ok(re) => re, // bind re to regex
Err(e) => {
eprintln!("{e}"); // write to standard error
exit(1);
}
};
// store the 0-based line number for any matched line
let match_lines = find_matching_lines(&lines, regex);
// create intervals of the form [a,b] with the before/after context
let mut intervals =
create_intervals(match_lines, before_context, after_context);
// merge overlapping intervals
merge_intervals(&mut intervals);
// print the lines
print_results(intervals, lines);
}
Don’t forget, you can reveal the hidden parts by clicking Show hidden lines.
The let regex = match Regex::new(pattern) variable binding expression might
seem a bit unusual. The pattern is discussed in the Rust documentation section
on Recoverable Errors with Result. To briefly explain: When the result is
Ok, this code extracts the inner re value from the Ok variant and moves it
to the variable regex.
Next
Onward to creating our own module!
-
The
Regexcrate includes excellent documentation and detailed examples to learn from. ↩