Welcome to Fast Track to Rust
This course is designed to introduce you to the Rust programming language by building an actual program. We'll be developing a grep-like program with a minimal subset of the features found in the GNU 1 grep 2 utility. This means we'll start with what we know and iterate upon the design as we learn more about the language.
The official URL for this online course is: https://freddiehaddad.github.io/fast-track-to-rust
The source code can be found at: https://github.com/freddiehaddad/fast-track-to-rust
The goal is to teach you Rust, so it's assumed you don't know anything about the language. However, it's also assumed that you have experience programming in other languages like C++, and are familiar with multithreading, data structures, and program memory organization (i.e., heap, stack, etc.).
This course moves relatively quickly. Most topics are covered enough to provide a fundamental understanding. As you progress, references to the official Rust documentation are provided for further exploration.
Enough chit chat, let's get started!
GNU is an collection of free software commonly used as an operating system. The family of operating systems, known as Linux, are built from them.
Grep searches one or more input files for lines containing a match to a specified pattern. By default, Grep outputs the matching lines.
Shortcuts
This course, developed with mdBook, features efficient keyboard shortcuts for streamlined navigation.
Keyboard Shortcuts
Arrow-Left
: Navigate to the previous page.Arrow-Right
: Navigate to the next page.s
: Activate the search bar.Ctrl+Enter
: Execute the code in the focused code block.
Installation
Most exercises in this book can be completed in your web browser, with a few exceptions. However, it is recommended to install the Rust development toolchain on your system. Once installed, you'll have everything you need to build and run Rust code locally.
Installation is accomplished using rustup
, the Rust toolchain installer. Once
completed, you'll have rustup
, cargo
, and rustc
.
Follow the instructions at https://rustup.rs/ to complete the installation.
Configuring your IDE
It is recommended (but not required) to configure your editor to work with Rust. With Cargo installed, most editors, such as VS Code, will automatically install rust-analyzer, the LSP implementation for the Rust programming language.
Rust Playground
In addition to running and editing code in the browser, you might find the Rust Playground helpful. It's a web interface for running Rust code.
fn main() { println!("Welcome to Fast Track to Rust!"); }
Exercise
- Hover the mouse over the code block and click the arrow button to run the code.
- Modify the
"Welcome to Fast Track to Rust!"
string and re-run the code. - Revert your changes by hovering your mouse over the code block and clicking the Undo changes button.
- Run the hello world program in the playground.
Using Cargo
In this chapter we'll cover using Cargo to create a package, compile the boilerplate code, and run the program. The package we create will be used for the remainder of the book to build our Grep program.
Package Creation
Let's make sure all the prerequisites are in place. You should have followed the installation instructions to prepare your development environment. After those steps are complete, you should be able to run the following commands:
$ rustc --version
rustc 1.81.0 (eeb90cda1 2024-09-04)
$ cargo --version
cargo 1.81.0 (2dbb1af80 2024-08-20)
The version numbers might be different, but the output should look relatively similar.
If the above commands worked, you're ready to go!
-
Use
cargo new grep
to create a new Rust package namedgrep
for our project:$ cargo new grep ∙ Creating binary (application) `grep` package note: see more `Cargo.toml` keys and their definitions at https://doc.rust-lang.org/cargo/reference/manifest.html
-
Navigate to the
grep
directory and usecargo run
to build and run the program:$ cd grep $ cargo run Compiling grep v0.1.0 (S:\projects\git\grep) Finished `dev` profile [unoptimized + debuginfo] target(s) in 0.43s Running `target\debug\grep.exe` Hello, world!
-
Explore some of the other actions you can perform with
cargo
usingcargo --help
.cargo build
- compile the current packagecargo build --release
- build the project in release mode (with optimizations)cargo check
- analyze the current package and report errors (no object files generated)cargo test
- run unit tests- and more...
Summary
In this section, we:
- Created a package.
- Compiled and ran the boilerplate code.
- Learned a bit about Cargo.
Next
Let's see what cargo new
actually did!
Boilerplate
In the previous section, we used cargo new
to create a Rust package, which is
the most common way to create a package. Running this command generated some
boilerplate code. When we used cargo run
, a target
directory was created and
the Rust compiler rustc
was invoked, generating compiler-related files and an
executable binary within the target
directory.
Let's clean everything up with cargo clean
and return to the boilerplate. The
cargo clean
command removes the target directory where all the generated code
resides. After running cargo clean
, we're left with the boilerplate.
$ cargo clean
Removed 23 files, 2.9MiB total
.
├─ Cargo.lock
├─ Cargo.toml
└─ src
└─ main.rs
For now, we'll focus on main.rs
and discuss Cargo.lock
and Cargo.toml
later.
main.rs
main.rs
is a Rust source file that gets created as part of a package. It
includes a function named main
that serves as the program entry point.
The contents of main.rs
:
fn main() { println!("Hello, world!"); }
Some of the syntax should look familiar:
- Blocks of code are surrounded by curly braces
{}
. - Statements end with a semicolon
;
. - Function parameters are enclosed in parentheses
()
. - String literals are enclosed in double quotes
""
.
What may not be familiar:
fn
is the keyword used to declare a function.println!
is actually a macro, denoted by the!
suffix.
Summary
In this section, we:
- Explored the boilerplate code generated by
cargo new
. - Reviewed some basic Rust syntax.
Next
Let's dive into building our grep program. We'll start by setting up the basic structure and then gradually add more functionality.
String literals are actually string slices with type 'static &str
.
Macros are custom definitions that extend Rust's functionality and syntax. When invoked, macros are expanded at compile time and replaced with their definitions.
Language Basics
Rust includes the primitive types,1 statements, and expressions2 you're already familiar with. However, it also incorporates some powerful concepts from functional languages that we will explore throughout this course. To begin with our grep program, we'll keep things simple and gradually refactor the code to introduce more of what Rust has to offer. Let's start with variables.
Details about the built-in types in Rust can be found in the official Rust documentation.
Details about language statements and expressions can be found in the official Rust documentation.
Variables
Rust is a statically typed language, meaning that all variables have a type, and
this type must be known at compile time. The let
keyword is used to declare
variables, or more precisely, for variable binding.
The anatomy of a let
statement1 2:
let identifier: type = expression;
Type Inference
Since y
is passed as an argument to the print_value
function, which requires
a signed 32-bit integer, the compiler infers its type. Therefore, the explicit
type declaration for x
can be omitted.
fn print_value(value: i32) { println!("{value}"); } fn main() { let x: i32 = 10; let y = 20; print_value(x); print_value(y); }
Grep Variables
To begin with our grep program, we'll avoid handling user input via command line arguments for now. Instead, we'll hard code some strings and perform some simple grepping. Let's use the famous poem My Shadow by the poet Robert Louis Stevenson as our input.
fn main() { let pattern = "him"; let poem = "I have a little shadow that goes in and out with me, And what can be the use of him is more than I can see. He is very, very like me from the heels up to the head; And I see him jump before me, when I jump into my bed. The funniest thing about him is the way he likes to grow - Not at all like proper children, which is always very slow; For he sometimes shoots up taller like an india-rubber ball, And he sometimes gets so little that there’s none of him at all."; }
The next step is to search the poem for occurrences of the pattern and print the results. To achieve this, we'll need to learn a bit about control flow.
Summary
In this section, we:
- Learned how to declare variables.
- Explored Rust's type inference.
Next
Onward to control flow.
let
statements support more advanced features that are not being covered
yet.
In many cases, the compiler can infer the type allowing you to omit it.
Control Flow
Some of the common methods in Rust for controlling the flow of a program include:
if
andif let
expressions- Loop expressions
match
expressions
These methods help you manage the execution flow and make your code more efficient and readable.
We will use a for
loop to iterate over all the lines in the poem, checking
each for the substring specified by pattern
. Each line will be evaluated using
a match
expression.
fn main() { let pattern = "him"; let poem = "I have a little shadow that goes in and out with me, And what can be the use of him is more than I can see. He is very, very like me from the heels up to the head; And I see him jump before me, when I jump into my bed. The funniest thing about him is the way he likes to grow - Not at all like proper children, which is always very slow; For he sometimes shoots up taller like an india-rubber ball, And he sometimes gets so little that there’s none of him at all."; for line in poem.lines() { match line.contains(pattern) { true => println!("{line}"), false => (), } } }
for
Loops
The syntax of a for
loop should look familiar. However, something interesting
is happening on that line. Notice the method call poem.lines()
. You might have
thought poem was a string literal. How can it have methods? Well, as you guessed
and as was mentioned earlier, string slices are more than just string literals.
We'll explore them in more detail, so keep that in mind.
The purpose of the loop is quite clear: it iterates over each line in the poem.
match
Expressions
You might have already figured it out, but the match
expression is similar to
an if
expression in that it introduces a branch in the code execution.
However, it has a very powerful feature: when used, the compiler ensures that
all possible results of the scrutinee are covered. This aspect of match
expressions guarantees that all cases are handled.
Let's ensure we fully understand this. In the code snippet, comment out line 16
by prefixing it with //
, and then run the code.
Friendly compiler errors
Take a moment to appreciate just how helpful this compiler error is. The Rust compiler is truly your friend!
Compiling playground v0.0.1 (/playground)
error[E0004]: non-exhaustive patterns: `false` not covered
--> src/main.rs:14:15
|
14 | match line.contains(pattern) {
| ^^^^^^^^^^^^^^^^^^^^^^ pattern `false` not covered
|
= note: the matched value is of type `bool`
help: ensure that all possible cases are being handled by adding a match arm with a
| wildcard pattern or an explicit pattern as shown
|
15 ~ true => println!("{line}"),
16 ~ false => todo!(),
|
For more information about this error, try `rustc --explain E0004`.
error: could not compile `playground` (bin "playground") due to 1 previous error
The compiler error message is informing you that:
- The
contains
method returns a boolean value. - The
false
case is not covered. - The Rust compiler can provide more information via
rustc --explain E0004
.
Let's see what the Rust compiler has to say:
$ rustc --explain E0004
This error indicates that the compiler cannot guarantee a matching pattern for
one or more possible inputs to a match expression. Guaranteed matches are
required in order to assign values to match expressions, or alternatively,
determine the flow of execution.
Erroneous code example:
enum Terminator {
HastaLaVistaBaby,
TalkToMyHand,
}
let x = Terminator::HastaLaVistaBaby;
match x { // error: non-exhaustive patterns: `HastaLaVistaBaby` not covered
Terminator::TalkToMyHand => {}
}
If you encounter this error you must alter your patterns so that every possible
value of the input type is matched. For types with a small number of variants
(like enums) you should probably cover all cases explicitly. Alternatively, the
underscore `_` wildcard pattern can be added after all other patterns to match
"anything else". Example:
enum Terminator {
HastaLaVistaBaby,
TalkToMyHand,
}
let x = Terminator::HastaLaVistaBaby;
match x {
Terminator::TalkToMyHand => {}
Terminator::HastaLaVistaBaby => {}
}
// or:
match x {
Terminator::TalkToMyHand => {}
_ => {}
}
match
arms have the following structure:
match VALUE {
PATTERN = EXPRESSION,
PATTERN = EXPRESSION,
PATTERN = EXPRESSION,
}
()
Unit Type
In situations where we don't want to perform any action, such as in the false
arm, we can use the empty unit type ()
. We'll explore this more as we progress
through the course.
Summary
In this section, we:
- Explored
for
loop expressions. - Examined
match
expressions. - Learned how the Rust compiler provides helpful error messages.
- Were introduced to the empty unit type
()
.
Exercise
- Replace the
match
expression with anif
expression.
Solution
fn main() { let pattern = "him"; let poem = "I have a little shadow that goes in and out with me, And what can be the use of him is more than I can see. He is very, very like me from the heels up to the head; And I see him jump before me, when I jump into my bed. The funniest thing about him is the way he likes to grow - Not at all like proper children, which is always very slow; For he sometimes shoots up taller like an india-rubber ball, And he sometimes gets so little that there’s none of him at all."; for line in poem.lines() { if line.contains(pattern) { println!("{line}"); } } }
Next
Take a break if you need, and then let's continue!
Types
Let's take a moment to discuss the different types we've encountered:
- The empty unit type
()
is used when a function or expression does not return a value. It's essentially a placeholder indicating the absence of a meaningful value. - String slices
&str
are references to a portion of a string. They allow you to work with parts of a string without needing to own the entire string, making them efficient and flexible. - Signed 32-bit integers
i32
are integers that can store both positive and negative values within a specific range. They are commonly used for numerical operations where the size of the number is known and fits within the 32-bit limit.
Understanding these types will help us write more efficient and effective Rust code as we continue to build our grep program.
String Slice str
The most interesting type is probably str
. str
is a primitive string type in
Rust, and it's usually seen in its borrowed1 form &str
.
You can think of a string slice as an object with two components:
field | value |
---|---|
ptr | address |
len | unsigned integer |
Imagine a block of memory starting at address 0x10
containing the bytes for
the string literal "rust"
. This block of memory would store the individual
bytes representing each character in the string. In this case, the memory would
contain the bytes for 'r'
, 'u'
, 's'
, and 't'
, sequentially stored
starting from address 0x10.
This visualization helps in understanding how string literals are stored and accessed in memory:
0x10 | 0x11 | 0x12 | 0x13 |
---|---|---|---|
r | u | s | t |
When we bind the string literal "rust"
to the variable language
:
let language = "rust";
The memory would look like:
field | value |
---|---|
ptr | 0x10 |
len | 4 |
Primitive Types
Here are some additional primitive types in Rust:
type | description |
---|---|
array | A fixed-size array, denoted [T; N] , for the element type, T , and the non-negative compile-time constant size, N . |
bool | The boolean type. |
char | A character type. |
f32 | A 32-bit floating-point type (specifically, the “binary32” type defined in IEEE 754-2008). |
f64 | A 64-bit floating-point type (specifically, the “binary64” type defined in IEEE 754-2008). |
i8 | The 8-bit signed integer type. |
i16 | The 16-bit signed integer type. |
i32 | The 32-bit signed integer type. |
i64 | The 64-bit signed integer type. |
i128 | The 128-bit signed integer type. |
isize | The pointer-sized signed integer type. |
str | String slices. |
u8 | The 8-bit unsigned integer type. |
u16 | The 16-bit unsigned integer type. |
u32 | The 32-bit unsigned integer type. |
u64 | The 64-bit unsigned integer type. |
u128 | The 128-bit unsigned integer type. |
usize | The pointer-sized unsigned integer type. |
User-defined Types
In addition to the primitive types, Rust supports user-defined types. We'll cover them in more detail as we build our grep program. But to satisfy your curiosity, some of the user-defined types include:
type | description |
---|---|
struct | A heterogeneous product of other types, called the fields of the type.2 |
enum | An enumerated type is a nominal, heterogeneous disjoint union type, denoted by the name of an enum item.3 |
union | A union type is a nominal, heterogeneous C-like union, denoted by the name of a union item. |
These user-defined types allow for more complex and expressive code, enabling you to model real-world concepts more effectively. We'll explore these in greater depth as we progress through our project.
We'll explore what borrowing means during the course.
struct
types are analogous to struct
types in C.
The enum
type is analogous to a data constructor declaration in Haskell,
or a pick ADT in Limbo.
Programming Concepts
It's time to start adding some functionality to our grep program. In the next few sections, we'll cover topics such as variables and mutability, iterators, optional values, and closures. These topics are quite extensive, and covering them in full detail is not the objective. Instead, we'll lay the foundation and share core fundamentals to help you gain enough understanding to leverage the documentation successfully. Using the provided links is highly recommended for full details and comprehension.
Grep Features
Grep is a fairly large program with numerous features. In this course, we won't be implementing all of them. Instead, we'll concentrate on a subset of features (with slight variations in behavior) that will help you learn Rust. The features we'll be implementing are:
command line argument | description |
---|---|
-n , --line-number | Prefix each line of output with the 1-based line number within its input file. |
-A num , --after-context=num | Print num lines of trailing context after matching lines. |
-B num , --before-context=num | Print num lines of leading context before matching lines. |
Printing Line Numbers
The first feature we're going to add to our grep program is the ability to print the line number of any lines that match our pattern. Coming from a language like C/C++, you may be inclined to handle this with a variable that we increment with each loop iteration. While this would work, it introduces some of the issues that plague languages like C and C++ (e.g., out-of-bounds indexing, integer overflow, and more). We'll demonstrate this and follow it up with a more idiomatic approach. Let's get going!
Mutability
Two lines of code were added to our grep program on lines 13 and 18. The
variable i
will get incremented with each loop iteration and we'll use it to
print the line number. Go ahead and run the code.
fn main() { let pattern = "him"; let poem = "I have a little shadow that goes in and out with me, And what can be the use of him is more than I can see. He is very, very like me from the heels up to the head; And I see him jump before me, when I jump into my bed. The funniest thing about him is the way he likes to grow - Not at all like proper children, which is always very slow; For he sometimes shoots up taller like an india-rubber ball, And he sometimes gets so little that there’s none of him at all."; let i = 1; for line in poem.lines() { if line.contains(pattern) { println!("{i}: {line}"); } i += 1; } }
Uh-oh! Something went wrong. Luckily, the compiler (rustc
) emits very helpful
error messages.1 The variable i
is immutable!
Variables and Mutability
Variables in Rust are immutable by default. This design choice helps you write
code that leverages Rust's memory safety and fearless concurrency features. For
more details, you can refer to the section on variables and mutability in the
Rust documentation. While the language encourages immutability, there are
situations where mutating values is necessary. In such cases, we use the mut
keyword.
let mut i = 1;
Go ahead and add the mut
keyword to line 13 and run the program again.
We're going to explore cases where immutability is needed and appropriate. However, let's see how iterators can be used to avoid the need for a mutable counter.
A lot of effort has been put into making rustc
have great error messages.
To help understand how to interpret them, refer to the diagnostic
structure in the documentation.
Iterators
Iterators are pervasive in Rust, and delving into them in full detail would necessitate renaming this course to Slow Track to Rust. We'll cover the basics and direct you to the documentation for all the intricate details about iterators.
The Control Flow section explained that the call to
lines()
in the for
loop returns an iterator. We say that the for
loop
consumes the iterator. However, in Rust, we can do much more than just consume
iterators!
Iterator Adaptors
New iterators can be created using iterator adaptors. These adaptors generate new iterators that modify some aspect of the original. Let's use one to print line numbers.
The enumerate
function creates an iterator that yields the current iteration
count along with the value from the previous iterator, as tuples1 in the form
(index, value)
. In the loop, the type for i
is usize
, and the type for
line
is &str
.
fn main() { let pattern = "him"; let poem = "I have a little shadow that goes in and out with me, And what can be the use of him is more than I can see. He is very, very like me from the heels up to the head; And I see him jump before me, when I jump into my bed. The funniest thing about him is the way he likes to grow - Not at all like proper children, which is always very slow; For he sometimes shoots up taller like an india-rubber ball, And he sometimes gets so little that there’s none of him at all."; for (i, line) in poem.lines().enumerate() { if line.contains(pattern) { println!("{}: {line}", i + 1); } } }
Advanced Iterators
We can take it a step further by using the filter_map
iterator adapter to
return an iterator that includes only the items we want!
fn main() { let pattern = "him"; let poem = "I have a little shadow that goes in and out with me, And what can be the use of him is more than I can see. He is very, very like me from the heels up to the head; And I see him jump before me, when I jump into my bed. The funniest thing about him is the way he likes to grow - Not at all like proper children, which is always very slow; For he sometimes shoots up taller like an india-rubber ball, And he sometimes gets so little that there’s none of him at all."; for (line_no, line) in poem.lines() .enumerate() .filter_map(|(i, line)| match line.contains(pattern) { true => Some((i + 1, line)), false => None, }) { println!("{line_no}: {line}"); } }
Whoa! What just happened? We're on the fast track to learning Rust, so we're picking up the pace! Let's break this down because this code snippet needs some unpacking.
A tuple is a collection of values of different types and is constructed
using parentheses ()
.
Option
The Option
type represents an optional value that can either be Some
with a
value or None
. They are prevalent in Rust and serve various purposes. Options
are often used with pattern matching (i.e., match
expression). The
documentation for Option
provides extensive details on its usage.
Some(T)
and None
are two variants of the Option<T>
enum
.1
pub enum Option<T> {
None,
Some(T),
}
We haven't discussed generics yet. For now, just note that the
T
in theenum
represents any type.
Returning to our Loop
In the code snippet, we check if the pattern is a substring of the current line.
If it is, we return a tuple
with the line number (i+1
)2 and the line
,
enclosed within the Some
variant of the Option
enum
. If not, we return the
None
variant.
for (line_no, line) in
poem.lines()
.enumerate()
.filter_map(|(i, line)| match line.contains(pattern) {
true => Some((i + 1, line)),
false => None,
})
{
println!("{line_no}: {line}");
}
enum
s in Rust are similar to those of other compiled languages like C,
but have important differences that make them considerably more powerful.
Recall that grep uses 1-based line numbering. enumerate
begins at 0.
Closures
In Rust, closures are functions that can capture their surrounding environment, clearly express the programmer's intent, and achieve this with low-level performance. They are quite common in functional programming languages.
Closures in Rust have some distinct characteristics:
- Parameters are enclosed with
||
instead of()
. - Curly braces
{}
around the function body are optional for single-line expressions.
Returning to our Loop
fn main() { let pattern = "him"; let poem = "I have a little shadow that goes in and out with me, And what can be the use of him is more than I can see. He is very, very like me from the heels up to the head; And I see him jump before me, when I jump into my bed. The funniest thing about him is the way he likes to grow - Not at all like proper children, which is always very slow; For he sometimes shoots up taller like an india-rubber ball, And he sometimes gets so little that there’s none of him at all."; for (line_no, line) in poem.lines() .enumerate() .filter_map(|(i, line)| match line.contains(pattern) { true => Some((i + 1, line)), false => None, }) { println!("{line_no}: {line}"); } }
In the code snippet, the closure is:
|(i, line)| match line.contains(pattern) {
true => Some((i + 1, line)),
false => None,
}
filter_map
The filter_map
1 function creates an iterator that both filters and maps.
It calls a closure and filters out the results that are None
, leaving us with
an iterator of tuples containing the line number and line for all matches of our
pattern. When the for
loop consumes the iterator, we only need to print the
results, as we are left with only the lines that contained the pattern!
Recall that
enumerate
returns an iterator oftuple
s.filter_map
receives thetuple
as the argument to the closure.
Exercise
- Replace
filter_map
withmap
andfilter
to achieve the same output. Notice howfilter_map
results in more efficient and concise code.
Solution
fn main() { let pattern = "him"; let poem = "I have a little shadow that goes in and out with me, And what can be the use of him is more than I can see. He is very, very like me from the heels up to the head; And I see him jump before me, when I jump into my bed. The funniest thing about him is the way he likes to grow - Not at all like proper children, which is always very slow; For he sometimes shoots up taller like an india-rubber ball, And he sometimes gets so little that there’s none of him at all."; for (line_no, line) in poem .lines() .enumerate() .map(|(i, line)| (i + 1, line)) .filter(|(_line_no, line)| line.contains(pattern)) { println!("{line_no}: {line}"); } }
The underscore
_
prefix in_line_no
is how we tell the Rust compiler that we are intentionally ignoring the first argument. Without it the compiler will complain.
Summary
In this section we:
- introduced the
Option
enum
- learned about closures
- worked with iterators
- learned about the
mut
keyword
Next
Onward to collections!
Separate filter
and map
iterators also exist.
Collections
Rust's standard library features some highly powerful and useful data structures, referred to as collections. While we'll explore one of them in this course, it's definitely worth spending time reviewing the entire collections library.
The collections are commonly grouped into four categories:
- Sequences: Vec, VecDeque, LinkedList
- Maps: HashMap, BTreeMap
- Sets: HashSet, BTreeSet
- Misc: BinaryHeap
Grep
Currently, our grep program only outputs the line number and the matching line for each pattern match. It doesn't yet support printing lines before or after the match. So far, we haven't had a straightforward way to implement this functionality, and there are several challenges we need to address.
Handling Context
The --before-context
option requires us to print preceding lines when we
encounter a match. This necessitates a mechanism to reference those lines.
However, we should only print each line once to avoid duplicating lines when
consecutive lines within the context contain matches.
Consider the poem from our code:
I have a little shadow that goes in and out with me,
And what can be the use of him is more than I can see.
He is very, very like me from the heels up to the head;
And I see him jump before me, when I jump into my bed.
The funniest thing about him is the way he likes to grow -
Not at all like proper children, which is always very slow;
For he sometimes shoots up taller like an india-rubber ball,
And he sometimes gets so little that there’s none of him at all.
If the pattern is "him" and the number of lines to print before each match is
set to 2 (--before-context=2
), without a method to track printed lines, we
would end up with the following output:
1: I have a little shadow that goes in and out with me,
2: And what can be the use of him is more than I can see.
2: And what can be the use of him is more than I can see.
3: He is very, very like me from the heels up to the head;
4: And I see him jump before me, when I jump into my bed.
5:
4: And I see him jump before me, when I jump into my bed.
5:
6: The funniest thing about him is the way he likes to grow -
7: Not at all like proper children, which is always very slow;
8: For he sometimes shoots up taller like an india-rubber ball,
9: And he sometimes gets so little that there’s none of him at all.
This is not the behavior we want from our grep program. Instead, we aim for the following output:
1: I have a little shadow that goes in and out with me,
2: And what can be the use of him is more than I can see.
3: He is very, very like me from the heels up to the head;
4: And I see him jump before me, when I jump into my bed.
5:
6: The funniest thing about him is the way he likes to grow -
7: Not at all like proper children, which is always very slow;
8: For he sometimes shoots up taller like an india-rubber ball,
9: And he sometimes gets so little that there’s none of him at all.
A similar issue arises with --after-context
. Our solution involves using a
vector and tuples (since we are now familiar with them) to create intervals over
the ranges of lines we want to print.
Accessing Lines from the Input
The context problem indicates that we need a method to reference a range of
lines around each match. We'll address this by storing the lines in a collection
from the Rust standard library called a vector Vec<T>
.
Tracking the Lines Surrounding Each Match
Since we are familiar with them, we can use a tuple
to represent the starting
and ending intervals around each match. We'll need to track these intervals and
merge overlapping ones to avoid printing the same line multiple times. Once
again, a vector will be useful!
Next
Let's start coding!
Vec
A Vec
, short for vector, is a contiguous growable array type, written as
Vec<T>
. Although we haven't covered user-defined types yet, Vec
is
implemented using a struct
.1 It supports many familiar operations like
push
, pop
, and len
, but also includes some operations you might not be
familiar with.
Storing our Lines in a Vector
Let's populate a vector with the input we've been using so far in our grep program. There are several ways to create a vector, and the documentation provides plenty of examples. We'll explore a few of them.
Using push
Someone with a C++ background might be inclined to achieve this using a variant of the following form:
Hidden code
To make reading the code easier, only the new parts are visible. You can view previously seen code by clicking the Show hidden lines button when hovering over the code block.
fn main() { let poem = "I have a little shadow that goes in and out with me, And what can be the use of him is more than I can see. He is very, very like me from the heels up to the head; And I see him jump before me, when I jump into my bed. The funniest thing about him is the way he likes to grow - Not at all like proper children, which is always very slow; For he sometimes shoots up taller like an india-rubber ball, And he sometimes gets so little that there’s none of him at all."; let mut lines = Vec::new(); // each call to push mutates the vector for line in poem.lines() { lines.push(line); } // format text for debugging purposes println!("{lines:?}"); }
We haven't discussed traits yet, so for now, just know that :?
specifies
formatting the output with the Debug
2 trait. The vector type implements the
fmt::Debug
trait.
Using an Iterator
While using push
is functionally correct, it introduces a mutable variable
unnecessarily. An idiomatic solution would look like this:
use std::iter::FromIterator; // this line addresses a rust playground bug fn main() { let poem = "I have a little shadow that goes in and out with me, And what can be the use of him is more than I can see. He is very, very like me from the heels up to the head; And I see him jump before me, when I jump into my bed. The funniest thing about him is the way he likes to grow - Not at all like proper children, which is always very slow; For he sometimes shoots up taller like an india-rubber ball, And he sometimes gets so little that there’s none of him at all."; // convert poem into lines let lines = Vec::from_iter(poem.lines()); // format text for debugging purposes println!("{lines:?}"); }
Favoring immutability
The concept of being immutable by default is one of the many ways Rust encourages you to write code that leverages its safety features and supports easy concurrency.
Tracking all Matches
Now that we have a vector containing all the lines in our poem, let's create another vector to hold the line numbers where the pattern was found.
use std::iter::FromIterator; // this line addresses a rust playground bug fn main() { let poem = "I have a little shadow that goes in and out with me, And what can be the use of him is more than I can see. He is very, very like me from the heels up to the head; And I see him jump before me, when I jump into my bed. The funniest thing about him is the way he likes to grow - Not at all like proper children, which is always very slow; For he sometimes shoots up taller like an india-rubber ball, And he sometimes gets so little that there’s none of him at all."; let pattern = "all"; // convert the poem into lines let lines = Vec::from_iter(poem.lines()); // store the 0-based line number for any matched line let match_lines: Vec<_> = lines // inferred type (_) .iter() .enumerate() .filter_map(|(i, line)| match line.contains(pattern) { true => Some(i), false => None, }) .collect(); // turns anything iterable into a collection // format text for debugging purposes println!("{match_lines:?}"); }
Instead of using from_iter
, you can also use collect
.
The inferred type (
_
) inVec<_>
asks the compiler to infer the type if possible based on the surrounding context.2
Creating Intervals from Matched Lines
The map
function invokes the closure
once for each value in the vector,
passing line_no
(the line number) to the function. We use this to identify the
lines before and after the match that we want to print. To handle potential
out-of-bounds indexing, we use saturating_add
and saturating_sub
.
use std::iter::FromIterator; // this line addresses a rust playground bug fn main() { let poem = "I have a little shadow that goes in and out with me, And what can be the use of him is more than I can see. He is very, very like me from the heels up to the head; And I see him jump before me, when I jump into my bed. The funniest thing about him is the way he likes to grow - Not at all like proper children, which is always very slow; For he sometimes shoots up taller like an india-rubber ball, And he sometimes gets so little that there’s none of him at all."; let pattern = "all"; let before_context = 1; let after_context = 1; // convert the poem into lines let lines = Vec::from_iter(poem.lines()); // store the 0-based line number for any matched line let match_lines: Vec<_> = lines .iter() .enumerate() .filter_map(|(i, line)| match line.contains(pattern) { true => Some(i), false => None, }) .collect(); // turns anything iterable into a collection // create intervals of the form [a,b] with the before/after context let intervals: Vec<_> = match_lines .iter() .map(|line_no| { ( // prevent underflow line_no.saturating_sub(before_context), // prevent overflow line_no.saturating_add(after_context), ) }) .collect(); // format text for debugging purposes println!("{intervals:?}"); }
Merging Intervals
Merging overlapping intervals is straightforward because they are already
sorted, and the ending value of the next interval must be greater than the
current one. We could store the merged intervals in a new Vector, but that would
be inefficient. Instead, we'll make our intervals
vector mutable and take
advantage of the dedup_by
iterator adaptor to perform the merge.
use std::iter::FromIterator; // this line addresses a rust playground bug fn main() { let poem = "I have a little shadow that goes in and out with me, And what can be the use of him is more than I can see. He is very, very like me from the heels up to the head; And I see him jump before me, when I jump into my bed. The funniest thing about him is the way he likes to grow - Not at all like proper children, which is always very slow; For he sometimes shoots up taller like an india-rubber ball, And he sometimes gets so little that there’s none of him at all."; let pattern = "all"; let before_context = 1; let after_context = 1; // convert the poem into lines let lines = Vec::from_iter(poem.lines()); // store the 0-based line number for any matched line let match_lines: Vec<_> = lines .iter() .enumerate() .filter_map(|(i, line)| match line.contains(pattern) { true => Some(i), false => None, }) .collect(); // turns anything iterable into a collection // create intervals of the form [a,b] with the before/after context let mut intervals: Vec<_> = match_lines .iter() .map(|line| { ( line.saturating_sub(before_context), line.saturating_add(after_context), ) }) .collect(); // merge overlapping intervals intervals.dedup_by(|next, prev| { if prev.1 < next.0 { false } else { prev.1 = next.1; true } }); // format text for debugging purposes println!("{intervals:?}"); }
The dedup_by
iterator adaptor removes consecutive identical elements based
on a provided closure. It starts by calling the closure with the first (prev
)
and second (next
) elements in the collection. If the closure returns false
,
prev
becomes next
, next
is advanced, and the closure is called with the
new elements. If the closure returns true
, only next
is advanced. This
behavior allows us to update prev
to be the merged interval, enabling
efficient merging of overlapping intervals.
Tuple fields are accessed using a 0-based index, just like array elements.
Print the Results
With our intervals merged, we can now print out the results correctly!
use std::iter::FromIterator; // this line addresses a rust playground bug fn main() { let poem = "I have a little shadow that goes in and out with me, And what can be the use of him is more than I can see. He is very, very like me from the heels up to the head; And I see him jump before me, when I jump into my bed. The funniest thing about him is the way he likes to grow - Not at all like proper children, which is always very slow; For he sometimes shoots up taller like an india-rubber ball, And he sometimes gets so little that there’s none of him at all."; let pattern = "all"; let before_context = 1; let after_context = 1; // convert the poem into lines let lines = Vec::from_iter(poem.lines()); // store the 0-based line number for any matched line let match_lines: Vec<_> = lines .iter() .enumerate() .filter_map(|(i, line)| match line.contains(pattern) { true => Some(i), false => None, }) .collect(); // turns anything iterable into a collection // create intervals of the form [a,b] with the before/after context let mut intervals: Vec<_> = match_lines .iter() .map(|line| { ( line.saturating_sub(before_context), line.saturating_add(after_context), ) }) .collect(); // merge overlapping intervals intervals.dedup_by(|next, prev| { if prev.1 < next.0 { false } else { prev.1 = next.1; true } }); // print the lines for (start, end) in intervals { for (line_no, line) in lines .iter() .enumerate() .take(end + 1) .skip(start) { println!("{}: {}", line_no + 1, line) } } }
We Did It!
We have a working grep program! Take some time to review everything we've covered so far. The rest of the course is going to move quickly.
Summary
We covered quite a bit in this section:
collect
converts any iterable into a collection.saturating_add
andsaturating_sub
prevent overflow.skip
creates an iterator that skips elements untiln
elements are skipped or the end of the iterator is reached, whichever comes first.take
creates an iterator that yields elements untiln
elements are yielded or the end of the iterator is reached, whichever comes first.- The
println!
macro supports the:?
format specifier, which is intended to help with debugging Rust code. All public types should implement thefmt::Debug trait
.3
Next
Let's dive into the concepts of ownership and borrowing.
Refer to the Vec
source code see it's full implementation. The struct
definition is on line 397.
The Inferred type is part of the type system in Rust.
Details about the Debug
trait can be found
here.
Ownership
Ownership and borrowing are fundamental principles in Rust that ensure memory safety without needing a garbage collector. Ownership dictates how memory is managed, while borrowing allows you to reference data without taking ownership. Understanding these concepts is crucial for writing efficient and safe Rust programs.
The Rust documentation provides an in-depth exploration of these topics in Understanding Ownership, and it's highly recommended to spend some time reading that material. For now, let's focus on the core principles of ownership.
Core Principles
Keep these rules about ownership in mind as we progress through the course:
- Each value in Rust has an owner.
- There can only be one owner at a time.
- When the owner goes out of scope, the value is dropped.1
The term dropped means the memory is freed and the object's lifetime has ended.
There Can Be Only One
Currently, our entire grep program is contained within the main
function. This
approach was chosen to address the challenges of teaching a course in a way that
introduces important concepts logically and in an easily digestible manner.
However, the time has come to refactor some of the code from the main
function
into separate functions.
Let's start by creating a function to handle the work of finding all the lines in the input that contain the specified pattern. Here is the function signature:
fn find_matching_lines(lines: Vec<&str>, pattern: &str) -> Vec<usize>
This is the code that we'll transfer into the new function:
// store the 0-based line number for any matched line
let match_lines: Vec<_> = lines
.iter()
.enumerate()
.filter_map(|(i, line)| match line.contains(pattern) {
true => Some(i),
false => None,
})
.collect(); // turns anything iterable into a collection
Here is the revised code with the changes implemented. Review it and run the code.
use std::iter::FromIterator; // this line addresses a rust playground bug fn find_matching_lines(lines: Vec<&str>, pattern: &str) -> Vec<usize> { lines .iter() .enumerate() .filter_map(|(i, line)| match line.contains(pattern) { true => Some(i), false => None, }) .collect() // turns anything iterable into a collection // The return keyword is unnecessary when the returned value is the // final expression in a function. In this scenario, the semicolon (;) // is omitted. } fn main() { let poem = "I have a little shadow that goes in and out with me, And what can be the use of him is more than I can see. He is very, very like me from the heels up to the head; And I see him jump before me, when I jump into my bed. The funniest thing about him is the way he likes to grow - Not at all like proper children, which is always very slow; For he sometimes shoots up taller like an india-rubber ball, And he sometimes gets so little that there’s none of him at all."; // command line arguments let pattern = "all"; let before_context = 1; let after_context = 1; // convert the poem into lines let lines = Vec::from_iter(poem.lines()); // store the 0-based line number for any matched line let match_lines = find_matching_lines(lines, pattern); // create intervals of the form [a,b] with the before/after context let mut intervals: Vec<_> = match_lines .iter() .map(|line| { ( line.saturating_sub(before_context), line.saturating_add(after_context), ) }) .collect(); // merge overlapping intervals intervals.dedup_by(|next, prev| { if prev.1 < next.0 { false } else { prev.1 = next.1; true } }); // print the lines for (start, end) in intervals { for (line_no, line) in lines.iter().enumerate().take(end + 1).skip(start) { println!("{}: {}", line_no + 1, line) } } }
Uh-oh!
A minor code change caused an issue with the program. Fortunately, the Rust compiler offers helpful information for diagnosing the problem. However, if you're not familiar with Rust's ownership rules, understanding this error can be challenging. Let's break down the error and understand what went wrong. Here are the key details from the error message, cleaned up for readability:
error[E0382]: borrow of moved value: `lines`
--> src/main.rs:63:13
|
34 | let lines = Vec::from_iter(poem.lines());
| ----- move occurs because `lines` has type `Vec<&str>`, which
| does not implement the `Copy` trait
...
37 | let match_lines = find_matching_lines(lines, pattern);
| ----- value moved here
...
63 | lines.iter().enumerate().take(end + 1).skip(start)
| ^^^^^^^^^^^^ value borrowed here after move
|
Unpacking the Error
Copy trait, value moved, value borrowed. What the heck does all this mean?
Copy Trait
We'll explore traits in more detail later in the course. For now, just know
that when a type implements the Copy
trait, its values are duplicated when
assigned to a new variable. This means that after an assignment, both the
original and the new variable can be used independently.
The lines
vector we passed to the find_matching_lines
function does not
implement the Copy
trait.
Value Moved
By default, variable bindings in Rust follow move semantics.1 When a value is moved, its ownership is transferred to the new variable, rendering the original variable invalid and unusable.
Since the lines
vector does not implement the Copy
trait, its value was
moved, rendering the original value in main invalid.
Value Borrowed
Because the lines
variable in main
becomes invalid due to the move, any
attempt to borrow or reference its value is invalid. This is why the
compiler generates the message "value borrowed here after move".
Move Semantics
Rust's move semantics play an integral part in ensuring memory safety by detecting common memory-related errors, like null pointer dereferencing, buffer overflows, and use-after-free, during compile time, thereby preventing them from happening at runtime.
There are some exceptions in Rust. For example, most primitive types
implement the Copy
trait.
Borrowing
Given the error in the previous section about borrowing a value after it has been moved, let's now focus on how to borrow a value.
References
Recall from the section on the string slice str
that we said it's usually
seen in it's borrowed form &str
. The &
operator1 in the prefix position
represents a borrow. In find_matching_lines
, pattern
is borrowed.
fn find_matching_lines(lines: Vec<&str>, pattern: &str) -> Vec<usize>
Borrowing lines
In find_matching_lines
, we can borrow lines
by prefixing the parameter's
type with an &
, changing it to &Vec<&str>
, and by prefixing the variable
lines
in main
to &lines
. After making these changes and re-running the
program, we can see that it now works.
use std::iter::FromIterator; // this line addresses a rust playground bug fn find_matching_lines(lines: &Vec<&str>, pattern: &str) -> Vec<usize> { lines .iter() .enumerate() .filter_map(|(i, line)| match line.contains(pattern) { true => Some(i), false => None, }) .collect() // turns anything iterable into a collection } fn main() { let poem = "I have a little shadow that goes in and out with me, And what can be the use of him is more than I can see. He is very, very like me from the heels up to the head; And I see him jump before me, when I jump into my bed. The funniest thing about him is the way he likes to grow - Not at all like proper children, which is always very slow; For he sometimes shoots up taller like an india-rubber ball, And he sometimes gets so little that there’s none of him at all."; // command line arguments let pattern = "all"; let before_context = 1; let after_context = 1; // convert the poem into lines let lines = Vec::from_iter(poem.lines()); // store the 0-based line number for any matched line let match_lines = find_matching_lines(&lines, pattern); // create intervals of the form [a,b] with the before/after context let mut intervals: Vec<_> = match_lines .iter() .map(|line| { ( line.saturating_sub(before_context), line.saturating_add(after_context), ) }) .collect(); // merge overlapping intervals intervals.dedup_by(|next, prev| { if prev.1 < next.0 { false } else { prev.1 = next.1; true } }); // print the lines for (start, end) in intervals { for (line_no, line) in lines.iter().enumerate().take(end + 1).skip(start) { println!("{}: {}", line_no + 1, line) } } }
Continuing to Refactor
Let's create another function to handle the creation of our intervals. Here is the function signature:
fn create_intervals(
lines: Vec<usize>,
before_context: usize,
after_context: usize,
) -> Vec<(usize, usize)>
This is the code that we'll transfer into the new function:
// create intervals of the form [a,b] with the before/after context
let mut intervals: Vec<_> = match_lines
.iter()
.map(|line| {
(
line.saturating_sub(before_context),
line.saturating_add(after_context),
)
})
.collect();
Here is the revised code with the changes implemented. Review it and run the code.
use std::iter::FromIterator; // this line addresses a rust playground bug fn find_matching_lines(lines: &Vec<&str>, pattern: &str) -> Vec<usize> { lines .iter() .enumerate() .filter_map(|(i, line)| match line.contains(pattern) { true => Some(i), false => None, }) .collect() // turns anything iterable into a collection } fn create_intervals( lines: Vec<usize>, before_context: usize, after_context: usize, ) -> Vec<(usize, usize)> { lines .iter() .map(|line| { ( line.saturating_sub(before_context), line.saturating_add(after_context), ) }) .collect() } fn main() { let poem = "I have a little shadow that goes in and out with me, And what can be the use of him is more than I can see. He is very, very like me from the heels up to the head; And I see him jump before me, when I jump into my bed. The funniest thing about him is the way he likes to grow - Not at all like proper children, which is always very slow; For he sometimes shoots up taller like an india-rubber ball, And he sometimes gets so little that there’s none of him at all."; // command line arguments let pattern = "all"; let before_context = 1; let after_context = 1; // convert the poem into lines let lines = Vec::from_iter(poem.lines()); // store the 0-based line number for any matched line let match_lines = find_matching_lines(&lines, pattern); // create intervals of the form [a,b] with the before/after context let mut intervals = create_intervals(match_lines, before_context, after_context); // merge overlapping intervals intervals.dedup_by(|next, prev| { if prev.1 < next.0 { false } else { prev.1 = next.1; true } }); // print the lines for (start, end) in intervals { for (line_no, line) in lines.iter().enumerate().take(end + 1).skip(start) { println!("{}: {}", line_no + 1, line) } } }
Moving vs Borrowing
- Why is it possible to move
match_lines
without causing an error?- Considering heap allocations, what advantages might there be in moving
match_lines
instead of borrowing it?
Exercise
Develop a function named merge_intervals and transfer the specified code from main into this function, making any necessary updates. Construct another function called print_results and relocate the specified code from main into this function, updating it as needed.
- Create a function named
merge_intervals
and move the specified code frommain
into this function, making any necessary updates.// merge overlapping intervals intervals.dedup_by(|next, prev| { if prev.1 < next.0 { false } else { prev.1 = next.1; true } });
- Create another function called
print_results
and move the specified code frommain
into this function, updating it as needed.// print the lines for (start, end) in intervals { for (line_no, line) in lines.iter().enumerate().take(end + 1).skip(start) { println!("{}: {}", line_no + 1, line) } }
- Modify
main
to utilize these newly created functions.
You can complete these exercises by updating the most recent version of the code provided above.
Solution
use std::iter::FromIterator; // this line addresses a rust playground bug fn find_matching_lines(lines: &Vec<&str>, pattern: &str) -> Vec<usize> { lines .iter() .enumerate() .filter_map(|(i, line)| match line.contains(pattern) { true => Some(i), false => None, }) .collect() // turns anything iterable into a collection } fn create_intervals( lines: Vec<usize>, before_context: usize, after_context: usize, ) -> Vec<(usize, usize)> { lines .iter() .map(|line| { ( line.saturating_sub(before_context), line.saturating_add(after_context), ) }) .collect() } fn merge_intervals(intervals: &mut Vec<(usize, usize)>) { // merge overlapping intervals intervals.dedup_by(|next, prev| { if prev.1 < next.0 { false } else { prev.1 = next.1; true } }) } fn print_results(intervals: Vec<(usize, usize)>, lines: Vec<&str>) { for (start, end) in intervals { for (line_no, line) in lines.iter().enumerate().take(end + 1).skip(start) { println!("{}: {}", line_no + 1, line) } } } fn main() { let poem = "I have a little shadow that goes in and out with me, And what can be the use of him is more than I can see. He is very, very like me from the heels up to the head; And I see him jump before me, when I jump into my bed. The funniest thing about him is the way he likes to grow - Not at all like proper children, which is always very slow; For he sometimes shoots up taller like an india-rubber ball, And he sometimes gets so little that there’s none of him at all."; // command line arguments let pattern = "all"; let before_context = 1; let after_context = 1; // convert the poem into lines let lines = Vec::from_iter(poem.lines()); // store the 0-based line number for any matched line let match_lines = find_matching_lines(&lines, pattern); // create intervals of the form [a,b] with the before/after context let mut intervals = create_intervals(match_lines, before_context, after_context); // merge overlapping intervals merge_intervals(&mut intervals); // print the lines print_results(intervals, lines); }
Summary
Although the concepts of ownership and borrowing are relatively straightforward, they can be frustrating when learning Rust. Reading through the official documentation on Understanding Ownership will certainly help overcome this challenge. Keep the following points in mind as you continue on your journey with Rust:
Ownership Rules
- Each value in Rust has an owner.
- There can only be one owner at a time.
- When the owner goes out of scope, the value is dropped.
Borrowing Rules
- At any given time, you can have either one mutable reference or any number of immutable references.
- References must always be valid.
Next
With our new understanding of ownership and borrowing, let's switch our focus to error handling.
The &
operator can have different meanings depending on the context. For
example, when used an infix operator, it becomes a bitwise AND.
Error Handling
Many programs use mechanisms like exceptions for handling errors, but Rust takes
a different approach. Rust doesn't have exceptions. Instead, it uses the
Result
type for recoverable errors and the panic!
macro for unrecoverable
errors. Similar to the Option
enum
we previously discussed, Result
is
another enum
.
Result<T, E>
Result<T, E>
is the type used for returning and propagating errors. Similar to
Option
, it has two variants: Ok(T)
, which represents success and contains a
value, and Err(E)
, which represents an error and contains an error value. The
T
and E
are generic type parameters that we'll discuss in more detail soon.
For now, it's important to know that T
represents the type of the value
returned in a success case within the Ok
variant, and E
represents the type
of the error returned in a failure case within the Err
variant.
enum Result<T, E> {
Ok(T),
Err(E),
}
Additional Resources
Several sections in the Rust documentation thoroughly cover error handling.
Here are some useful links:
- Unrecoverable Errors with panic!: Useful information on how to troubleshoot code that panics.
- Recoverable Errors with Result: Deep dive into
Result
and general error handling. - To panic! or Not to panic!: General guidelines on how to decide whether to panic in library code.
- The question mark operator: Useful for propagating errors to the calling function.1
We'll cover the question mark operator ?
when we refactor the code.
File I/O
Our grep program wouldn't be complete without the ability to search text files.
Given the potential for I/O errors, adding this capability now is convenient as
we explore error handling and the Result
type. This also introduces us to
additional packages in Rust's standard library.
Up to this point, we've been able to use string literals in our grep program because dynamic memory allocation wasn't needed. However, now that we will be reading from a file, dynamic memory allocation becomes necessary. The string slice is no longer sufficient, so we need to utilize the
String
1 type in Rust.
Storing Data on the Heap
Should you find yourself needing to allocate memory directly on the heap, the
Box
type is commonly used. You can find numerous examples on its usage in the documentation on theBox
type.
Let's start by creating a function that reads a file and returns a vector of
strings (Vec<String>
) where each string represents a line. Here is the
function signature:2 3
fn read_file(file: File) -> Vec<String> {
todo!(); // see the footnote [^3]
}
This is the code that we'll add to the read_file
function:
BufReader::new(file).lines().map_while(Result::ok).collect()
The read_file
function accepts a file handle and utilizes BufReader
to
efficiently read the file line by line, storing each line in a vector of strings
(Vec<String>
), which it then returns to the caller.
Many less efficient methods for reading a file and storing the results in a collection typically involve iterating over each line, converting it to a string, and then pushing the string into a vector. This approach requires intermediate memory allocations, which can become costly for large files. Additionally, each line read from the file potentially involves a system call. The
BufReader
uses an internal buffer to read large chunks of data from the file, minimizing both memory allocations and system calls.
The modifications to the main
function:
fn main() {
// command line arguments
let pattern = "all";
let before_context = 1;
let after_context = 1;
let filename = "poem.txt";
// attempt to open the file
let lines = match File::open(filename) {
// convert the poem into lines
Ok(file) => read_file(file),
Err(e) => {
eprintln!("Error opening {filename}: {e}");
exit(1);
}
};
// store the 0-based line number for any matched line
let match_lines = find_matching_lines(&lines, pattern);
// create intervals of the form [a,b] with the before/after context
let mut intervals =
create_intervals(match_lines, before_context, after_context);
// merge overlapping intervals
merge_intervals(&mut intervals);
// print the lines
print_results(intervals, lines);
}
Unpacking the Code
There's a lot going on here, so let's break it down step by step.
read_file
fn read_file(file: File) -> Vec<String> {
BufReader::new(file).lines().map_while(Result::ok).collect()
}
-
BufReader
:BufReader::new(file)
creates a buffered reader from the providedFile
. This helps in efficiently reading the file line by line. -
lines()
: Thelines()
method onBufReader
returns an iterator over the lines in the file. Because reading from a file can file, each line is wrapped in aResult
, which can be eitherOk
(containing the line) orErr
(containing an error). -
map_while(Result::ok)
: Themap_while
method is used to transform the iterator. It applies theResult::ok
function to each item, which convertsOk(line)
toSome(line)
andErr(_)
toNone
. The iteration stops when the firstNone
is encountered. Here are the relevant parts from the source code, cleaned up for readability:pub enum Result<T, E> { Ok(T), Err(E), } impl<T, E> Result<T, E> { pub fn ok(self) -> Option<T> { match self { Ok(x) => Some(x), Err(_) => None, } } }
This conversion is necessary because the map method requires the closure to return an
Option
. ConvertingErr
toNone
drops the error value and causesmap_while
to stop yielding. -
collect()
: Thecollect()
method gathers all theSome(line)
values into aVec<String>
that gets returned to the caller.
main
In the main
function, we attempt to open a file, which can fail for various
reasons. If the Result
is Ok
, we call read_file
with the file value. Since
we don't need the file handle afterward, borrowing isn't necessary. If an error
occurs while opening the file, we use the eprintln!
macro to print the error
to standard error and then exit.
Putting it All Together
Here are the changes with the unrelated parts of the program hidden:
use std::fs::File;
use std::io::{BufRead, BufReader};
use std::process::exit;
fn find_matching_lines(lines: &[String], pattern: &str) -> Vec<usize> {
lines
.iter()
.enumerate()
.filter_map(|(i, line)| match line.contains(pattern) {
true => Some(i),
false => None,
})
.collect() // turns anything iterable into a collection
}
fn create_intervals(
lines: Vec<usize>,
before_context: usize,
after_context: usize,
) -> Vec<(usize, usize)> {
lines
.iter()
.map(|line| {
(
line.saturating_sub(before_context),
line.saturating_add(after_context),
)
})
.collect()
}
fn merge_intervals(intervals: &mut Vec<(usize, usize)>) {
// merge overlapping intervals
intervals.dedup_by(|next, prev| {
if prev.1 < next.0 {
false
} else {
prev.1 = next.1;
true
}
})
}
fn print_results(intervals: Vec<(usize, usize)>, lines: Vec<String>) {
for (start, end) in intervals {
for (line_no, line) in
lines.iter().enumerate().take(end + 1).skip(start)
{
println!("{}: {}", line_no + 1, line)
}
}
}
fn read_file(file: File) -> Vec<String> {
BufReader::new(file).lines().map_while(Result::ok).collect()
}
fn main() {
// command line arguments
let pattern = "all";
let before_context = 1;
let after_context = 1;
let filename = "poem.txt";
// attempt to open the file
let lines = match File::open(filename) {
// convert the poem into lines
Ok(file) => read_file(file),
Err(e) => {
eprintln!("Error opening {filename}: {e}");
exit(1);
}
};
// store the 0-based line number for any matched line
let match_lines = find_matching_lines(&lines, pattern);
// create intervals of the form [a,b] with the before/after context
let mut intervals =
create_intervals(match_lines, before_context, after_context);
// merge overlapping intervals
merge_intervals(&mut intervals);
// print the lines
print_results(intervals, lines);
}
Don't forget, you can reveal the hidden parts by clicking Show hidden lines.
Summary
- Rust requires acknowledging and handling errors before code compilation, ensuring robustness.
- Errors are categorized into recoverable (e.g., file not found) and unrecoverable (e.g., out-of-bounds access).
- Rust uses
Result<T, E>
for recoverable errors andpanic!
for unrecoverable errors, unlike other languages that use exceptions.
Next
To continue using the Rust Playground, opening an actual file isn't going to work. Let's see how we can leverage an in-memory buffer to represent an open file.
Strings are implemented as Vec<u8>
in Rust. Reference the
API for details.
Unfortunately, the Rust Playground doesn't support opening files, so you'll need to run this part of the code on your local machine.
Rust offers several useful macros that are handy for developing and
prototyping your program. todo!()
is one of them, and another is
unimplemented!()
.
Unlike many object-oriented programming languages that use this
, Rust uses
self
.
Cursor
Our production grep program now has the capability to access real files. However, the Rust Playground does not support opening files directly. To ensure this course remains functional in your web browser, we need to use an in-memory buffer to simulate a file. This technique of mocking an open file is also commonly used in unit testing, making it a valuable concept to explore.
The Cursor
1 is used with in-memory buffers to provide Read
or
Write
functionality. Without digressing too much, the BufReader
we are
using works on objects that implement the Read
trait. For now, think of a
trait as an interface in object-oriented programming languages, with some
differences.
Mocking a File with Cursor
We only need to make a few changes to our program to utilize Cursor
. Let's
start by updating the read_file
function to accept any object that implements
the Read
trait as an argument.
fn read_file(file: impl Read) -> Vec<String>
Next, we'll reintroduce our famous poem and use it as our in-memory buffer to represent our file.
let poem = "I have a little shadow that goes in and out with me,
And what can be the use of him is more than I can see.
He is very, very like me from the heels up to the head;
And I see him jump before me, when I jump into my bed.
The funniest thing about him is the way he likes to grow -
Not at all like proper children, which is always very slow;
For he sometimes shoots up taller like an india-rubber ball,
And he sometimes gets so little that there’s none of him at all.";
let mock_file = std::io::Cursor::new(poem);
Finally, we comment out the lines that handled opening a file and calling
read_file
, and instead, we directly call read_file
with mock_file
.
// attempt to open the file
let lines = read_file(mock_file);
//let lines = match File::open(filename) {
// // convert the poem into lines
// Ok(file) => read_file(file),
// Err(e) => {
// eprintln!("Error opening {filename}: {e}");
// exit(1);
// }
//};
Putting it All Together
With these changes applied to our grep program, we can once again utilize the Rust Playground to extend its functionality and continue learning Rust.
#![allow(unused_imports)] use std::fs::File; use std::io::Read; use std::io::{BufRead, BufReader}; use std::process::exit; fn find_matching_lines(lines: &[String], pattern: &str) -> Vec<usize> { lines .iter() .enumerate() .filter_map(|(i, line)| match line.contains(pattern) { true => Some(i), false => None, }) .collect() // turns anything iterable into a collection } fn create_intervals( lines: Vec<usize>, before_context: usize, after_context: usize, ) -> Vec<(usize, usize)> { lines .iter() .map(|line| { ( line.saturating_sub(before_context), line.saturating_add(after_context), ) }) .collect() } fn merge_intervals(intervals: &mut Vec<(usize, usize)>) { // merge overlapping intervals intervals.dedup_by(|next, prev| { if prev.1 < next.0 { false } else { prev.1 = next.1; true } }) } fn print_results(intervals: Vec<(usize, usize)>, lines: Vec<String>) { for (start, end) in intervals { for (line_no, line) in lines.iter().enumerate().take(end + 1).skip(start) { println!("{}: {}", line_no + 1, line) } } } fn read_file(file: impl Read) -> Vec<String> { BufReader::new(file).lines().map_while(Result::ok).collect() } fn main() { let poem = "I have a little shadow that goes in and out with me, And what can be the use of him is more than I can see. He is very, very like me from the heels up to the head; And I see him jump before me, when I jump into my bed. The funniest thing about him is the way he likes to grow - Not at all like proper children, which is always very slow; For he sometimes shoots up taller like an india-rubber ball, And he sometimes gets so little that there’s none of him at all."; let mock_file = std::io::Cursor::new(poem); // command line arguments let pattern = "all"; let before_context = 1; let after_context = 1; // attempt to open the file let lines = read_file(mock_file); //let lines = match File::open(filename) { // // convert the poem into lines // Ok(file) => read_file(file), // Err(e) => { // eprintln!("Error opening {filename}: {e}"); // exit(1); // } //}; // store the 0-based line number for any matched line let match_lines = find_matching_lines(&lines, pattern); // create intervals of the form [a,b] with the before/after context let mut intervals = create_intervals(match_lines, before_context, after_context); // merge overlapping intervals merge_intervals(&mut intervals); // print the lines print_results(intervals, lines); }
What? You don't believe me! Give it a whirl and see for yourself! 😄
Since we commented out the file handling code, some previously necessary imports are now unused. The
#![allow(unused_imports)]
attribute in Rust instructs the compiler to permit these unused imports without issuing warnings. We'll delve deeper into attributes when we discuss custom types and implement command line argument support.
Next
We're ready to add support for regular expressions for pattern matching. We'll take a brief detour to learn about project management in Rust, which will allow us to use packages (also known as crates) to add this functionality.
A Cursor wraps an in-memory buffer and provides it with a Seek
implementation.
Project Management
As your programs grow in size, it becomes crucial to organize your code effectively. By grouping related functionalities and separating distinct features, you make it easier to locate the code responsible for a specific feature and to modify how that feature operates. Let's look at how a Rust project can be organized.
Paraphrasing the Rust documentation, Rust offers several features to help you manage your code’s organization, including which details are exposed, which are private, and what names are in each scope of your programs. These features, often collectively referred to as the module system, include:
- Packages: A Cargo feature that lets you build, test, and share crates.
- Crates: A tree of modules that produces a library or executable.
- Modules and use: Allow you to control the organization, scope, and privacy of paths.
- Paths: A way of naming an item, such as a struct, function, or module.
All the details about managing projects can be found in the Managing Growing Projects with Packages, Crates, and Modules section of The Rust Programming Language documentation.
Crates
Crates are the smallest units of code that the Rust compiler processes at a time. For instance, the grep program we're developing is recognized as a crate by the Rust compiler.
Types of Crates
Crates are available in two types: binary and library. The compiler identifies
the type of crate being compiled based on the presence of either a main.rs
or
lib.rs
file.
Binary Crates
Binary crates contain a main.rs
file in the src
directory, feature a main
function, and are compiled into an executable.
Library Crates
Library crates contain a lib.rs
file in the src
directory, lack a main
function, and do not compile into an executable. Instead, they are designed to
be shared with other projects.
Packages
A package is a collection of one or more crates that provide a set of
functionalities. It includes a Cargo.toml
file that outlines how to build
those crates. A package can contain multiple binary crates, but only one library
crate at most. However, a package must contain at least one crate, whether it's
a library or binary crate.
Refer to the section on Packages and Crates in the Rust documentation.
Modules
Modules help us organize code within a crate for better readability and easy reuse. They also allow us to control the privacy of items, as code within a module is private by default. Private items are internal implementation details not accessible from outside. We can choose to make modules and their items public, exposing them for external code to use and depend on.
Refer to the section on Defining Modules to Control Scope and Privacy in the Rust documentation.
Summary
In this section, we were introduced to:
- Crates: A tree of modules that produces a library or executable
- Packages: A Cargo feature that lets you build, test, and share crates
- Modules: Let you control the organization of your project
Next
With a basic understanding of crates, packages, and modules, let's utilize one for regular expression support.
Dependencies
crates.io is the central package registry for the Rust community, serving as
a hub to discover and download packages. cargo
supports adding and managing
dependencies for you and is configured to use the package registry by
default.
We're going to explore a few crates in this course:
- regex: An implementation of regular expressions for Rust
- itertools: Extra iterator adaptors, iterator methods, free functions, and macros
- clap: A simple to use, efficient, and full-featured Command Line Argument Parser
- crossbeam: A set of tools for concurrent programming
Cargo.toml
Cargo.toml
is one of the boilerplate files generated by cargo new
. This is
where we manage dependencies and other configurations. The starter file is quite
simple.
[package]
name = "grep"
version = "0.1.0"
edition = "2021"
[dependencies]
Adding a Dependency
Adding dependencies is as simple as specifying the name of the crate and the
desired version1 in the [dependencies]
section of the Cargo.toml
file.
Here's how we add the regex
crate.
regex = "1.11.1"
We can also use cargo add
:
$ cargo add regex
Updating crates.io index
Adding regex v1.11.1 to dependencies
Features:
+ perf
+ perf-backtrack
+ perf-cache
+ perf-dfa
+ perf-inline
+ perf-literal
+ perf-onepass
+ std
+ unicode
+ unicode-age
+ unicode-bool
+ unicode-case
+ unicode-gencat
+ unicode-perl
+ unicode-script
+ unicode-segment
- logging
- pattern
- perf-dfa-full
- unstable
- use_std
Updating crates.io index
Locking 5 packages to latest compatible versions
Adding aho-corasick v1.1.3
Adding memchr v2.7.4
Adding regex v1.11.1
Adding regex-automata v0.4.9
Adding regex-syntax v0.8.5
Either method produces the same result.
[package]
name = "grep"
version = "0.1.0"
edition = "2021"
[dependencies]
regex = "1.11.1"
Exercise
These exercises must be completed locally.
- Add the
clap
crate and include thederive
2 feature. More information to accomplish this can be found here and here.
Solution
clap = { version = "4.5.21", features = ["derive"] }
regex = "1.11.1"
Semantic version numbers (i.e. SemVer) are supported. Refer to the documentation on specifying dependencies for more advanced version control.
Features of dependencies can be enabled within the dependency declaration. The features key indicates which features to enable. The Cargo Book covers this under the Dependency features section.
use-ing Dependencies
Now that we've added the regex
crate as a dependency, we are ready to use it
(pun intended). You can refer to the crate by the full path name regex::Regex
(crate::module) or, more commonly, by bringing the module into scope with the
use
1 keyword.
We've been doing this in our grep program:
use std::fs::File;
use std::io::Read;
use std::io::{BufRead, BufReader};
use std::process::exit;
By bringing the module into scope with use regex::Regex;
, we can reference it
with Regex
.
Now that we know how to add dependencies to our project and use them, it's time to add support for regular expressions!
The use
keyword can do more than just bring things into scope, so be
sure to check the documentation on use declarations for more details on
use
.
Regular Expressions
With the Regex
crate added to our project, we'll replace the pattern
string
slice we've been using with a regular expression.
Using Regex
The Regex
modules defines a new
method that takes a regular expression,
attempts to compile it, and returns a Regex
object.1
let pattern = "[Ee]xample";
let re = Regex::new(pattern);
Since compiling a regular expression can fail (e.g., due to an invalid pattern),
new
returns a Result
. Here is the function signature for new
2:
fn new(re: &str) -> Result<Regex, Error>
The function signature indicates that the Ok
variant returns a Regex
, while
the Err
variant returns an Error
. Since our grep program can't continue with
an invalid regular expression, we need to catch that case, display a helpful
error message, and exit the program.
Let's put all this together:
extern crate regex; // this is needed for the playground use regex::Regex; use std::process::exit; fn main() { let pattern = "(missing the closing parenthesis"; // invalid expression // compile the regular expression match Regex::new(pattern) { // the underscore (_) means we are ignoring the value returned by new Ok(_) => println!("{pattern} is a valid regular expression!"), // e is the error value returned by new Err(e) => { eprintln!("{e}"); // eprintln! writes to standard error exit(1); // exit with error code 1 } }; }
Run the code to see the error. Then, correct the it by adding the missing
parenthesis )
and re-run the code.
Updating Grep
We now have enough context to modify our Grep program to include regular expression support. Below are the changes, with the unrelated parts of the program hidden:
#![allow(unused_imports)] use std::fs::File; use std::io::Read; use std::io::{BufRead, BufReader}; use std::process::exit; extern crate regex; // this is needed for the playground use regex::Regex; fn find_matching_lines(lines: &[String], regex: Regex) -> Vec<usize> { lines .iter() .enumerate() .filter_map(|(i, line)| match regex.is_match(line) { true => Some(i), false => None, }) .collect() // turns anything iterable into a collection } fn create_intervals( lines: Vec<usize>, before_context: usize, after_context: usize, ) -> Vec<(usize, usize)> { lines .iter() .map(|line| { ( line.saturating_sub(before_context), line.saturating_add(after_context), ) }) .collect() } fn merge_intervals(intervals: &mut Vec<(usize, usize)>) { // merge overlapping intervals intervals.dedup_by(|next, prev| { if prev.1 < next.0 { false } else { prev.1 = next.1; true } }) } fn print_results(intervals: Vec<(usize, usize)>, lines: Vec<String>) { for (start, end) in intervals { for (line_no, line) in lines.iter().enumerate().take(end + 1).skip(start) { println!("{}: {}", line_no + 1, line) } } } fn read_file(file: impl Read) -> Vec<String> { BufReader::new(file).lines().map_while(Result::ok).collect() } fn main() { let poem = "I have a little shadow that goes in and out with me, And what can be the use of him is more than I can see. He is very, very like me from the heels up to the head; And I see him jump before me, when I jump into my bed. The funniest thing about him is the way he likes to grow - Not at all like proper children, which is always very slow; For he sometimes shoots up taller like an india-rubber ball, And he sometimes gets so little that there’s none of him at all."; let mock_file = std::io::Cursor::new(poem); // command line arguments let pattern = "(all)|(little)"; let before_context = 1; let after_context = 1; // attempt to open the file let lines = read_file(mock_file); //let lines = match File::open(filename) { // // convert the poem into lines // Ok(file) => read_file(file), // Err(e) => { // eprintln!("Error opening {filename}: {e}"); // exit(1); // } //}; // compile the regular expression let regex = match Regex::new(pattern) { Ok(re) => re, // bind re to regex Err(e) => { eprintln!("{e}"); // write to standard error exit(1); } }; // store the 0-based line number for any matched line let match_lines = find_matching_lines(&lines, regex); // create intervals of the form [a,b] with the before/after context let mut intervals = create_intervals(match_lines, before_context, after_context); // merge overlapping intervals merge_intervals(&mut intervals); // print the lines print_results(intervals, lines); }
Don't forget, you can reveal the hidden parts by clicking Show hidden lines.
The let regex = match Regex::new(pattern)
variable binding expression might
seem a bit unusual. The pattern is discussed in the Rust documentation section
on Recoverable Errors with Result. To briefly explain: When the result is
Ok
, this code extracts the inner re
value from the Ok
variant and moves it
to the variable regex
.
Next
Onward to creating our own module!
The Regex
crate includes excellent
documentation and detailed
examples to learn from.
The source code for new
can be found
here.
Custom Types
Rust provides two options for creating custom types: enum
and struct
.
The key difference between them is that enums have a fixed set of variants,
meaning they are used to indicate that a value is one of a specific set of
possible values. On the other hand, struct
is not limited to a set of
possible values. Structures allow for packaging related values into meaningful
groups.
We've already encountered enums when we looked at
Option
andResult
.
In our current grep program, we use tuples to represent intervals. In this
section, we’ll replace tuples with a custom Interval
type using a struct
,
and enhance it by adding methods for various operations. Finally, we’ll define
an enum
to represent potential error values when creating and interacting
with intervals.
Struct
We're going to create a type to represent an interval. Rust follows a
standardized naming convention for types, where structures use
UpperCamelCase
1 for "type-level" constructs. Therefore, we'll name our
struct
Interval
.
Defining our Struct
Our closed interval is going to represent the start and end of the lines to print:
struct Interval {
start: usize,
end: usize,
}
Defining Behavior
Functions and methods are added by defining them inside an impl
block:
impl Interval {
// Methods definitions
}
The new
Function
As it is common convention in Rust to define a function called new
for
creating an object, we'll begin by defining one to return an Interval
.
impl Interval {
fn new(start: usize, end: usize) -> Self {
Self { start, end }
}
}
The keyword
Self
The
Self
keywords in the return type and the body of the function are aliases for the type specified after theimpl
keyword, which in this case isInterval
. While the type can be explicitly stated, the common convention is to useSelf
.
Exclusion of the
return
keyword and (;
)The
return
keyword is unnecessary when the returned value is the final expression in a function. In this scenario, the semicolon (;
) is omitted. 2
Methods
Recall from our grep program that we merged overlapping intervals to prevent printing the same line multiple times. It would be useful to create a method to check if two intervals overlap and another method to merge overlapping intervals. Let's outline these methods!
fn overlaps(&self, other: &Interval) -> bool {
todo!();
}
fn merge(&self, other: &Self) -> Self {
todo!();
}
The
todo!()
andunimplemented!()
macros can be useful if you are prototyping and just want a placeholder to let your code pass type analysis.
The self
Method Receiver
You're probably accustomed to using the implicit this
pointer (a hidden first
parameter) in your class methods. In Rust, the self
keyword is used to
represent the receiver of a method and the convention is to omit the type for
this parameter. Depending on the intended behavior, it can be specified as
self
, &self
or &mut self
.3
Omitting the type after
self
is syntactic sugar. It's short forself: Self
. AsSelf
is an alias to the actual type, the full expansion would beself: Interval
,self: &Interval
orself: &mut Interval
.
Implementing the Methods
Remember that our grep program processes lines sequentially. This allows us to
optimize the detection of overlapping intervals. However, this approach limits
the versatility of our Interval
type. As an exercise, you can work on making
it more generic.
overlaps
The overlaps
method is fairly straightforward. We check if the end
of the
first interval is greater than or equal to the start
of the next interval. The
only caveat is the order of the comparison.
fn overlaps(&self, other: &Interval) -> bool {
self.end >= other.start
}
merge
The merge
method returns a new Interval
using the start
of the first
interval and the end
of the second. The same caveat applies: the receiver must
be the interval that comes first in the sequence.
In both cases, an immutable borrow for both intervals is sufficient since we do not need to mutate either value.
fn merge(&self, other: &Self) -> Self {
Interval::new(self.start, other.end)
}
Implementing merge_intervals
Rust programmers coming from a C++ background might be inclined to implement this function using some variation of the following algorithm:
fn merge_intervals(intervals: Vec<Interval>) -> Vec<Interval> {
let mut merged_intervals: Vec<Interval> = Vec::new();
let mut iter = intervals.into_iter();
match iter.next() {
Some(interval) => merged_intervals.push(interval),
None => return merged_intervals,
};
for interval in iter {
if let Some(previous_interval) = merged_intervals.last() {
if previous_interval.overlaps(&interval) {
let new_interval = previous_interval.merge(&interval);
merged_intervals.pop();
merged_intervals.push(new_interval);
} else {
merged_intervals.push(interval);
}
}
}
merged_intervals
}
While functionally correct, Rust features powerful crates that can make
implementing this behavior more concise. One such crate is Itertools
, which
provides extra iterator adaptors. To use this crate, specify it as a dependency
in Crates.toml
and include it in main.rs
with use itertools::Itertools
.
Let's see how the coalesce
adaptor can simplify the code:
use itertools::Itertools;
fn merge_intervals(intervals: Vec<Interval>) -> Vec<Interval> {
intervals
.into_iter()
.coalesce(|p, c| {
if p.overlaps(&c) {
Ok(p.merge(&c))
} else {
Err((p, c))
}
})
.collect()
}
into_iter
In both implementations, we use
into_iter
, which creates a consuming iterator that moves each value out of the vector from start to end. This is another example of how we can take advantage of move semantics.NOTE: Because the iterator consumes the data, the vector cannot be used afterward.
Updating Grep
It's time to update our Grep program to utilize our new type. Additionally, a few minor changes have been made to enhance the design, demonstrate some additional language features, and leverage move semantics. Give the program a run!
#![allow(unused_imports)] extern crate regex; // this is needed for the playground use itertools::Itertools; use regex::Regex; use std::fs::File; use std::io::Read; use std::io::{BufRead, BufReader}; use std::process::exit; fn find_matching_lines(lines: &[String], regex: Regex) -> Vec<usize> { lines .iter() .enumerate() .filter_map(|(i, line)| match regex.is_match(line) { true => Some(i), false => None, }) .collect() // turns anything iterable into a collection } fn create_intervals( lines: Vec<usize>, before_context: usize, after_context: usize, ) -> Vec<Interval> { lines .iter() .map(|line| { let start = line.saturating_sub(before_context); let end = line.saturating_add(after_context); Interval::new(start, end) }) .collect() } fn merge_intervals(intervals: Vec<Interval>) -> Vec<Interval> { // merge overlapping intervals intervals .into_iter() .coalesce(|p, c| { if p.overlaps(&c) { Ok(p.merge(&c)) } else { Err((p, c)) } }) .collect() } fn print_results(intervals: Vec<Interval>, lines: Vec<String>) { for interval in intervals { for (line_no, line) in lines .iter() .enumerate() .take(interval.end + 1) .skip(interval.start) { println!("{}: {}", line_no + 1, line) } } } fn read_file(file: impl Read) -> Vec<String> { BufReader::new(file).lines().map_while(Result::ok).collect() } fn main() { let poem = "I have a little shadow that goes in and out with me, And what can be the use of him is more than I can see. He is very, very like me from the heels up to the head; And I see him jump before me, when I jump into my bed. The funniest thing about him is the way he likes to grow - Not at all like proper children, which is always very slow; For he sometimes shoots up taller like an india-rubber ball, And he sometimes gets so little that there’s none of him at all."; let mock_file = std::io::Cursor::new(poem); // command line arguments let pattern = "(all)|(little)"; let before_context = 1; let after_context = 1; // attempt to open the file let lines = read_file(mock_file); //let lines = match File::open(filename) { // // convert the poem into lines // Ok(file) => read_file(file), // Err(e) => { // eprintln!("Error opening {filename}: {e}"); // exit(1); // } //}; // compile the regular expression let regex = match Regex::new(pattern) { Ok(re) => re, // bind re to regex Err(e) => { eprintln!("{e}"); // write to standard error exit(1); } }; // store the 0-based line number for any matched line let match_lines = find_matching_lines(&lines, regex); // create intervals of the form [a,b] with the before/after context let intervals = create_intervals(match_lines, before_context, after_context); // merge overlapping intervals let intervals = merge_intervals(intervals); // print the lines print_results(intervals, lines); } struct Interval { start: usize, end: usize, } impl Interval { fn new(start: usize, end: usize) -> Self { Self { start, end } } fn overlaps(&self, other: &Interval) -> bool { self.end >= other.start } fn merge(&self, other: &Self) -> Self { Interval::new(self.start, other.end) } }
Summarizing the Changes
Our grep program now makes use of our custom type and includes a few other enhancements.
Let's review the changes:
create_intervals
was updated with the following changes:
- The return type was changed to
Vec<Interval>
- The tuple created from
start
andend
are now used to create anInterval
merge_intervals
was updated with the following changes:
- The argument
intervals
now has a type ofVec<Interval>
and is moved instead of the mutable borrow dedup_by
was replaced withcoalesce
print_results
was updated with the following changes:
- The argument
intervals
is now aVec<Interval>
- The
take
andskip
iterator adaptors were updated to use the fields from theInterval
Each interval is dropped at the end of the loop iteration when written as
for interval in intervals
. If the loop were written asfor interval in &intervals
, we would borrow each value. The same applies if we had written the loop asfor interval in intervals.iter()
. The former is syntactic sugar for the latter.4
Casing conforms to RFC 430 (C-CASE).
Exclusion of return
is discussed
here.
Method syntax is covered in full detail here.
Details about consuming collections with for in
and into_iter
can be
found
here.
Enum
An enumeration, commonly known as an enum, is declared using the keyword
enum
. Compared to other languges, enums in Rust offer greater flexibility and
power. For instance, they support generics (as seen with the Option
enum
)
and can encapsulate values within their variants! 1
Using an enum
for Errors
For our grep program, we’ll create an enum
to represent possible errors during
Interval
operations. For example, when a user creates an Interval
, the
starting value must be less than or equal to the ending value. Additionally, if
a user wants to merge two intervals, they must overlap. If these conditions
aren't met, we’ll return an error (Err
) using one of our enum
variants.
Defining an enum
Below is the IntervalError
enum
, which lists the errors that we may need to
return:
enum IntervalError {
StartEndRangeInvalid,
NonOverlappingInterval,
}
Updating our Interval
First, we'll change the return type for the function new
and the method
merge
to a Result
with the Ok
variant being an Interval
and the Err
variant the appropriate IntervalError
:
fn new(start: usize, end: usize) -> Result<Self, IntervalError> {
if start <= end {
Ok(Self { start, end })
} else {
Err(IntervalError::StartEndRangeInvalid)
}
}
fn merge(&self, other: &Self) -> Result<Self, IntervalError> {
if self.overlaps(other) {
Ok(Self {
start: self.start,
end: other.end,
})
} else {
Err(IntervalError::NonOverlappingInterval)
}
}
Observe how an
Interval
is created innew
as opposed tomerge
. Since the parameter names in new precisely match the fields in theInterval
struct
definition, you can omit the field specifiers. This technique is referred to as field init shorthand syntax. 2
Updating Grep
With the Interval
changes implemented, we need to update the
create_intervals
and merge_intervals
functions to accommodate the Result
return type.
fn create_intervals(
lines: Vec<usize>,
before_context: usize,
after_context: usize,
) -> Result<Vec<Interval>, IntervalError> {
lines
.iter()
.map(|line| {
let start = line.saturating_sub(before_context);
let end = line.saturating_add(after_context);
Interval::new(start, end)
})
.collect()
}
In create_intervals
, the only change was the return type, which changed from
Vec<Interval>
to Result<Vec<Interval>, IntervalError>
. You might wonder why
this works. The concise answer is that the Result
type implements the
FromIterator
trait.
Imagine an intermediary collection of Result
values created from calls to
Interval::new(start, end)
. The FromIterator
trait implementation allows an
iterator over Result
values to be collected into a Result
containing a
collection of the underlying values or an Err
. 3 In other words, the value
contained in the Result
returned by Interval::new(start, end)
is stored in
the Vec<Interval>
collection which is then wrapped in a Result
.
We'll be exploring traits in the next section.
fn merge_intervals(intervals: Vec<Interval>) -> Vec<Interval> {
// merge overlapping intervals
intervals
.into_iter()
.coalesce(|p, c| p.merge(&c).or(Err((p, c))))
.collect()
}
In merge_intervals
, the only change required is to the closure used in
coalesce
. We attempt to merge two intervals and invoke the or
method of
Result
. If the merge is successful, returning Ok
, the value is passed back
to coalesce
. Otherwise, the Err((p, c))
value provided to or
is returned.
Result
methods likemap_err
,or
, andor_else
are often used in error handling because they allow benign errors to be managed while letting successful results pass through. Since themerge
method only merges overlapping intervals, we replace theErr
variant it returns with the tuple(p, c)
needed bycoalesce
.
Eager vs Lazy Evaluation
Depending on the use case, different
Result
methods may be more efficient. It's important to read the documentation to determine the best choice. For example, arguments passed toor
are eagerly evaluated. If you're passing the result of a function call, it's better to useor_else
, which is lazily evaluated.
The final change required is in main
. Since create_intervals
now returns a
Result
, we use a match
expression to check if the operation was successful.
In the case of an Err
, since it’s unrecoverable, we print an error message and
exit.
// create intervals of the form [a,b] with the before/after context
let intervals =
match create_intervals(match_lines, before_context, after_context) {
Ok(intervals) => intervals,
Err(_) => {
eprintln!("An error occurred while creating intervals");
exit(1);
}
};
Updating Grep
With our changes in place, our Interval now supports error handling via Result
and our grep program properly handles any errors.
#![allow(unused_imports)] extern crate regex; // this is needed for the playground use itertools::Itertools; use regex::Regex; use std::fs::File; use std::io::Read; use std::io::{BufRead, BufReader}; use std::process::exit; fn find_matching_lines(lines: &[String], regex: Regex) -> Vec<usize> { lines .iter() .enumerate() .filter_map(|(i, line)| match regex.is_match(line) { true => Some(i), false => None, }) .collect() // turns anything iterable into a collection } fn create_intervals( lines: Vec<usize>, before_context: usize, after_context: usize, ) -> Result<Vec<Interval>, IntervalError> { lines .iter() .map(|line| { let start = line.saturating_sub(before_context); let end = line.saturating_add(after_context); Interval::new(start, end) }) .collect() } fn merge_intervals(intervals: Vec<Interval>) -> Vec<Interval> { // merge overlapping intervals intervals .into_iter() .coalesce(|p, c| p.merge(&c).or(Err((p, c)))) .collect() } fn print_results(intervals: Vec<Interval>, lines: Vec<String>) { for interval in intervals { for (line_no, line) in lines .iter() .enumerate() .take(interval.end + 1) .skip(interval.start) { println!("{}: {}", line_no + 1, line) } } } fn read_file(file: impl Read) -> Vec<String> { BufReader::new(file).lines().map_while(Result::ok).collect() } fn main() { let poem = "I have a little shadow that goes in and out with me, And what can be the use of him is more than I can see. He is very, very like me from the heels up to the head; And I see him jump before me, when I jump into my bed. The funniest thing about him is the way he likes to grow - Not at all like proper children, which is always very slow; For he sometimes shoots up taller like an india-rubber ball, And he sometimes gets so little that there’s none of him at all."; let mock_file = std::io::Cursor::new(poem); // command line arguments let pattern = "(all)|(little)"; let before_context = 1; let after_context = 1; // attempt to open the file let lines = read_file(mock_file); //let lines = match File::open(filename) { // // convert the poem into lines // Ok(file) => read_file(file), // Err(e) => { // eprintln!("Error opening {filename}: {e}"); // exit(1); // } //}; // compile the regular expression let regex = match Regex::new(pattern) { Ok(re) => re, // bind re to regex Err(e) => { eprintln!("{e}"); // write to standard error exit(1); } }; // store the 0-based line number for any matched line let match_lines = find_matching_lines(&lines, regex); // create intervals of the form [a,b] with the before/after context let intervals = match create_intervals(match_lines, before_context, after_context) { Ok(intervals) => intervals, Err(_) => { eprintln!("An error occurred while creating intervals"); exit(1); } }; // merge overlapping intervals let intervals = merge_intervals(intervals); // print the lines print_results(intervals, lines); } enum IntervalError { StartEndRangeInvalid, NonOverlappingInterval, } struct Interval { start: usize, end: usize, } impl Interval { fn new(start: usize, end: usize) -> Result<Self, IntervalError> { if start <= end { Ok(Self { start, end }) } else { Err(IntervalError::StartEndRangeInvalid) } } fn overlaps(&self, other: &Interval) -> bool { self.end >= other.start } fn merge(&self, other: &Self) -> Result<Self, IntervalError> { if self.overlaps(other) { Ok(Self { start: self.start, end: other.end, }) } else { Err(IntervalError::NonOverlappingInterval) } } }
Summary
The decision to use the Result
type for error handling in Rust provides a
robust and flexible way of managing errors such as:
- Explicit Error Handling: Using
Result
makes error handling explicit. Functions that can fail return aResult
, which forces the caller to handle the potential error, making the code more reliable and less prone to unexpected failures. - Recoverable Errors: The
Result
type allows for recoverable errors. By returning aResult
, the function gives the caller the option to handle the error in a way that makes sense for their specific context, rather than immediately terminating the program. - Type Safety: Rust’s type system ensures that errors are handled
correctly. The
Result
type is part of this system, helping to prevent common errors like null pointer dereferencing and making the code safer and more predictable. - Composability: The
Result
type implements traits likeFromIterator
, which allows for powerful and flexible error handling patterns. This makes it easier to work with collections of results and to propagate errors through multiple layers of function calls.
Overall, the use of Result
aligns with Rust’s goals of safety, concurrency,
and performance, providing a clear and structured way to handle errors.
Exercise
-
Replace
Result::or
withResult::map_err
inmerge_intervals
.Solution
fn merge_intervals(intervals: Vec<Interval>) -> Vec<Interval> { // merge overlapping intervals intervals .into_iter() .coalesce(|p, c| p.merge(&c).map_err(|_| (p, c))) .collect() }
#![allow(unused_imports)] extern crate regex; // this is needed for the playground use itertools::Itertools; use regex::Regex; use std::fs::File; use std::io::Read; use std::io::{BufRead, BufReader}; use std::process::exit; fn find_matching_lines(lines: &[String], regex: Regex) -> Vec<usize> { lines .iter() .enumerate() .filter_map(|(i, line)| match regex.is_match(line) { true => Some(i), false => None, }) .collect() // turns anything iterable into a collection } fn create_intervals( lines: Vec<usize>, before_context: usize, after_context: usize, ) -> Result<Vec<Interval>, IntervalError> { lines .iter() .map(|line| { let start = line.saturating_sub(before_context); let end = line.saturating_add(after_context); Interval::new(start, end) }) .collect() } fn merge_intervals(intervals: Vec<Interval>) -> Vec<Interval> { // merge overlapping intervals intervals .into_iter() .coalesce(|p, c| p.merge(&c).or(Err((p, c)))) .collect() } fn print_results(intervals: Vec<Interval>, lines: Vec<String>) { for interval in intervals { for (line_no, line) in lines .iter() .enumerate() .take(interval.end + 1) .skip(interval.start) { println!("{}: {}", line_no + 1, line) } } } fn read_file(file: impl Read) -> Vec<String> { BufReader::new(file).lines().map_while(Result::ok).collect() } fn main() { let poem = "I have a little shadow that goes in and out with me, And what can be the use of him is more than I can see. He is very, very like me from the heels up to the head; And I see him jump before me, when I jump into my bed. The funniest thing about him is the way he likes to grow - Not at all like proper children, which is always very slow; For he sometimes shoots up taller like an india-rubber ball, And he sometimes gets so little that there’s none of him at all."; let mock_file = std::io::Cursor::new(poem); // command line arguments let pattern = "(all)|(little)"; let before_context = 1; let after_context = 1; // attempt to open the file let lines = read_file(mock_file); //let lines = match File::open(filename) { // // convert the poem into lines // Ok(file) => read_file(file), // Err(e) => { // eprintln!("Error opening {filename}: {e}"); // exit(1); // } //}; // compile the regular expression let regex = match Regex::new(pattern) { Ok(re) => re, // bind re to regex Err(e) => { eprintln!("{e}"); // write to standard error exit(1); } }; // store the 0-based line number for any matched line let match_lines = find_matching_lines(&lines, regex); // create intervals of the form [a,b] with the before/after context let intervals = match create_intervals(match_lines, before_context, after_context) { Ok(intervals) => intervals, Err(_) => { eprintln!("An error occurred while creating intervals"); exit(1); } }; // merge overlapping intervals let intervals = merge_intervals(intervals); // print the lines print_results(intervals, lines); } enum IntervalError { StartEndRangeInvalid, NonOverlappingInterval, } struct Interval { start: usize, end: usize, } impl Interval { fn new(start: usize, end: usize) -> Result<Self, IntervalError> { if start <= end { Ok(Self { start, end }) } else { Err(IntervalError::StartEndRangeInvalid) } } fn overlaps(&self, other: &Interval) -> bool { self.end >= other.start } fn merge(&self, other: &Self) -> Result<Self, IntervalError> { if self.overlaps(other) { Ok(Self { start: self.start, end: other.end, }) } else { Err(IntervalError::NonOverlappingInterval) } } }
Next
With our Interval complete, let’s make it a module!
Refer to the documentation on enum values.
Refer to the documentation on field init shorthand syntax.
Refer to the documentation on Collecting into a Result
for detailed
explanation.
Creating Modules
In the previous section, we defined an enum
and struct
. However, we didn't
actually create a module! In this section, we will. Before diving into the
details, the Rust documentation provides a Module Cheat Sheet which will be a
handy reference on your Rust journey.
Since we're building our module to work in this online book, we'll deviate
slightly from standard practice. Typically, modules are created in separate
files. For instance, our Interval
would be placed in a new file called
interval.rs
.
grep
├─ Cargo.lock
├─ Cargo.toml
└─ src
├─ interval.rs
└─ main.rs
Document generation
As we develop our
interval
module, it's a great opportunity to learn aboutrustdoc
and documentation generation. Theinterval
module will be annotated to enable documentation generation. If you're building the code locally, the simplest way to generate and view documentation is by usingcargo doc --open
.
Let's get started!
Scope and Privacy
We're going to convert all our Interval
related code into a module. To define
a module, we use the mod
keyword followed by the module's name and enclose the
body within curly braces.
mod interval {
// module body
}
Here's the new version of our grep program with the Interval
related parts
moved inside the interval
module. In addition, comments have been added to the
module which will be used to generate documentation in an upcoming section. Go
ahead and run the code to see it in action!
#![allow(unused_imports)] extern crate regex; // this is needed for the playground use itertools::Itertools; use regex::Regex; use std::fs::File; use std::io::Read; use std::io::{BufRead, BufReader}; use std::process::exit; fn find_matching_lines(lines: &[String], regex: Regex) -> Vec<usize> { lines .iter() .enumerate() .filter_map(|(i, line)| match regex.is_match(line) { true => Some(i), false => None, }) .collect() // turns anything iterable into a collection } fn create_intervals( lines: Vec<usize>, before_context: usize, after_context: usize, ) -> Result<Vec<Interval>, IntervalError> { lines .iter() .map(|line| { let start = line.saturating_sub(before_context); let end = line.saturating_add(after_context); Interval::new(start, end) }) .collect() } fn merge_intervals(intervals: Vec<Interval>) -> Vec<Interval> { // merge overlapping intervals intervals .into_iter() .coalesce(|p, c| p.merge(&c).map_err(|_| (p, c))) .collect() } fn print_results(intervals: Vec<Interval>, lines: Vec<String>) { for interval in intervals { for (line_no, line) in lines .iter() .enumerate() .take(interval.end + 1) .skip(interval.start) { println!("{}: {}", line_no + 1, line) } } } fn read_file(file: impl Read) -> Vec<String> { BufReader::new(file).lines().map_while(Result::ok).collect() } fn main() { let poem = "I have a little shadow that goes in and out with me, And what can be the use of him is more than I can see. He is very, very like me from the heels up to the head; And I see him jump before me, when I jump into my bed. The funniest thing about him is the way he likes to grow - Not at all like proper children, which is always very slow; For he sometimes shoots up taller like an india-rubber ball, And he sometimes gets so little that there’s none of him at all."; let mock_file = std::io::Cursor::new(poem); // command line arguments let pattern = "(all)|(little)"; let before_context = 1; let after_context = 1; // attempt to open the file let lines = read_file(mock_file); //let lines = match File::open(filename) { // // convert the poem into lines // Ok(file) => read_file(file), // Err(e) => { // eprintln!("Error opening {filename}: {e}"); // exit(1); // } //}; // compile the regular expression let regex = match Regex::new(pattern) { Ok(re) => re, // bind re to regex Err(e) => { eprintln!("{e}"); // write to standard error exit(1); } }; // store the 0-based line number for any matched line let match_lines = find_matching_lines(&lines, regex); // create intervals of the form [a,b] with the before/after context let intervals = match create_intervals(match_lines, before_context, after_context) { Ok(intervals) => intervals, Err(_) => { eprintln!("An error occurred while creating intervals"); exit(1); } }; // merge overlapping intervals let intervals = merge_intervals(intervals); // print the lines print_results(intervals, lines); } mod interval { /// A list specifying general categories of Interval errors. enum IntervalError { /// Start is not less than or equal to end StartEndRangeInvalid, /// Two intervals to be merged do not overlap NonOverlappingInterval, } /// A closed-interval [`start`, `end`] type used for representing a range of /// values between `start` and `end` inclusively. /// /// # Examples /// /// You can create an `Interval` using `new`. /// /// ```rust /// let interval = Interval::new(1, 10).unwrap(); /// assert_eq!(interval.start, 1); /// assert_eq!(interval.end, 10); /// ``` struct Interval { start: usize, end: usize, } impl Interval { /// Creates a new `Interval` set to `start` and `end`. /// /// # Examples /// /// ```rust /// let interval = Interval::new(1, 10).unwrap(); /// assert_eq!(interval.start, 1); /// assert_eq!(interval.end, 10); /// ``` fn new(start: usize, end: usize) -> Result<Self, IntervalError> { if start <= end { Ok(Self { start, end }) } else { Err(IntervalError::StartEndRangeInvalid) } } /// Checks if two intervals overlap. Overlapping intervals have at least /// one point in common. /// /// # Examples /// /// ```rust /// let a = Interval::new(1, 3).unwrap(); /// let b = Interval::new(3, 5).unwrap(); /// assert_eq!(a.overlaps(&b), true); /// assert_eq!(b.overlaps(&a), true); /// ``` /// /// ```rust /// let a = Interval::new(1, 5).unwrap(); /// let b = Interval::new(2, 4).unwrap(); /// assert_eq!(a.overlaps(&b), true); /// assert_eq!(b.overlaps(&a), true); /// ``` /// /// ```rust /// let a = Interval::new(1, 3).unwrap(); /// let b = Interval::new(4, 6).unwrap(); /// assert_eq!(a.overlaps(&b), false); /// assert_eq!(b.overlaps(&a), true); /// ``` fn overlaps(&self, other: &Interval) -> bool { self.end >= other.start } /// Merges two intervals returning a new `Interval`. /// /// The merged `Interval` range includes the union of ranges from each /// `Interval`. /// /// # Examples /// /// ```rust /// let a = Interval::new(1, 3).unwrap(); /// let b = Interval::new(3, 5).unwrap(); /// let c = a.merge(&b).unwrap(); /// assert_eq!(c.start, 1); /// assert_eq!(c.end, 5); /// ``` fn merge(&self, other: &Self) -> Result<Self, IntervalError> { if self.overlaps(other) { Ok(Self { start: self.start, end: other.end, }) } else { Err(IntervalError::NonOverlappingInterval) } } } }
Uh-oh! Looks like we have all kinds of compiler errors! Among the chaos is two different errors having to do with scope and privacy.
- [E0412]: cannot find type
Interval
in this scope - [E0433]: failed to resolve: use of undeclared type
Interval
Remember you can always use
rustc --explain EXXXX
which provides a detailed explanation of an error message.
Scope
The first error indicates that the Interval
type is not in scope. This is
because it is now part of the interval
module. To access it, we can use the
full path name interval::Interval
or create a shortcut with use
. Since we
don't want to replace every occurrence of Interval
in our code with
interval::Interval
, we'll take advantage of use
. Additionally, we need
access to IntervalError
, so we'll bring that into scope at the same time. When
bringing multiple types into scope, we can wrap them in {}
and separate them
with ,
.
With that change in place, run the code again.
#![allow(unused_imports)] extern crate regex; // this is needed for the playground use interval::{Interval, IntervalError}; use itertools::Itertools; use regex::Regex; use std::fs::File; use std::io::Read; use std::io::{BufRead, BufReader}; use std::process::exit; fn find_matching_lines(lines: &[String], regex: Regex) -> Vec<usize> { lines .iter() .enumerate() .filter_map(|(i, line)| match regex.is_match(line) { true => Some(i), false => None, }) .collect() // turns anything iterable into a collection } fn create_intervals( lines: Vec<usize>, before_context: usize, after_context: usize, ) -> Result<Vec<Interval>, IntervalError> { lines .iter() .map(|line| { let start = line.saturating_sub(before_context); let end = line.saturating_add(after_context); Interval::new(start, end) }) .collect() } fn merge_intervals(intervals: Vec<Interval>) -> Vec<Interval> { // merge overlapping intervals intervals .into_iter() .coalesce(|p, c| p.merge(&c).map_err(|_| (p, c))) .collect() } fn print_results(intervals: Vec<Interval>, lines: Vec<String>) { for interval in intervals { for (line_no, line) in lines .iter() .enumerate() .take(interval.end + 1) .skip(interval.start) { println!("{}: {}", line_no + 1, line) } } } fn read_file(file: impl Read) -> Vec<String> { BufReader::new(file).lines().map_while(Result::ok).collect() } fn main() { let poem = "I have a little shadow that goes in and out with me, And what can be the use of him is more than I can see. He is very, very like me from the heels up to the head; And I see him jump before me, when I jump into my bed. The funniest thing about him is the way he likes to grow - Not at all like proper children, which is always very slow; For he sometimes shoots up taller like an india-rubber ball, And he sometimes gets so little that there’s none of him at all."; let mock_file = std::io::Cursor::new(poem); // command line arguments let pattern = "(all)|(little)"; let before_context = 1; let after_context = 1; // attempt to open the file let lines = read_file(mock_file); //let lines = match File::open(filename) { // // convert the poem into lines // Ok(file) => read_file(file), // Err(e) => { // eprintln!("Error opening {filename}: {e}"); // exit(1); // } //}; // compile the regular expression let regex = match Regex::new(pattern) { Ok(re) => re, // bind re to regex Err(e) => { eprintln!("{e}"); // write to standard error exit(1); } }; // store the 0-based line number for any matched line let match_lines = find_matching_lines(&lines, regex); // create intervals of the form [a,b] with the before/after context let intervals = match create_intervals(match_lines, before_context, after_context) { Ok(intervals) => intervals, Err(_) => { eprintln!("An error occurred while creating intervals"); exit(1); } }; // merge overlapping intervals let intervals = merge_intervals(intervals); // print the lines print_results(intervals, lines); } mod interval { /// A list specifying general categories of Interval errors. enum IntervalError { /// Start is not less than or equal to end StartEndRangeInvalid, /// Two intervals to be merged do not overlap NonOverlappingInterval, } /// A closed-interval [`start`, `end`] type used for representing a range of /// values between `start` and `end` inclusively. /// /// # Examples /// /// You can create an `Interval` using `new`. /// /// ```rust /// let interval = Interval::new(1, 10).unwrap(); /// assert_eq!(interval.start, 1); /// assert_eq!(interval.end, 10); /// ``` struct Interval { start: usize, end: usize, } impl Interval { /// Creates a new `Interval` set to `start` and `end`. /// /// # Examples /// /// ```rust /// let interval = Interval::new(1, 10).unwrap(); /// assert_eq!(interval.start, 1); /// assert_eq!(interval.end, 10); /// ``` fn new(start: usize, end: usize) -> Result<Self, IntervalError> { if start <= end { Ok(Self { start, end }) } else { Err(IntervalError::StartEndRangeInvalid) } } /// Checks if two intervals overlap. Overlapping intervals have at least /// one point in common. /// /// # Examples /// /// ```rust /// let a = Interval::new(1, 3).unwrap(); /// let b = Interval::new(3, 5).unwrap(); /// assert_eq!(a.overlaps(&b), true); /// assert_eq!(b.overlaps(&a), true); /// ``` /// /// ```rust /// let a = Interval::new(1, 5).unwrap(); /// let b = Interval::new(2, 4).unwrap(); /// assert_eq!(a.overlaps(&b), true); /// assert_eq!(b.overlaps(&a), true); /// ``` /// /// ```rust /// let a = Interval::new(1, 3).unwrap(); /// let b = Interval::new(4, 6).unwrap(); /// assert_eq!(a.overlaps(&b), false); /// assert_eq!(b.overlaps(&a), true); /// ``` fn overlaps(&self, other: &Interval) -> bool { self.end >= other.start } /// Merges two intervals returning a new `Interval`. /// /// The merged `Interval` range includes the union of ranges from each /// `Interval`. /// /// # Examples /// /// ```rust /// let a = Interval::new(1, 3).unwrap(); /// let b = Interval::new(3, 5).unwrap(); /// let c = a.merge(&b).unwrap(); /// assert_eq!(c.start, 1); /// assert_eq!(c.end, 5); /// ``` fn merge(&self, other: &Self) -> Result<Self, IntervalError> { if self.overlaps(other) { Ok(Self { start: self.start, end: other.end, }) } else { Err(IntervalError::NonOverlappingInterval) } } } }
Bugger! More compiler errors!
Privacy
Before we defined the interval
module, all the types and their methods were
public. However, once we enclosed everything in a module, they became private!
By default, code within a module is private from its parent modules. To make a
module public, we need to declare it with pub mod
instead of just mod
. Let's
add the pub
prefix and run the program again.
#![allow(unused_imports)] extern crate regex; // this is needed for the playground use interval::{Interval, IntervalError}; use itertools::Itertools; use regex::Regex; use std::fs::File; use std::io::Read; use std::io::{BufRead, BufReader}; use std::process::exit; fn find_matching_lines(lines: &[String], regex: Regex) -> Vec<usize> { lines .iter() .enumerate() .filter_map(|(i, line)| match regex.is_match(line) { true => Some(i), false => None, }) .collect() // turns anything iterable into a collection } fn create_intervals( lines: Vec<usize>, before_context: usize, after_context: usize, ) -> Result<Vec<Interval>, IntervalError> { lines .iter() .map(|line| { let start = line.saturating_sub(before_context); let end = line.saturating_add(after_context); Interval::new(start, end) }) .collect() } fn merge_intervals(intervals: Vec<Interval>) -> Vec<Interval> { // merge overlapping intervals intervals .into_iter() .coalesce(|p, c| p.merge(&c).map_err(|_| (p, c))) .collect() } fn print_results(intervals: Vec<Interval>, lines: Vec<String>) { for interval in intervals { for (line_no, line) in lines .iter() .enumerate() .take(interval.end + 1) .skip(interval.start) { println!("{}: {}", line_no + 1, line) } } } fn read_file(file: impl Read) -> Vec<String> { BufReader::new(file).lines().map_while(Result::ok).collect() } fn main() { let poem = "I have a little shadow that goes in and out with me, And what can be the use of him is more than I can see. He is very, very like me from the heels up to the head; And I see him jump before me, when I jump into my bed. The funniest thing about him is the way he likes to grow - Not at all like proper children, which is always very slow; For he sometimes shoots up taller like an india-rubber ball, And he sometimes gets so little that there’s none of him at all."; let mock_file = std::io::Cursor::new(poem); // command line arguments let pattern = "(all)|(little)"; let before_context = 1; let after_context = 1; // attempt to open the file let lines = read_file(mock_file); //let lines = match File::open(filename) { // // convert the poem into lines // Ok(file) => read_file(file), // Err(e) => { // eprintln!("Error opening {filename}: {e}"); // exit(1); // } //}; // compile the regular expression let regex = match Regex::new(pattern) { Ok(re) => re, // bind re to regex Err(e) => { eprintln!("{e}"); // write to standard error exit(1); } }; // store the 0-based line number for any matched line let match_lines = find_matching_lines(&lines, regex); // create intervals of the form [a,b] with the before/after context let intervals = match create_intervals(match_lines, before_context, after_context) { Ok(intervals) => intervals, Err(_) => { eprintln!("An error occurred while creating intervals"); exit(1); } }; // merge overlapping intervals let intervals = merge_intervals(intervals); // print the lines print_results(intervals, lines); } pub mod interval { /// A list specifying general categories of Interval errors. enum IntervalError { /// Start is not less than or equal to end StartEndRangeInvalid, /// Two intervals to be merged do not overlap NonOverlappingInterval, } /// A closed-interval [`start`, `end`] type used for representing a range of /// values between `start` and `end` inclusively. /// /// # Examples /// /// You can create an `Interval` using `new`. /// /// ```rust /// let interval = Interval::new(1, 10).unwrap(); /// assert_eq!(interval.start, 1); /// assert_eq!(interval.end, 10); /// ``` struct Interval { start: usize, end: usize, } impl Interval { /// Creates a new `Interval` set to `start` and `end`. /// /// # Examples /// /// ```rust /// let interval = Interval::new(1, 10).unwrap(); /// assert_eq!(interval.start, 1); /// assert_eq!(interval.end, 10); /// ``` fn new(start: usize, end: usize) -> Result<Self, IntervalError> { if start <= end { Ok(Self { start, end }) } else { Err(IntervalError::StartEndRangeInvalid) } } /// Checks if two intervals overlap. Overlapping intervals have at least /// one point in common. /// /// # Examples /// /// ```rust /// let a = Interval::new(1, 3).unwrap(); /// let b = Interval::new(3, 5).unwrap(); /// assert_eq!(a.overlaps(&b), true); /// assert_eq!(b.overlaps(&a), true); /// ``` /// /// ```rust /// let a = Interval::new(1, 5).unwrap(); /// let b = Interval::new(2, 4).unwrap(); /// assert_eq!(a.overlaps(&b), true); /// assert_eq!(b.overlaps(&a), true); /// ``` /// /// ```rust /// let a = Interval::new(1, 3).unwrap(); /// let b = Interval::new(4, 6).unwrap(); /// assert_eq!(a.overlaps(&b), false); /// assert_eq!(b.overlaps(&a), true); /// ``` fn overlaps(&self, other: &Interval) -> bool { self.end >= other.start } /// Merges two intervals returning a new `Interval`. /// /// The merged `Interval` range includes the union of ranges from each /// `Interval`. /// /// # Examples /// /// ```rust /// let a = Interval::new(1, 3).unwrap(); /// let b = Interval::new(3, 5).unwrap(); /// let c = a.merge(&b).unwrap(); /// assert_eq!(c.start, 1); /// assert_eq!(c.end, 5); /// ``` fn merge(&self, other: &Self) -> Result<Self, IntervalError> { if self.overlaps(other) { Ok(Self { start: self.start, end: other.end, }) } else { Err(IntervalError::NonOverlappingInterval) } } } }
More compiler errors! Who would have thought creating a module could be this
complicated? The reason for this series of errors is to help us understand
module scope and privacy. The emerging pattern is that modules default
everything to private. Simply declaring the module public isn't enough; every
type and method within the module must also be explicitly declared public if
that is the intended design. So, let's prefix the methods and types with pub
and hopefully achieve a successful compile!
#![allow(unused_imports)] extern crate regex; // this is needed for the playground use interval::{Interval, IntervalError}; use itertools::Itertools; use regex::Regex; use std::fs::File; use std::io::Read; use std::io::{BufRead, BufReader}; use std::process::exit; fn find_matching_lines(lines: &[String], regex: Regex) -> Vec<usize> { lines .iter() .enumerate() .filter_map(|(i, line)| match regex.is_match(line) { true => Some(i), false => None, }) .collect() // turns anything iterable into a collection } fn create_intervals( lines: Vec<usize>, before_context: usize, after_context: usize, ) -> Result<Vec<Interval>, IntervalError> { lines .iter() .map(|line| { let start = line.saturating_sub(before_context); let end = line.saturating_add(after_context); Interval::new(start, end) }) .collect() } fn merge_intervals(intervals: Vec<Interval>) -> Vec<Interval> { // merge overlapping intervals intervals .into_iter() .coalesce(|p, c| p.merge(&c).map_err(|_| (p, c))) .collect() } fn print_results(intervals: Vec<Interval>, lines: Vec<String>) { for interval in intervals { for (line_no, line) in lines .iter() .enumerate() .take(interval.end + 1) .skip(interval.start) { println!("{}: {}", line_no + 1, line) } } } fn read_file(file: impl Read) -> Vec<String> { BufReader::new(file).lines().map_while(Result::ok).collect() } fn main() { let poem = "I have a little shadow that goes in and out with me, And what can be the use of him is more than I can see. He is very, very like me from the heels up to the head; And I see him jump before me, when I jump into my bed. The funniest thing about him is the way he likes to grow - Not at all like proper children, which is always very slow; For he sometimes shoots up taller like an india-rubber ball, And he sometimes gets so little that there’s none of him at all."; let mock_file = std::io::Cursor::new(poem); // command line arguments let pattern = "(all)|(little)"; let before_context = 1; let after_context = 1; // attempt to open the file let lines = read_file(mock_file); //let lines = match File::open(filename) { // // convert the poem into lines // Ok(file) => read_file(file), // Err(e) => { // eprintln!("Error opening {filename}: {e}"); // exit(1); // } //}; // compile the regular expression let regex = match Regex::new(pattern) { Ok(re) => re, // bind re to regex Err(e) => { eprintln!("{e}"); // write to standard error exit(1); } }; // store the 0-based line number for any matched line let match_lines = find_matching_lines(&lines, regex); // create intervals of the form [a,b] with the before/after context let intervals = match create_intervals(match_lines, before_context, after_context) { Ok(intervals) => intervals, Err(_) => { eprintln!("An error occurred while creating intervals"); exit(1); } }; // merge overlapping intervals let intervals = merge_intervals(intervals); // print the lines print_results(intervals, lines); } pub mod interval { /// A list specifying general categories of Interval errors. pub enum IntervalError { /// Start is not less than or equal to end StartEndRangeInvalid, /// Two intervals to be merged do not overlap NonOverlappingInterval, } /// A closed-interval [`start`, `end`] type used for representing a range of /// values between `start` and `end` inclusively. /// /// # Examples /// /// You can create an `Interval` using `new`. /// /// ```rust /// let interval = Interval::new(1, 10).unwrap(); /// assert_eq!(interval.start, 1); /// assert_eq!(interval.end, 10); /// ``` pub struct Interval { start: usize, end: usize, } impl Interval { /// Creates a new `Interval` set to `start` and `end`. /// /// # Examples /// /// ```rust /// let interval = Interval::new(1, 10).unwrap(); /// assert_eq!(interval.start, 1); /// assert_eq!(interval.end, 10); /// ``` pub fn new(start: usize, end: usize) -> Result<Self, IntervalError> { if start <= end { Ok(Self { start, end }) } else { Err(IntervalError::StartEndRangeInvalid) } } /// Checks if two intervals overlap. Overlapping intervals have at least /// one point in common. /// /// # Examples /// /// ```rust /// let a = Interval::new(1, 3).unwrap(); /// let b = Interval::new(3, 5).unwrap(); /// assert_eq!(a.overlaps(&b), true); /// assert_eq!(b.overlaps(&a), true); /// ``` /// /// ```rust /// let a = Interval::new(1, 5).unwrap(); /// let b = Interval::new(2, 4).unwrap(); /// assert_eq!(a.overlaps(&b), true); /// assert_eq!(b.overlaps(&a), true); /// ``` /// /// ```rust /// let a = Interval::new(1, 3).unwrap(); /// let b = Interval::new(4, 6).unwrap(); /// assert_eq!(a.overlaps(&b), false); /// assert_eq!(b.overlaps(&a), true); /// ``` pub fn overlaps(&self, other: &Interval) -> bool { self.end >= other.start } /// Merges two intervals returning a new `Interval`. /// /// The merged `Interval` range includes the union of ranges from each /// `Interval`. /// /// # Examples /// /// ```rust /// let a = Interval::new(1, 3).unwrap(); /// let b = Interval::new(3, 5).unwrap(); /// let c = a.merge(&b).unwrap(); /// assert_eq!(c.start, 1); /// assert_eq!(c.end, 5); /// ``` pub fn merge(&self, other: &Self) -> Result<Self, IntervalError> { if self.overlaps(other) { Ok(Self { start: self.start, end: other.end, }) } else { Err(IntervalError::NonOverlappingInterval) } } } }
Alright, last time, I promise! As a final note on privacy, and as the compiler
error pointed out, even the fields of a struct
are private by default. So, the
last step is to make the fields public. Let's do that and finally achieve a
successful compile. Fingers crossed!
#![allow(unused_imports)] extern crate regex; // this is needed for the playground use interval::{Interval, IntervalError}; use itertools::Itertools; use regex::Regex; use std::fs::File; use std::io::Read; use std::io::{BufRead, BufReader}; use std::process::exit; fn find_matching_lines(lines: &[String], regex: Regex) -> Vec<usize> { lines .iter() .enumerate() .filter_map(|(i, line)| match regex.is_match(line) { true => Some(i), false => None, }) .collect() // turns anything iterable into a collection } fn create_intervals( lines: Vec<usize>, before_context: usize, after_context: usize, ) -> Result<Vec<Interval>, IntervalError> { lines .iter() .map(|line| { let start = line.saturating_sub(before_context); let end = line.saturating_add(after_context); Interval::new(start, end) }) .collect() } fn merge_intervals(intervals: Vec<Interval>) -> Vec<Interval> { // merge overlapping intervals intervals .into_iter() .coalesce(|p, c| p.merge(&c).map_err(|_| (p, c))) .collect() } fn print_results(intervals: Vec<Interval>, lines: Vec<String>) { for interval in intervals { for (line_no, line) in lines .iter() .enumerate() .take(interval.end + 1) .skip(interval.start) { println!("{}: {}", line_no + 1, line) } } } fn read_file(file: impl Read) -> Vec<String> { BufReader::new(file).lines().map_while(Result::ok).collect() } fn main() { let poem = "I have a little shadow that goes in and out with me, And what can be the use of him is more than I can see. He is very, very like me from the heels up to the head; And I see him jump before me, when I jump into my bed. The funniest thing about him is the way he likes to grow - Not at all like proper children, which is always very slow; For he sometimes shoots up taller like an india-rubber ball, And he sometimes gets so little that there’s none of him at all."; let mock_file = std::io::Cursor::new(poem); // command line arguments let pattern = "(all)|(little)"; let before_context = 1; let after_context = 1; // attempt to open the file let lines = read_file(mock_file); //let lines = match File::open(filename) { // // convert the poem into lines // Ok(file) => read_file(file), // Err(e) => { // eprintln!("Error opening {filename}: {e}"); // exit(1); // } //}; // compile the regular expression let regex = match Regex::new(pattern) { Ok(re) => re, // bind re to regex Err(e) => { eprintln!("{e}"); // write to standard error exit(1); } }; // store the 0-based line number for any matched line let match_lines = find_matching_lines(&lines, regex); // create intervals of the form [a,b] with the before/after context let intervals = match create_intervals(match_lines, before_context, after_context) { Ok(intervals) => intervals, Err(_) => { eprintln!("An error occurred while creating intervals"); exit(1); } }; // merge overlapping intervals let intervals = merge_intervals(intervals); // print the lines print_results(intervals, lines); } pub mod interval { /// A list specifying general categories of Interval errors. pub enum IntervalError { /// Start is not less than or equal to end StartEndRangeInvalid, /// Two intervals to be merged do not overlap NonOverlappingInterval, } /// A closed-interval [`start`, `end`] type used for representing a range of /// values between `start` and `end` inclusively. /// /// # Examples /// /// You can create an `Interval` using `new`. /// /// ```rust /// let interval = Interval::new(1, 10).unwrap(); /// assert_eq!(interval.start, 1); /// assert_eq!(interval.end, 10); /// ``` pub struct Interval { pub start: usize, pub end: usize, } impl Interval { /// Creates a new `Interval` set to `start` and `end`. /// /// # Examples /// /// ```rust /// let interval = Interval::new(1, 10).unwrap(); /// assert_eq!(interval.start, 1); /// assert_eq!(interval.end, 10); /// ``` pub fn new(start: usize, end: usize) -> Result<Self, IntervalError> { if start <= end { Ok(Self { start, end }) } else { Err(IntervalError::StartEndRangeInvalid) } } /// Checks if two intervals overlap. Overlapping intervals have at least /// one point in common. /// /// # Examples /// /// ```rust /// let a = Interval::new(1, 3).unwrap(); /// let b = Interval::new(3, 5).unwrap(); /// assert_eq!(a.overlaps(&b), true); /// assert_eq!(b.overlaps(&a), true); /// ``` /// /// ```rust /// let a = Interval::new(1, 5).unwrap(); /// let b = Interval::new(2, 4).unwrap(); /// assert_eq!(a.overlaps(&b), true); /// assert_eq!(b.overlaps(&a), true); /// ``` /// /// ```rust /// let a = Interval::new(1, 3).unwrap(); /// let b = Interval::new(4, 6).unwrap(); /// assert_eq!(a.overlaps(&b), false); /// assert_eq!(b.overlaps(&a), true); /// ``` pub fn overlaps(&self, other: &Interval) -> bool { self.end >= other.start } /// Merges two intervals returning a new `Interval`. /// /// The merged `Interval` range includes the union of ranges from each /// `Interval`. /// /// # Examples /// /// ```rust /// let a = Interval::new(1, 3).unwrap(); /// let b = Interval::new(3, 5).unwrap(); /// let c = a.merge(&b).unwrap(); /// assert_eq!(c.start, 1); /// assert_eq!(c.end, 5); /// ``` pub fn merge(&self, other: &Self) -> Result<Self, IntervalError> { if self.overlaps(other) { Ok(Self { start: self.start, end: other.end, }) } else { Err(IntervalError::NonOverlappingInterval) } } } }
Finally, the code compiles! Now, you might be wondering why we didn't have to
declare the enum
variants as public. If you are, good catch! The reason is
that there are two exceptions to Rust's everything is private1 behavior:
- Associated items in a
pub
Trait are public by default. enum
variants in apub enum
are public by default.
Summary
This section focused heavily on emphasizing Rust's everything is private default behavior when it comes to modules. As you develop software, use privacy to your advantage and carefully decide what parts of the API should be made public.
Rust goes into great detail with regard to project management. As you create your own modules and crates, reviewing the section on Managing Growing Projects with Packages, Crates, and Modules will be extremely valuable.
Exercises
These exercises must be completed locally.
- Move the
interval
module out ofmain.rs
and into a separate file. Explore two approaches:- First, create a file
interval.rs
in thesrc
directory. - Next, create a directory under
src
calledinterval
and moveinterval.rs
inside.
- First, create a file
- Advanced: Creating an external crate
- Create a library crate
cargo new --lib interval
and move the interval code into it. - If you haven't already, create a binary crate
cargo new grep
and update theCargo.toml
file to use the external crate from the previous step. Refer to the section on Specifying Dependencies in The Cargo Book for guidance.
- Create a library crate
Next
Now, let's dive into generic types and make our Interval
generic!
Refer to the Visibility and Privacy reference for details.
Generic Types
Many programming languages include tools for effectively handling the
duplication of concepts, and Rust is no exception. Rust offers powerful support
for generics, a topic that could easily warrant its own book. However, since
we're on the Fast Track to Rust, we'll focus on the core concepts of generics
and traits by making the interval
module from the previous section generic.
We've already been working with generics throughout this course, using types such as
Vec<T>
,Option<T>
, andResult<T, E>
.
Similar to our approach in the previous section, we'll examine the helpful
compiler errors as we work towards making the interval
module generic. The
rationale behind exposing you to these compiler errors is to illustrate how
beneficial the compiler can be when working with the language.
Let's get started!
Interval
The current version of our Interval
is limited to the usize
type. If we want
to support floating point values, negative integers, or other exotic numeric
types, it won't work. We're going to address this by making it compatible with
any numeric type, with some restrictions.1
Generic Data Type
The current definition of our Interval
struct
specifies the field types as
usize
. To redefine the struct
to use a generic type parameter for its
fields, we use the <>
syntax.
pub struct Interval<T> {
pub start: T,
pub end: T,
}
By using a single generic type to define
Interval<T>
, we indicate that theInterval<T>
struct
is generic over some typeT
, and the fieldsstart
andend
are both of that same type, whatever it may be. If we attempt to create an instance ofInterval<T>
with fields of different types, our code won't compile. To support different types for the fields, we would need to specify additional generic types, such asstruct Interval<T, U>
.
You've probably guessed it already, but this code won't compile. Take a moment to think about why that might be, and then run the code to see what the compiler has to say.
#![allow(unused_imports)] extern crate regex; // this is needed for the playground use interval::{Interval, IntervalError}; use itertools::Itertools; use regex::Regex; use std::fs::File; use std::io::Read; use std::io::{BufRead, BufReader}; use std::process::exit; fn find_matching_lines(lines: &[String], regex: Regex) -> Vec<usize> { lines .iter() .enumerate() .filter_map(|(i, line)| match regex.is_match(line) { true => Some(i), false => None, }) .collect() // turns anything iterable into a collection } fn create_intervals( lines: Vec<usize>, before_context: usize, after_context: usize, ) -> Result<Vec<Interval>, IntervalError> { lines .iter() .map(|line| { let start = line.saturating_sub(before_context); let end = line.saturating_add(after_context); Interval::new(start, end) }) .collect() } fn merge_intervals(intervals: Vec<Interval>) -> Vec<Interval> { // merge overlapping intervals intervals .into_iter() .coalesce(|p, c| p.merge(&c).map_err(|_| (p, c))) .collect() } fn print_results(intervals: Vec<Interval>, lines: Vec<String>) { for interval in intervals { for (line_no, line) in lines .iter() .enumerate() .take(interval.end + 1) .skip(interval.start) { println!("{}: {}", line_no + 1, line) } } } fn read_file(file: impl Read) -> Vec<String> { BufReader::new(file).lines().map_while(Result::ok).collect() } fn main() { let poem = "I have a little shadow that goes in and out with me, And what can be the use of him is more than I can see. He is very, very like me from the heels up to the head; And I see him jump before me, when I jump into my bed. The funniest thing about him is the way he likes to grow - Not at all like proper children, which is always very slow; For he sometimes shoots up taller like an india-rubber ball, And he sometimes gets so little that there’s none of him at all."; let mock_file = std::io::Cursor::new(poem); // command line arguments let pattern = "(all)|(little)"; let before_context = 1; let after_context = 1; // attempt to open the file let lines = read_file(mock_file); //let lines = match File::open(filename) { // // convert the poem into lines // Ok(file) => read_file(file), // Err(e) => { // eprintln!("Error opening {filename}: {e}"); // exit(1); // } //}; // compile the regular expression let regex = match Regex::new(pattern) { Ok(re) => re, // bind re to regex Err(e) => { eprintln!("{e}"); // write to standard error exit(1); } }; // store the 0-based line number for any matched line let match_lines = find_matching_lines(&lines, regex); // create intervals of the form [a,b] with the before/after context let intervals = match create_intervals(match_lines, before_context, after_context) { Ok(intervals) => intervals, Err(_) => { eprintln!("An error occurred while creating intervals"); exit(1); } }; // merge overlapping intervals let intervals = merge_intervals(intervals); // print the lines print_results(intervals, lines); } pub mod interval { /// A list specifying general categories of Interval errors. pub enum IntervalError { /// Start is not less than or equal to end StartEndRangeInvalid, /// Two intervals to be merged do not overlap NonOverlappingInterval, } /// A closed-interval [`start`, `end`] type used for representing a range of /// values between `start` and `end` inclusively. /// /// # Examples /// /// You can create an `Interval` using `new`. /// /// ```rust /// let interval = Interval::new(1, 10).unwrap(); /// assert_eq!(interval.start, 1); /// assert_eq!(interval.end, 10); /// ``` pub struct Interval<T> { pub start: T, pub end: T, } impl Interval { /// Creates a new `Interval` set to `start` and `end`. /// /// # Examples /// /// ```rust /// let interval = Interval::new(1, 10).unwrap(); /// assert_eq!(interval.start, 1); /// assert_eq!(interval.end, 10); /// ``` pub fn new(start: usize, end: usize) -> Result<Self, IntervalError> { if start <= end { Ok(Self { start, end }) } else { Err(IntervalError::StartEndRangeInvalid) } } /// Checks if two intervals overlap. Overlapping intervals have at least /// one point in common. /// /// # Examples /// /// ```rust /// let a = Interval::new(1, 3).unwrap(); /// let b = Interval::new(3, 5).unwrap(); /// assert_eq!(a.overlaps(&b), true); /// assert_eq!(b.overlaps(&a), true); /// ``` /// /// ```rust /// let a = Interval::new(1, 5).unwrap(); /// let b = Interval::new(2, 4).unwrap(); /// assert_eq!(a.overlaps(&b), true); /// assert_eq!(b.overlaps(&a), true); /// ``` /// /// ```rust /// let a = Interval::new(1, 3).unwrap(); /// let b = Interval::new(4, 6).unwrap(); /// assert_eq!(a.overlaps(&b), false); /// assert_eq!(b.overlaps(&a), true); /// ``` pub fn overlaps(&self, other: &Interval) -> bool { self.end >= other.start } /// Merges two intervals returning a new `Interval`. /// /// The merged `Interval` range includes the union of ranges from each /// `Interval`. /// /// # Examples /// /// ```rust /// let a = Interval::new(1, 3).unwrap(); /// let b = Interval::new(3, 5).unwrap(); /// let c = a.merge(&b).unwrap(); /// assert_eq!(c.start, 1); /// assert_eq!(c.end, 5); /// ``` pub fn merge(&self, other: &Self) -> Result<Self, IntervalError> { if self.overlaps(other) { Ok(Self { start: self.start, end: other.end, }) } else { Err(IntervalError::NonOverlappingInterval) } } } }
As you guessed it, there's a lot of compiler errors and they're very helpful!
- [E0107]: missing generics for
struct
Interval
Let's walk through the errors one by one. The first error informs us that the
return value on line 26 references a generic type but fails to specify one. It
shows the definition of the type on line 141 and provides a helpful tip for how
to correct it. The compiler suggests appending <T>
to Interval
and
specifying the type for T
, which in our case will be usize
.
error[E0107]: missing generics for struct `Interval`
--> src/main.rs:26:17
|
26 | ) -> Result<Vec<Interval>, IntervalError> {
| ^^^^^^^^ expected 1 generic argument
|
note: struct defined here, with 1 generic parameter: `T`
--> src/main.rs:141:16
|
141 | pub struct Interval<T> {
| ^^^^^^^^ -
help: add missing generic argument
|
26 | ) -> Result<Vec<Interval<T>>, IntervalError> {
| +++
The fix should be relatively straightforward to understand, as we've used the same approach for other generic types we've worked with. Let's apply the fix to all the functions and try again!
#![allow(unused_imports)] extern crate regex; // this is needed for the playground use interval::{Interval, IntervalError}; use itertools::Itertools; use regex::Regex; use std::fs::File; use std::io::Read; use std::io::{BufRead, BufReader}; use std::process::exit; fn find_matching_lines(lines: &[String], regex: Regex) -> Vec<usize> { lines .iter() .enumerate() .filter_map(|(i, line)| match regex.is_match(line) { true => Some(i), false => None, }) .collect() // turns anything iterable into a collection } fn create_intervals( lines: Vec<usize>, before_context: usize, after_context: usize, ) -> Result<Vec<Interval<usize>>, IntervalError> { lines .iter() .map(|line| { let start = line.saturating_sub(before_context); let end = line.saturating_add(after_context); Interval::new(start, end) }) .collect() } fn merge_intervals(intervals: Vec<Interval<usize>>) -> Vec<Interval<usize>> { // merge overlapping intervals intervals .into_iter() .coalesce(|p, c| p.merge(&c).map_err(|_| (p, c))) .collect() } fn print_results(intervals: Vec<Interval<usize>>, lines: Vec<String>) { for interval in intervals { for (line_no, line) in lines .iter() .enumerate() .take(interval.end + 1) .skip(interval.start) { println!("{}: {}", line_no + 1, line) } } } fn read_file(file: impl Read) -> Vec<String> { BufReader::new(file).lines().map_while(Result::ok).collect() } fn main() { let poem = "I have a little shadow that goes in and out with me, And what can be the use of him is more than I can see. He is very, very like me from the heels up to the head; And I see him jump before me, when I jump into my bed. The funniest thing about him is the way he likes to grow - Not at all like proper children, which is always very slow; For he sometimes shoots up taller like an india-rubber ball, And he sometimes gets so little that there’s none of him at all."; let mock_file = std::io::Cursor::new(poem); // command line arguments let pattern = "(all)|(little)"; let before_context = 1; let after_context = 1; // attempt to open the file let lines = read_file(mock_file); //let lines = match File::open(filename) { // // convert the poem into lines // Ok(file) => read_file(file), // Err(e) => { // eprintln!("Error opening {filename}: {e}"); // exit(1); // } //}; // compile the regular expression let regex = match Regex::new(pattern) { Ok(re) => re, // bind re to regex Err(e) => { eprintln!("{e}"); // write to standard error exit(1); } }; // store the 0-based line number for any matched line let match_lines = find_matching_lines(&lines, regex); // create intervals of the form [a,b] with the before/after context let intervals = match create_intervals(match_lines, before_context, after_context) { Ok(intervals) => intervals, Err(_) => { eprintln!("An error occurred while creating intervals"); exit(1); } }; // merge overlapping intervals let intervals = merge_intervals(intervals); // print the lines print_results(intervals, lines); } pub mod interval { /// A list specifying general categories of Interval errors. pub enum IntervalError { /// Start is not less than or equal to end StartEndRangeInvalid, /// Two intervals to be merged do not overlap NonOverlappingInterval, } /// A closed-interval [`start`, `end`] type used for representing a range of /// values between `start` and `end` inclusively. /// /// # Examples /// /// You can create an `Interval` using `new`. /// /// ```rust /// let interval = Interval::new(1, 10).unwrap(); /// assert_eq!(interval.start, 1); /// assert_eq!(interval.end, 10); /// ``` pub struct Interval<T> { pub start: T, pub end: T, } impl Interval { /// Creates a new `Interval` set to `start` and `end`. /// /// # Examples /// /// ```rust /// let interval = Interval::new(1, 10).unwrap(); /// assert_eq!(interval.start, 1); /// assert_eq!(interval.end, 10); /// ``` pub fn new(start: usize, end: usize) -> Result<Self, IntervalError> { if start <= end { Ok(Self { start, end }) } else { Err(IntervalError::StartEndRangeInvalid) } } /// Checks if two intervals overlap. Overlapping intervals have at least /// one point in common. /// /// # Examples /// /// ```rust /// let a = Interval::new(1, 3).unwrap(); /// let b = Interval::new(3, 5).unwrap(); /// assert_eq!(a.overlaps(&b), true); /// assert_eq!(b.overlaps(&a), true); /// ``` /// /// ```rust /// let a = Interval::new(1, 5).unwrap(); /// let b = Interval::new(2, 4).unwrap(); /// assert_eq!(a.overlaps(&b), true); /// assert_eq!(b.overlaps(&a), true); /// ``` /// /// ```rust /// let a = Interval::new(1, 3).unwrap(); /// let b = Interval::new(4, 6).unwrap(); /// assert_eq!(a.overlaps(&b), false); /// assert_eq!(b.overlaps(&a), true); /// ``` pub fn overlaps(&self, other: &Interval) -> bool { self.end >= other.start } /// Merges two intervals returning a new `Interval`. /// /// The merged `Interval` range includes the union of ranges from each /// `Interval`. /// /// # Examples /// /// ```rust /// let a = Interval::new(1, 3).unwrap(); /// let b = Interval::new(3, 5).unwrap(); /// let c = a.merge(&b).unwrap(); /// assert_eq!(c.start, 1); /// assert_eq!(c.end, 5); /// ``` pub fn merge(&self, other: &Self) -> Result<Self, IntervalError> { if self.overlaps(other) { Ok(Self { start: self.start, end: other.end, }) } else { Err(IntervalError::NonOverlappingInterval) } } } }
As expected, we still have some more compiler errors to work through and these lead us into how we implement methods for a generic type!
error[E0107]: missing generics for struct `Interval`
--> src/main.rs:146:10
|
146 | impl Interval {
| ^^^^^^^^ expected 1 generic argument
|
note: struct defined here, with 1 generic parameter: `T`
--> src/main.rs:141:16
|
141 | pub struct Interval<T> {
| ^^^^^^^^ -
help: add missing generic argument
|
146 | impl Interval<T> {
| +++
error[E0107]: missing generics for struct `Interval`
--> src/main.rs:189:40
|
189 | pub fn overlaps(&self, other: &Interval) -> bool {
| ^^^^^^^^ expected 1 generic argument
|
note: struct defined here, with 1 generic parameter: `T`
--> src/main.rs:141:16
|
141 | pub struct Interval<T> {
| ^^^^^^^^ -
help: add missing generic argument
|
189 | pub fn overlaps(&self, other: &Interval<T>) -> bool {
| +++
For more information about this error, try `rustc --explain E0107`.
Inherent Implementation
There are two main types of implementation we define: inherent implementations (standalone) and trait implementations.2 We'll focus on inherent implementation first and explore trait implementations towards the end of this section.
We need to update our impl
block to support generic type parameters for our
generic type, and the syntax is as follows:
impl<T> Type<T> {} // Type is replaced with Interval for our case
Here are the necessary changes. Review them and run the program again.
#![allow(unused_imports)] extern crate regex; // this is needed for the playground use interval::{Interval, IntervalError}; use itertools::Itertools; use regex::Regex; use std::fs::File; use std::io::Read; use std::io::{BufRead, BufReader}; use std::process::exit; fn find_matching_lines(lines: &[String], regex: Regex) -> Vec<usize> { lines .iter() .enumerate() .filter_map(|(i, line)| match regex.is_match(line) { true => Some(i), false => None, }) .collect() // turns anything iterable into a collection } fn create_intervals( lines: Vec<usize>, before_context: usize, after_context: usize, ) -> Result<Vec<Interval<usize>>, IntervalError> { lines .iter() .map(|line| { let start = line.saturating_sub(before_context); let end = line.saturating_add(after_context); Interval::new(start, end) }) .collect() } fn merge_intervals(intervals: Vec<Interval<usize>>) -> Vec<Interval<usize>> { // merge overlapping intervals intervals .into_iter() .coalesce(|p, c| p.merge(&c).map_err(|_| (p, c))) .collect() } fn print_results(intervals: Vec<Interval<usize>>, lines: Vec<String>) { for interval in intervals { for (line_no, line) in lines .iter() .enumerate() .take(interval.end + 1) .skip(interval.start) { println!("{}: {}", line_no + 1, line) } } } fn read_file(file: impl Read) -> Vec<String> { BufReader::new(file).lines().map_while(Result::ok).collect() } fn main() { let poem = "I have a little shadow that goes in and out with me, And what can be the use of him is more than I can see. He is very, very like me from the heels up to the head; And I see him jump before me, when I jump into my bed. The funniest thing about him is the way he likes to grow - Not at all like proper children, which is always very slow; For he sometimes shoots up taller like an india-rubber ball, And he sometimes gets so little that there’s none of him at all."; let mock_file = std::io::Cursor::new(poem); // command line arguments let pattern = "(all)|(little)"; let before_context = 1; let after_context = 1; // attempt to open the file let lines = read_file(mock_file); //let lines = match File::open(filename) { // // convert the poem into lines // Ok(file) => read_file(file), // Err(e) => { // eprintln!("Error opening {filename}: {e}"); // exit(1); // } //}; // compile the regular expression let regex = match Regex::new(pattern) { Ok(re) => re, // bind re to regex Err(e) => { eprintln!("{e}"); // write to standard error exit(1); } }; // store the 0-based line number for any matched line let match_lines = find_matching_lines(&lines, regex); // create intervals of the form [a,b] with the before/after context let intervals = match create_intervals(match_lines, before_context, after_context) { Ok(intervals) => intervals, Err(_) => { eprintln!("An error occurred while creating intervals"); exit(1); } }; // merge overlapping intervals let intervals = merge_intervals(intervals); // print the lines print_results(intervals, lines); } pub mod interval { /// A list specifying general categories of Interval errors. pub enum IntervalError { /// Start is not less than or equal to end StartEndRangeInvalid, /// Two intervals to be merged do not overlap NonOverlappingInterval, } /// A closed-interval [`start`, `end`] type used for representing a range of /// values between `start` and `end` inclusively. /// /// # Examples /// /// You can create an `Interval` using `new`. /// /// ```rust /// let interval = Interval::new(1, 10).unwrap(); /// assert_eq!(interval.start, 1); /// assert_eq!(interval.end, 10); /// ``` pub struct Interval<T> { pub start: T, pub end: T, } impl<T> Interval<T> { /// Creates a new `Interval` set to `start` and `end`. /// /// # Examples /// /// ```rust /// let interval = Interval::new(1, 10).unwrap(); /// assert_eq!(interval.start, 1); /// assert_eq!(interval.end, 10); /// ``` pub fn new(start: T, end: T) -> Result<Self, IntervalError> { if start <= end { Ok(Self { start, end }) } else { Err(IntervalError::StartEndRangeInvalid) } } /// Checks if two intervals overlap. Overlapping intervals have at least /// one point in common. /// /// # Examples /// /// ```rust /// let a = Interval::new(1, 3).unwrap(); /// let b = Interval::new(3, 5).unwrap(); /// assert_eq!(a.overlaps(&b), true); /// assert_eq!(b.overlaps(&a), true); /// ``` /// /// ```rust /// let a = Interval::new(1, 5).unwrap(); /// let b = Interval::new(2, 4).unwrap(); /// assert_eq!(a.overlaps(&b), true); /// assert_eq!(b.overlaps(&a), true); /// ``` /// /// ```rust /// let a = Interval::new(1, 3).unwrap(); /// let b = Interval::new(4, 6).unwrap(); /// assert_eq!(a.overlaps(&b), false); /// assert_eq!(b.overlaps(&a), true); /// ``` pub fn overlaps(&self, other: &Interval<T>) -> bool { self.end >= other.start } /// Merges two intervals returning a new `Interval`. /// /// The merged `Interval` range includes the union of ranges from each /// `Interval`. /// /// # Examples /// /// ```rust /// let a = Interval::new(1, 3).unwrap(); /// let b = Interval::new(3, 5).unwrap(); /// let c = a.merge(&b).unwrap(); /// assert_eq!(c.start, 1); /// assert_eq!(c.end, 5); /// ``` pub fn merge(&self, other: &Self) -> Result<Self, IntervalError> { if self.overlaps(other) { Ok(Self { start: self.start, end: other.end, }) } else { Err(IntervalError::NonOverlappingInterval) } } } }
Well, that didn't work! Now we have two new compiler errors:
- [E0369]: binary operation
<=
cannot be applied to typeT
- [E0507]: cannot move out of
self.start
which is behind a shared reference
That's okay, because these errors are a perfect segue into trait bounds!
Trait Bounds
The compiler generated a few very helpful error messages, which we can categorize into two types, both of which suggest the use of trait bounds:
error[E0369]: binary operation `<=` cannot be applied to type `T`
--> src/main.rs:157:22
|
157 | if start <= end {
| ----- ^^ --- T
| |
| T
|
help: consider restricting type parameter `T`
|
146 | impl<T: std::cmp::PartialOrd> Interval<T> {
| ++++++++++++++++++++++
error[E0507]: cannot move out of `self.start` which is behind a shared reference
--> src/main.rs:210:28
|
210 | start: self.start,
| ^^^^^^^^^^ move occurs because `self.start` has
type `T`, which does not implement the
`Copy` trait
|
help: if `T` implemented `Clone`, you could clone the value
--> src/main.rs:146:10
|
146 | impl<T> Interval<T> {
| ^ consider constraining this type parameter with `Clone`
...
210 | start: self.start,
| ---------- you could clone this value
With an introduction to trait bounds, everything will become clear. Let's begin
with the first error. The compiler indicates that a comparison between two
generic types is being attempted in the new
function, and not all types
support this capability. In other words, not all types implement the
PartialOrd
trait, which is necessary for using comparison operators like
<=
and >=
.
For those with an object-oriented programming background, a trait can be thought of as similar to an interface.
pub fn new(start: T, end: T) -> Result<Self, IntervalError> {
if start <= end {
Ok(Self { start, end })
} else {
Err(IntervalError::StartEndRangeInvalid)
}
}
Since our Interval
struct
is generic over T
, the types for start
and
end
can be anything, including types that aren't comparable. Therefore, the
compiler is instructing us to restrict T
to any type that implements the
PartialOrd
trait. In other words, we need to place a bound on T
.
The next error requires us to revisit the topic of move semantics in Rust.
Except for primitive types, most standard library types and those found in
third-party crates do not implement the Copy
trait. In other words, they do
not have copy semantics. Let's examine the function where the error occurs:
pub fn merge(&self, other: &Self) -> Result<Self, IntervalError> {
if self.overlaps(other) {
Ok(Self {
start: self.start,
end: other.end,
})
} else {
Err(IntervalError::NonOverlappingInterval)
}
}
The merge
function borrows the values for self
and other
. If the two
intervals overlap, a new interval is returned with the values copied into it.
Since we have been using usize
, a type that implements the Copy
trait, this
worked. To ensure it continues to work, we need to place another bound on T
,
limiting it to types that implement the Copy
trait.
The compiler also mentions the
Clone
trait, which allows for explicit duplication of an object via theclone
method. This means we could specify the trait bound asClone
instead ofCopy
and update the method to explicitly callclone
.
With an understanding of trait bounds, let's examine how to impose these
restrictions on our Interval
. Does the program work now?
#![allow(unused_imports)] extern crate regex; // this is needed for the playground use interval::{Interval, IntervalError}; use itertools::Itertools; use regex::Regex; use std::fs::File; use std::io::Read; use std::io::{BufRead, BufReader}; use std::process::exit; fn find_matching_lines(lines: &[String], regex: Regex) -> Vec<usize> { lines .iter() .enumerate() .filter_map(|(i, line)| match regex.is_match(line) { true => Some(i), false => None, }) .collect() // turns anything iterable into a collection } fn create_intervals( lines: Vec<usize>, before_context: usize, after_context: usize, ) -> Result<Vec<Interval<usize>>, IntervalError> { lines .iter() .map(|line| { let start = line.saturating_sub(before_context); let end = line.saturating_add(after_context); Interval::new(start, end) }) .collect() } fn merge_intervals(intervals: Vec<Interval<usize>>) -> Vec<Interval<usize>> { // merge overlapping intervals intervals .into_iter() .coalesce(|p, c| p.merge(&c).map_err(|_| (p, c))) .collect() } fn print_results(intervals: Vec<Interval<usize>>, lines: Vec<String>) { for interval in intervals { for (line_no, line) in lines .iter() .enumerate() .take(interval.end + 1) .skip(interval.start) { println!("{}: {}", line_no + 1, line) } } } fn read_file(file: impl Read) -> Vec<String> { BufReader::new(file).lines().map_while(Result::ok).collect() } fn main() { let poem = "I have a little shadow that goes in and out with me, And what can be the use of him is more than I can see. He is very, very like me from the heels up to the head; And I see him jump before me, when I jump into my bed. The funniest thing about him is the way he likes to grow - Not at all like proper children, which is always very slow; For he sometimes shoots up taller like an india-rubber ball, And he sometimes gets so little that there’s none of him at all."; let mock_file = std::io::Cursor::new(poem); // command line arguments let pattern = "(all)|(little)"; let before_context = 1; let after_context = 1; // attempt to open the file let lines = read_file(mock_file); //let lines = match File::open(filename) { // // convert the poem into lines // Ok(file) => read_file(file), // Err(e) => { // eprintln!("Error opening {filename}: {e}"); // exit(1); // } //}; // compile the regular expression let regex = match Regex::new(pattern) { Ok(re) => re, // bind re to regex Err(e) => { eprintln!("{e}"); // write to standard error exit(1); } }; // store the 0-based line number for any matched line let match_lines = find_matching_lines(&lines, regex); // create intervals of the form [a,b] with the before/after context let intervals = match create_intervals(match_lines, before_context, after_context) { Ok(intervals) => intervals, Err(_) => { eprintln!("An error occurred while creating intervals"); exit(1); } }; // merge overlapping intervals let intervals = merge_intervals(intervals); // print the lines print_results(intervals, lines); } pub mod interval { /// A list specifying general categories of Interval errors. pub enum IntervalError { /// Start is not less than or equal to end StartEndRangeInvalid, /// Two intervals to be merged do not overlap NonOverlappingInterval, } /// A closed-interval [`start`, `end`] type used for representing a range of /// values between `start` and `end` inclusively. /// /// # Examples /// /// You can create an `Interval` using `new`. /// /// ```rust /// let interval = Interval::new(1, 10).unwrap(); /// assert_eq!(interval.start, 1); /// assert_eq!(interval.end, 10); /// ``` pub struct Interval<T> { pub start: T, pub end: T, } impl<T: Copy + PartialOrd> Interval<T> { /// Creates a new `Interval` set to `start` and `end`. /// /// # Examples /// /// ```rust /// let interval = Interval::new(1, 10).unwrap(); /// assert_eq!(interval.start, 1); /// assert_eq!(interval.end, 10); /// ``` pub fn new(start: T, end: T) -> Result<Self, IntervalError> { if start <= end { Ok(Self { start, end }) } else { Err(IntervalError::StartEndRangeInvalid) } } /// Checks if two intervals overlap. Overlapping intervals have at least /// one point in common. /// /// # Examples /// /// ```rust /// let a = Interval::new(1, 3).unwrap(); /// let b = Interval::new(3, 5).unwrap(); /// assert_eq!(a.overlaps(&b), true); /// assert_eq!(b.overlaps(&a), true); /// ``` /// /// ```rust /// let a = Interval::new(1, 5).unwrap(); /// let b = Interval::new(2, 4).unwrap(); /// assert_eq!(a.overlaps(&b), true); /// assert_eq!(b.overlaps(&a), true); /// ``` /// /// ```rust /// let a = Interval::new(1, 3).unwrap(); /// let b = Interval::new(4, 6).unwrap(); /// assert_eq!(a.overlaps(&b), false); /// assert_eq!(b.overlaps(&a), true); /// ``` pub fn overlaps(&self, other: &Interval<T>) -> bool { self.end >= other.start } /// Merges two intervals returning a new `Interval`. /// /// The merged `Interval` range includes the union of ranges from each /// `Interval`. /// /// # Examples /// /// ```rust /// let a = Interval::new(1, 3).unwrap(); /// let b = Interval::new(3, 5).unwrap(); /// let c = a.merge(&b).unwrap(); /// assert_eq!(c.start, 1); /// assert_eq!(c.end, 5); /// ``` pub fn merge(&self, other: &Self) -> Result<Self, IntervalError> { if self.overlaps(other) { Ok(Self { start: self.start, end: other.end, }) } else { Err(IntervalError::NonOverlappingInterval) } } } }
Voila! And we now have a generic Interval
module that can support any type
implementing the Copy
and PartialOrd
traits!
Exercise
- Modify the program to use the
Clone
trait instead ofCopy
.
Solution
impl<T: Clone + PartialOrd> Interval<T> {
pub fn merge(&self, other: &Self) -> Result<Self, IntervalError> {
if self.overlaps(other) {
Ok(Self {
start: self.start.clone(),
end: other.end.clone(),
})
} else {
Err(IntervalError::NonOverlappingInterval)
}
}
}
#![allow(unused_imports)] extern crate regex; // this is needed for the playground use interval::{Interval, IntervalError}; use itertools::Itertools; use regex::Regex; use std::fs::File; use std::io::Read; use std::io::{BufRead, BufReader}; use std::process::exit; fn find_matching_lines(lines: &[String], regex: Regex) -> Vec<usize> { lines .iter() .enumerate() .filter_map(|(i, line)| match regex.is_match(line) { true => Some(i), false => None, }) .collect() // turns anything iterable into a collection } fn create_intervals( lines: Vec<usize>, before_context: usize, after_context: usize, ) -> Result<Vec<Interval<usize>>, IntervalError> { lines .iter() .map(|line| { let start = line.saturating_sub(before_context); let end = line.saturating_add(after_context); Interval::new(start, end) }) .collect() } fn merge_intervals(intervals: Vec<Interval<usize>>) -> Vec<Interval<usize>> { // merge overlapping intervals intervals .into_iter() .coalesce(|p, c| p.merge(&c).map_err(|_| (p, c))) .collect() } fn print_results(intervals: Vec<Interval<usize>>, lines: Vec<String>) { for interval in intervals { for (line_no, line) in lines .iter() .enumerate() .take(interval.end + 1) .skip(interval.start) { println!("{}: {}", line_no + 1, line) } } } fn read_file(file: impl Read) -> Vec<String> { BufReader::new(file).lines().map_while(Result::ok).collect() } fn main() { let poem = "I have a little shadow that goes in and out with me, And what can be the use of him is more than I can see. He is very, very like me from the heels up to the head; And I see him jump before me, when I jump into my bed. The funniest thing about him is the way he likes to grow - Not at all like proper children, which is always very slow; For he sometimes shoots up taller like an india-rubber ball, And he sometimes gets so little that there’s none of him at all."; let mock_file = std::io::Cursor::new(poem); // command line arguments let pattern = "(all)|(little)"; let before_context = 1; let after_context = 1; // attempt to open the file let lines = read_file(mock_file); //let lines = match File::open(filename) { // // convert the poem into lines // Ok(file) => read_file(file), // Err(e) => { // eprintln!("Error opening {filename}: {e}"); // exit(1); // } //}; // compile the regular expression let regex = match Regex::new(pattern) { Ok(re) => re, // bind re to regex Err(e) => { eprintln!("{e}"); // write to standard error exit(1); } }; // store the 0-based line number for any matched line let match_lines = find_matching_lines(&lines, regex); // create intervals of the form [a,b] with the before/after context let intervals = match create_intervals(match_lines, before_context, after_context) { Ok(intervals) => intervals, Err(_) => { eprintln!("An error occurred while creating intervals"); exit(1); } }; // merge overlapping intervals let intervals = merge_intervals(intervals); // print the lines print_results(intervals, lines); } pub mod interval { /// A list specifying general categories of Interval errors. pub enum IntervalError { /// Start is not less than or equal to end StartEndRangeInvalid, /// Two intervals to be merged do not overlap NonOverlappingInterval, } /// A closed-interval [`start`, `end`] type used for representing a range of /// values between `start` and `end` inclusively. /// /// # Examples /// /// You can create an `Interval` using `new`. /// /// ```rust /// let interval = Interval::new(1, 10).unwrap(); /// assert_eq!(interval.start, 1); /// assert_eq!(interval.end, 10); /// ``` pub struct Interval<T> { pub start: T, pub end: T, } impl<T: Copy + PartialOrd> Interval<T> { /// Creates a new `Interval` set to `start` and `end`. /// /// # Examples /// /// ```rust /// let interval = Interval::new(1, 10).unwrap(); /// assert_eq!(interval.start, 1); /// assert_eq!(interval.end, 10); /// ``` pub fn new(start: T, end: T) -> Result<Self, IntervalError> { if start <= end { Ok(Self { start, end }) } else { Err(IntervalError::StartEndRangeInvalid) } } /// Checks if two intervals overlap. Overlapping intervals have at least /// one point in common. /// /// # Examples /// /// ```rust /// let a = Interval::new(1, 3).unwrap(); /// let b = Interval::new(3, 5).unwrap(); /// assert_eq!(a.overlaps(&b), true); /// assert_eq!(b.overlaps(&a), true); /// ``` /// /// ```rust /// let a = Interval::new(1, 5).unwrap(); /// let b = Interval::new(2, 4).unwrap(); /// assert_eq!(a.overlaps(&b), true); /// assert_eq!(b.overlaps(&a), true); /// ``` /// /// ```rust /// let a = Interval::new(1, 3).unwrap(); /// let b = Interval::new(4, 6).unwrap(); /// assert_eq!(a.overlaps(&b), false); /// assert_eq!(b.overlaps(&a), true); /// ``` pub fn overlaps(&self, other: &Interval<T>) -> bool { self.end >= other.start } /// Merges two intervals returning a new `Interval`. /// /// The merged `Interval` range includes the union of ranges from each /// `Interval`. /// /// # Examples /// /// ```rust /// let a = Interval::new(1, 3).unwrap(); /// let b = Interval::new(3, 5).unwrap(); /// let c = a.merge(&b).unwrap(); /// assert_eq!(c.start, 1); /// assert_eq!(c.end, 5); /// ``` pub fn merge(&self, other: &Self) -> Result<Self, IntervalError> { if self.overlaps(other) { Ok(Self { start: self.start, end: other.end, }) } else { Err(IntervalError::NonOverlappingInterval) } } } }
Next
Let's dive deeper into traits!
The restrictions we place on the supported types are known as trait bounds.
For more information on impl Trait
syntax, see the Rust book
Traits
In the previous section, we specified trait bounds to restrict the types that
can be used with our Interval
. Let's formalize what a trait is:
A trait is officially defined as a collection of methods for an unknown type:
Self
.1
In simpler terms, a trait allows you to define shared behavior in an abstract way. It specifies a set of methods that a type must implement, similar to interfaces in other programming languages.
The Rust By Example book includes many examples of traits and explores their usage in depth. Reviewing this material is highly recommended.
The compiler can automatically provide basic implementations for certain traits using the derive (
#[derive]
) attribute. However, if more complex behavior is needed, these traits can still be manually implemented. Here is a list of derivable traits:
- Comparison traits:
Eq
,PartialEq
, Ord,PartialOrd
. Clone
, to createT
from&T
via a copy.Copy
, to give a type copy semantics instead of move semantics.Hash
, to compute a hash from&T
.Default
, to create an empty instance of a data type.Debug
, to format a value using the{:?}
formatter.
Deriving a Trait
Someone using our Interval
might find it helpful to have an easy way to
compare them. For instance, a user may want to check if two intervals are equal.
The PartialEq
trait, part of the cmp
module, specifies two methods that
any type must define to implement the PartialEq
trait.
pub trait PartialEq<Rhs = Self>
where
Rhs: ?Sized,
{
// Required method
fn eq(&self, other: &Rhs) -> bool;
// Provided method
fn ne(&self, other: &Rhs) -> bool { ... }
}
Notice the comment above the
ne
method that says "Provided method." This indicates that if we manually implement this trait, we only need to provide an implementation foreq
.
Here's our revised Interval
with support for PartialEq
, thanks to the derive
attribute and the Rust compiler. Give it a try!
The
Debug
trait has also been derived, enabling intervals to be printed in a programmer-facing context using the:?
operator. This common pattern in Rust provides more helpful output in the program below.
use interval::Interval; pub mod interval { /// A list specifying general categories of Interval errors. #[derive(Debug)] pub enum IntervalError { /// Start is not less than or equal to end StartEndRangeInvalid, /// Two intervals to be merged do not overlap NonOverlappingInterval, } /// A closed-interval [`start`, `end`] type used for representing a range of /// values between `start` and `end` inclusively. /// /// # Examples /// /// You can create an `Interval` using `new`. /// /// ```rust /// let interval = Interval::new(1, 10).unwrap(); /// assert_eq!(interval.start, 1); /// assert_eq!(interval.end, 10); /// ``` #[derive(Debug, PartialEq)] pub struct Interval<T> { pub start: T, pub end: T, } impl<T: Copy + PartialOrd> Interval<T> { /// Creates a new `Interval` set to `start` and `end`. /// /// # Examples /// /// ```rust /// let interval = Interval::new(1, 10).unwrap(); /// assert_eq!(interval.start, 1); /// assert_eq!(interval.end, 10); /// ``` pub fn new(start: T, end: T) -> Result<Self, IntervalError> { if start <= end { Ok(Self { start, end }) } else { Err(IntervalError::StartEndRangeInvalid) } } /// Checks if two intervals overlap. Overlapping intervals have at least /// one point in common. /// /// # Examples /// /// ```rust /// let a = Interval::new(1, 3).unwrap(); /// let b = Interval::new(3, 5).unwrap(); /// assert_eq!(a.overlaps(&b), true); /// assert_eq!(b.overlaps(&a), true); /// ``` /// /// ```rust /// let a = Interval::new(1, 5).unwrap(); /// let b = Interval::new(2, 4).unwrap(); /// assert_eq!(a.overlaps(&b), true); /// assert_eq!(b.overlaps(&a), true); /// ``` /// /// ```rust /// let a = Interval::new(1, 3).unwrap(); /// let b = Interval::new(4, 6).unwrap(); /// assert_eq!(a.overlaps(&b), false); /// assert_eq!(b.overlaps(&a), true); /// ``` pub fn overlaps(&self, other: &Interval<T>) -> bool { self.end >= other.start } /// Merges two intervals returning a new `Interval`. /// /// The merged `Interval` range includes the union of ranges from each /// `Interval`. /// /// # Examples /// /// ```rust /// let a = Interval::new(1, 3).unwrap(); /// let b = Interval::new(3, 5).unwrap(); /// let c = a.merge(&b).unwrap(); /// assert_eq!(c.start, 1); /// assert_eq!(c.end, 5); /// ``` pub fn merge(&self, other: &Self) -> Result<Self, IntervalError> { if self.overlaps(other) { Ok(Self { start: self.start, end: other.end, }) } else { Err(IntervalError::NonOverlappingInterval) } } } } fn main() { let a = Interval::new(1, 5).unwrap(); // unwrap returns the Ok value or panics let b = Interval::new(1, 5).unwrap(); let c = Interval::new(2, 4).unwrap(); // Comparing intervals with eq and ne. println!("{:?} == {:?} => {}", a, b, a.eq(&b)); println!("{:?} == {:?} => {}", a, c, a.eq(&c)); println!("{:?} != {:?} => {}", a, b, a.ne(&b)); println!("{:?} != {:?} => {}", a, c, a.ne(&c)); // Rust supports operator overloading too! println!("{:?} == {:?} => {}", a, b, a == b); println!("{:?} != {:?} => {}", a, c, a != c); }
The
unwrap()
method is implemented byResult
and other types likeOption
. ForResult
, it returns theOk
value or panics if the value isErr
. Its usage is generally discouraged and is only used here for brevity.
Operator Overloading
You might have noticed the comparison between intervals using
==
and!=
. This works because Rust includes support for operator overloading! You can find all the detailed information in the Operator Overloading section of the Rust by Example book and in theops
module.
Implementing a Trait
Just as comparing traits for equality can be helpful, so can generating a
user-facing representation of an Interval
. By implementing the Display
trait, we can achieve this quite efficiently. Additionally, implementing the
Display
trait automatically implements the ToString
trait, enabling the
use of the to_string()
method. Let's enhance our Interval
code to print an
interval in the form [x, y]
.
Here's the trait definition for Display
:
pub trait Display {
// Required method
fn fmt(&self, f: &mut Formatter<'_>) -> Result<(), Error>;
}
The
<'_>
suffix next toFormatter
is known as lifetime annotation syntax.2
Here's our revised Interval
with the Display
trait implemented. Give it a
try!
use interval::Interval; pub mod interval { /// A list specifying general categories of Interval errors. #[derive(Debug)] pub enum IntervalError { /// Start is not less than or equal to end StartEndRangeInvalid, /// Two intervals to be merged do not overlap NonOverlappingInterval, } /// A closed-interval [`start`, `end`] type used for representing a range of /// values between `start` and `end` inclusively. /// /// # Examples /// /// You can create an `Interval` using `new`. /// /// ```rust /// let interval = Interval::new(1, 10).unwrap(); /// assert_eq!(interval.start, 1); /// assert_eq!(interval.end, 10); /// ``` #[derive(Debug, PartialEq)] pub struct Interval<T> { pub start: T, pub end: T, } use std::fmt; impl<T: fmt::Display> fmt::Display for Interval<T> { fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> fmt::Result { write!(f, "[{}, {}]", self.start, self.end) } } impl<T: Copy + PartialOrd> Interval<T> { /// Creates a new `Interval` set to `start` and `end`. /// /// # Examples /// /// ```rust /// let interval = Interval::new(1, 10).unwrap(); /// assert_eq!(interval.start, 1); /// assert_eq!(interval.end, 10); /// ``` pub fn new(start: T, end: T) -> Result<Self, IntervalError> { if start <= end { Ok(Self { start, end }) } else { Err(IntervalError::StartEndRangeInvalid) } } /// Checks if two intervals overlap. Overlapping intervals have at least /// one point in common. /// /// # Examples /// /// ```rust /// let a = Interval::new(1, 3).unwrap(); /// let b = Interval::new(3, 5).unwrap(); /// assert_eq!(a.overlaps(&b), true); /// assert_eq!(b.overlaps(&a), true); /// ``` /// /// ```rust /// let a = Interval::new(1, 5).unwrap(); /// let b = Interval::new(2, 4).unwrap(); /// assert_eq!(a.overlaps(&b), true); /// assert_eq!(b.overlaps(&a), true); /// ``` /// /// ```rust /// let a = Interval::new(1, 3).unwrap(); /// let b = Interval::new(4, 6).unwrap(); /// assert_eq!(a.overlaps(&b), false); /// assert_eq!(b.overlaps(&a), true); /// ``` pub fn overlaps(&self, other: &Interval<T>) -> bool { self.end >= other.start } /// Merges two intervals returning a new `Interval`. /// /// The merged `Interval` range includes the union of ranges from each /// `Interval`. /// /// # Examples /// /// ```rust /// let a = Interval::new(1, 3).unwrap(); /// let b = Interval::new(3, 5).unwrap(); /// let c = a.merge(&b).unwrap(); /// assert_eq!(c.start, 1); /// assert_eq!(c.end, 5); /// ``` pub fn merge(&self, other: &Self) -> Result<Self, IntervalError> { if self.overlaps(other) { Ok(Self { start: self.start, end: other.end, }) } else { Err(IntervalError::NonOverlappingInterval) } } } } fn main() { let a = Interval::new(1, 5).unwrap(); // unwrap returns the Ok value or panics let b = Interval::new(1, 5).unwrap(); let b_str = b.to_string(); let c = Interval::new(2, 4).unwrap(); println!( "Comparisons between three intervals: {}, {}, {}...", a, b_str, c ); // Comparing intervals with eq and ne. println!("{} == {} => {}", a, b, a.eq(&b)); println!("{} == {} => {}", a, c, a.eq(&c)); println!("{} != {} => {}", a, b, a.ne(&b)); println!("{} != {} => {}", a, c, a.ne(&c)); // Rust supports operator overloading too! println!("{} == {} => {}", a, b, a == b); println!("{} != {} => {}", a, c, a != c); }
Summary
- A trait defines shared behavior abstractly, specifying methods a type must implement, similar to interfaces in other languages.
- The compiler can automatically provide basic implementations for certain
traits using the
#[derive]
attribute. - Rust includes support for operator overloading.
Exercise
- Implement the
PartialOrd
trait forInterval
. To keep the code simple, you can returnNone
for overlapping intervals.
use interval::Interval; pub mod interval { /// A list specifying general categories of Interval errors. #[derive(Debug)] pub enum IntervalError { /// Start is not less than or equal to end StartEndRangeInvalid, /// Two intervals to be merged do not overlap NonOverlappingInterval, } /// A closed-interval [`start`, `end`] type used for representing a range of /// values between `start` and `end` inclusively. /// /// # Examples /// /// You can create an `Interval` using `new`. /// /// ```rust /// let interval = Interval::new(1, 10).unwrap(); /// assert_eq!(interval.start, 1); /// assert_eq!(interval.end, 10); /// ``` #[derive(Debug, PartialEq)] pub struct Interval<T> { pub start: T, pub end: T, } use std::fmt; impl<T: fmt::Display> fmt::Display for Interval<T> { fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> fmt::Result { write!(f, "[{}, {}]", self.start, self.end) } } impl<T: Copy + PartialOrd> Interval<T> { /// Creates a new `Interval` set to `start` and `end`. /// /// # Examples /// /// ```rust /// let interval = Interval::new(1, 10).unwrap(); /// assert_eq!(interval.start, 1); /// assert_eq!(interval.end, 10); /// ``` pub fn new(start: T, end: T) -> Result<Self, IntervalError> { if start <= end { Ok(Self { start, end }) } else { Err(IntervalError::StartEndRangeInvalid) } } /// Checks if two intervals overlap. Overlapping intervals have at least /// one point in common. /// /// # Examples /// /// ```rust /// let a = Interval::new(1, 3).unwrap(); /// let b = Interval::new(3, 5).unwrap(); /// assert_eq!(a.overlaps(&b), true); /// assert_eq!(b.overlaps(&a), true); /// ``` /// /// ```rust /// let a = Interval::new(1, 5).unwrap(); /// let b = Interval::new(2, 4).unwrap(); /// assert_eq!(a.overlaps(&b), true); /// assert_eq!(b.overlaps(&a), true); /// ``` /// /// ```rust /// let a = Interval::new(1, 3).unwrap(); /// let b = Interval::new(4, 6).unwrap(); /// assert_eq!(a.overlaps(&b), false); /// assert_eq!(b.overlaps(&a), true); /// ``` pub fn overlaps(&self, other: &Interval<T>) -> bool { self.end >= other.start } /// Merges two intervals returning a new `Interval`. /// /// The merged `Interval` range includes the union of ranges from each /// `Interval`. /// /// # Examples /// /// ```rust /// let a = Interval::new(1, 3).unwrap(); /// let b = Interval::new(3, 5).unwrap(); /// let c = a.merge(&b).unwrap(); /// assert_eq!(c.start, 1); /// assert_eq!(c.end, 5); /// ``` pub fn merge(&self, other: &Self) -> Result<Self, IntervalError> { if self.overlaps(other) { Ok(Self { start: self.start, end: other.end, }) } else { Err(IntervalError::NonOverlappingInterval) } } } } fn main() { use std::cmp::Ordering::{Equal, Greater, Less}; let a = Interval::new(0, 1).unwrap(); let b = Interval::new(0, 1).unwrap(); assert_eq!(a.partial_cmp(&b), Some(Equal)); assert_eq!(b.partial_cmp(&a), Some(Equal)); let a = Interval::new(0, 1).unwrap(); let b = Interval::new(2, 3).unwrap(); assert_eq!(a.partial_cmp(&b), Some(Less)); assert_eq!(b.partial_cmp(&a), Some(Greater)); println!("Nice Work!"); }
Solution
use interval::Interval; pub mod interval { /// A list specifying general categories of Interval errors. #[derive(Debug)] pub enum IntervalError { /// Start is not less than or equal to end StartEndRangeInvalid, /// Two intervals to be merged do not overlap NonOverlappingInterval, } /// A closed-interval [`start`, `end`] type used for representing a range of /// values between `start` and `end` inclusively. /// /// # Examples /// /// You can create an `Interval` using `new`. /// /// ```rust /// let interval = Interval::new(1, 10).unwrap(); /// assert_eq!(interval.start, 1); /// assert_eq!(interval.end, 10); /// ``` #[derive(Debug, PartialEq)] pub struct Interval<T> { pub start: T, pub end: T, } use std::fmt; impl<T: fmt::Display> fmt::Display for Interval<T> { fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> fmt::Result { write!(f, "[{}, {}]", self.start, self.end) } } use std::cmp::Ordering; impl<T: PartialEq + PartialOrd> PartialOrd for Interval<T> { fn partial_cmp(&self, other: &Self) -> Option<Ordering> { if self == other { Some(Ordering::Equal) } else if self.end < other.start { Some(Ordering::Less) } else if self.start > other.end { Some(Ordering::Greater) } else { None // Intervals overlap } } } impl<T: Copy + PartialOrd> Interval<T> { /// Creates a new `Interval` set to `start` and `end`. /// /// # Examples /// /// ```rust /// let interval = Interval::new(1, 10).unwrap(); /// assert_eq!(interval.start, 1); /// assert_eq!(interval.end, 10); /// ``` pub fn new(start: T, end: T) -> Result<Self, IntervalError> { if start <= end { Ok(Self { start, end }) } else { Err(IntervalError::StartEndRangeInvalid) } } /// Checks if two intervals overlap. Overlapping intervals have at least /// one point in common. /// /// # Examples /// /// ```rust /// let a = Interval::new(1, 3).unwrap(); /// let b = Interval::new(3, 5).unwrap(); /// assert_eq!(a.overlaps(&b), true); /// assert_eq!(b.overlaps(&a), true); /// ``` /// /// ```rust /// let a = Interval::new(1, 5).unwrap(); /// let b = Interval::new(2, 4).unwrap(); /// assert_eq!(a.overlaps(&b), true); /// assert_eq!(b.overlaps(&a), true); /// ``` /// /// ```rust /// let a = Interval::new(1, 3).unwrap(); /// let b = Interval::new(4, 6).unwrap(); /// assert_eq!(a.overlaps(&b), false); /// assert_eq!(b.overlaps(&a), true); /// ``` pub fn overlaps(&self, other: &Interval<T>) -> bool { self.end >= other.start } /// Merges two intervals returning a new `Interval`. /// /// The merged `Interval` range includes the union of ranges from each /// `Interval`. /// /// # Examples /// /// ```rust /// let a = Interval::new(1, 3).unwrap(); /// let b = Interval::new(3, 5).unwrap(); /// let c = a.merge(&b).unwrap(); /// assert_eq!(c.start, 1); /// assert_eq!(c.end, 5); /// ``` pub fn merge(&self, other: &Self) -> Result<Self, IntervalError> { if self.overlaps(other) { Ok(Self { start: self.start, end: other.end, }) } else { Err(IntervalError::NonOverlappingInterval) } } } } fn main() { use std::cmp::Ordering::{Equal, Greater, Less}; let a = Interval::new(0, 1).unwrap(); let b = Interval::new(0, 1).unwrap(); assert_eq!(a.partial_cmp(&b), Some(Equal)); assert_eq!(b.partial_cmp(&a), Some(Equal)); let a = Interval::new(0, 1).unwrap(); let b = Interval::new(2, 3).unwrap(); assert_eq!(a.partial_cmp(&b), Some(Less)); assert_eq!(b.partial_cmp(&a), Some(Greater)); println!("Nice Work!"); }
Next
Let's combine all the code and take a brief moment to explain the attributes.
https://doc.rust-lang.org/rust-by-example/trait.html
This is a separate topic that you can learn more about in the Validating References with Lifetimes section of the Rust Programming Language and the Lifetimes section of Rust By Example.
Grep
Now that we've covered traits, let's ensure our projects are synchronized. Our
interval module has been enhanced to support several new capabilities. We can
print programmer-facing intervals using the Debug
trait and user-facing
intervals using the Display
trait. Additionally, we have limited support for
comparing intervals through the PartialEq
and PartialOrd
traits. Here's the
completed code:
#![allow(unused_imports)] extern crate regex; // this is needed for the playground use interval::{Interval, IntervalError}; use itertools::Itertools; use regex::Regex; use std::fs::File; use std::io::Read; use std::io::{BufRead, BufReader}; use std::process::exit; fn find_matching_lines(lines: &[String], regex: Regex) -> Vec<usize> { lines .iter() .enumerate() .filter_map(|(i, line)| match regex.is_match(line) { true => Some(i), false => None, }) .collect() // turns anything iterable into a collection } fn create_intervals( lines: Vec<usize>, before_context: usize, after_context: usize, ) -> Result<Vec<Interval<usize>>, IntervalError> { lines .iter() .map(|line| { let start = line.saturating_sub(before_context); let end = line.saturating_add(after_context); Interval::new(start, end) }) .collect() } fn merge_intervals(intervals: Vec<Interval<usize>>) -> Vec<Interval<usize>> { // merge overlapping intervals intervals .into_iter() .coalesce(|p, c| p.merge(&c).map_err(|_| (p, c))) .collect() } fn print_results(intervals: Vec<Interval<usize>>, lines: Vec<String>) { for interval in intervals { for (line_no, line) in lines .iter() .enumerate() .take(interval.end + 1) .skip(interval.start) { println!("{}: {}", line_no + 1, line) } } } fn read_file(file: impl Read) -> Vec<String> { BufReader::new(file).lines().map_while(Result::ok).collect() } fn main() { let poem = "I have a little shadow that goes in and out with me, And what can be the use of him is more than I can see. He is very, very like me from the heels up to the head; And I see him jump before me, when I jump into my bed. The funniest thing about him is the way he likes to grow - Not at all like proper children, which is always very slow; For he sometimes shoots up taller like an india-rubber ball, And he sometimes gets so little that there’s none of him at all."; let mock_file = std::io::Cursor::new(poem); // command line arguments let pattern = "(all)|(little)"; let before_context = 1; let after_context = 1; // attempt to open the file let lines = read_file(mock_file); //let lines = match File::open(filename) { // // convert the poem into lines // Ok(file) => read_file(file), // Err(e) => { // eprintln!("Error opening {filename}: {e}"); // exit(1); // } //}; // compile the regular expression let regex = match Regex::new(pattern) { Ok(re) => re, // bind re to regex Err(e) => { eprintln!("{e}"); // write to standard error exit(1); } }; // store the 0-based line number for any matched line let match_lines = find_matching_lines(&lines, regex); // create intervals of the form [a,b] with the before/after context let intervals = match create_intervals(match_lines, before_context, after_context) { Ok(intervals) => intervals, Err(_) => { eprintln!("An error occurred while creating intervals"); exit(1); } }; // merge overlapping intervals let intervals = merge_intervals(intervals); // print the lines print_results(intervals, lines); } pub mod interval { /// A list specifying general categories of Interval errors. #[derive(Debug)] pub enum IntervalError { /// Start is not less than or equal to end StartEndRangeInvalid, /// Two intervals to be merged do not overlap NonOverlappingInterval, } /// A closed-interval [`start`, `end`] type used for representing a range of /// values between `start` and `end` inclusively. /// /// # Examples /// /// You can create an `Interval` using `new`. /// /// ```rust /// let interval = Interval::new(1, 10).unwrap(); /// assert_eq!(interval.start, 1); /// assert_eq!(interval.end, 10); /// ``` #[derive(Debug, PartialEq)] pub struct Interval<T> { pub start: T, pub end: T, } impl<T: Copy + PartialOrd> Interval<T> { /// Creates a new `Interval` set to `start` and `end`. /// /// # Examples /// /// ```rust /// let interval = Interval::new(1, 10).unwrap(); /// assert_eq!(interval.start, 1); /// assert_eq!(interval.end, 10); /// ``` pub fn new(start: T, end: T) -> Result<Self, IntervalError> { if start <= end { Ok(Self { start, end }) } else { Err(IntervalError::StartEndRangeInvalid) } } /// Checks if two intervals overlap. Overlapping intervals have at least /// one point in common. /// /// # Examples /// /// ```rust /// let a = Interval::new(1, 3).unwrap(); /// let b = Interval::new(3, 5).unwrap(); /// assert_eq!(a.overlaps(&b), true); /// assert_eq!(b.overlaps(&a), true); /// ``` /// /// ```rust /// let a = Interval::new(1, 5).unwrap(); /// let b = Interval::new(2, 4).unwrap(); /// assert_eq!(a.overlaps(&b), true); /// assert_eq!(b.overlaps(&a), true); /// ``` /// /// ```rust /// let a = Interval::new(1, 3).unwrap(); /// let b = Interval::new(4, 6).unwrap(); /// assert_eq!(a.overlaps(&b), false); /// assert_eq!(b.overlaps(&a), true); /// ``` pub fn overlaps(&self, other: &Interval<T>) -> bool { self.end >= other.start } /// Merges two intervals returning a new `Interval`. /// /// The merged `Interval` range includes the union of ranges from each /// `Interval`. /// /// # Examples /// /// ```rust /// let a = Interval::new(1, 3).unwrap(); /// let b = Interval::new(3, 5).unwrap(); /// let c = a.merge(&b).unwrap(); /// assert_eq!(c.start, 1); /// assert_eq!(c.end, 5); /// ``` pub fn merge(&self, other: &Self) -> Result<Self, IntervalError> { if self.overlaps(other) { Ok(Self { start: self.start, end: other.end, }) } else { Err(IntervalError::NonOverlappingInterval) } } } use std::fmt; impl<T: fmt::Display> fmt::Display for Interval<T> { fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> fmt::Result { write!(f, "[{}, {}]", self.start, self.end) } } use std::cmp::Ordering; impl<T: PartialEq + PartialOrd> PartialOrd for Interval<T> { fn partial_cmp(&self, other: &Self) -> Option<Ordering> { if self == other { Some(Ordering::Equal) } else if self.end < other.start { Some(Ordering::Less) } else if self.start > other.end { Some(Ordering::Greater) } else { None // Intervals overlap } } } }
Next
Let's shift our focus to attributes!
Attributes
Throughout this course, we've been using attributes, but we haven't delved into them yet. Let's briefly discuss the attributes you've encountered and provide some useful resources for further exploration.
At a high level, an attribute is metadata applied to a crate, module, or item. This metadata serves various purposes. Attributes follow this syntax:
InnerAttribute :
# ! [ Attr ]
OuterAttribute :
# [ Attr ]
Attr :
SimplePath AttrInput?
| unsafe ( SimplePath AttrInput? )
AttrInput :
DelimTokenTree
| = Expression
Attributes come in two forms: InnerAttribute
and OuterAttribute
. The
difference between them is that an InnerAttribute
includes a bang (!
)
between the #
and the [
. An InnerAttribute
applies to the item it is
declared within, while an OuterAttribute
applies to whatever follows it.
Inner Attributes
For example, in our grep program, we've used the inner attribute
#![allow(unused_imports)]
. By default, the Rust compiler issues warnings if
your program imports packages via use
but doesn't actually utilize them. Our
program imports the File
module with use std::fs::File;
, but since we are
using Cursor
for development and have commented out the file I/O code, the
compiler complains about the unused import. This attribute disables that warning
for the crate.
Outer Attributes
Recall in the last section, we derived the Debug
and PartialEq
traits. We
achieved this using the #[derive(...)]
attribute. Notice how this attribute is
placed directly before the definition of the Interval
struct
. This is an
example of an OuterAttribute
.
#[derive(Debug, PartialEq)]
pub struct Interval<T> {
pub start: T,
pub end: T,
}
Classification of Attributes
Attributes are commonly grouped into the following categories:
Summary
The Rust Reference provides an in-depth look at attributes, and you are encouraged to explore it for a comprehensive understanding of their uses. Additionally, Rust By Example offers extra material and illustrates various use cases.
Next
Attributes offer even greater utility, which we'll explore through command line argument parsing!
Command Line Arguments
In the Cargo.toml
section, you were asked to add the clap
crate as a
dependency to support command line argument parsing for our grep program. With
our current understanding of attributes, it's an ideal time to utilize this
crate by extending our grep program to handle command line arguments.
One method for defining command line arguments with clap
involves using custom
attributes. This is why we specified the derive feature in the dependency.
If you didn't complete the exercise and are working through this course locally, you can add the dependency to the grep crate with:
$ cargo add clap --features derive
Advanced Attributes
The topic of macro attributes and derive macro helper attributes is quite advanced and beyond the scope of this course. Therefore, we won't delve into their inner workings in depth. However, keep this section in mind as you continue your Rust journey, because the time will likely come when you will need to implement similar attributes yourself for code generation.
Extending Grep
We're going to add command line argument support to our program. However, we'll
continue to mock the command line arguments to keep developing this course
online. After configuring clap
, this will be the auto-generated help (i.e.,
typing grep --help
in the terminal).
$ grep.exe --help
Usage: grep.exe [OPTIONS] <PATTERN> [FILES]...
Arguments:
<PATTERN> The regular expression to match
[FILES]... List of files to search
Options:
-l, --line-number Prefix each line of output with the 1-based line
number within its input file
-b, --before-context <num> Print num lines of trailing context before matching
lines [default: 0]
-a, --after-context <num> Print num lines of trailing context after matching
lines [default: 0]
-h, --help Print help
-V, --version Print version
The
CLAP
documentation is very extensive and includes plenty of examples to learn from. If you find yourself building a CLI application, you will most certainly want to reference the documentation.
Here's the updated version of grep with command line argument support added:
#![allow(unused_imports)] extern crate regex; // this is needed for the playground use clap::Parser; use interval::{Interval, IntervalError}; use itertools::Itertools; use regex::Regex; use std::fs::File; use std::io::Read; use std::io::{BufRead, BufReader}; use std::path::PathBuf; use std::process::exit; fn find_matching_lines(lines: &[String], regex: Regex) -> Vec<usize> { lines .iter() .enumerate() .filter_map(|(i, line)| match regex.is_match(line) { true => Some(i), false => None, }) .collect() // turns anything iterable into a collection } fn create_intervals( lines: Vec<usize>, before_context: usize, after_context: usize, ) -> Result<Vec<Interval<usize>>, IntervalError> { lines .iter() .map(|line| { let start = line.saturating_sub(before_context); let end = line.saturating_add(after_context); Interval::new(start, end) }) .collect() } fn merge_intervals(intervals: Vec<Interval<usize>>) -> Vec<Interval<usize>> { // merge overlapping intervals intervals .into_iter() .coalesce(|p, c| p.merge(&c).map_err(|_| (p, c))) .collect() } fn print_results( intervals: Vec<Interval<usize>>, lines: Vec<String>, line_number: bool, ) { for interval in intervals { for (line_no, line) in lines .iter() .enumerate() .take(interval.end + 1) .skip(interval.start) { if line_number { print!("{}: ", line_no + 1); } println!("{}", line); } } } fn read_file(file: impl Read) -> Vec<String> { BufReader::new(file).lines().map_while(Result::ok).collect() } #[derive(Parser)] #[command(version, about, long_about = None)] struct Cli { /// Prefix each line of output with the 1-based line number within its /// input file. #[arg(short, long, default_value_t = false)] line_number: bool, /// Print num lines of trailing context before matching lines. #[arg(short, long, default_value_t = 0, value_name = "num")] before_context: u8, /// Print num lines of trailing context after matching lines. #[arg(short, long, default_value_t = 0, value_name = "num")] after_context: u8, /// The regular expression to match. #[arg(required = true)] pattern: String, /// List of files to search. #[arg(required = true)] files: Vec<PathBuf>, } fn main() { let poem = "I have a little shadow that goes in and out with me, And what can be the use of him is more than I can see. He is very, very like me from the heels up to the head; And I see him jump before me, when I jump into my bed. The funniest thing about him is the way he likes to grow - Not at all like proper children, which is always very slow; For he sometimes shoots up taller like an india-rubber ball, And he sometimes gets so little that there’s none of him at all."; let mock_file = std::io::Cursor::new(poem); // let cli = Cli::parse(); // for production use // mock command line arguments let cli = match Cli::try_parse_from([ "grep", // executable name "--line-number", "--before-context", "1", "--after-context", "1", "(all)|(little)", // pattern "poem.txt", // file ]) { Ok(cli) => cli, Err(e) => { eprintln!("Error parsing command line arguments: {e:?}"); exit(1); } }; // get values from clap let pattern = cli.pattern; let line_number = cli.line_number; let before_context = cli.before_context as usize; let after_context = cli.after_context as usize; // attempt to open the file let lines = read_file(mock_file); //let lines = match File::open(filename) { // // convert the poem into lines // Ok(file) => read_file(file), // Err(e) => { // eprintln!("Error opening {filename}: {e}"); // exit(1); // } //}; // compile the regular expression let regex = match Regex::new(&pattern) { Ok(re) => re, // bind re to regex Err(e) => { eprintln!("{e}"); // write to standard error exit(1); } }; // store the 0-based line number for any matched line let match_lines = find_matching_lines(&lines, regex); // create intervals of the form [a,b] with the before/after context let intervals = match create_intervals(match_lines, before_context, after_context) { Ok(intervals) => intervals, Err(_) => { eprintln!("An error occurred while creating intervals"); exit(1); } }; // merge overlapping intervals let intervals = merge_intervals(intervals); // print the lines print_results(intervals, lines, line_number); } pub mod interval { /// A list specifying general categories of Interval errors. #[derive(Debug)] pub enum IntervalError { /// Start is not less than or equal to end StartEndRangeInvalid, /// Two intervals to be merged do not overlap NonOverlappingInterval, } /// A closed-interval [`start`, `end`] type used for representing a range of /// values between `start` and `end` inclusively. /// /// # Examples /// /// You can create an `Interval` using `new`. /// /// ```rust /// let interval = Interval::new(1, 10).unwrap(); /// assert_eq!(interval.start, 1); /// assert_eq!(interval.end, 10); /// ``` #[derive(Debug, PartialEq)] pub struct Interval<T> { pub start: T, pub end: T, } impl<T: Copy + PartialOrd> Interval<T> { /// Creates a new `Interval` set to `start` and `end`. /// /// # Examples /// /// ```rust /// let interval = Interval::new(1, 10).unwrap(); /// assert_eq!(interval.start, 1); /// assert_eq!(interval.end, 10); /// ``` pub fn new(start: T, end: T) -> Result<Self, IntervalError> { if start <= end { Ok(Self { start, end }) } else { Err(IntervalError::StartEndRangeInvalid) } } /// Checks if two intervals overlap. Overlapping intervals have at least /// one point in common. /// /// # Examples /// /// ```rust /// let a = Interval::new(1, 3).unwrap(); /// let b = Interval::new(3, 5).unwrap(); /// assert_eq!(a.overlaps(&b), true); /// assert_eq!(b.overlaps(&a), true); /// ``` /// /// ```rust /// let a = Interval::new(1, 5).unwrap(); /// let b = Interval::new(2, 4).unwrap(); /// assert_eq!(a.overlaps(&b), true); /// assert_eq!(b.overlaps(&a), true); /// ``` /// /// ```rust /// let a = Interval::new(1, 3).unwrap(); /// let b = Interval::new(4, 6).unwrap(); /// assert_eq!(a.overlaps(&b), false); /// assert_eq!(b.overlaps(&a), true); /// ``` pub fn overlaps(&self, other: &Interval<T>) -> bool { self.end >= other.start } /// Merges two intervals returning a new `Interval`. /// /// The merged `Interval` range includes the union of ranges from each /// `Interval`. /// /// # Examples /// /// ```rust /// let a = Interval::new(1, 3).unwrap(); /// let b = Interval::new(3, 5).unwrap(); /// let c = a.merge(&b).unwrap(); /// assert_eq!(c.start, 1); /// assert_eq!(c.end, 5); /// ``` pub fn merge(&self, other: &Self) -> Result<Self, IntervalError> { if self.overlaps(other) { Ok(Self { start: self.start, end: other.end, }) } else { Err(IntervalError::NonOverlappingInterval) } } } use std::fmt; impl<T: fmt::Display> fmt::Display for Interval<T> { fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> fmt::Result { write!(f, "[{}, {}]", self.start, self.end) } } use std::cmp::Ordering; impl<T: PartialEq + PartialOrd> PartialOrd for Interval<T> { fn partial_cmp(&self, other: &Self) -> Option<Ordering> { if self == other { Some(Ordering::Equal) } else if self.end < other.start { Some(Ordering::Less) } else if self.start > other.end { Some(Ordering::Greater) } else { None // Intervals overlap } } } }
Summary
We made some minor code changes along with configuring clap
. Most of it should
be straightforward by now, so we'll focus on the new aspects:
- The
PathBuf
module facilitates cross-platform path manipulation. Since our grep program requires files to search, it's better to usePathBuf
for handling that input. - We updated
print_results
to take a new boolean argument,line_number
, which specifies whether or not to print line numbers in the output. Iftrue
, theprint!
macro is used to output the line number without emitting a newline. - We used
clap
attributes to derive our command line arguments. Theclap
documentation covers these attributes in full detail. - In
main
, we mock command line arguments with an array of string slices that gets parsed byclap
.The
match
expression is for development purposes. If we made any mistakes in our mock command line arguments, we would catch the error and print it. - Since we defined the
before_context
andafter_context
variables asu8
in theCli
structure but useusize
throughout the code, an explicit cast usingas
is necessary when assigning these values to local variables.
Next
Onward to concurrency!
Concurrency
We're nearing the end of the course, and there's no better way to wrap things up than with concurrency! Rust, like many other languages, includes a native thread module for spawning threads and performing tasks in parallel. However, numerous crates build on Rust's native multithreading capabilities, simplifying the process and offering various threading patterns.
For instance, if you've programmed in Go, you're likely familiar with channels
for moving data between goroutines
. Similarly, if you've worked with
JavaScript, you probably know about async
and await
. Both of these
paradigms, along with many others, are available as crates in Rust.
Here's a few examples worth exploring:
- crossbeam: A set of tools for concurrent programming
- rayon: A data-parallelism library for Rust.
- tokio: An event-driven, non-blocking I/O platform for writing asynchronous I/O backed applications.
Let's dive into multithreading!
Non-scoped Threads
The simplest way to enable multithreading support in our Grep program is to spawn a thread for each specified file, perform the pattern matching steps we've developed, and print the results using a fork/join pattern. If you have a C++ background, you might start by exploring the functions in the thread module, discovering the spawn function, and diving right in.
However, this approach is flawed! While it might be a valuable learning experience, it will likely be time-consuming, frustrating, and confusing. Let's examine some of the pitfalls of this approach and then explore a more effective solution.
The spawn
Function
Here's the signature for the spawn
function:
pub fn spawn<F, T>(f: F) -> JoinHandle<T>
where
F: FnOnce() -> T + Send + 'static,
T: Send + 'static,
The spawn
function takes a closure (f
) as an argument, spawns a new thread
to run the closure, and returns a JoinHandle
. As the name suggests, the
JoinHandle
provides a join
method to wait for the spawned thread to
finish. So far, so good. However, the 'static
1 constraint on F
and the
return value T
is where, as the Bard would say, lies the rub.
The 'static
constraint requires that the closure and its return value must
live for the entire duration of the program's execution. This is necessary
because threads can outlive the context in which they were created.
If a thread, and by extension its return value, can outlive its caller, we need
to ensure they remain valid afterward. Since we can't predict when the thread
will finish, they must be valid until the end of the program. Hence, the
'static
lifetime.
Don't worry if this concept is still unclear; it's one of the most challenging aspects of Rust, and many developers find it difficult at first.
Let's examine a simple program to better understand these concepts. Review the following code snippet and then run the program.
fn main() { use std::thread; let error = 0373; // Compiler error E0373 let handler = thread::spawn(|| { println!("{error}"); }); handler.join().unwrap(); }
This simple program seems perfectly valid. So why does the Rust compiler
complain about code that would run without issues in most other languages? Even
though it's not the case in this example, in a more general sense, the spawned
thread created in main
could potentially outlive the thread from which it was
created. Because the closure borrows the error
value, the parent thread could
go out of scope, causing the error
value to be dropped rendering the reference
invalid. Because the compiler can't know whether or not that's going to
happen, we see the error:
error[E0373]: closure may outlive the current function, but it borrows
error
, which is owned by the current function
The compiler does offer a solution that may work under some circumstances:
help: to force the closure to take ownership of
error
(and any other referenced variables), use themove
2 keyword
Let's see what happens if we do that:
fn main() { use std::thread; let error = 0373; // Compiler error E0373 let handler = thread::spawn(move || { println!("{error}"); }); handler.join().unwrap(); }
Recalling our discussion on copy semantics, it probably makes sense why this
worked. The value was copied into the scope of the spawned thread, making it
independent from the original and solving the lifetime problem. The variable is
now guaranteed to live for as long as the thread. This works fine for primitive
types and types implementing the Copy
trait. But what if copying is an
expensive operation, or the object can't be copied? This leads to the next
challenge that developers learning Rust often face: what happens if we need to
spawn additional threads that also need access to the same variable?
For example, consider this case:
fn main() { use std::thread; let error = String::from("E0373"); // Compiler error E0373 let handler1 = thread::spawn(move || { println!("{error}"); }); let handler2 = thread::spawn(move || { println!("{error}"); }); handler1.join().unwrap(); handler2.join().unwrap(); }
While these strict rules may seem daunting, especially in larger programs, they are the foundation of Rust's fearless concurrency goals. These rules ensure that concurrency in your programs is both safe and efficient.
Now that we understand why the traditional fork/join model, which works in many other languages, is likely to fail in Rust, let's explore how to correctly implement this approach!
Lifetimes are another type of generic. However, instead of ensuring that a type has the desired behavior, they ensure that references remain valid for the desired duration.
move
converts any variables captured by reference or mutable reference to
variables captured by value.
Scoped Threads
In the previous section, we explored the spawn
method in the thread
module
and noted its limitation: it cannot borrow non-'static
data because the
compiler cannot guarantee that all threads will be joined before the lifetimes
of any borrowed values expire. To address this issue, the thread
module also
offers the scope
function, which works in conjunction with Scope
. This
combination allows for borrowing non-'static
data by ensuring that all spawned
threads within the scope are automatically joined before the function in which
they were created returns, unless they were manually joined earlier.
The scope
Function
Here's the signature for the scope
function:
pub fn scope<'env, F, T>(f: F) -> T
where
F: for<'scope> FnOnce(&'scope Scope<'scope, 'env>) -> T,
The scope
function takes a closure (f
) as an argument, which receives a
Scope
object (created by scope
). This Scope
object allows threads to be
spawned using the spawn
method. The spawn method returns a
ScopedJoinHandle
, which, as the name suggests, provides a join
method to
wait for the spawned thread to complete.
Scoped threads involve two lifetimes: 'scope
and 'env
:
'env
: This is a lifetime parameter that represents the lifetime of the environment data that theScope
can borrow (meaning the data from outside the scope). It ensures that any references to the environment data outlive the scope.'scope
: This is another lifetime parameter that represents the lifetime of the scope itself. This is the period during which new scoped threads can be spawned and running. It ensures that any running threads are joined before the lifetime ends.
In plain English, the scope
function takes a closure f
that can borrow data
with a specific lifetime 'env
and ensures that all threads spawned within the
scope are joined before the function returns.
Let's revisit one of the examples from the previous section that failed to compile to better understand these concepts. Review the following code snippet and then run the program.
fn main() { use std::thread; let error = String::from("E0373"); // Compiler error E0373 thread::scope(|s| { let handler = s.spawn(|| { println!("{error}"); }); s.spawn(|| { println!("{error}"); }); handler.join().unwrap(); // Manually join the first thread // Second thread is automatically joined when the closure returns. }); }
Through these ownership and type system tools, it is guaranteed that all threads
created within scope
are joined before their lifetimes end. This allows the
Rust compiler to be certain that all borrowed data (in this case, the error
String
) remains valid for the lifetime of the threads. This is how Rust turns
many runtime errors into compile-time errors. Concepts like this facilitate
fearless concurrency in Rust!
Multithreaded Grep
With our understanding of scoped vs non-scoped threads, we are now prepared to correctly update Grep to process each file specified on the command line in a separate thread.
Here's the updated version of the program with the changes visible.
#![allow(unused_imports)] extern crate regex; // this is needed for the playground use clap::Parser; use interval::{Interval, IntervalError}; use itertools::Itertools; use regex::Regex; use std::collections::HashMap; use std::fs::File; use std::io::Read; use std::io::{BufRead, BufReader}; use std::path::PathBuf; use std::process::exit; use std::thread; fn find_matching_lines(lines: &[String], regex: &Regex) -> Vec<usize> { lines .iter() .enumerate() .filter_map(|(i, line)| match regex.is_match(line) { true => Some(i), false => None, }) .collect() // turns anything iterable into a collection } fn create_intervals( lines: Vec<usize>, before_context: usize, after_context: usize, ) -> Result<Vec<Interval<usize>>, IntervalError> { lines .iter() .map(|line| { let start = line.saturating_sub(before_context); let end = line.saturating_add(after_context); Interval::new(start, end) }) .collect() } fn merge_intervals(intervals: Vec<Interval<usize>>) -> Vec<Interval<usize>> { // merge overlapping intervals intervals .into_iter() .coalesce(|p, c| p.merge(&c).map_err(|_| (p, c))) .collect() } fn print_results( intervals: Vec<Interval<usize>>, lines: Vec<String>, line_number: bool, ) { for interval in intervals { for (line_no, line) in lines .iter() .enumerate() .take(interval.end + 1) .skip(interval.start) { if line_number { print!("{}: ", line_no + 1); } println!("{}", line); } } } fn read_file(file: impl Read) -> Vec<String> { BufReader::new(file).lines().map_while(Result::ok).collect() } #[derive(Parser)] #[command(version, about, long_about = None)] struct Cli { /// Prefix each line of output with the 1-based line number within its /// input file. #[arg(short, long, default_value_t = false)] line_number: bool, /// Print num lines of trailing context before matching lines. #[arg(short, long, default_value_t = 0, value_name = "num")] before_context: u8, /// Print num lines of trailing context after matching lines. #[arg(short, long, default_value_t = 0, value_name = "num")] after_context: u8, /// The regular expression to match. #[arg(required = true)] pattern: String, /// List of files to search. #[arg(required = true)] files: Vec<PathBuf>, } // Result from a thread struct GrepSuccess { intervals: Vec<Interval<usize>>, lines: Vec<String>, } // Result from a failed thread struct GrepFailure { error: String, } fn main() { // let cli = Cli::parse(); // for production use // mock command line arguments let cli = match Cli::try_parse_from([ "grep", // executable name "--line-number", "--before-context", "1", "--after-context", "1", "(all)|(will)", // pattern // file(s)... "poem.txt", "bad_file.txt", // intention failure "scoped_threads.txt", ]) { Ok(cli) => cli, Err(e) => { eprintln!("Error parsing command line arguments: {e:?}"); exit(1); } }; // map of filename/file contents to simulate opening a file let mock_disk = HashMap::from([ ( "poem.txt", "I have a little shadow that goes in and out with me, And what can be the use of him is more than I can see. He is very, very like me from the heels up to the head; And I see him jump before me, when I jump into my bed. The funniest thing about him is the way he likes to grow - Not at all like proper children, which is always very slow; For he sometimes shoots up taller like an india-rubber ball, And he sometimes gets so little that there’s none of him at all.", ), ( "scoped_threads.txt", "When we work with scoped threads, the compiler can clearly see, if the variables we want to use will be avilable to me. Because of this visiblity, I'm runtime error free! And issues in my code will be exposed by rustc. If this sort of safety is provided at native speeds, there's simply no compelling case to stick with cpp!", ), ]); // get values from clap let pattern = cli.pattern; let line_number = cli.line_number; let before_context = cli.before_context as usize; let after_context = cli.after_context as usize; let files = cli.files; // compile the regular expression let regex = match Regex::new(&pattern) { Ok(re) => re, // bind re to regex Err(e) => { eprintln!("{e}"); // write to standard error exit(1); } }; thread::scope(|s| { let handles: Vec<_> = files .iter() .map(|file| { let filename = match file.to_str() { Some(filename) => filename, None => { return Err(GrepFailure { error: format!( "Invalid filename: {}", file.display() ), }) } }; // attempt to open the file //let lines = match File::open(filename) { // // convert the poem into lines // Ok(file) => read_file(file), // Err(e) => { // eprintln!("Error opening {filename}: {e}"); // exit(1); // } //}; if !mock_disk.contains_key(filename) { return Err(GrepFailure { error: format!("File not found: {}", filename), }); } Ok(filename) }) .map_ok(|filename| { // only spawn a thread for accessible file s.spawn(|| { let contents = mock_disk.get(filename).unwrap(); let mock_file = std::io::Cursor::new(contents); let lines = read_file(mock_file); // store the 0-based line number for any matched line let match_lines = find_matching_lines(&lines, ®ex); // create intervals of the form [a,b] with the before/after context let intervals = match create_intervals( match_lines, before_context, after_context, ) { Ok(intervals) => intervals, Err(_) => return Err(GrepFailure { error: String::from( "An error occurred while creating intervals", ), }), }; // merge overlapping intervals let intervals = merge_intervals(intervals); Ok(GrepSuccess { intervals, lines }) }) }) .collect(); // process all the results for handle in handles { let result = match handle { Ok(scoped_join_handle) => scoped_join_handle, Err(e) => { eprintln!("{}", e.error); continue; } }; if let Ok(result) = result.join() { match result { Ok(result) => print_results( result.intervals, result.lines, line_number, ), Err(e) => eprintln!("{}", e.error), }; }; } }); } pub mod interval { /// A list specifying general categories of Interval errors. #[derive(Debug)] pub enum IntervalError { /// Start is not less than or equal to end StartEndRangeInvalid, /// Two intervals to be merged do not overlap NonOverlappingInterval, } /// A closed-interval [`start`, `end`] type used for representing a range of /// values between `start` and `end` inclusively. /// /// # Examples /// /// You can create an `Interval` using `new`. /// /// ```rust /// let interval = Interval::new(1, 10).unwrap(); /// assert_eq!(interval.start, 1); /// assert_eq!(interval.end, 10); /// ``` #[derive(Debug, PartialEq)] pub struct Interval<T> { pub start: T, pub end: T, } impl<T: Copy + PartialOrd> Interval<T> { /// Creates a new `Interval` set to `start` and `end`. /// /// # Examples /// /// ```rust /// let interval = Interval::new(1, 10).unwrap(); /// assert_eq!(interval.start, 1); /// assert_eq!(interval.end, 10); /// ``` pub fn new(start: T, end: T) -> Result<Self, IntervalError> { if start <= end { Ok(Self { start, end }) } else { Err(IntervalError::StartEndRangeInvalid) } } /// Checks if two intervals overlap. Overlapping intervals have at least /// one point in common. /// /// # Examples /// /// ```rust /// let a = Interval::new(1, 3).unwrap(); /// let b = Interval::new(3, 5).unwrap(); /// assert_eq!(a.overlaps(&b), true); /// assert_eq!(b.overlaps(&a), true); /// ``` /// /// ```rust /// let a = Interval::new(1, 5).unwrap(); /// let b = Interval::new(2, 4).unwrap(); /// assert_eq!(a.overlaps(&b), true); /// assert_eq!(b.overlaps(&a), true); /// ``` /// /// ```rust /// let a = Interval::new(1, 3).unwrap(); /// let b = Interval::new(4, 6).unwrap(); /// assert_eq!(a.overlaps(&b), false); /// assert_eq!(b.overlaps(&a), true); /// ``` pub fn overlaps(&self, other: &Interval<T>) -> bool { self.end >= other.start } /// Merges two intervals returning a new `Interval`. /// /// The merged `Interval` range includes the union of ranges from each /// `Interval`. /// /// # Examples /// /// ```rust /// let a = Interval::new(1, 3).unwrap(); /// let b = Interval::new(3, 5).unwrap(); /// let c = a.merge(&b).unwrap(); /// assert_eq!(c.start, 1); /// assert_eq!(c.end, 5); /// ``` pub fn merge(&self, other: &Self) -> Result<Self, IntervalError> { if self.overlaps(other) { Ok(Self { start: self.start, end: other.end, }) } else { Err(IntervalError::NonOverlappingInterval) } } } use std::fmt; impl<T: fmt::Display> fmt::Display for Interval<T> { fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> fmt::Result { write!(f, "[{}, {}]", self.start, self.end) } } use std::cmp::Ordering; impl<T: PartialEq + PartialOrd> PartialOrd for Interval<T> { fn partial_cmp(&self, other: &Self) -> Option<Ordering> { if self == other { Some(Ordering::Equal) } else if self.end < other.start { Some(Ordering::Less) } else if self.start > other.end { Some(Ordering::Greater) } else { None // Intervals overlap } } } }
Missing Error Output
File not found: bad_file.txt
The error message for the bad file doesn't appear in the playground output because standard error output isn't captured.
Our grep program is now multithreaded and processes all the input files in parallel! Let's walk through the code changes.
mock_disk
The mock_disk
HashMap
simulates disk access to check if a file exists and
is accessible. The filename serves as the key, while the file's contents are the
value. This approach aids in testing and development, but its use here is solely
for Rust playground compatibility. To ensure the creation of multiple threads, I
added a new poem written by yours truly
.
thread::scope
- Iterate over all files specified on the command line using the
map
iterator adapter, which returns aResult
. TheOk
variant holds the filename if valid, while theError
holds the error for an invalid filename. - The
map_ok
iterator adapter processes eachResult
, calling the provided closure on anyOk
values, allowing us to ignore any invalid filenames. The providedScope
(s
) spawns one thread per file for processing. The closure returns aResult
:Err
with an error message in aGrepFailure
struct if processing fails, orOk
with aGrepSuccess
struct containing intervals and lines from the input file if successful. - Use
collect
to create a vector (Vec
) of results from each file iteration, binding it tohandles
. - Finally, iterate over the elements in the
handles
vector using a for loop. Print any errors to standard error, and pass successful pattern matching results to theprint_results
function for output to standard output.
find_matching_lines
Since each thread needs access to the regex
object, the value is borrowed
instead of moved.
Summary
- Ownership and type systems are powerful tools for managing memory safety and concurrency issues. By leveraging ownership and type checking, many concurrency errors in Rust are caught at compile time rather than at runtime.
- Unlike non-scoped threads, scoped threads can borrow non-
'static
data because the scope ensures all threads are joined before it ends.
Conclusion
Congratulations on reaching the end of the course! I truly hope you found this material helpful as you continue your journey as a Rust developer. Hopefully, you've gained a solid understanding of some foundational concepts in Rust that many beginners find challenging. What we've covered here is just the beginning. There are many more topics to explore! Stay curious and enjoy the journey. Rust is the future!
Here are some resources used throughout this course, listed alphabetically:
- The Cargo Book
- Compiler Error Code Index
- Learn Rust
- Office Rust Site
- Rust By Example
- Rust Playground
- The Rust Programming Language
- The Rust Standard Library
- The Rustc Book
- The Rustdoc Book
A Note About Copilot
The content in this book is the author's own words. However, some of the content was improved with the assistance of Copilot to enhance clarity and maintain a consistent writing style throughout the book.
If you find that anything appears to be copied from another source without proper accreditation, please start a discussion on GitHub.
Final Grep
As we reach the end of this journey, I am pleased to present the final version
of the Grep program, ready for production use. To compile a release version, be
sure to use cargo build --release
. I hope you have enjoyed this journey as
much as I have enjoyed guiding you through it. Thank you very much for
completing the course. Have fun on your Rust journey!
use clap::Parser;
use interval::{Interval, IntervalError};
use itertools::Itertools;
use regex::Regex;
use std::fs::File;
use std::io::Read;
use std::io::{BufRead, BufReader};
use std::path::PathBuf;
use std::process::exit;
use std::thread;
fn find_matching_lines(lines: &[String], regex: &Regex) -> Vec<usize> {
lines
.iter()
.enumerate()
.filter_map(|(i, line)| match regex.is_match(line) {
true => Some(i),
false => None,
})
.collect() // turns anything iterable into a collection
}
fn create_intervals(
lines: Vec<usize>,
before_context: usize,
after_context: usize,
) -> Result<Vec<Interval<usize>>, IntervalError> {
lines
.iter()
.map(|line| {
let start = line.saturating_sub(before_context);
let end = line.saturating_add(after_context);
Interval::new(start, end)
})
.collect()
}
fn merge_intervals(intervals: Vec<Interval<usize>>) -> Vec<Interval<usize>> {
// merge overlapping intervals
intervals
.into_iter()
.coalesce(|p, c| p.merge(&c).map_err(|_| (p, c)))
.collect()
}
fn print_results(
intervals: Vec<Interval<usize>>,
lines: Vec<String>,
line_number: bool,
) {
for interval in intervals {
for (line_no, line) in lines
.iter()
.enumerate()
.take(interval.end + 1)
.skip(interval.start)
{
if line_number {
print!("{}: ", line_no + 1);
}
println!("{}", line);
}
}
}
fn read_file(file: impl Read) -> Vec<String> {
BufReader::new(file).lines().map_while(Result::ok).collect()
}
#[derive(Parser)]
#[command(version, about, long_about = None)]
struct Cli {
/// Prefix each line of output with the 1-based line number within its
/// input file.
#[arg(short, long, default_value_t = false)]
line_number: bool,
/// Print num lines of trailing context before matching lines.
#[arg(short, long, default_value_t = 0, value_name = "num")]
before_context: u8,
/// Print num lines of trailing context after matching lines.
#[arg(short, long, default_value_t = 0, value_name = "num")]
after_context: u8,
/// The regular expression to match.
#[arg(required = true)]
pattern: String,
/// List of files to search.
#[arg(required = true)]
files: Vec<PathBuf>,
}
// Result from a thread
struct GrepSuccess {
intervals: Vec<Interval<usize>>,
lines: Vec<String>,
}
// Result from a failed thread
struct GrepFailure {
error: String,
}
fn main() {
let cli = Cli::parse();
// get values from clap
let pattern = cli.pattern;
let line_number = cli.line_number;
let before_context = cli.before_context as usize;
let after_context = cli.after_context as usize;
let files = cli.files;
// compile the regular expression
let regex = match Regex::new(&pattern) {
Ok(re) => re, // bind re to regex
Err(e) => {
eprintln!("{e}"); // write to standard error
exit(1);
}
};
thread::scope(|s| {
let handles: Vec<_> = files
.iter()
.map(|file| {
let filename = match file.to_str() {
Some(filename) => filename,
None => {
return Err(GrepFailure {
error: format!(
"Invalid filename: {}",
file.display()
),
})
}
};
// attempt to open the file
File::open(filename).map_err(|e| GrepFailure {
error: format!("Error opening {filename}: {e}"),
})
})
.map_ok(|file| {
// only spawn a thread for accessible file
s.spawn(|| {
let lines = read_file(file);
// store the 0-based line number for any matched line
let match_lines = find_matching_lines(&lines, ®ex);
// create intervals of the form [a,b] with the before/after context
let intervals = match create_intervals(
match_lines,
before_context,
after_context,
) {
Ok(intervals) => intervals,
Err(_) => return Err(GrepFailure {
error: String::from(
"An error occurred while creating intervals",
),
}),
};
// merge overlapping intervals
let intervals = merge_intervals(intervals);
Ok(GrepSuccess { intervals, lines })
})
})
.collect();
// process all the results
for handle in handles {
let result = match handle {
Ok(scoped_join_handle) => scoped_join_handle,
Err(e) => {
eprintln!("{}", e.error);
continue;
}
};
if let Ok(result) = result.join() {
match result {
Ok(result) => print_results(
result.intervals,
result.lines,
line_number,
),
Err(e) => eprintln!("{}", e.error),
};
};
}
});
}
pub mod interval {
/// A list specifying general categories of Interval errors.
#[derive(Debug)]
pub enum IntervalError {
/// Start is not less than or equal to end
StartEndRangeInvalid,
/// Two intervals to be merged do not overlap
NonOverlappingInterval,
}
/// A closed-interval [`start`, `end`] type used for representing a range of
/// values between `start` and `end` inclusively.
///
/// # Examples
///
/// You can create an `Interval` using `new`.
///
/// ```rust
/// let interval = Interval::new(1, 10).unwrap();
/// assert_eq!(interval.start, 1);
/// assert_eq!(interval.end, 10);
/// ```
#[derive(Debug, PartialEq)]
pub struct Interval<T> {
pub start: T,
pub end: T,
}
impl<T: Copy + PartialOrd> Interval<T> {
/// Creates a new `Interval` set to `start` and `end`.
///
/// # Examples
///
/// ```rust
/// let interval = Interval::new(1, 10).unwrap();
/// assert_eq!(interval.start, 1);
/// assert_eq!(interval.end, 10);
/// ```
pub fn new(start: T, end: T) -> Result<Self, IntervalError> {
if start <= end {
Ok(Self { start, end })
} else {
Err(IntervalError::StartEndRangeInvalid)
}
}
/// Checks if two intervals overlap. Overlapping intervals have at least
/// one point in common.
///
/// # Examples
///
/// ```rust
/// let a = Interval::new(1, 3).unwrap();
/// let b = Interval::new(3, 5).unwrap();
/// assert_eq!(a.overlaps(&b), true);
/// assert_eq!(b.overlaps(&a), true);
/// ```
///
/// ```rust
/// let a = Interval::new(1, 5).unwrap();
/// let b = Interval::new(2, 4).unwrap();
/// assert_eq!(a.overlaps(&b), true);
/// assert_eq!(b.overlaps(&a), true);
/// ```
///
/// ```rust
/// let a = Interval::new(1, 3).unwrap();
/// let b = Interval::new(4, 6).unwrap();
/// assert_eq!(a.overlaps(&b), false);
/// assert_eq!(b.overlaps(&a), true);
/// ```
pub fn overlaps(&self, other: &Interval<T>) -> bool {
self.end >= other.start
}
/// Merges two intervals returning a new `Interval`.
///
/// The merged `Interval` range includes the union of ranges from each
/// `Interval`.
///
/// # Examples
///
/// ```rust
/// let a = Interval::new(1, 3).unwrap();
/// let b = Interval::new(3, 5).unwrap();
/// let c = a.merge(&b).unwrap();
/// assert_eq!(c.start, 1);
/// assert_eq!(c.end, 5);
/// ```
pub fn merge(&self, other: &Self) -> Result<Self, IntervalError> {
if self.overlaps(other) {
Ok(Self {
start: self.start,
end: other.end,
})
} else {
Err(IntervalError::NonOverlappingInterval)
}
}
}
use std::fmt;
impl<T: fmt::Display> fmt::Display for Interval<T> {
fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> fmt::Result {
write!(f, "[{}, {}]", self.start, self.end)
}
}
use std::cmp::Ordering;
impl<T: PartialEq + PartialOrd> PartialOrd for Interval<T> {
fn partial_cmp(&self, other: &Self) -> Option<Ordering> {
if self == other {
Some(Ordering::Equal)
} else if self.end < other.start {
Some(Ordering::Less)
} else if self.start > other.end {
Some(Ordering::Greater)
} else {
None // Intervals overlap
}
}
}
}
You can find the final version of the Grep program we built together during this course here.