Welcome to Fast Track to Rust

This course is designed to introduce you to the Rust programming language by building an actual program. We'll be developing a grep-like program with a minimal subset of the features found in the GNU 1 grep 2 utility. This means we'll start with what we know and iterate upon the design as we learn more about the language.

The official URL for this online course is: https://freddiehaddad.github.io/fast-track-to-rust

The source code can be found at: https://github.com/freddiehaddad/fast-track-to-rust

The goal is to teach you Rust, so it's assumed you don't know anything about the language. However, it's also assumed that you have experience programming in other languages like C++, and are familiar with multithreading, data structures, and program memory organization (i.e., heap, stack, etc.).

This course moves relatively quickly. Most topics are covered enough to provide a fundamental understanding. As you progress, references to the official Rust documentation are provided for further exploration.

Enough chit chat, let's get started!

Fast Track to Rust Logo


1

GNU is an collection of free software commonly used as an operating system. The family of operating systems, known as Linux, are built from them.

2

Grep searches one or more input files for lines containing a match to a specified pattern. By default, Grep outputs the matching lines.

Shortcuts

This course, developed with mdBook, features efficient keyboard shortcuts for streamlined navigation.

Keyboard Shortcuts

  • Arrow-Left: Navigate to the previous page.
  • Arrow-Right: Navigate to the next page.
  • s: Activate the search bar.
  • Ctrl+Enter: Execute the code in the focused code block.

Installation

Most exercises in this book can be completed in your web browser, with a few exceptions. However, it is recommended to install the Rust development toolchain on your system. Once installed, you'll have everything you need to build and run Rust code locally.

Installation is accomplished using rustup, the Rust toolchain installer. Once completed, you'll have rustup, cargo, and rustc.

Follow the instructions at https://rustup.rs/ to complete the installation.

Configuring your IDE

It is recommended (but not required) to configure your editor to work with Rust. With Cargo installed, most editors, such as VS Code, will automatically install rust-analyzer, the LSP implementation for the Rust programming language.

Rust Playground

In addition to running and editing code in the browser, you might find the Rust Playground helpful. It's a web interface for running Rust code.

fn main() {
    println!("Welcome to Fast Track to Rust!");
}

Exercise

  • Hover the mouse over the code block and click the arrow button to run the code.
  • Modify the "Welcome to Fast Track to Rust!" string and re-run the code.
  • Revert your changes by hovering your mouse over the code block and clicking the Undo changes button.
  • Run the hello world program in the playground.

Using Cargo

In this chapter we'll cover using Cargo to create a package, compile the boilerplate code, and run the program. The package we create will be used for the remainder of the book to build our Grep program.

Package Creation

Let's make sure all the prerequisites are in place. You should have followed the installation instructions to prepare your development environment. After those steps are complete, you should be able to run the following commands:

$ rustc --version
rustc 1.81.0 (eeb90cda1 2024-09-04)

$ cargo --version
cargo 1.81.0 (2dbb1af80 2024-08-20)

The version numbers might be different, but the output should look relatively similar.

If the above commands worked, you're ready to go!

  1. Use cargo new grep to create a new Rust package named grep for our project:

    $ cargo new grep
    ∙
        Creating binary (application) `grep` package
    note: see more `Cargo.toml` keys and their definitions at
    https://doc.rust-lang.org/cargo/reference/manifest.html
    
  2. Navigate to the grep directory and use cargo run to build and run the program:

    $ cd grep
    $ cargo run
       Compiling grep v0.1.0 (S:\projects\git\grep)
        Finished `dev` profile [unoptimized + debuginfo] target(s) in 0.43s
         Running `target\debug\grep.exe`
    Hello, world!
    
  3. Explore some of the other actions you can perform with cargo using cargo --help.

    • cargo build - compile the current package
    • cargo build --release - build the project in release mode (with optimizations)
    • cargo check - analyze the current package and report errors (no object files generated)
    • cargo test - run unit tests
    • and more...

Summary

In this section, we:

  • Created a package.
  • Compiled and ran the boilerplate code.
  • Learned a bit about Cargo.

Next

Let's see what cargo new actually did!

Boilerplate

In the previous section, we used cargo new to create a Rust package, which is the most common way to create a package. Running this command generated some boilerplate code. When we used cargo run, a target directory was created and the Rust compiler rustc was invoked, generating compiler-related files and an executable binary within the target directory.

Let's clean everything up with cargo clean and return to the boilerplate. The cargo clean command removes the target directory where all the generated code resides. After running cargo clean, we're left with the boilerplate.

$ cargo clean
     Removed 23 files, 2.9MiB total
.
├─ Cargo.lock
├─ Cargo.toml
└─ src
   └─ main.rs

For now, we'll focus on main.rs and discuss Cargo.lock and Cargo.toml later.

main.rs

main.rs is a Rust source file that gets created as part of a package. It includes a function named main that serves as the program entry point.

The contents of main.rs:

fn main() {
    println!("Hello, world!");
}

Some of the syntax should look familiar:

  • Blocks of code are surrounded by curly braces {}.
  • Statements end with a semicolon ;.
  • Function parameters are enclosed in parentheses ().
  • String literals are enclosed in double quotes "".

What may not be familiar:

  • fn is the keyword used to declare a function.
  • println! is actually a macro, denoted by the ! suffix.

Summary

In this section, we:

  • Explored the boilerplate code generated by cargo new.
  • Reviewed some basic Rust syntax.

Next

Let's dive into building our grep program. We'll start by setting up the basic structure and then gradually add more functionality.


1

String literals are actually string slices with type 'static &str.

2

Macros are custom definitions that extend Rust's functionality and syntax. When invoked, macros are expanded at compile time and replaced with their definitions.

Language Basics

Rust includes the primitive types,1 statements, and expressions2 you're already familiar with. However, it also incorporates some powerful concepts from functional languages that we will explore throughout this course. To begin with our grep program, we'll keep things simple and gradually refactor the code to introduce more of what Rust has to offer. Let's start with variables.


1

Details about the built-in types in Rust can be found in the official Rust documentation.

2

Details about language statements and expressions can be found in the official Rust documentation.

Variables

Rust is a statically typed language, meaning that all variables have a type, and this type must be known at compile time. The let keyword is used to declare variables, or more precisely, for variable binding.

The anatomy of a let statement1 2:

let identifier: type = expression;

Type Inference

Since y is passed as an argument to the print_value function, which requires a signed 32-bit integer, the compiler infers its type. Therefore, the explicit type declaration for x can be omitted.

fn print_value(value: i32) {
    println!("{value}");
}

fn main() {
    let x: i32 = 10;
    let y = 20;

    print_value(x);
    print_value(y);
}

Grep Variables

To begin with our grep program, we'll avoid handling user input via command line arguments for now. Instead, we'll hard code some strings and perform some simple grepping. Let's use the famous poem My Shadow by the poet Robert Louis Stevenson as our input.

fn main() {
    let pattern = "him";
    let poem = "I have a little shadow that goes in and out with me,
                And what can be the use of him is more than I can see.
                He is very, very like me from the heels up to the head;
                And I see him jump before me, when I jump into my bed.

                The funniest thing about him is the way he likes to grow -
                Not at all like proper children, which is always very slow;
                For he sometimes shoots up taller like an india-rubber ball,
                And he sometimes gets so little that there’s none of him at all.";
}

The next step is to search the poem for occurrences of the pattern and print the results. To achieve this, we'll need to learn a bit about control flow.

Summary

In this section, we:

  • Learned how to declare variables.
  • Explored Rust's type inference.

Next

Onward to control flow.


1

let statements support more advanced features that are not being covered yet.

2

In many cases, the compiler can infer the type allowing you to omit it.

Control Flow

Some of the common methods in Rust for controlling the flow of a program include:

These methods help you manage the execution flow and make your code more efficient and readable.

We will use a for loop to iterate over all the lines in the poem, checking each for the substring specified by pattern. Each line will be evaluated using a match expression.

fn main() {
    let pattern = "him";
    let poem = "I have a little shadow that goes in and out with me,
                And what can be the use of him is more than I can see.
                He is very, very like me from the heels up to the head;
                And I see him jump before me, when I jump into my bed.

                The funniest thing about him is the way he likes to grow -
                Not at all like proper children, which is always very slow;
                For he sometimes shoots up taller like an india-rubber ball,
                And he sometimes gets so little that there’s none of him at all.";

    for line in poem.lines() {
        match line.contains(pattern) {
            true => println!("{line}"),
            false => (),
        }
    }
}

for Loops

The syntax of a for loop should look familiar. However, something interesting is happening on that line. Notice the method call poem.lines(). You might have thought poem was a string literal. How can it have methods? Well, as you guessed and as was mentioned earlier, string slices are more than just string literals. We'll explore them in more detail, so keep that in mind.

The purpose of the loop is quite clear: it iterates over each line in the poem.

match Expressions

You might have already figured it out, but the match expression is similar to an if expression in that it introduces a branch in the code execution. However, it has a very powerful feature: when used, the compiler ensures that all possible results of the scrutinee are covered. This aspect of match expressions guarantees that all cases are handled.

Let's ensure we fully understand this. In the code snippet, comment out line 16 by prefixing it with //, and then run the code.

Friendly compiler errors

Take a moment to appreciate just how helpful this compiler error is. The Rust compiler is truly your friend!

   Compiling playground v0.0.1 (/playground)
error[E0004]: non-exhaustive patterns: `false` not covered
  --> src/main.rs:14:15
   |
14 |         match line.contains(pattern) {
   |               ^^^^^^^^^^^^^^^^^^^^^^ pattern `false` not covered
   |
   = note: the matched value is of type `bool`
help: ensure that all possible cases are being handled by adding a match arm with a
   |  wildcard pattern or an explicit pattern as shown
   |
15 ~             true => println!("{line}"),
16 ~             false => todo!(),
   |

For more information about this error, try `rustc --explain E0004`.
error: could not compile `playground` (bin "playground") due to 1 previous error

The compiler error message is informing you that:

  • The contains method returns a boolean value.
  • The false case is not covered.
  • The Rust compiler can provide more information via rustc --explain E0004.

Let's see what the Rust compiler has to say:

$ rustc --explain E0004
This error indicates that the compiler cannot guarantee a matching pattern for
one or more possible inputs to a match expression. Guaranteed matches are
required in order to assign values to match expressions, or alternatively,
determine the flow of execution.

Erroneous code example:

enum Terminator {
    HastaLaVistaBaby,
    TalkToMyHand,
}

let x = Terminator::HastaLaVistaBaby;

match x { // error: non-exhaustive patterns: `HastaLaVistaBaby` not covered
    Terminator::TalkToMyHand => {}
}

If you encounter this error you must alter your patterns so that every possible
value of the input type is matched. For types with a small number of variants
(like enums) you should probably cover all cases explicitly. Alternatively, the
underscore `_` wildcard pattern can be added after all other patterns to match
"anything else". Example:

enum Terminator {
    HastaLaVistaBaby,
    TalkToMyHand,
}

let x = Terminator::HastaLaVistaBaby;

match x {
    Terminator::TalkToMyHand => {}
    Terminator::HastaLaVistaBaby => {}
}

// or:

match x {
    Terminator::TalkToMyHand => {}
    _ => {}
}

match arms have the following structure:

match VALUE {
    PATTERN = EXPRESSION,
    PATTERN = EXPRESSION,
    PATTERN = EXPRESSION,
}

() Unit Type

In situations where we don't want to perform any action, such as in the false arm, we can use the empty unit type (). We'll explore this more as we progress through the course.

Summary

In this section, we:

  • Explored for loop expressions.
  • Examined match expressions.
  • Learned how the Rust compiler provides helpful error messages.
  • Were introduced to the empty unit type ().

Exercise

  • Replace the match expression with an if expression.
Solution
fn main() {
    let pattern = "him";
    let poem = "I have a little shadow that goes in and out with me,
                And what can be the use of him is more than I can see.
                He is very, very like me from the heels up to the head;
                And I see him jump before me, when I jump into my bed.

                The funniest thing about him is the way he likes to grow -
                Not at all like proper children, which is always very slow;
                For he sometimes shoots up taller like an india-rubber ball,
                And he sometimes gets so little that there’s none of him at all.";

    for line in poem.lines() {
        if line.contains(pattern) {
            println!("{line}");
        }
    }
}

Next

Take a break if you need, and then let's continue!

Types

Let's take a moment to discuss the different types we've encountered:

  • The empty unit type () is used when a function or expression does not return a value. It's essentially a placeholder indicating the absence of a meaningful value.
  • String slices &str are references to a portion of a string. They allow you to work with parts of a string without needing to own the entire string, making them efficient and flexible.
  • Signed 32-bit integers i32 are integers that can store both positive and negative values within a specific range. They are commonly used for numerical operations where the size of the number is known and fits within the 32-bit limit.

Understanding these types will help us write more efficient and effective Rust code as we continue to build our grep program.

String Slice str

The most interesting type is probably str. str is a primitive string type in Rust, and it's usually seen in its borrowed1 form &str.

You can think of a string slice as an object with two components:

fieldvalue
ptraddress
lenunsigned integer

Imagine a block of memory starting at address 0x10 containing the bytes for the string literal "rust". This block of memory would store the individual bytes representing each character in the string. In this case, the memory would contain the bytes for 'r', 'u', 's', and 't', sequentially stored starting from address 0x10.

This visualization helps in understanding how string literals are stored and accessed in memory:

0x100x110x120x13
rust

When we bind the string literal "rust" to the variable language:

let language = "rust";

The memory would look like:

fieldvalue
ptr0x10
len4

Primitive Types

Here are some additional primitive types in Rust:

typedescription
arrayA fixed-size array, denoted [T; N], for the element type, T, and the non-negative compile-time constant size, N.
boolThe boolean type.
charA character type.
f32A 32-bit floating-point type (specifically, the “binary32” type defined in IEEE 754-2008).
f64A 64-bit floating-point type (specifically, the “binary64” type defined in IEEE 754-2008).
i8The 8-bit signed integer type.
i16The 16-bit signed integer type.
i32The 32-bit signed integer type.
i64The 64-bit signed integer type.
i128The 128-bit signed integer type.
isizeThe pointer-sized signed integer type.
strString slices.
u8The 8-bit unsigned integer type.
u16The 16-bit unsigned integer type.
u32The 32-bit unsigned integer type.
u64The 64-bit unsigned integer type.
u128The 128-bit unsigned integer type.
usizeThe pointer-sized unsigned integer type.

User-defined Types

In addition to the primitive types, Rust supports user-defined types. We'll cover them in more detail as we build our grep program. But to satisfy your curiosity, some of the user-defined types include:

typedescription
structA heterogeneous product of other types, called the fields of the type.2
enumAn enumerated type is a nominal, heterogeneous disjoint union type, denoted by the name of an enum item.3
unionA union type is a nominal, heterogeneous C-like union, denoted by the name of a union item.

These user-defined types allow for more complex and expressive code, enabling you to model real-world concepts more effectively. We'll explore these in greater depth as we progress through our project.


1

We'll explore what borrowing means during the course.

2

struct types are analogous to struct types in C.

3

The enum type is analogous to a data constructor declaration in Haskell, or a pick ADT in Limbo.

Programming Concepts

It's time to start adding some functionality to our grep program. In the next few sections, we'll cover topics such as variables and mutability, iterators, optional values, and closures. These topics are quite extensive, and covering them in full detail is not the objective. Instead, we'll lay the foundation and share core fundamentals to help you gain enough understanding to leverage the documentation successfully. Using the provided links is highly recommended for full details and comprehension.

Grep Features

Grep is a fairly large program with numerous features. In this course, we won't be implementing all of them. Instead, we'll concentrate on a subset of features (with slight variations in behavior) that will help you learn Rust. The features we'll be implementing are:

command line argumentdescription
-n, --line-numberPrefix each line of output with the 1-based line number within its input file.
-A num, --after-context=numPrint num lines of trailing context after matching lines.
-B num, --before-context=numPrint num lines of leading context before matching lines.

Printing Line Numbers

The first feature we're going to add to our grep program is the ability to print the line number of any lines that match our pattern. Coming from a language like C/C++, you may be inclined to handle this with a variable that we increment with each loop iteration. While this would work, it introduces some of the issues that plague languages like C and C++ (e.g., out-of-bounds indexing, integer overflow, and more). We'll demonstrate this and follow it up with a more idiomatic approach. Let's get going!

Mutability

Two lines of code were added to our grep program on lines 13 and 18. The variable i will get incremented with each loop iteration and we'll use it to print the line number. Go ahead and run the code.

fn main() {
    let pattern = "him";
    let poem = "I have a little shadow that goes in and out with me,
                And what can be the use of him is more than I can see.
                He is very, very like me from the heels up to the head;
                And I see him jump before me, when I jump into my bed.

                The funniest thing about him is the way he likes to grow -
                Not at all like proper children, which is always very slow;
                For he sometimes shoots up taller like an india-rubber ball,
                And he sometimes gets so little that there’s none of him at all.";

    let i = 1;
    for line in poem.lines() {
        if line.contains(pattern) {
            println!("{i}: {line}");
        }
        i += 1;
    }
}

Uh-oh! Something went wrong. Luckily, the compiler (rustc) emits very helpful error messages.1 The variable i is immutable!

Variables and Mutability

Variables in Rust are immutable by default. This design choice helps you write code that leverages Rust's memory safety and fearless concurrency features. For more details, you can refer to the section on variables and mutability in the Rust documentation. While the language encourages immutability, there are situations where mutating values is necessary. In such cases, we use the mut keyword.

let mut i = 1;

Go ahead and add the mut keyword to line 13 and run the program again.

We're going to explore cases where immutability is needed and appropriate. However, let's see how iterators can be used to avoid the need for a mutable counter.


1

A lot of effort has been put into making rustc have great error messages. To help understand how to interpret them, refer to the diagnostic structure in the documentation.

Iterators

Iterators are pervasive in Rust, and delving into them in full detail would necessitate renaming this course to Slow Track to Rust. We'll cover the basics and direct you to the documentation for all the intricate details about iterators.

The Control Flow section explained that the call to lines() in the for loop returns an iterator. We say that the for loop consumes the iterator. However, in Rust, we can do much more than just consume iterators!

Iterator Adaptors

New iterators can be created using iterator adaptors. These adaptors generate new iterators that modify some aspect of the original. Let's use one to print line numbers.

The enumerate function creates an iterator that yields the current iteration count along with the value from the previous iterator, as tuples1 in the form (index, value). In the loop, the type for i is usize, and the type for line is &str.

fn main() {
    let pattern = "him";
    let poem = "I have a little shadow that goes in and out with me,
                And what can be the use of him is more than I can see.
                He is very, very like me from the heels up to the head;
                And I see him jump before me, when I jump into my bed.

                The funniest thing about him is the way he likes to grow -
                Not at all like proper children, which is always very slow;
                For he sometimes shoots up taller like an india-rubber ball,
                And he sometimes gets so little that there’s none of him at all.";

    for (i, line) in poem.lines().enumerate() {
        if line.contains(pattern) {
            println!("{}: {line}", i + 1);
        }
    }
}

Advanced Iterators

We can take it a step further by using the filter_map iterator adapter to return an iterator that includes only the items we want!

fn main() {
    let pattern = "him";
    let poem = "I have a little shadow that goes in and out with me,
                And what can be the use of him is more than I can see.
                He is very, very like me from the heels up to the head;
                And I see him jump before me, when I jump into my bed.

                The funniest thing about him is the way he likes to grow -
                Not at all like proper children, which is always very slow;
                For he sometimes shoots up taller like an india-rubber ball,
                And he sometimes gets so little that there’s none of him at all.";

    for (line_no, line) in
        poem.lines()
            .enumerate()
            .filter_map(|(i, line)| match line.contains(pattern) {
                true => Some((i + 1, line)),
                false => None,
            })
    {
        println!("{line_no}: {line}");
    }
}

Whoa! What just happened? We're on the fast track to learning Rust, so we're picking up the pace! Let's break this down because this code snippet needs some unpacking.


1

A tuple is a collection of values of different types and is constructed using parentheses ().

Option

The Option type represents an optional value that can either be Some with a value or None. They are prevalent in Rust and serve various purposes. Options are often used with pattern matching (i.e., match expression). The documentation for Option provides extensive details on its usage.

Some(T) and None are two variants of the Option<T> enum.1

pub enum Option<T> {
    None,
    Some(T),
}

We haven't discussed generics yet. For now, just note that the T in the enum represents any type.

Returning to our Loop

In the code snippet, we check if the pattern is a substring of the current line. If it is, we return a tuple with the line number (i+1)2 and the line, enclosed within the Some variant of the Option enum. If not, we return the None variant.

    for (line_no, line) in
        poem.lines()
            .enumerate()
            .filter_map(|(i, line)| match line.contains(pattern) {
                true => Some((i + 1, line)),
                false => None,
            })
    {
        println!("{line_no}: {line}");
    }

1

enums in Rust are similar to those of other compiled languages like C, but have important differences that make them considerably more powerful.

2

Recall that grep uses 1-based line numbering. enumerate begins at 0.

Closures

In Rust, closures are functions that can capture their surrounding environment, clearly express the programmer's intent, and achieve this with low-level performance. They are quite common in functional programming languages.

Closures in Rust have some distinct characteristics:

  • Parameters are enclosed with || instead of ().
  • Curly braces {} around the function body are optional for single-line expressions.

Returning to our Loop

fn main() {
    let pattern = "him";
    let poem = "I have a little shadow that goes in and out with me,
                And what can be the use of him is more than I can see.
                He is very, very like me from the heels up to the head;
                And I see him jump before me, when I jump into my bed.

                The funniest thing about him is the way he likes to grow -
                Not at all like proper children, which is always very slow;
                For he sometimes shoots up taller like an india-rubber ball,
                And he sometimes gets so little that there’s none of him at all.";

    for (line_no, line) in
        poem.lines()
            .enumerate()
            .filter_map(|(i, line)| match line.contains(pattern) {
                true => Some((i + 1, line)),
                false => None,
            })
    {
        println!("{line_no}: {line}");
    }
}

In the code snippet, the closure is:

|(i, line)| match line.contains(pattern) {
    true => Some((i + 1, line)),
    false => None,
}

filter_map

The filter_map 1 function creates an iterator that both filters and maps. It calls a closure and filters out the results that are None, leaving us with an iterator of tuples containing the line number and line for all matches of our pattern. When the for loop consumes the iterator, we only need to print the results, as we are left with only the lines that contained the pattern!

Recall that enumerate returns an iterator of tuples. filter_map receives the tuple as the argument to the closure.

Exercise

  • Replace filter_map with map and filter to achieve the same output. Notice how filter_map results in more efficient and concise code.
Solution
fn main() {
    let pattern = "him";
    let poem = "I have a little shadow that goes in and out with me,
                And what can be the use of him is more than I can see.
                He is very, very like me from the heels up to the head;
                And I see him jump before me, when I jump into my bed.

                The funniest thing about him is the way he likes to grow -
                Not at all like proper children, which is always very slow;
                For he sometimes shoots up taller like an india-rubber ball,
                And he sometimes gets so little that there’s none of him at all.";

    for (line_no, line) in poem
        .lines()
        .enumerate()
        .map(|(i, line)| (i + 1, line))
        .filter(|(_line_no, line)| line.contains(pattern))
    {
        println!("{line_no}: {line}");
    }
}

The underscore _ prefix in _line_no is how we tell the Rust compiler that we are intentionally ignoring the first argument. Without it the compiler will complain.

Summary

In this section we:

  • introduced the Option enum
  • learned about closures
  • worked with iterators
  • learned about the mut keyword

Next

Onward to collections!


1

Separate filter and map iterators also exist.

Collections

Rust's standard library features some highly powerful and useful data structures, referred to as collections. While we'll explore one of them in this course, it's definitely worth spending time reviewing the entire collections library.

The collections are commonly grouped into four categories:

Grep

Currently, our grep program only outputs the line number and the matching line for each pattern match. It doesn't yet support printing lines before or after the match. So far, we haven't had a straightforward way to implement this functionality, and there are several challenges we need to address.

Handling Context

The --before-context option requires us to print preceding lines when we encounter a match. This necessitates a mechanism to reference those lines. However, we should only print each line once to avoid duplicating lines when consecutive lines within the context contain matches.

Consider the poem from our code:

I have a little shadow that goes in and out with me,
And what can be the use of him is more than I can see.
He is very, very like me from the heels up to the head;
And I see him jump before me, when I jump into my bed.

The funniest thing about him is the way he likes to grow -
Not at all like proper children, which is always very slow;
For he sometimes shoots up taller like an india-rubber ball,
And he sometimes gets so little that there’s none of him at all.

If the pattern is "him" and the number of lines to print before each match is set to 2 (--before-context=2), without a method to track printed lines, we would end up with the following output:

1: I have a little shadow that goes in and out with me,
2: And what can be the use of him is more than I can see.
2: And what can be the use of him is more than I can see.
3: He is very, very like me from the heels up to the head;
4: And I see him jump before me, when I jump into my bed.
5:
4: And I see him jump before me, when I jump into my bed.
5:
6: The funniest thing about him is the way he likes to grow -
7: Not at all like proper children, which is always very slow;
8: For he sometimes shoots up taller like an india-rubber ball,
9: And he sometimes gets so little that there’s none of him at all.

This is not the behavior we want from our grep program. Instead, we aim for the following output:

1: I have a little shadow that goes in and out with me,
2: And what can be the use of him is more than I can see.
3: He is very, very like me from the heels up to the head;
4: And I see him jump before me, when I jump into my bed.
5:
6: The funniest thing about him is the way he likes to grow -
7: Not at all like proper children, which is always very slow;
8: For he sometimes shoots up taller like an india-rubber ball,
9: And he sometimes gets so little that there’s none of him at all.

A similar issue arises with --after-context. Our solution involves using a vector and tuples (since we are now familiar with them) to create intervals over the ranges of lines we want to print.

Accessing Lines from the Input

The context problem indicates that we need a method to reference a range of lines around each match. We'll address this by storing the lines in a collection from the Rust standard library called a vector Vec<T>.

Tracking the Lines Surrounding Each Match

Since we are familiar with them, we can use a tuple to represent the starting and ending intervals around each match. We'll need to track these intervals and merge overlapping ones to avoid printing the same line multiple times. Once again, a vector will be useful!

Next

Let's start coding!

Vec

A Vec, short for vector, is a contiguous growable array type, written as Vec<T>. Although we haven't covered user-defined types yet, Vec is implemented using a struct.1 It supports many familiar operations like push, pop, and len, but also includes some operations you might not be familiar with.

Storing our Lines in a Vector

Let's populate a vector with the input we've been using so far in our grep program. There are several ways to create a vector, and the documentation provides plenty of examples. We'll explore a few of them.

Using push

Someone with a C++ background might be inclined to achieve this using a variant of the following form:

Hidden code

To make reading the code easier, only the new parts are visible. You can view previously seen code by clicking the Show hidden lines button when hovering over the code block.

fn main() {
    let poem = "I have a little shadow that goes in and out with me,
                And what can be the use of him is more than I can see.
                He is very, very like me from the heels up to the head;
                And I see him jump before me, when I jump into my bed.

                The funniest thing about him is the way he likes to grow -
                Not at all like proper children, which is always very slow;
                For he sometimes shoots up taller like an india-rubber ball,
                And he sometimes gets so little that there’s none of him at all.";

    let mut lines = Vec::new(); // each call to push mutates the vector
    for line in poem.lines() {
        lines.push(line);
    }

    // format text for debugging purposes
    println!("{lines:?}");
}

We haven't discussed traits yet, so for now, just know that :? specifies formatting the output with the Debug2 trait. The vector type implements the fmt::Debug trait.

Using an Iterator

While using push is functionally correct, it introduces a mutable variable unnecessarily. An idiomatic solution would look like this:

use std::iter::FromIterator; // this line addresses a rust playground bug

fn main() {
    let poem = "I have a little shadow that goes in and out with me,
                And what can be the use of him is more than I can see.
                He is very, very like me from the heels up to the head;
                And I see him jump before me, when I jump into my bed.

                The funniest thing about him is the way he likes to grow -
                Not at all like proper children, which is always very slow;
                For he sometimes shoots up taller like an india-rubber ball,
                And he sometimes gets so little that there’s none of him at all.";

    // convert poem into lines
    let lines = Vec::from_iter(poem.lines());

    // format text for debugging purposes
    println!("{lines:?}");
}

Favoring immutability

The concept of being immutable by default is one of the many ways Rust encourages you to write code that leverages its safety features and supports easy concurrency.

Tracking all Matches

Now that we have a vector containing all the lines in our poem, let's create another vector to hold the line numbers where the pattern was found.

use std::iter::FromIterator; // this line addresses a rust playground bug

fn main() {
    let poem = "I have a little shadow that goes in and out with me,
                And what can be the use of him is more than I can see.
                He is very, very like me from the heels up to the head;
                And I see him jump before me, when I jump into my bed.

                The funniest thing about him is the way he likes to grow -
                Not at all like proper children, which is always very slow;
                For he sometimes shoots up taller like an india-rubber ball,
                And he sometimes gets so little that there’s none of him at all.";

    let pattern = "all";

    // convert the poem into lines
    let lines = Vec::from_iter(poem.lines());

    // store the 0-based line number for any matched line
    let match_lines: Vec<_> = lines // inferred type (_)
        .iter()
        .enumerate()
        .filter_map(|(i, line)| match line.contains(pattern) {
            true => Some(i),
            false => None,
        })
        .collect(); // turns anything iterable into a collection

    // format text for debugging purposes
    println!("{match_lines:?}");
}

Instead of using from_iter, you can also use collect.

The inferred type (_) in Vec<_> asks the compiler to infer the type if possible based on the surrounding context.2

Creating Intervals from Matched Lines

The map function invokes the closure once for each value in the vector, passing line_no (the line number) to the function. We use this to identify the lines before and after the match that we want to print. To handle potential out-of-bounds indexing, we use saturating_add and saturating_sub.

use std::iter::FromIterator; // this line addresses a rust playground bug

fn main() {
    let poem = "I have a little shadow that goes in and out with me,
                And what can be the use of him is more than I can see.
                He is very, very like me from the heels up to the head;
                And I see him jump before me, when I jump into my bed.

                The funniest thing about him is the way he likes to grow -
                Not at all like proper children, which is always very slow;
                For he sometimes shoots up taller like an india-rubber ball,
                And he sometimes gets so little that there’s none of him at all.";

    let pattern = "all";
    let before_context = 1;
    let after_context = 1;

    // convert the poem into lines
    let lines = Vec::from_iter(poem.lines());

    // store the 0-based line number for any matched line
    let match_lines: Vec<_> = lines
        .iter()
        .enumerate()
        .filter_map(|(i, line)| match line.contains(pattern) {
            true => Some(i),
            false => None,
        })
        .collect(); // turns anything iterable into a collection

    // create intervals of the form [a,b] with the before/after context
    let intervals: Vec<_> = match_lines
        .iter()
        .map(|line_no| {
            (
                // prevent underflow
                line_no.saturating_sub(before_context),
                // prevent overflow
                line_no.saturating_add(after_context),
            )
        })
        .collect();

    // format text for debugging purposes
    println!("{intervals:?}");
}

Merging Intervals

Merging overlapping intervals is straightforward because they are already sorted, and the ending value of the next interval must be greater than the current one. We could store the merged intervals in a new Vector, but that would be inefficient. Instead, we'll make our intervals vector mutable and take advantage of the dedup_by iterator adaptor to perform the merge.

use std::iter::FromIterator; // this line addresses a rust playground bug

fn main() {
    let poem = "I have a little shadow that goes in and out with me,
                And what can be the use of him is more than I can see.
                He is very, very like me from the heels up to the head;
                And I see him jump before me, when I jump into my bed.

                The funniest thing about him is the way he likes to grow -
                Not at all like proper children, which is always very slow;
                For he sometimes shoots up taller like an india-rubber ball,
                And he sometimes gets so little that there’s none of him at all.";

    let pattern = "all";
    let before_context = 1;
    let after_context = 1;

    // convert the poem into lines
    let lines = Vec::from_iter(poem.lines());

    // store the 0-based line number for any matched line
    let match_lines: Vec<_> = lines
        .iter()
        .enumerate()
        .filter_map(|(i, line)| match line.contains(pattern) {
            true => Some(i),
            false => None,
        })
        .collect(); // turns anything iterable into a collection

    // create intervals of the form [a,b] with the before/after context
    let mut intervals: Vec<_> = match_lines
        .iter()
        .map(|line| {
            (
                line.saturating_sub(before_context),
                line.saturating_add(after_context),
            )
        })
        .collect();

    // merge overlapping intervals
    intervals.dedup_by(|next, prev| {
        if prev.1 < next.0 {
            false
        } else {
            prev.1 = next.1;
            true
        }
    });

    // format text for debugging purposes
    println!("{intervals:?}");
}

The dedup_by iterator adaptor removes consecutive identical elements based on a provided closure. It starts by calling the closure with the first (prev) and second (next) elements in the collection. If the closure returns false, prev becomes next, next is advanced, and the closure is called with the new elements. If the closure returns true, only next is advanced. This behavior allows us to update prev to be the merged interval, enabling efficient merging of overlapping intervals.

Tuple fields are accessed using a 0-based index, just like array elements.

With our intervals merged, we can now print out the results correctly!

use std::iter::FromIterator; // this line addresses a rust playground bug

fn main() {
    let poem = "I have a little shadow that goes in and out with me,
                And what can be the use of him is more than I can see.
                He is very, very like me from the heels up to the head;
                And I see him jump before me, when I jump into my bed.

                The funniest thing about him is the way he likes to grow -
                Not at all like proper children, which is always very slow;
                For he sometimes shoots up taller like an india-rubber ball,
                And he sometimes gets so little that there’s none of him at all.";

    let pattern = "all";
    let before_context = 1;
    let after_context = 1;

    // convert the poem into lines
    let lines = Vec::from_iter(poem.lines());

    // store the 0-based line number for any matched line
    let match_lines: Vec<_> = lines
        .iter()
        .enumerate()
        .filter_map(|(i, line)| match line.contains(pattern) {
            true => Some(i),
            false => None,
        })
        .collect(); // turns anything iterable into a collection

    // create intervals of the form [a,b] with the before/after context
    let mut intervals: Vec<_> = match_lines
        .iter()
        .map(|line| {
            (
                line.saturating_sub(before_context),
                line.saturating_add(after_context),
            )
        })
        .collect();

    // merge overlapping intervals
    intervals.dedup_by(|next, prev| {
        if prev.1 < next.0 {
            false
        } else {
            prev.1 = next.1;
            true
        }
    });

    // print the lines
    for (start, end) in intervals {
        for (line_no, line) in lines
                .iter()
                .enumerate()
                .take(end + 1)
                .skip(start) {
            println!("{}: {}", line_no + 1, line)
        }
    }
}

We Did It!

We have a working grep program! Take some time to review everything we've covered so far. The rest of the course is going to move quickly.

Summary

We covered quite a bit in this section:

  • collect converts any iterable into a collection.
  • saturating_add and saturating_sub prevent overflow.
  • skip creates an iterator that skips elements until n elements are skipped or the end of the iterator is reached, whichever comes first.
  • take creates an iterator that yields elements until n elements are yielded or the end of the iterator is reached, whichever comes first.
  • The println! macro supports the :? format specifier, which is intended to help with debugging Rust code. All public types should implement the fmt::Debug trait.3

Next

Let's dive into the concepts of ownership and borrowing.


1

Refer to the Vec source code see it's full implementation. The struct definition is on line 397.

2

The Inferred type is part of the type system in Rust.

3

Details about the Debug trait can be found here.

Ownership

Ownership and borrowing are fundamental principles in Rust that ensure memory safety without needing a garbage collector. Ownership dictates how memory is managed, while borrowing allows you to reference data without taking ownership. Understanding these concepts is crucial for writing efficient and safe Rust programs.

The Rust documentation provides an in-depth exploration of these topics in Understanding Ownership, and it's highly recommended to spend some time reading that material. For now, let's focus on the core principles of ownership.

Core Principles

Keep these rules about ownership in mind as we progress through the course:

  • Each value in Rust has an owner.
  • There can only be one owner at a time.
  • When the owner goes out of scope, the value is dropped.1

1

The term dropped means the memory is freed and the object's lifetime has ended.

There Can Be Only One

Currently, our entire grep program is contained within the main function. This approach was chosen to address the challenges of teaching a course in a way that introduces important concepts logically and in an easily digestible manner. However, the time has come to refactor some of the code from the main function into separate functions.

Let's start by creating a function to handle the work of finding all the lines in the input that contain the specified pattern. Here is the function signature:

fn find_matching_lines(lines: Vec<&str>, pattern: &str) -> Vec<usize>

This is the code that we'll transfer into the new function:

// store the 0-based line number for any matched line
let match_lines: Vec<_> = lines
    .iter()
    .enumerate()
    .filter_map(|(i, line)| match line.contains(pattern) {
        true => Some(i),
        false => None,
    })
    .collect(); // turns anything iterable into a collection

Here is the revised code with the changes implemented. Review it and run the code.

use std::iter::FromIterator; // this line addresses a rust playground bug

fn find_matching_lines(lines: Vec<&str>, pattern: &str) -> Vec<usize> {
    lines
        .iter()
        .enumerate()
        .filter_map(|(i, line)| match line.contains(pattern) {
            true => Some(i),
            false => None,
        })
        .collect() // turns anything iterable into a collection
        // The return keyword is unnecessary when the returned value is the
        // final expression in a function. In this scenario, the semicolon (;)
        // is omitted.
}

fn main() {
    let poem = "I have a little shadow that goes in and out with me,
                And what can be the use of him is more than I can see.
                He is very, very like me from the heels up to the head;
                And I see him jump before me, when I jump into my bed.

                The funniest thing about him is the way he likes to grow -
                Not at all like proper children, which is always very slow;
                For he sometimes shoots up taller like an india-rubber ball,
                And he sometimes gets so little that there’s none of him at all.";

    // command line arguments
    let pattern = "all";
    let before_context = 1;
    let after_context = 1;

    // convert the poem into lines
    let lines = Vec::from_iter(poem.lines());

    // store the 0-based line number for any matched line
    let match_lines = find_matching_lines(lines, pattern);

    // create intervals of the form [a,b] with the before/after context
    let mut intervals: Vec<_> = match_lines
        .iter()
        .map(|line| {
            (
                line.saturating_sub(before_context),
                line.saturating_add(after_context),
            )
        })
        .collect();

    // merge overlapping intervals
    intervals.dedup_by(|next, prev| {
        if prev.1 < next.0 {
            false
        } else {
            prev.1 = next.1;
            true
        }
    });

    // print the lines
    for (start, end) in intervals {
        for (line_no, line) in
            lines.iter().enumerate().take(end + 1).skip(start)
        {
            println!("{}: {}", line_no + 1, line)
        }
    }
}

Uh-oh!

A minor code change caused an issue with the program. Fortunately, the Rust compiler offers helpful information for diagnosing the problem. However, if you're not familiar with Rust's ownership rules, understanding this error can be challenging. Let's break down the error and understand what went wrong. Here are the key details from the error message, cleaned up for readability:

error[E0382]: borrow of moved value: `lines`
    --> src/main.rs:63:13
     |
34   |     let lines = Vec::from_iter(poem.lines());
     |         ----- move occurs because `lines` has type `Vec<&str>`, which
     |               does not implement the `Copy` trait
...
37   |     let match_lines = find_matching_lines(lines, pattern);
     |                                           ----- value moved here
...
63   |             lines.iter().enumerate().take(end + 1).skip(start)
     |             ^^^^^^^^^^^^ value borrowed here after move
     |

Unpacking the Error

Copy trait, value moved, value borrowed. What the heck does all this mean?

Copy Trait

We'll explore traits in more detail later in the course. For now, just know that when a type implements the Copy trait, its values are duplicated when assigned to a new variable. This means that after an assignment, both the original and the new variable can be used independently.

The lines vector we passed to the find_matching_lines function does not implement the Copy trait.

Value Moved

By default, variable bindings in Rust follow move semantics.1 When a value is moved, its ownership is transferred to the new variable, rendering the original variable invalid and unusable.

Since the lines vector does not implement the Copy trait, its value was moved, rendering the original value in main invalid.

Value Borrowed

Because the lines variable in main becomes invalid due to the move, any attempt to borrow or reference its value is invalid. This is why the compiler generates the message "value borrowed here after move".

Move Semantics

Rust's move semantics play an integral part in ensuring memory safety by detecting common memory-related errors, like null pointer dereferencing, buffer overflows, and use-after-free, during compile time, thereby preventing them from happening at runtime.


1

There are some exceptions in Rust. For example, most primitive types implement the Copy trait.

Borrowing

Given the error in the previous section about borrowing a value after it has been moved, let's now focus on how to borrow a value.

References

Recall from the section on the string slice str that we said it's usually seen in it's borrowed form &str. The & operator1 in the prefix position represents a borrow. In find_matching_lines, pattern is borrowed.

fn find_matching_lines(lines: Vec<&str>, pattern: &str) -> Vec<usize>

Borrowing lines

In find_matching_lines, we can borrow lines by prefixing the parameter's type with an &, changing it to &Vec<&str>, and by prefixing the variable lines in main to &lines. After making these changes and re-running the program, we can see that it now works.

use std::iter::FromIterator; // this line addresses a rust playground bug

fn find_matching_lines(lines: &Vec<&str>, pattern: &str) -> Vec<usize> {
    lines
        .iter()
        .enumerate()
        .filter_map(|(i, line)| match line.contains(pattern) {
            true => Some(i),
            false => None,
        })
        .collect() // turns anything iterable into a collection
}

fn main() {
    let poem = "I have a little shadow that goes in and out with me,
                And what can be the use of him is more than I can see.
                He is very, very like me from the heels up to the head;
                And I see him jump before me, when I jump into my bed.

                The funniest thing about him is the way he likes to grow -
                Not at all like proper children, which is always very slow;
                For he sometimes shoots up taller like an india-rubber ball,
                And he sometimes gets so little that there’s none of him at all.";

    // command line arguments
    let pattern = "all";
    let before_context = 1;
    let after_context = 1;

    // convert the poem into lines
    let lines = Vec::from_iter(poem.lines());

    // store the 0-based line number for any matched line
    let match_lines = find_matching_lines(&lines, pattern);

    // create intervals of the form [a,b] with the before/after context
    let mut intervals: Vec<_> = match_lines
        .iter()
        .map(|line| {
            (
                line.saturating_sub(before_context),
                line.saturating_add(after_context),
            )
        })
        .collect();

    // merge overlapping intervals
    intervals.dedup_by(|next, prev| {
        if prev.1 < next.0 {
            false
        } else {
            prev.1 = next.1;
            true
        }
    });

    // print the lines
    for (start, end) in intervals {
        for (line_no, line) in
            lines.iter().enumerate().take(end + 1).skip(start)
        {
            println!("{}: {}", line_no + 1, line)
        }
    }
}

Continuing to Refactor

Let's create another function to handle the creation of our intervals. Here is the function signature:

fn create_intervals(
    lines: Vec<usize>,
    before_context: usize,
    after_context: usize,
) -> Vec<(usize, usize)>

This is the code that we'll transfer into the new function:

// create intervals of the form [a,b] with the before/after context
let mut intervals: Vec<_> = match_lines
    .iter()
    .map(|line| {
        (
            line.saturating_sub(before_context),
            line.saturating_add(after_context),
        )
    })
    .collect();

Here is the revised code with the changes implemented. Review it and run the code.

use std::iter::FromIterator; // this line addresses a rust playground bug

fn find_matching_lines(lines: &Vec<&str>, pattern: &str) -> Vec<usize> {
    lines
        .iter()
        .enumerate()
        .filter_map(|(i, line)| match line.contains(pattern) {
            true => Some(i),
            false => None,
        })
        .collect() // turns anything iterable into a collection
}

fn create_intervals(
    lines: Vec<usize>,
    before_context: usize,
    after_context: usize,
) -> Vec<(usize, usize)> {
    lines
        .iter()
        .map(|line| {
            (
                line.saturating_sub(before_context),
                line.saturating_add(after_context),
            )
        })
        .collect()
}

fn main() {
    let poem = "I have a little shadow that goes in and out with me,
                And what can be the use of him is more than I can see.
                He is very, very like me from the heels up to the head;
                And I see him jump before me, when I jump into my bed.

                The funniest thing about him is the way he likes to grow -
                Not at all like proper children, which is always very slow;
                For he sometimes shoots up taller like an india-rubber ball,
                And he sometimes gets so little that there’s none of him at all.";

    // command line arguments
    let pattern = "all";
    let before_context = 1;
    let after_context = 1;

    // convert the poem into lines
    let lines = Vec::from_iter(poem.lines());

    // store the 0-based line number for any matched line
    let match_lines = find_matching_lines(&lines, pattern);

    // create intervals of the form [a,b] with the before/after context
    let mut intervals =
        create_intervals(match_lines, before_context, after_context);

    // merge overlapping intervals
    intervals.dedup_by(|next, prev| {
        if prev.1 < next.0 {
            false
        } else {
            prev.1 = next.1;
            true
        }
    });

    // print the lines
    for (start, end) in intervals {
        for (line_no, line) in
            lines.iter().enumerate().take(end + 1).skip(start)
        {
            println!("{}: {}", line_no + 1, line)
        }
    }
}

Moving vs Borrowing

  • Why is it possible to move match_lines without causing an error?
  • Considering heap allocations, what advantages might there be in moving match_lines instead of borrowing it?

Exercise

Develop a function named merge_intervals and transfer the specified code from main into this function, making any necessary updates. Construct another function called print_results and relocate the specified code from main into this function, updating it as needed.

  • Create a function named merge_intervals and move the specified code from main into this function, making any necessary updates.
    // merge overlapping intervals
    intervals.dedup_by(|next, prev| {
        if prev.1 < next.0 {
            false
        } else {
            prev.1 = next.1;
            true
        }
    });
  • Create another function called print_results and move the specified code from main into this function, updating it as needed.
    // print the lines
    for (start, end) in intervals {
        for (line_no, line) in
            lines.iter().enumerate().take(end + 1).skip(start)
        {
            println!("{}: {}", line_no + 1, line)
        }
    }
  • Modify main to utilize these newly created functions.

You can complete these exercises by updating the most recent version of the code provided above.

Solution
use std::iter::FromIterator; // this line addresses a rust playground bug

fn find_matching_lines(lines: &Vec<&str>, pattern: &str) -> Vec<usize> {
    lines
        .iter()
        .enumerate()
        .filter_map(|(i, line)| match line.contains(pattern) {
            true => Some(i),
            false => None,
        })
        .collect() // turns anything iterable into a collection
}

fn create_intervals(
    lines: Vec<usize>,
    before_context: usize,
    after_context: usize,
) -> Vec<(usize, usize)> {
    lines
        .iter()
        .map(|line| {
            (
                line.saturating_sub(before_context),
                line.saturating_add(after_context),
            )
        })
        .collect()
}

fn merge_intervals(intervals: &mut Vec<(usize, usize)>) {
    // merge overlapping intervals
    intervals.dedup_by(|next, prev| {
        if prev.1 < next.0 {
            false
        } else {
            prev.1 = next.1;
            true
        }
    })
}

fn print_results(intervals: Vec<(usize, usize)>, lines: Vec<&str>) {
    for (start, end) in intervals {
        for (line_no, line) in
            lines.iter().enumerate().take(end + 1).skip(start)
        {
            println!("{}: {}", line_no + 1, line)
        }
    }
}

fn main() {
    let poem = "I have a little shadow that goes in and out with me,
                And what can be the use of him is more than I can see.
                He is very, very like me from the heels up to the head;
                And I see him jump before me, when I jump into my bed.

                The funniest thing about him is the way he likes to grow -
                Not at all like proper children, which is always very slow;
                For he sometimes shoots up taller like an india-rubber ball,
                And he sometimes gets so little that there’s none of him at all.";

    // command line arguments
    let pattern = "all";
    let before_context = 1;
    let after_context = 1;

    // convert the poem into lines
    let lines = Vec::from_iter(poem.lines());

    // store the 0-based line number for any matched line
    let match_lines = find_matching_lines(&lines, pattern);

    // create intervals of the form [a,b] with the before/after context
    let mut intervals =
        create_intervals(match_lines, before_context, after_context);

    // merge overlapping intervals
    merge_intervals(&mut intervals);

    // print the lines
    print_results(intervals, lines);
}

Summary

Although the concepts of ownership and borrowing are relatively straightforward, they can be frustrating when learning Rust. Reading through the official documentation on Understanding Ownership will certainly help overcome this challenge. Keep the following points in mind as you continue on your journey with Rust:

Ownership Rules

  • Each value in Rust has an owner.
  • There can only be one owner at a time.
  • When the owner goes out of scope, the value is dropped.

Borrowing Rules

  • At any given time, you can have either one mutable reference or any number of immutable references.
  • References must always be valid.

Next

With our new understanding of ownership and borrowing, let's switch our focus to error handling.


1

The & operator can have different meanings depending on the context. For example, when used an infix operator, it becomes a bitwise AND.

Error Handling

Many programs use mechanisms like exceptions for handling errors, but Rust takes a different approach. Rust doesn't have exceptions. Instead, it uses the Result type for recoverable errors and the panic! macro for unrecoverable errors. Similar to the Option enum we previously discussed, Result is another enum.

Result<T, E>

Result<T, E> is the type used for returning and propagating errors. Similar to Option, it has two variants: Ok(T), which represents success and contains a value, and Err(E), which represents an error and contains an error value. The T and E are generic type parameters that we'll discuss in more detail soon. For now, it's important to know that T represents the type of the value returned in a success case within the Ok variant, and E represents the type of the error returned in a failure case within the Err variant.

enum Result<T, E> {
   Ok(T),
   Err(E),
}

Additional Resources

Several sections in the Rust documentation thoroughly cover error handling.

Here are some useful links:


1

We'll cover the question mark operator ? when we refactor the code.

File I/O

Our grep program wouldn't be complete without the ability to search text files. Given the potential for I/O errors, adding this capability now is convenient as we explore error handling and the Result type. This also introduces us to additional packages in Rust's standard library.

Up to this point, we've been able to use string literals in our grep program because dynamic memory allocation wasn't needed. However, now that we will be reading from a file, dynamic memory allocation becomes necessary. The string slice is no longer sufficient, so we need to utilize the String 1 type in Rust.

Storing Data on the Heap

Should you find yourself needing to allocate memory directly on the heap, the Box type is commonly used. You can find numerous examples on its usage in the documentation on the Box type.

Let's start by creating a function that reads a file and returns a vector of strings (Vec<String>) where each string represents a line. Here is the function signature:2 3

fn read_file(file: File) -> Vec<String> {
    todo!(); // see the footnote [^3]
}

This is the code that we'll add to the read_file function:

BufReader::new(file).lines().map_while(Result::ok).collect()

The read_file function accepts a file handle and utilizes BufReader to efficiently read the file line by line, storing each line in a vector of strings (Vec<String>), which it then returns to the caller.

Many less efficient methods for reading a file and storing the results in a collection typically involve iterating over each line, converting it to a string, and then pushing the string into a vector. This approach requires intermediate memory allocations, which can become costly for large files. Additionally, each line read from the file potentially involves a system call. The BufReader uses an internal buffer to read large chunks of data from the file, minimizing both memory allocations and system calls.

The modifications to the main function:

fn main() {
    // command line arguments
    let pattern = "all";
    let before_context = 1;
    let after_context = 1;
    let filename = "poem.txt";

    // attempt to open the file
    let lines = match File::open(filename) {
        // convert the poem into lines
        Ok(file) => read_file(file),
        Err(e) => {
            eprintln!("Error opening {filename}: {e}");
            exit(1);
        }
    };

    // store the 0-based line number for any matched line
    let match_lines = find_matching_lines(&lines, pattern);

    // create intervals of the form [a,b] with the before/after context
    let mut intervals =
        create_intervals(match_lines, before_context, after_context);

    // merge overlapping intervals
    merge_intervals(&mut intervals);

    // print the lines
    print_results(intervals, lines);
}

Unpacking the Code

There's a lot going on here, so let's break it down step by step.

read_file

fn read_file(file: File) -> Vec<String> {
    BufReader::new(file).lines().map_while(Result::ok).collect()
}
  1. BufReader: BufReader::new(file) creates a buffered reader from the provided File. This helps in efficiently reading the file line by line.

  2. lines(): The lines() method on BufReader returns an iterator over the lines in the file. Because reading from a file can file, each line is wrapped in a Result, which can be either Ok (containing the line) or Err (containing an error).

  3. map_while(Result::ok): The map_while method is used to transform the iterator. It applies the Result::ok function to each item, which converts Ok(line) to Some(line) and Err(_) to None. The iteration stops when the first None is encountered. Here are the relevant parts from the source code, cleaned up for readability:

    pub enum Result<T, E> {
       Ok(T),
       Err(E),
    }
    
    impl<T, E> Result<T, E> {
        pub fn ok(self) -> Option<T> {
            match self {
                Ok(x) => Some(x),
                Err(_) => None,
            }
        }
    }

    This conversion is necessary because the map method requires the closure to return an Option. Converting Err to None drops the error value and causes map_while to stop yielding.

  4. collect(): The collect() method gathers all the Some(line) values into a Vec<String> that gets returned to the caller.

main

In the main function, we attempt to open a file, which can fail for various reasons. If the Result is Ok, we call read_file with the file value. Since we don't need the file handle afterward, borrowing isn't necessary. If an error occurs while opening the file, we use the eprintln! macro to print the error to standard error and then exit.

Putting it All Together

Here are the changes with the unrelated parts of the program hidden:

use std::fs::File;
use std::io::{BufRead, BufReader};
use std::process::exit;

fn find_matching_lines(lines: &[String], pattern: &str) -> Vec<usize> {
    lines
        .iter()
        .enumerate()
        .filter_map(|(i, line)| match line.contains(pattern) {
            true => Some(i),
            false => None,
        })
        .collect() // turns anything iterable into a collection
}

fn create_intervals(
    lines: Vec<usize>,
    before_context: usize,
    after_context: usize,
) -> Vec<(usize, usize)> {
    lines
        .iter()
        .map(|line| {
            (
                line.saturating_sub(before_context),
                line.saturating_add(after_context),
            )
        })
        .collect()
}

fn merge_intervals(intervals: &mut Vec<(usize, usize)>) {
    // merge overlapping intervals
    intervals.dedup_by(|next, prev| {
        if prev.1 < next.0 {
            false
        } else {
            prev.1 = next.1;
            true
        }
    })
}

fn print_results(intervals: Vec<(usize, usize)>, lines: Vec<String>) {
    for (start, end) in intervals {
        for (line_no, line) in
            lines.iter().enumerate().take(end + 1).skip(start)
        {
            println!("{}: {}", line_no + 1, line)
        }
    }
}

fn read_file(file: File) -> Vec<String> {
    BufReader::new(file).lines().map_while(Result::ok).collect()
}

fn main() {
    // command line arguments
    let pattern = "all";
    let before_context = 1;
    let after_context = 1;
    let filename = "poem.txt";

    // attempt to open the file
    let lines = match File::open(filename) {
        // convert the poem into lines
        Ok(file) => read_file(file),
        Err(e) => {
            eprintln!("Error opening {filename}: {e}");
            exit(1);
        }
    };

    // store the 0-based line number for any matched line
    let match_lines = find_matching_lines(&lines, pattern);

    // create intervals of the form [a,b] with the before/after context
    let mut intervals =
        create_intervals(match_lines, before_context, after_context);

    // merge overlapping intervals
    merge_intervals(&mut intervals);

    // print the lines
    print_results(intervals, lines);
}

Don't forget, you can reveal the hidden parts by clicking Show hidden lines.

Summary

  • Rust requires acknowledging and handling errors before code compilation, ensuring robustness.
  • Errors are categorized into recoverable (e.g., file not found) and unrecoverable (e.g., out-of-bounds access).
  • Rust uses Result<T, E> for recoverable errors and panic! for unrecoverable errors, unlike other languages that use exceptions.

Next

To continue using the Rust Playground, opening an actual file isn't going to work. Let's see how we can leverage an in-memory buffer to represent an open file.


1

Strings are implemented as Vec<u8> in Rust. Reference the API for details.

2

Unfortunately, the Rust Playground doesn't support opening files, so you'll need to run this part of the code on your local machine.

3

Rust offers several useful macros that are handy for developing and prototyping your program. todo!() is one of them, and another is unimplemented!().

4

Unlike many object-oriented programming languages that use this, Rust uses self.

Cursor

Our production grep program now has the capability to access real files. However, the Rust Playground does not support opening files directly. To ensure this course remains functional in your web browser, we need to use an in-memory buffer to simulate a file. This technique of mocking an open file is also commonly used in unit testing, making it a valuable concept to explore.

The Cursor 1 is used with in-memory buffers to provide Read or Write functionality. Without digressing too much, the BufReader we are using works on objects that implement the Read trait. For now, think of a trait as an interface in object-oriented programming languages, with some differences.

Mocking a File with Cursor

We only need to make a few changes to our program to utilize Cursor. Let's start by updating the read_file function to accept any object that implements the Read trait as an argument.

fn read_file(file: impl Read) -> Vec<String>

Next, we'll reintroduce our famous poem and use it as our in-memory buffer to represent our file.

let poem = "I have a little shadow that goes in and out with me,
            And what can be the use of him is more than I can see.
            He is very, very like me from the heels up to the head;
            And I see him jump before me, when I jump into my bed.

            The funniest thing about him is the way he likes to grow -
            Not at all like proper children, which is always very slow;
            For he sometimes shoots up taller like an india-rubber ball,
            And he sometimes gets so little that there’s none of him at all.";

let mock_file = std::io::Cursor::new(poem);

Finally, we comment out the lines that handled opening a file and calling read_file, and instead, we directly call read_file with mock_file.

// attempt to open the file
let lines = read_file(mock_file);
//let lines = match File::open(filename) {
//    // convert the poem into lines
//    Ok(file) => read_file(file),
//    Err(e) => {
//        eprintln!("Error opening {filename}: {e}");
//        exit(1);
//    }
//};

Putting it All Together

With these changes applied to our grep program, we can once again utilize the Rust Playground to extend its functionality and continue learning Rust.

#![allow(unused_imports)]
use std::fs::File;
use std::io::Read;
use std::io::{BufRead, BufReader};
use std::process::exit;

fn find_matching_lines(lines: &[String], pattern: &str) -> Vec<usize> {
    lines
        .iter()
        .enumerate()
        .filter_map(|(i, line)| match line.contains(pattern) {
            true => Some(i),
            false => None,
        })
        .collect() // turns anything iterable into a collection
}

fn create_intervals(
    lines: Vec<usize>,
    before_context: usize,
    after_context: usize,
) -> Vec<(usize, usize)> {
    lines
        .iter()
        .map(|line| {
            (
                line.saturating_sub(before_context),
                line.saturating_add(after_context),
            )
        })
        .collect()
}

fn merge_intervals(intervals: &mut Vec<(usize, usize)>) {
    // merge overlapping intervals
    intervals.dedup_by(|next, prev| {
        if prev.1 < next.0 {
            false
        } else {
            prev.1 = next.1;
            true
        }
    })
}

fn print_results(intervals: Vec<(usize, usize)>, lines: Vec<String>) {
    for (start, end) in intervals {
        for (line_no, line) in
            lines.iter().enumerate().take(end + 1).skip(start)
        {
            println!("{}: {}", line_no + 1, line)
        }
    }
}

fn read_file(file: impl Read) -> Vec<String> {
    BufReader::new(file).lines().map_while(Result::ok).collect()
}

fn main() {
    let poem = "I have a little shadow that goes in and out with me,
            And what can be the use of him is more than I can see.
            He is very, very like me from the heels up to the head;
            And I see him jump before me, when I jump into my bed.

            The funniest thing about him is the way he likes to grow -
            Not at all like proper children, which is always very slow;
            For he sometimes shoots up taller like an india-rubber ball,
            And he sometimes gets so little that there’s none of him at all.";

    let mock_file = std::io::Cursor::new(poem);

    // command line arguments
    let pattern = "all";
    let before_context = 1;
    let after_context = 1;

    // attempt to open the file
    let lines = read_file(mock_file);
    //let lines = match File::open(filename) {
    //    // convert the poem into lines
    //    Ok(file) => read_file(file),
    //    Err(e) => {
    //        eprintln!("Error opening {filename}: {e}");
    //        exit(1);
    //    }
    //};

    // store the 0-based line number for any matched line
    let match_lines = find_matching_lines(&lines, pattern);

    // create intervals of the form [a,b] with the before/after context
    let mut intervals =
        create_intervals(match_lines, before_context, after_context);

    // merge overlapping intervals
    merge_intervals(&mut intervals);

    // print the lines
    print_results(intervals, lines);
}

What? You don't believe me! Give it a whirl and see for yourself! 😄

Since we commented out the file handling code, some previously necessary imports are now unused. The #![allow(unused_imports)] attribute in Rust instructs the compiler to permit these unused imports without issuing warnings. We'll delve deeper into attributes when we discuss custom types and implement command line argument support.

Next

We're ready to add support for regular expressions for pattern matching. We'll take a brief detour to learn about project management in Rust, which will allow us to use packages (also known as crates) to add this functionality.


1

A Cursor wraps an in-memory buffer and provides it with a Seek implementation.

Project Management

As your programs grow in size, it becomes crucial to organize your code effectively. By grouping related functionalities and separating distinct features, you make it easier to locate the code responsible for a specific feature and to modify how that feature operates. Let's look at how a Rust project can be organized.

Paraphrasing the Rust documentation, Rust offers several features to help you manage your code’s organization, including which details are exposed, which are private, and what names are in each scope of your programs. These features, often collectively referred to as the module system, include:

  • Packages: A Cargo feature that lets you build, test, and share crates.
  • Crates: A tree of modules that produces a library or executable.
  • Modules and use: Allow you to control the organization, scope, and privacy of paths.
  • Paths: A way of naming an item, such as a struct, function, or module.

All the details about managing projects can be found in the Managing Growing Projects with Packages, Crates, and Modules section of The Rust Programming Language documentation.

Crates

Crates are the smallest units of code that the Rust compiler processes at a time. For instance, the grep program we're developing is recognized as a crate by the Rust compiler.

Types of Crates

Crates are available in two types: binary and library. The compiler identifies the type of crate being compiled based on the presence of either a main.rs or lib.rs file.

Binary Crates

Binary crates contain a main.rs file in the src directory, feature a main function, and are compiled into an executable.

Library Crates

Library crates contain a lib.rs file in the src directory, lack a main function, and do not compile into an executable. Instead, they are designed to be shared with other projects.

Packages

A package is a collection of one or more crates that provide a set of functionalities. It includes a Cargo.toml file that outlines how to build those crates. A package can contain multiple binary crates, but only one library crate at most. However, a package must contain at least one crate, whether it's a library or binary crate.

Refer to the section on Packages and Crates in the Rust documentation.

Modules

Modules help us organize code within a crate for better readability and easy reuse. They also allow us to control the privacy of items, as code within a module is private by default. Private items are internal implementation details not accessible from outside. We can choose to make modules and their items public, exposing them for external code to use and depend on.

Refer to the section on Defining Modules to Control Scope and Privacy in the Rust documentation.

Summary

In this section, we were introduced to:

  • Crates: A tree of modules that produces a library or executable
  • Packages: A Cargo feature that lets you build, test, and share crates
  • Modules: Let you control the organization of your project

Next

With a basic understanding of crates, packages, and modules, let's utilize one for regular expression support.

Dependencies

crates.io is the central package registry for the Rust community, serving as a hub to discover and download packages. cargo supports adding and managing dependencies for you and is configured to use the package registry by default.

We're going to explore a few crates in this course:

  • regex: An implementation of regular expressions for Rust
  • itertools: Extra iterator adaptors, iterator methods, free functions, and macros
  • clap: A simple to use, efficient, and full-featured Command Line Argument Parser
  • crossbeam: A set of tools for concurrent programming

Cargo.toml

Cargo.toml is one of the boilerplate files generated by cargo new. This is where we manage dependencies and other configurations. The starter file is quite simple.

[package]
name = "grep"
version = "0.1.0"
edition = "2021"

[dependencies]

Adding a Dependency

Adding dependencies is as simple as specifying the name of the crate and the desired version1 in the [dependencies] section of the Cargo.toml file. Here's how we add the regex crate.

regex = "1.11.1"

We can also use cargo add:

$ cargo add regex
    Updating crates.io index
      Adding regex v1.11.1 to dependencies
             Features:
             + perf
             + perf-backtrack
             + perf-cache
             + perf-dfa
             + perf-inline
             + perf-literal
             + perf-onepass
             + std
             + unicode
             + unicode-age
             + unicode-bool
             + unicode-case
             + unicode-gencat
             + unicode-perl
             + unicode-script
             + unicode-segment
             - logging
             - pattern
             - perf-dfa-full
             - unstable
             - use_std
    Updating crates.io index
     Locking 5 packages to latest compatible versions
      Adding aho-corasick v1.1.3
      Adding memchr v2.7.4
      Adding regex v1.11.1
      Adding regex-automata v0.4.9
      Adding regex-syntax v0.8.5

Either method produces the same result.

[package]
name = "grep"
version = "0.1.0"
edition = "2021"

[dependencies]
regex = "1.11.1"

Exercise

These exercises must be completed locally.

  • Add the clap crate and include the derive2 feature. More information to accomplish this can be found here and here.
Solution
clap = { version = "4.5.21", features = ["derive"] }
regex = "1.11.1"
1

Semantic version numbers (i.e. SemVer) are supported. Refer to the documentation on specifying dependencies for more advanced version control.

2

Features of dependencies can be enabled within the dependency declaration. The features key indicates which features to enable. The Cargo Book covers this under the Dependency features section.

use-ing Dependencies

Now that we've added the regex crate as a dependency, we are ready to use it (pun intended). You can refer to the crate by the full path name regex::Regex (crate::module) or, more commonly, by bringing the module into scope with the use1 keyword.

We've been doing this in our grep program:

use std::fs::File;
use std::io::Read;
use std::io::{BufRead, BufReader};
use std::process::exit;

By bringing the module into scope with use regex::Regex;, we can reference it with Regex.

Now that we know how to add dependencies to our project and use them, it's time to add support for regular expressions!


1

The use keyword can do more than just bring things into scope, so be sure to check the documentation on use declarations for more details on use.

Regular Expressions

With the Regex crate added to our project, we'll replace the pattern string slice we've been using with a regular expression.

Using Regex

The Regex modules defines a new method that takes a regular expression, attempts to compile it, and returns a Regex object.1

let pattern = "[Ee]xample";
let re = Regex::new(pattern);

Since compiling a regular expression can fail (e.g., due to an invalid pattern), new returns a Result. Here is the function signature for new2:

fn new(re: &str) -> Result<Regex, Error>

The function signature indicates that the Ok variant returns a Regex, while the Err variant returns an Error. Since our grep program can't continue with an invalid regular expression, we need to catch that case, display a helpful error message, and exit the program.

Let's put all this together:

extern crate regex; // this is needed for the playground
use regex::Regex;
use std::process::exit;


fn main() {
    let pattern = "(missing the closing parenthesis"; // invalid expression

    // compile the regular expression
    match Regex::new(pattern) {
        // the underscore (_) means we are ignoring the value returned by new
        Ok(_) => println!("{pattern} is a valid regular expression!"),

        // e is the error value returned by new
        Err(e) => {
            eprintln!("{e}"); // eprintln! writes to standard error
            exit(1);          // exit with error code 1
        }
    };
}

Run the code to see the error. Then, correct the it by adding the missing parenthesis ) and re-run the code.

Updating Grep

We now have enough context to modify our Grep program to include regular expression support. Below are the changes, with the unrelated parts of the program hidden:

#![allow(unused_imports)]
use std::fs::File;
use std::io::Read;
use std::io::{BufRead, BufReader};
use std::process::exit;
extern crate regex; // this is needed for the playground
use regex::Regex;

fn find_matching_lines(lines: &[String], regex: Regex) -> Vec<usize> {
    lines
        .iter()
        .enumerate()
        .filter_map(|(i, line)| match regex.is_match(line) {
            true => Some(i),
            false => None,
        })
        .collect() // turns anything iterable into a collection
}

fn create_intervals(
    lines: Vec<usize>,
    before_context: usize,
    after_context: usize,
) -> Vec<(usize, usize)> {
    lines
        .iter()
        .map(|line| {
            (
                line.saturating_sub(before_context),
                line.saturating_add(after_context),
            )
        })
        .collect()
}

fn merge_intervals(intervals: &mut Vec<(usize, usize)>) {
    // merge overlapping intervals
    intervals.dedup_by(|next, prev| {
        if prev.1 < next.0 {
            false
        } else {
            prev.1 = next.1;
            true
        }
    })
}

fn print_results(intervals: Vec<(usize, usize)>, lines: Vec<String>) {
    for (start, end) in intervals {
        for (line_no, line) in
            lines.iter().enumerate().take(end + 1).skip(start)
        {
            println!("{}: {}", line_no + 1, line)
        }
    }
}

fn read_file(file: impl Read) -> Vec<String> {
    BufReader::new(file).lines().map_while(Result::ok).collect()
}

fn main() {
    let poem = "I have a little shadow that goes in and out with me,
                And what can be the use of him is more than I can see.
                He is very, very like me from the heels up to the head;
                And I see him jump before me, when I jump into my bed.

                The funniest thing about him is the way he likes to grow -
                Not at all like proper children, which is always very slow;
                For he sometimes shoots up taller like an india-rubber ball,
                And he sometimes gets so little that there’s none of him at all.";

    let mock_file = std::io::Cursor::new(poem);

    // command line arguments
    let pattern = "(all)|(little)";
    let before_context = 1;
    let after_context = 1;

    // attempt to open the file
    let lines = read_file(mock_file);
    //let lines = match File::open(filename) {
    //    // convert the poem into lines
    //    Ok(file) => read_file(file),
    //    Err(e) => {
    //        eprintln!("Error opening {filename}: {e}");
    //        exit(1);
    //    }
    //};

    // compile the regular expression
    let regex = match Regex::new(pattern) {
        Ok(re) => re, // bind re to regex
        Err(e) => {
            eprintln!("{e}"); // write to standard error
            exit(1);
        }
    };

    // store the 0-based line number for any matched line
    let match_lines = find_matching_lines(&lines, regex);

    // create intervals of the form [a,b] with the before/after context
    let mut intervals =
        create_intervals(match_lines, before_context, after_context);

    // merge overlapping intervals
    merge_intervals(&mut intervals);

    // print the lines
    print_results(intervals, lines);
}

Don't forget, you can reveal the hidden parts by clicking Show hidden lines.

The let regex = match Regex::new(pattern) variable binding expression might seem a bit unusual. The pattern is discussed in the Rust documentation section on Recoverable Errors with Result. To briefly explain: When the result is Ok, this code extracts the inner re value from the Ok variant and moves it to the variable regex.

Next

Onward to creating our own module!


1

The Regex crate includes excellent documentation and detailed examples to learn from.

2

The source code for new can be found here.

Custom Types

Rust provides two options for creating custom types: enum and struct. The key difference between them is that enums have a fixed set of variants, meaning they are used to indicate that a value is one of a specific set of possible values. On the other hand, struct is not limited to a set of possible values. Structures allow for packaging related values into meaningful groups.

We've already encountered enums when we looked at Option and Result.

In our current grep program, we use tuples to represent intervals. In this section, we’ll replace tuples with a custom Interval type using a struct, and enhance it by adding methods for various operations. Finally, we’ll define an enum to represent potential error values when creating and interacting with intervals.

Struct

We're going to create a type to represent an interval. Rust follows a standardized naming convention for types, where structures use UpperCamelCase1 for "type-level" constructs. Therefore, we'll name our struct Interval.

Defining our Struct

Our closed interval is going to represent the start and end of the lines to print:

struct Interval {
    start: usize,
    end: usize,
}

Defining Behavior

Functions and methods are added by defining them inside an impl block:

impl Interval {
    // Methods definitions
}

The new Function

As it is common convention in Rust to define a function called new for creating an object, we'll begin by defining one to return an Interval.

impl Interval {
    fn new(start: usize, end: usize) -> Self {
        Self { start, end }
    }
}

The keyword Self

The Self keywords in the return type and the body of the function are aliases for the type specified after the impl keyword, which in this case is Interval. While the type can be explicitly stated, the common convention is to use Self.

Exclusion of the return keyword and (;)

The return keyword is unnecessary when the returned value is the final expression in a function. In this scenario, the semicolon (;) is omitted. 2

Methods

Recall from our grep program that we merged overlapping intervals to prevent printing the same line multiple times. It would be useful to create a method to check if two intervals overlap and another method to merge overlapping intervals. Let's outline these methods!

fn overlaps(&self, other: &Interval) -> bool {
    todo!();
}

fn merge(&self, other: &Self) -> Self {
    todo!();
}

The todo!() and unimplemented!() macros can be useful if you are prototyping and just want a placeholder to let your code pass type analysis.

The self Method Receiver

You're probably accustomed to using the implicit this pointer (a hidden first parameter) in your class methods. In Rust, the self keyword is used to represent the receiver of a method and the convention is to omit the type for this parameter. Depending on the intended behavior, it can be specified as self, &self or &mut self.3

Omitting the type after self is syntactic sugar. It's short for self: Self. As Self is an alias to the actual type, the full expansion would be self: Interval, self: &Interval or self: &mut Interval.

Implementing the Methods

Remember that our grep program processes lines sequentially. This allows us to optimize the detection of overlapping intervals. However, this approach limits the versatility of our Interval type. As an exercise, you can work on making it more generic.

overlaps

The overlaps method is fairly straightforward. We check if the end of the first interval is greater than or equal to the start of the next interval. The only caveat is the order of the comparison.

fn overlaps(&self, other: &Interval) -> bool {
    self.end >= other.start
}

merge

The merge method returns a new Interval using the start of the first interval and the end of the second. The same caveat applies: the receiver must be the interval that comes first in the sequence.

In both cases, an immutable borrow for both intervals is sufficient since we do not need to mutate either value.

fn merge(&self, other: &Self) -> Self {
    Interval::new(self.start, other.end)
}

Implementing merge_intervals

Rust programmers coming from a C++ background might be inclined to implement this function using some variation of the following algorithm:

fn merge_intervals(intervals: Vec<Interval>) -> Vec<Interval> {
    let mut merged_intervals: Vec<Interval> = Vec::new();

    let mut iter = intervals.into_iter();
    match iter.next() {
        Some(interval) => merged_intervals.push(interval),
        None => return merged_intervals,
    };

    for interval in iter {
        if let Some(previous_interval) = merged_intervals.last() {
            if previous_interval.overlaps(&interval) {
                let new_interval = previous_interval.merge(&interval);
                merged_intervals.pop();
                merged_intervals.push(new_interval);
            } else {
                merged_intervals.push(interval);
            }
        }
    }

    merged_intervals
}

While functionally correct, Rust features powerful crates that can make implementing this behavior more concise. One such crate is Itertools, which provides extra iterator adaptors. To use this crate, specify it as a dependency in Crates.toml and include it in main.rs with use itertools::Itertools. Let's see how the coalesce adaptor can simplify the code:

use itertools::Itertools;

fn merge_intervals(intervals: Vec<Interval>) -> Vec<Interval> {
    intervals
        .into_iter()
        .coalesce(|p, c| {
            if p.overlaps(&c) {
                Ok(p.merge(&c))
            } else {
                Err((p, c))
            }
        })
        .collect()
}

into_iter

In both implementations, we use into_iter, which creates a consuming iterator that moves each value out of the vector from start to end. This is another example of how we can take advantage of move semantics.

NOTE: Because the iterator consumes the data, the vector cannot be used afterward.

Updating Grep

It's time to update our Grep program to utilize our new type. Additionally, a few minor changes have been made to enhance the design, demonstrate some additional language features, and leverage move semantics. Give the program a run!

#![allow(unused_imports)]
extern crate regex; // this is needed for the playground
use itertools::Itertools;
use regex::Regex;
use std::fs::File;
use std::io::Read;
use std::io::{BufRead, BufReader};
use std::process::exit;

fn find_matching_lines(lines: &[String], regex: Regex) -> Vec<usize> {
    lines
        .iter()
        .enumerate()
        .filter_map(|(i, line)| match regex.is_match(line) {
            true => Some(i),
            false => None,
        })
        .collect() // turns anything iterable into a collection
}

fn create_intervals(
    lines: Vec<usize>,
    before_context: usize,
    after_context: usize,
) -> Vec<Interval> {
    lines
        .iter()
        .map(|line| {
            let start = line.saturating_sub(before_context);
            let end = line.saturating_add(after_context);
            Interval::new(start, end)
        })
        .collect()
}

fn merge_intervals(intervals: Vec<Interval>) -> Vec<Interval> {
    // merge overlapping intervals
    intervals
        .into_iter()
        .coalesce(|p, c| {
            if p.overlaps(&c) {
                Ok(p.merge(&c))
            } else {
                Err((p, c))
            }
        })
        .collect()
}

fn print_results(intervals: Vec<Interval>, lines: Vec<String>) {
    for interval in intervals {
        for (line_no, line) in lines
            .iter()
            .enumerate()
            .take(interval.end + 1)
            .skip(interval.start)
        {
            println!("{}: {}", line_no + 1, line)
        }
    }
}

fn read_file(file: impl Read) -> Vec<String> {
    BufReader::new(file).lines().map_while(Result::ok).collect()
}

fn main() {
    let poem = "I have a little shadow that goes in and out with me,
                And what can be the use of him is more than I can see.
                He is very, very like me from the heels up to the head;
                And I see him jump before me, when I jump into my bed.

                The funniest thing about him is the way he likes to grow -
                Not at all like proper children, which is always very slow;
                For he sometimes shoots up taller like an india-rubber ball,
                And he sometimes gets so little that there’s none of him at all.";

    let mock_file = std::io::Cursor::new(poem);

    // command line arguments
    let pattern = "(all)|(little)";
    let before_context = 1;
    let after_context = 1;

    // attempt to open the file
    let lines = read_file(mock_file);
    //let lines = match File::open(filename) {
    //    // convert the poem into lines
    //    Ok(file) => read_file(file),
    //    Err(e) => {
    //        eprintln!("Error opening {filename}: {e}");
    //        exit(1);
    //    }
    //};

    // compile the regular expression
    let regex = match Regex::new(pattern) {
        Ok(re) => re, // bind re to regex
        Err(e) => {
            eprintln!("{e}"); // write to standard error
            exit(1);
        }
    };

    // store the 0-based line number for any matched line
    let match_lines = find_matching_lines(&lines, regex);

    // create intervals of the form [a,b] with the before/after context
    let intervals =
        create_intervals(match_lines, before_context, after_context);

    // merge overlapping intervals
    let intervals = merge_intervals(intervals);

    // print the lines
    print_results(intervals, lines);
}

struct Interval {
    start: usize,
    end: usize,
}

impl Interval {
    fn new(start: usize, end: usize) -> Self {
        Self { start, end }
    }

    fn overlaps(&self, other: &Interval) -> bool {
        self.end >= other.start
    }

    fn merge(&self, other: &Self) -> Self {
        Interval::new(self.start, other.end)
    }
}

Summarizing the Changes

Our grep program now makes use of our custom type and includes a few other enhancements.

Let's review the changes:

create_intervals was updated with the following changes:

  • The return type was changed to Vec<Interval>
  • The tuple created from start and end are now used to create an Interval

merge_intervals was updated with the following changes:

  • The argument intervals now has a type of Vec<Interval> and is moved instead of the mutable borrow
  • dedup_by was replaced with coalesce

print_results was updated with the following changes:

  • The argument intervals is now a Vec<Interval>
  • The take and skip iterator adaptors were updated to use the fields from the Interval

Each interval is dropped at the end of the loop iteration when written as for interval in intervals. If the loop were written as for interval in &intervals, we would borrow each value. The same applies if we had written the loop as for interval in intervals.iter(). The former is syntactic sugar for the latter.4


1

Casing conforms to RFC 430 (C-CASE).

2

Exclusion of return is discussed here.

3

Method syntax is covered in full detail here.

4

Details about consuming collections with for in and into_iter can be found here.

Enum

An enumeration, commonly known as an enum, is declared using the keyword enum. Compared to other languges, enums in Rust offer greater flexibility and power. For instance, they support generics (as seen with the Option enum) and can encapsulate values within their variants! 1

Using an enum for Errors

For our grep program, we’ll create an enum to represent possible errors during Interval operations. For example, when a user creates an Interval, the starting value must be less than or equal to the ending value. Additionally, if a user wants to merge two intervals, they must overlap. If these conditions aren't met, we’ll return an error (Err) using one of our enum variants.

Defining an enum

Below is the IntervalError enum, which lists the errors that we may need to return:

enum IntervalError {
    StartEndRangeInvalid,
    NonOverlappingInterval,
}

Updating our Interval

First, we'll change the return type for the function new and the method merge to a Result with the Ok variant being an Interval and the Err variant the appropriate IntervalError:

fn new(start: usize, end: usize) -> Result<Self, IntervalError> {
    if start <= end {
        Ok(Self { start, end })
    } else {
        Err(IntervalError::StartEndRangeInvalid)
    }
}
fn merge(&self, other: &Self) -> Result<Self, IntervalError> {
    if self.overlaps(other) {
        Ok(Self {
            start: self.start,
            end: other.end,
        })
    } else {
        Err(IntervalError::NonOverlappingInterval)
    }
}

Observe how an Interval is created in new as opposed to merge. Since the parameter names in new precisely match the fields in the Interval struct definition, you can omit the field specifiers. This technique is referred to as field init shorthand syntax. 2

Updating Grep

With the Interval changes implemented, we need to update the create_intervals and merge_intervals functions to accommodate the Result return type.

fn create_intervals(
    lines: Vec<usize>,
    before_context: usize,
    after_context: usize,
) -> Result<Vec<Interval>, IntervalError> {
    lines
        .iter()
        .map(|line| {
            let start = line.saturating_sub(before_context);
            let end = line.saturating_add(after_context);
            Interval::new(start, end)
        })
        .collect()
}

In create_intervals, the only change was the return type, which changed from Vec<Interval> to Result<Vec<Interval>, IntervalError>. You might wonder why this works. The concise answer is that the Result type implements the FromIterator trait.

Imagine an intermediary collection of Result values created from calls to Interval::new(start, end). The FromIterator trait implementation allows an iterator over Result values to be collected into a Result containing a collection of the underlying values or an Err. 3 In other words, the value contained in the Result returned by Interval::new(start, end) is stored in the Vec<Interval> collection which is then wrapped in a Result.

We'll be exploring traits in the next section.

fn merge_intervals(intervals: Vec<Interval>) -> Vec<Interval> {
    // merge overlapping intervals
    intervals
        .into_iter()
        .coalesce(|p, c| p.merge(&c).or(Err((p, c))))
        .collect()
}

In merge_intervals, the only change required is to the closure used in coalesce. We attempt to merge two intervals and invoke the or method of Result. If the merge is successful, returning Ok, the value is passed back to coalesce. Otherwise, the Err((p, c)) value provided to or is returned.

Result methods like map_err, or, and or_else are often used in error handling because they allow benign errors to be managed while letting successful results pass through. Since the merge method only merges overlapping intervals, we replace the Err variant it returns with the tuple (p, c) needed by coalesce.

Eager vs Lazy Evaluation

Depending on the use case, different Result methods may be more efficient. It's important to read the documentation to determine the best choice. For example, arguments passed to or are eagerly evaluated. If you're passing the result of a function call, it's better to use or_else, which is lazily evaluated.

The final change required is in main. Since create_intervals now returns a Result, we use a match expression to check if the operation was successful. In the case of an Err, since it’s unrecoverable, we print an error message and exit.

// create intervals of the form [a,b] with the before/after context
let intervals =
    match create_intervals(match_lines, before_context, after_context) {
        Ok(intervals) => intervals,
        Err(_) => {
            eprintln!("An error occurred while creating intervals");
            exit(1);
        }
    };

Updating Grep

With our changes in place, our Interval now supports error handling via Result and our grep program properly handles any errors.

#![allow(unused_imports)]
extern crate regex; // this is needed for the playground
use itertools::Itertools;
use regex::Regex;
use std::fs::File;
use std::io::Read;
use std::io::{BufRead, BufReader};
use std::process::exit;

fn find_matching_lines(lines: &[String], regex: Regex) -> Vec<usize> {
    lines
        .iter()
        .enumerate()
        .filter_map(|(i, line)| match regex.is_match(line) {
            true => Some(i),
            false => None,
        })
        .collect() // turns anything iterable into a collection
}

fn create_intervals(
    lines: Vec<usize>,
    before_context: usize,
    after_context: usize,
) -> Result<Vec<Interval>, IntervalError> {
    lines
        .iter()
        .map(|line| {
            let start = line.saturating_sub(before_context);
            let end = line.saturating_add(after_context);
            Interval::new(start, end)
        })
        .collect()
}

fn merge_intervals(intervals: Vec<Interval>) -> Vec<Interval> {
    // merge overlapping intervals
    intervals
        .into_iter()
        .coalesce(|p, c| p.merge(&c).or(Err((p, c))))
        .collect()
}

fn print_results(intervals: Vec<Interval>, lines: Vec<String>) {
    for interval in intervals {
        for (line_no, line) in lines
            .iter()
            .enumerate()
            .take(interval.end + 1)
            .skip(interval.start)
        {
            println!("{}: {}", line_no + 1, line)
        }
    }
}

fn read_file(file: impl Read) -> Vec<String> {
    BufReader::new(file).lines().map_while(Result::ok).collect()
}

fn main() {
    let poem = "I have a little shadow that goes in and out with me,
                And what can be the use of him is more than I can see.
                He is very, very like me from the heels up to the head;
                And I see him jump before me, when I jump into my bed.

                The funniest thing about him is the way he likes to grow -
                Not at all like proper children, which is always very slow;
                For he sometimes shoots up taller like an india-rubber ball,
                And he sometimes gets so little that there’s none of him at all.";

    let mock_file = std::io::Cursor::new(poem);

    // command line arguments
    let pattern = "(all)|(little)";
    let before_context = 1;
    let after_context = 1;

    // attempt to open the file
    let lines = read_file(mock_file);
    //let lines = match File::open(filename) {
    //    // convert the poem into lines
    //    Ok(file) => read_file(file),
    //    Err(e) => {
    //        eprintln!("Error opening {filename}: {e}");
    //        exit(1);
    //    }
    //};

    // compile the regular expression
    let regex = match Regex::new(pattern) {
        Ok(re) => re, // bind re to regex
        Err(e) => {
            eprintln!("{e}"); // write to standard error
            exit(1);
        }
    };

    // store the 0-based line number for any matched line
    let match_lines = find_matching_lines(&lines, regex);

    // create intervals of the form [a,b] with the before/after context
    let intervals =
        match create_intervals(match_lines, before_context, after_context) {
            Ok(intervals) => intervals,
            Err(_) => {
                eprintln!("An error occurred while creating intervals");
                exit(1);
            }
        };

    // merge overlapping intervals
    let intervals = merge_intervals(intervals);

    // print the lines
    print_results(intervals, lines);
}

enum IntervalError {
    StartEndRangeInvalid,
    NonOverlappingInterval,
}

struct Interval {
    start: usize,
    end: usize,
}

impl Interval {
    fn new(start: usize, end: usize) -> Result<Self, IntervalError> {
        if start <= end {
            Ok(Self { start, end })
        } else {
            Err(IntervalError::StartEndRangeInvalid)
        }
    }

    fn overlaps(&self, other: &Interval) -> bool {
        self.end >= other.start
    }

    fn merge(&self, other: &Self) -> Result<Self, IntervalError> {
        if self.overlaps(other) {
            Ok(Self {
                start: self.start,
                end: other.end,
            })
        } else {
            Err(IntervalError::NonOverlappingInterval)
        }
    }
}

Summary

The decision to use the Result type for error handling in Rust provides a robust and flexible way of managing errors such as:

  1. Explicit Error Handling: Using Result makes error handling explicit. Functions that can fail return a Result, which forces the caller to handle the potential error, making the code more reliable and less prone to unexpected failures.
  2. Recoverable Errors: The Result type allows for recoverable errors. By returning a Result, the function gives the caller the option to handle the error in a way that makes sense for their specific context, rather than immediately terminating the program.
  3. Type Safety: Rust’s type system ensures that errors are handled correctly. The Result type is part of this system, helping to prevent common errors like null pointer dereferencing and making the code safer and more predictable.
  4. Composability: The Result type implements traits like FromIterator, which allows for powerful and flexible error handling patterns. This makes it easier to work with collections of results and to propagate errors through multiple layers of function calls.

Overall, the use of Result aligns with Rust’s goals of safety, concurrency, and performance, providing a clear and structured way to handle errors.

Exercise

  • Replace Result::or with Result::map_err in merge_intervals.

    Solution
    fn merge_intervals(intervals: Vec<Interval>) -> Vec<Interval> {
        // merge overlapping intervals
        intervals
            .into_iter()
            .coalesce(|p, c| p.merge(&c).map_err(|_| (p, c)))
            .collect()
    }
#![allow(unused_imports)]
extern crate regex; // this is needed for the playground
use itertools::Itertools;
use regex::Regex;
use std::fs::File;
use std::io::Read;
use std::io::{BufRead, BufReader};
use std::process::exit;

fn find_matching_lines(lines: &[String], regex: Regex) -> Vec<usize> {
    lines
        .iter()
        .enumerate()
        .filter_map(|(i, line)| match regex.is_match(line) {
            true => Some(i),
            false => None,
        })
        .collect() // turns anything iterable into a collection
}

fn create_intervals(
    lines: Vec<usize>,
    before_context: usize,
    after_context: usize,
) -> Result<Vec<Interval>, IntervalError> {
    lines
        .iter()
        .map(|line| {
            let start = line.saturating_sub(before_context);
            let end = line.saturating_add(after_context);
            Interval::new(start, end)
        })
        .collect()
}

fn merge_intervals(intervals: Vec<Interval>) -> Vec<Interval> {
    // merge overlapping intervals
    intervals
        .into_iter()
        .coalesce(|p, c| p.merge(&c).or(Err((p, c))))
        .collect()
}

fn print_results(intervals: Vec<Interval>, lines: Vec<String>) {
    for interval in intervals {
        for (line_no, line) in lines
            .iter()
            .enumerate()
            .take(interval.end + 1)
            .skip(interval.start)
        {
            println!("{}: {}", line_no + 1, line)
        }
    }
}

fn read_file(file: impl Read) -> Vec<String> {
    BufReader::new(file).lines().map_while(Result::ok).collect()
}

fn main() {
    let poem = "I have a little shadow that goes in and out with me,
                And what can be the use of him is more than I can see.
                He is very, very like me from the heels up to the head;
                And I see him jump before me, when I jump into my bed.

                The funniest thing about him is the way he likes to grow -
                Not at all like proper children, which is always very slow;
                For he sometimes shoots up taller like an india-rubber ball,
                And he sometimes gets so little that there’s none of him at all.";

    let mock_file = std::io::Cursor::new(poem);

    // command line arguments
    let pattern = "(all)|(little)";
    let before_context = 1;
    let after_context = 1;

    // attempt to open the file
    let lines = read_file(mock_file);
    //let lines = match File::open(filename) {
    //    // convert the poem into lines
    //    Ok(file) => read_file(file),
    //    Err(e) => {
    //        eprintln!("Error opening {filename}: {e}");
    //        exit(1);
    //    }
    //};

    // compile the regular expression
    let regex = match Regex::new(pattern) {
        Ok(re) => re, // bind re to regex
        Err(e) => {
            eprintln!("{e}"); // write to standard error
            exit(1);
        }
    };

    // store the 0-based line number for any matched line
    let match_lines = find_matching_lines(&lines, regex);

    // create intervals of the form [a,b] with the before/after context
    let intervals =
        match create_intervals(match_lines, before_context, after_context) {
            Ok(intervals) => intervals,
            Err(_) => {
                eprintln!("An error occurred while creating intervals");
                exit(1);
            }
        };

    // merge overlapping intervals
    let intervals = merge_intervals(intervals);

    // print the lines
    print_results(intervals, lines);
}

enum IntervalError {
    StartEndRangeInvalid,
    NonOverlappingInterval,
}

struct Interval {
    start: usize,
    end: usize,
}

impl Interval {
    fn new(start: usize, end: usize) -> Result<Self, IntervalError> {
        if start <= end {
            Ok(Self { start, end })
        } else {
            Err(IntervalError::StartEndRangeInvalid)
        }
    }

    fn overlaps(&self, other: &Interval) -> bool {
        self.end >= other.start
    }

    fn merge(&self, other: &Self) -> Result<Self, IntervalError> {
        if self.overlaps(other) {
            Ok(Self {
                start: self.start,
                end: other.end,
            })
        } else {
            Err(IntervalError::NonOverlappingInterval)
        }
    }
}

Next

With our Interval complete, let’s make it a module!


1

Refer to the documentation on enum values.

2

Refer to the documentation on field init shorthand syntax.

3

Refer to the documentation on Collecting into a Result for detailed explanation.

Creating Modules

In the previous section, we defined an enum and struct. However, we didn't actually create a module! In this section, we will. Before diving into the details, the Rust documentation provides a Module Cheat Sheet which will be a handy reference on your Rust journey.

Since we're building our module to work in this online book, we'll deviate slightly from standard practice. Typically, modules are created in separate files. For instance, our Interval would be placed in a new file called interval.rs.

grep
├─ Cargo.lock
├─ Cargo.toml
└─ src
   ├─ interval.rs
   └─ main.rs

Document generation

As we develop our interval module, it's a great opportunity to learn about rustdoc and documentation generation. The interval module will be annotated to enable documentation generation. If you're building the code locally, the simplest way to generate and view documentation is by using cargo doc --open.

Let's get started!

Scope and Privacy

We're going to convert all our Interval related code into a module. To define a module, we use the mod keyword followed by the module's name and enclose the body within curly braces.

mod interval {
    // module body
}

Here's the new version of our grep program with the Interval related parts moved inside the interval module. In addition, comments have been added to the module which will be used to generate documentation in an upcoming section. Go ahead and run the code to see it in action!

#![allow(unused_imports)]
extern crate regex; // this is needed for the playground
use itertools::Itertools;
use regex::Regex;
use std::fs::File;
use std::io::Read;
use std::io::{BufRead, BufReader};
use std::process::exit;

fn find_matching_lines(lines: &[String], regex: Regex) -> Vec<usize> {
    lines
        .iter()
        .enumerate()
        .filter_map(|(i, line)| match regex.is_match(line) {
            true => Some(i),
            false => None,
        })
        .collect() // turns anything iterable into a collection
}

fn create_intervals(
    lines: Vec<usize>,
    before_context: usize,
    after_context: usize,
) -> Result<Vec<Interval>, IntervalError> {
    lines
        .iter()
        .map(|line| {
            let start = line.saturating_sub(before_context);
            let end = line.saturating_add(after_context);
            Interval::new(start, end)
        })
        .collect()
}

fn merge_intervals(intervals: Vec<Interval>) -> Vec<Interval> {
    // merge overlapping intervals
    intervals
        .into_iter()
        .coalesce(|p, c| p.merge(&c).map_err(|_| (p, c)))
        .collect()
}

fn print_results(intervals: Vec<Interval>, lines: Vec<String>) {
    for interval in intervals {
        for (line_no, line) in lines
            .iter()
            .enumerate()
            .take(interval.end + 1)
            .skip(interval.start)
        {
            println!("{}: {}", line_no + 1, line)
        }
    }
}

fn read_file(file: impl Read) -> Vec<String> {
    BufReader::new(file).lines().map_while(Result::ok).collect()
}

fn main() {
    let poem = "I have a little shadow that goes in and out with me,
                And what can be the use of him is more than I can see.
                He is very, very like me from the heels up to the head;
                And I see him jump before me, when I jump into my bed.

                The funniest thing about him is the way he likes to grow -
                Not at all like proper children, which is always very slow;
                For he sometimes shoots up taller like an india-rubber ball,
                And he sometimes gets so little that there’s none of him at all.";

    let mock_file = std::io::Cursor::new(poem);

    // command line arguments
    let pattern = "(all)|(little)";
    let before_context = 1;
    let after_context = 1;

    // attempt to open the file
    let lines = read_file(mock_file);
    //let lines = match File::open(filename) {
    //    // convert the poem into lines
    //    Ok(file) => read_file(file),
    //    Err(e) => {
    //        eprintln!("Error opening {filename}: {e}");
    //        exit(1);
    //    }
    //};

    // compile the regular expression
    let regex = match Regex::new(pattern) {
        Ok(re) => re, // bind re to regex
        Err(e) => {
            eprintln!("{e}"); // write to standard error
            exit(1);
        }
    };

    // store the 0-based line number for any matched line
    let match_lines = find_matching_lines(&lines, regex);

    // create intervals of the form [a,b] with the before/after context
    let intervals =
        match create_intervals(match_lines, before_context, after_context) {
            Ok(intervals) => intervals,
            Err(_) => {
                eprintln!("An error occurred while creating intervals");
                exit(1);
            }
        };

    // merge overlapping intervals
    let intervals = merge_intervals(intervals);

    // print the lines
    print_results(intervals, lines);
}

mod interval {
    /// A list specifying general categories of Interval errors.
    enum IntervalError {
        /// Start is not less than or equal to end
        StartEndRangeInvalid,
        /// Two intervals to be merged do not overlap
        NonOverlappingInterval,
    }

    /// A closed-interval [`start`, `end`] type used for representing a range of
    /// values between `start` and `end` inclusively.
    ///
    /// # Examples
    ///
    /// You can create an `Interval` using `new`.
    ///
    /// ```rust
    /// let interval = Interval::new(1, 10).unwrap();
    /// assert_eq!(interval.start, 1);
    /// assert_eq!(interval.end, 10);
    /// ```
    struct Interval {
        start: usize,
        end: usize,
    }

    impl Interval {
        /// Creates a new `Interval` set to `start` and `end`.
        ///
        /// # Examples
        ///
        /// ```rust
        /// let interval = Interval::new(1, 10).unwrap();
        /// assert_eq!(interval.start, 1);
        /// assert_eq!(interval.end, 10);
        /// ```
        fn new(start: usize, end: usize) -> Result<Self, IntervalError> {
            if start <= end {
                Ok(Self { start, end })
            } else {
                Err(IntervalError::StartEndRangeInvalid)
            }
        }

        /// Checks if two intervals overlap. Overlapping intervals have at least
        /// one point in common.
        ///
        /// # Examples
        ///
        /// ```rust
        /// let a = Interval::new(1, 3).unwrap();
        /// let b = Interval::new(3, 5).unwrap();
        /// assert_eq!(a.overlaps(&b), true);
        /// assert_eq!(b.overlaps(&a), true);
        /// ```
        ///
        /// ```rust
        /// let a = Interval::new(1, 5).unwrap();
        /// let b = Interval::new(2, 4).unwrap();
        /// assert_eq!(a.overlaps(&b), true);
        /// assert_eq!(b.overlaps(&a), true);
        /// ```
        ///
        /// ```rust
        /// let a = Interval::new(1, 3).unwrap();
        /// let b = Interval::new(4, 6).unwrap();
        /// assert_eq!(a.overlaps(&b), false);
        /// assert_eq!(b.overlaps(&a), true);
        /// ```
        fn overlaps(&self, other: &Interval) -> bool {
            self.end >= other.start
        }

        /// Merges two intervals returning a new `Interval`.
        ///
        /// The merged `Interval` range includes the union of ranges from each
        /// `Interval`.
        ///
        /// # Examples
        ///
        /// ```rust
        /// let a = Interval::new(1, 3).unwrap();
        /// let b = Interval::new(3, 5).unwrap();
        /// let c = a.merge(&b).unwrap();
        /// assert_eq!(c.start, 1);
        /// assert_eq!(c.end, 5);
        /// ```
        fn merge(&self, other: &Self) -> Result<Self, IntervalError> {
            if self.overlaps(other) {
                Ok(Self {
                    start: self.start,
                    end: other.end,
                })
            } else {
                Err(IntervalError::NonOverlappingInterval)
            }
        }
    }
}

Uh-oh! Looks like we have all kinds of compiler errors! Among the chaos is two different errors having to do with scope and privacy.

  • [E0412]: cannot find type Interval in this scope
  • [E0433]: failed to resolve: use of undeclared type Interval

Remember you can always use rustc --explain EXXXX which provides a detailed explanation of an error message.

Scope

The first error indicates that the Interval type is not in scope. This is because it is now part of the interval module. To access it, we can use the full path name interval::Interval or create a shortcut with use. Since we don't want to replace every occurrence of Interval in our code with interval::Interval, we'll take advantage of use. Additionally, we need access to IntervalError, so we'll bring that into scope at the same time. When bringing multiple types into scope, we can wrap them in {} and separate them with ,.

With that change in place, run the code again.

#![allow(unused_imports)]
extern crate regex; // this is needed for the playground
use interval::{Interval, IntervalError};
use itertools::Itertools;
use regex::Regex;
use std::fs::File;
use std::io::Read;
use std::io::{BufRead, BufReader};
use std::process::exit;

fn find_matching_lines(lines: &[String], regex: Regex) -> Vec<usize> {
    lines
        .iter()
        .enumerate()
        .filter_map(|(i, line)| match regex.is_match(line) {
            true => Some(i),
            false => None,
        })
        .collect() // turns anything iterable into a collection
}

fn create_intervals(
    lines: Vec<usize>,
    before_context: usize,
    after_context: usize,
) -> Result<Vec<Interval>, IntervalError> {
    lines
        .iter()
        .map(|line| {
            let start = line.saturating_sub(before_context);
            let end = line.saturating_add(after_context);
            Interval::new(start, end)
        })
        .collect()
}

fn merge_intervals(intervals: Vec<Interval>) -> Vec<Interval> {
    // merge overlapping intervals
    intervals
        .into_iter()
        .coalesce(|p, c| p.merge(&c).map_err(|_| (p, c)))
        .collect()
}

fn print_results(intervals: Vec<Interval>, lines: Vec<String>) {
    for interval in intervals {
        for (line_no, line) in lines
            .iter()
            .enumerate()
            .take(interval.end + 1)
            .skip(interval.start)
        {
            println!("{}: {}", line_no + 1, line)
        }
    }
}

fn read_file(file: impl Read) -> Vec<String> {
    BufReader::new(file).lines().map_while(Result::ok).collect()
}

fn main() {
    let poem = "I have a little shadow that goes in and out with me,
                And what can be the use of him is more than I can see.
                He is very, very like me from the heels up to the head;
                And I see him jump before me, when I jump into my bed.

                The funniest thing about him is the way he likes to grow -
                Not at all like proper children, which is always very slow;
                For he sometimes shoots up taller like an india-rubber ball,
                And he sometimes gets so little that there’s none of him at all.";

    let mock_file = std::io::Cursor::new(poem);

    // command line arguments
    let pattern = "(all)|(little)";
    let before_context = 1;
    let after_context = 1;

    // attempt to open the file
    let lines = read_file(mock_file);
    //let lines = match File::open(filename) {
    //    // convert the poem into lines
    //    Ok(file) => read_file(file),
    //    Err(e) => {
    //        eprintln!("Error opening {filename}: {e}");
    //        exit(1);
    //    }
    //};

    // compile the regular expression
    let regex = match Regex::new(pattern) {
        Ok(re) => re, // bind re to regex
        Err(e) => {
            eprintln!("{e}"); // write to standard error
            exit(1);
        }
    };

    // store the 0-based line number for any matched line
    let match_lines = find_matching_lines(&lines, regex);

    // create intervals of the form [a,b] with the before/after context
    let intervals =
        match create_intervals(match_lines, before_context, after_context) {
            Ok(intervals) => intervals,
            Err(_) => {
                eprintln!("An error occurred while creating intervals");
                exit(1);
            }
        };

    // merge overlapping intervals
    let intervals = merge_intervals(intervals);

    // print the lines
    print_results(intervals, lines);
}

mod interval {
    /// A list specifying general categories of Interval errors.
    enum IntervalError {
        /// Start is not less than or equal to end
        StartEndRangeInvalid,
        /// Two intervals to be merged do not overlap
        NonOverlappingInterval,
    }

    /// A closed-interval [`start`, `end`] type used for representing a range of
    /// values between `start` and `end` inclusively.
    ///
    /// # Examples
    ///
    /// You can create an `Interval` using `new`.
    ///
    /// ```rust
    /// let interval = Interval::new(1, 10).unwrap();
    /// assert_eq!(interval.start, 1);
    /// assert_eq!(interval.end, 10);
    /// ```
    struct Interval {
        start: usize,
        end: usize,
    }

    impl Interval {
        /// Creates a new `Interval` set to `start` and `end`.
        ///
        /// # Examples
        ///
        /// ```rust
        /// let interval = Interval::new(1, 10).unwrap();
        /// assert_eq!(interval.start, 1);
        /// assert_eq!(interval.end, 10);
        /// ```
        fn new(start: usize, end: usize) -> Result<Self, IntervalError> {
            if start <= end {
                Ok(Self { start, end })
            } else {
                Err(IntervalError::StartEndRangeInvalid)
            }
        }

        /// Checks if two intervals overlap. Overlapping intervals have at least
        /// one point in common.
        ///
        /// # Examples
        ///
        /// ```rust
        /// let a = Interval::new(1, 3).unwrap();
        /// let b = Interval::new(3, 5).unwrap();
        /// assert_eq!(a.overlaps(&b), true);
        /// assert_eq!(b.overlaps(&a), true);
        /// ```
        ///
        /// ```rust
        /// let a = Interval::new(1, 5).unwrap();
        /// let b = Interval::new(2, 4).unwrap();
        /// assert_eq!(a.overlaps(&b), true);
        /// assert_eq!(b.overlaps(&a), true);
        /// ```
        ///
        /// ```rust
        /// let a = Interval::new(1, 3).unwrap();
        /// let b = Interval::new(4, 6).unwrap();
        /// assert_eq!(a.overlaps(&b), false);
        /// assert_eq!(b.overlaps(&a), true);
        /// ```
        fn overlaps(&self, other: &Interval) -> bool {
            self.end >= other.start
        }

        /// Merges two intervals returning a new `Interval`.
        ///
        /// The merged `Interval` range includes the union of ranges from each
        /// `Interval`.
        ///
        /// # Examples
        ///
        /// ```rust
        /// let a = Interval::new(1, 3).unwrap();
        /// let b = Interval::new(3, 5).unwrap();
        /// let c = a.merge(&b).unwrap();
        /// assert_eq!(c.start, 1);
        /// assert_eq!(c.end, 5);
        /// ```
        fn merge(&self, other: &Self) -> Result<Self, IntervalError> {
            if self.overlaps(other) {
                Ok(Self {
                    start: self.start,
                    end: other.end,
                })
            } else {
                Err(IntervalError::NonOverlappingInterval)
            }
        }
    }
}

Bugger! More compiler errors!

Privacy

Before we defined the interval module, all the types and their methods were public. However, once we enclosed everything in a module, they became private! By default, code within a module is private from its parent modules. To make a module public, we need to declare it with pub mod instead of just mod. Let's add the pub prefix and run the program again.

#![allow(unused_imports)]
extern crate regex; // this is needed for the playground
use interval::{Interval, IntervalError};
use itertools::Itertools;
use regex::Regex;
use std::fs::File;
use std::io::Read;
use std::io::{BufRead, BufReader};
use std::process::exit;

fn find_matching_lines(lines: &[String], regex: Regex) -> Vec<usize> {
    lines
        .iter()
        .enumerate()
        .filter_map(|(i, line)| match regex.is_match(line) {
            true => Some(i),
            false => None,
        })
        .collect() // turns anything iterable into a collection
}

fn create_intervals(
    lines: Vec<usize>,
    before_context: usize,
    after_context: usize,
) -> Result<Vec<Interval>, IntervalError> {
    lines
        .iter()
        .map(|line| {
            let start = line.saturating_sub(before_context);
            let end = line.saturating_add(after_context);
            Interval::new(start, end)
        })
        .collect()
}

fn merge_intervals(intervals: Vec<Interval>) -> Vec<Interval> {
    // merge overlapping intervals
    intervals
        .into_iter()
        .coalesce(|p, c| p.merge(&c).map_err(|_| (p, c)))
        .collect()
}

fn print_results(intervals: Vec<Interval>, lines: Vec<String>) {
    for interval in intervals {
        for (line_no, line) in lines
            .iter()
            .enumerate()
            .take(interval.end + 1)
            .skip(interval.start)
        {
            println!("{}: {}", line_no + 1, line)
        }
    }
}

fn read_file(file: impl Read) -> Vec<String> {
    BufReader::new(file).lines().map_while(Result::ok).collect()
}

fn main() {
    let poem = "I have a little shadow that goes in and out with me,
                And what can be the use of him is more than I can see.
                He is very, very like me from the heels up to the head;
                And I see him jump before me, when I jump into my bed.

                The funniest thing about him is the way he likes to grow -
                Not at all like proper children, which is always very slow;
                For he sometimes shoots up taller like an india-rubber ball,
                And he sometimes gets so little that there’s none of him at all.";

    let mock_file = std::io::Cursor::new(poem);

    // command line arguments
    let pattern = "(all)|(little)";
    let before_context = 1;
    let after_context = 1;

    // attempt to open the file
    let lines = read_file(mock_file);
    //let lines = match File::open(filename) {
    //    // convert the poem into lines
    //    Ok(file) => read_file(file),
    //    Err(e) => {
    //        eprintln!("Error opening {filename}: {e}");
    //        exit(1);
    //    }
    //};

    // compile the regular expression
    let regex = match Regex::new(pattern) {
        Ok(re) => re, // bind re to regex
        Err(e) => {
            eprintln!("{e}"); // write to standard error
            exit(1);
        }
    };

    // store the 0-based line number for any matched line
    let match_lines = find_matching_lines(&lines, regex);

    // create intervals of the form [a,b] with the before/after context
    let intervals =
        match create_intervals(match_lines, before_context, after_context) {
            Ok(intervals) => intervals,
            Err(_) => {
                eprintln!("An error occurred while creating intervals");
                exit(1);
            }
        };

    // merge overlapping intervals
    let intervals = merge_intervals(intervals);

    // print the lines
    print_results(intervals, lines);
}

pub mod interval {
    /// A list specifying general categories of Interval errors.
    enum IntervalError {
        /// Start is not less than or equal to end
        StartEndRangeInvalid,
        /// Two intervals to be merged do not overlap
        NonOverlappingInterval,
    }

    /// A closed-interval [`start`, `end`] type used for representing a range of
    /// values between `start` and `end` inclusively.
    ///
    /// # Examples
    ///
    /// You can create an `Interval` using `new`.
    ///
    /// ```rust
    /// let interval = Interval::new(1, 10).unwrap();
    /// assert_eq!(interval.start, 1);
    /// assert_eq!(interval.end, 10);
    /// ```
    struct Interval {
        start: usize,
        end: usize,
    }

    impl Interval {
        /// Creates a new `Interval` set to `start` and `end`.
        ///
        /// # Examples
        ///
        /// ```rust
        /// let interval = Interval::new(1, 10).unwrap();
        /// assert_eq!(interval.start, 1);
        /// assert_eq!(interval.end, 10);
        /// ```
        fn new(start: usize, end: usize) -> Result<Self, IntervalError> {
            if start <= end {
                Ok(Self { start, end })
            } else {
                Err(IntervalError::StartEndRangeInvalid)
            }
        }

        /// Checks if two intervals overlap. Overlapping intervals have at least
        /// one point in common.
        ///
        /// # Examples
        ///
        /// ```rust
        /// let a = Interval::new(1, 3).unwrap();
        /// let b = Interval::new(3, 5).unwrap();
        /// assert_eq!(a.overlaps(&b), true);
        /// assert_eq!(b.overlaps(&a), true);
        /// ```
        ///
        /// ```rust
        /// let a = Interval::new(1, 5).unwrap();
        /// let b = Interval::new(2, 4).unwrap();
        /// assert_eq!(a.overlaps(&b), true);
        /// assert_eq!(b.overlaps(&a), true);
        /// ```
        ///
        /// ```rust
        /// let a = Interval::new(1, 3).unwrap();
        /// let b = Interval::new(4, 6).unwrap();
        /// assert_eq!(a.overlaps(&b), false);
        /// assert_eq!(b.overlaps(&a), true);
        /// ```
        fn overlaps(&self, other: &Interval) -> bool {
            self.end >= other.start
        }

        /// Merges two intervals returning a new `Interval`.
        ///
        /// The merged `Interval` range includes the union of ranges from each
        /// `Interval`.
        ///
        /// # Examples
        ///
        /// ```rust
        /// let a = Interval::new(1, 3).unwrap();
        /// let b = Interval::new(3, 5).unwrap();
        /// let c = a.merge(&b).unwrap();
        /// assert_eq!(c.start, 1);
        /// assert_eq!(c.end, 5);
        /// ```
        fn merge(&self, other: &Self) -> Result<Self, IntervalError> {
            if self.overlaps(other) {
                Ok(Self {
                    start: self.start,
                    end: other.end,
                })
            } else {
                Err(IntervalError::NonOverlappingInterval)
            }
        }
    }
}

More compiler errors! Who would have thought creating a module could be this complicated? The reason for this series of errors is to help us understand module scope and privacy. The emerging pattern is that modules default everything to private. Simply declaring the module public isn't enough; every type and method within the module must also be explicitly declared public if that is the intended design. So, let's prefix the methods and types with pub and hopefully achieve a successful compile!

#![allow(unused_imports)]
extern crate regex; // this is needed for the playground
use interval::{Interval, IntervalError};
use itertools::Itertools;
use regex::Regex;
use std::fs::File;
use std::io::Read;
use std::io::{BufRead, BufReader};
use std::process::exit;

fn find_matching_lines(lines: &[String], regex: Regex) -> Vec<usize> {
    lines
        .iter()
        .enumerate()
        .filter_map(|(i, line)| match regex.is_match(line) {
            true => Some(i),
            false => None,
        })
        .collect() // turns anything iterable into a collection
}

fn create_intervals(
    lines: Vec<usize>,
    before_context: usize,
    after_context: usize,
) -> Result<Vec<Interval>, IntervalError> {
    lines
        .iter()
        .map(|line| {
            let start = line.saturating_sub(before_context);
            let end = line.saturating_add(after_context);
            Interval::new(start, end)
        })
        .collect()
}

fn merge_intervals(intervals: Vec<Interval>) -> Vec<Interval> {
    // merge overlapping intervals
    intervals
        .into_iter()
        .coalesce(|p, c| p.merge(&c).map_err(|_| (p, c)))
        .collect()
}

fn print_results(intervals: Vec<Interval>, lines: Vec<String>) {
    for interval in intervals {
        for (line_no, line) in lines
            .iter()
            .enumerate()
            .take(interval.end + 1)
            .skip(interval.start)
        {
            println!("{}: {}", line_no + 1, line)
        }
    }
}

fn read_file(file: impl Read) -> Vec<String> {
    BufReader::new(file).lines().map_while(Result::ok).collect()
}

fn main() {
    let poem = "I have a little shadow that goes in and out with me,
                And what can be the use of him is more than I can see.
                He is very, very like me from the heels up to the head;
                And I see him jump before me, when I jump into my bed.

                The funniest thing about him is the way he likes to grow -
                Not at all like proper children, which is always very slow;
                For he sometimes shoots up taller like an india-rubber ball,
                And he sometimes gets so little that there’s none of him at all.";

    let mock_file = std::io::Cursor::new(poem);

    // command line arguments
    let pattern = "(all)|(little)";
    let before_context = 1;
    let after_context = 1;

    // attempt to open the file
    let lines = read_file(mock_file);
    //let lines = match File::open(filename) {
    //    // convert the poem into lines
    //    Ok(file) => read_file(file),
    //    Err(e) => {
    //        eprintln!("Error opening {filename}: {e}");
    //        exit(1);
    //    }
    //};

    // compile the regular expression
    let regex = match Regex::new(pattern) {
        Ok(re) => re, // bind re to regex
        Err(e) => {
            eprintln!("{e}"); // write to standard error
            exit(1);
        }
    };

    // store the 0-based line number for any matched line
    let match_lines = find_matching_lines(&lines, regex);

    // create intervals of the form [a,b] with the before/after context
    let intervals =
        match create_intervals(match_lines, before_context, after_context) {
            Ok(intervals) => intervals,
            Err(_) => {
                eprintln!("An error occurred while creating intervals");
                exit(1);
            }
        };

    // merge overlapping intervals
    let intervals = merge_intervals(intervals);

    // print the lines
    print_results(intervals, lines);
}

pub mod interval {
    /// A list specifying general categories of Interval errors.
    pub enum IntervalError {
        /// Start is not less than or equal to end
        StartEndRangeInvalid,
        /// Two intervals to be merged do not overlap
        NonOverlappingInterval,
    }

    /// A closed-interval [`start`, `end`] type used for representing a range of
    /// values between `start` and `end` inclusively.
    ///
    /// # Examples
    ///
    /// You can create an `Interval` using `new`.
    ///
    /// ```rust
    /// let interval = Interval::new(1, 10).unwrap();
    /// assert_eq!(interval.start, 1);
    /// assert_eq!(interval.end, 10);
    /// ```
    pub struct Interval {
        start: usize,
        end: usize,
    }

    impl Interval {
        /// Creates a new `Interval` set to `start` and `end`.
        ///
        /// # Examples
        ///
        /// ```rust
        /// let interval = Interval::new(1, 10).unwrap();
        /// assert_eq!(interval.start, 1);
        /// assert_eq!(interval.end, 10);
        /// ```
        pub fn new(start: usize, end: usize) -> Result<Self, IntervalError> {
            if start <= end {
                Ok(Self { start, end })
            } else {
                Err(IntervalError::StartEndRangeInvalid)
            }
        }

        /// Checks if two intervals overlap. Overlapping intervals have at least
        /// one point in common.
        ///
        /// # Examples
        ///
        /// ```rust
        /// let a = Interval::new(1, 3).unwrap();
        /// let b = Interval::new(3, 5).unwrap();
        /// assert_eq!(a.overlaps(&b), true);
        /// assert_eq!(b.overlaps(&a), true);
        /// ```
        ///
        /// ```rust
        /// let a = Interval::new(1, 5).unwrap();
        /// let b = Interval::new(2, 4).unwrap();
        /// assert_eq!(a.overlaps(&b), true);
        /// assert_eq!(b.overlaps(&a), true);
        /// ```
        ///
        /// ```rust
        /// let a = Interval::new(1, 3).unwrap();
        /// let b = Interval::new(4, 6).unwrap();
        /// assert_eq!(a.overlaps(&b), false);
        /// assert_eq!(b.overlaps(&a), true);
        /// ```
        pub fn overlaps(&self, other: &Interval) -> bool {
            self.end >= other.start
        }

        /// Merges two intervals returning a new `Interval`.
        ///
        /// The merged `Interval` range includes the union of ranges from each
        /// `Interval`.
        ///
        /// # Examples
        ///
        /// ```rust
        /// let a = Interval::new(1, 3).unwrap();
        /// let b = Interval::new(3, 5).unwrap();
        /// let c = a.merge(&b).unwrap();
        /// assert_eq!(c.start, 1);
        /// assert_eq!(c.end, 5);
        /// ```
        pub fn merge(&self, other: &Self) -> Result<Self, IntervalError> {
            if self.overlaps(other) {
                Ok(Self {
                    start: self.start,
                    end: other.end,
                })
            } else {
                Err(IntervalError::NonOverlappingInterval)
            }
        }
    }
}

Alright, last time, I promise! As a final note on privacy, and as the compiler error pointed out, even the fields of a struct are private by default. So, the last step is to make the fields public. Let's do that and finally achieve a successful compile. Fingers crossed!

#![allow(unused_imports)]
extern crate regex; // this is needed for the playground
use interval::{Interval, IntervalError};
use itertools::Itertools;
use regex::Regex;
use std::fs::File;
use std::io::Read;
use std::io::{BufRead, BufReader};
use std::process::exit;

fn find_matching_lines(lines: &[String], regex: Regex) -> Vec<usize> {
    lines
        .iter()
        .enumerate()
        .filter_map(|(i, line)| match regex.is_match(line) {
            true => Some(i),
            false => None,
        })
        .collect() // turns anything iterable into a collection
}

fn create_intervals(
    lines: Vec<usize>,
    before_context: usize,
    after_context: usize,
) -> Result<Vec<Interval>, IntervalError> {
    lines
        .iter()
        .map(|line| {
            let start = line.saturating_sub(before_context);
            let end = line.saturating_add(after_context);
            Interval::new(start, end)
        })
        .collect()
}

fn merge_intervals(intervals: Vec<Interval>) -> Vec<Interval> {
    // merge overlapping intervals
    intervals
        .into_iter()
        .coalesce(|p, c| p.merge(&c).map_err(|_| (p, c)))
        .collect()
}

fn print_results(intervals: Vec<Interval>, lines: Vec<String>) {
    for interval in intervals {
        for (line_no, line) in lines
            .iter()
            .enumerate()
            .take(interval.end + 1)
            .skip(interval.start)
        {
            println!("{}: {}", line_no + 1, line)
        }
    }
}

fn read_file(file: impl Read) -> Vec<String> {
    BufReader::new(file).lines().map_while(Result::ok).collect()
}

fn main() {
    let poem = "I have a little shadow that goes in and out with me,
                And what can be the use of him is more than I can see.
                He is very, very like me from the heels up to the head;
                And I see him jump before me, when I jump into my bed.

                The funniest thing about him is the way he likes to grow -
                Not at all like proper children, which is always very slow;
                For he sometimes shoots up taller like an india-rubber ball,
                And he sometimes gets so little that there’s none of him at all.";

    let mock_file = std::io::Cursor::new(poem);

    // command line arguments
    let pattern = "(all)|(little)";
    let before_context = 1;
    let after_context = 1;

    // attempt to open the file
    let lines = read_file(mock_file);
    //let lines = match File::open(filename) {
    //    // convert the poem into lines
    //    Ok(file) => read_file(file),
    //    Err(e) => {
    //        eprintln!("Error opening {filename}: {e}");
    //        exit(1);
    //    }
    //};

    // compile the regular expression
    let regex = match Regex::new(pattern) {
        Ok(re) => re, // bind re to regex
        Err(e) => {
            eprintln!("{e}"); // write to standard error
            exit(1);
        }
    };

    // store the 0-based line number for any matched line
    let match_lines = find_matching_lines(&lines, regex);

    // create intervals of the form [a,b] with the before/after context
    let intervals =
        match create_intervals(match_lines, before_context, after_context) {
            Ok(intervals) => intervals,
            Err(_) => {
                eprintln!("An error occurred while creating intervals");
                exit(1);
            }
        };

    // merge overlapping intervals
    let intervals = merge_intervals(intervals);

    // print the lines
    print_results(intervals, lines);
}

pub mod interval {
    /// A list specifying general categories of Interval errors.
    pub enum IntervalError {
        /// Start is not less than or equal to end
        StartEndRangeInvalid,
        /// Two intervals to be merged do not overlap
        NonOverlappingInterval,
    }

    /// A closed-interval [`start`, `end`] type used for representing a range of
    /// values between `start` and `end` inclusively.
    ///
    /// # Examples
    ///
    /// You can create an `Interval` using `new`.
    ///
    /// ```rust
    /// let interval = Interval::new(1, 10).unwrap();
    /// assert_eq!(interval.start, 1);
    /// assert_eq!(interval.end, 10);
    /// ```
    pub struct Interval {
        pub start: usize,
        pub end: usize,
    }

    impl Interval {
        /// Creates a new `Interval` set to `start` and `end`.
        ///
        /// # Examples
        ///
        /// ```rust
        /// let interval = Interval::new(1, 10).unwrap();
        /// assert_eq!(interval.start, 1);
        /// assert_eq!(interval.end, 10);
        /// ```
        pub fn new(start: usize, end: usize) -> Result<Self, IntervalError> {
            if start <= end {
                Ok(Self { start, end })
            } else {
                Err(IntervalError::StartEndRangeInvalid)
            }
        }

        /// Checks if two intervals overlap. Overlapping intervals have at least
        /// one point in common.
        ///
        /// # Examples
        ///
        /// ```rust
        /// let a = Interval::new(1, 3).unwrap();
        /// let b = Interval::new(3, 5).unwrap();
        /// assert_eq!(a.overlaps(&b), true);
        /// assert_eq!(b.overlaps(&a), true);
        /// ```
        ///
        /// ```rust
        /// let a = Interval::new(1, 5).unwrap();
        /// let b = Interval::new(2, 4).unwrap();
        /// assert_eq!(a.overlaps(&b), true);
        /// assert_eq!(b.overlaps(&a), true);
        /// ```
        ///
        /// ```rust
        /// let a = Interval::new(1, 3).unwrap();
        /// let b = Interval::new(4, 6).unwrap();
        /// assert_eq!(a.overlaps(&b), false);
        /// assert_eq!(b.overlaps(&a), true);
        /// ```
        pub fn overlaps(&self, other: &Interval) -> bool {
            self.end >= other.start
        }

        /// Merges two intervals returning a new `Interval`.
        ///
        /// The merged `Interval` range includes the union of ranges from each
        /// `Interval`.
        ///
        /// # Examples
        ///
        /// ```rust
        /// let a = Interval::new(1, 3).unwrap();
        /// let b = Interval::new(3, 5).unwrap();
        /// let c = a.merge(&b).unwrap();
        /// assert_eq!(c.start, 1);
        /// assert_eq!(c.end, 5);
        /// ```
        pub fn merge(&self, other: &Self) -> Result<Self, IntervalError> {
            if self.overlaps(other) {
                Ok(Self {
                    start: self.start,
                    end: other.end,
                })
            } else {
                Err(IntervalError::NonOverlappingInterval)
            }
        }
    }
}

Finally, the code compiles! Now, you might be wondering why we didn't have to declare the enum variants as public. If you are, good catch! The reason is that there are two exceptions to Rust's everything is private1 behavior:

  1. Associated items in a pub Trait are public by default.
  2. enum variants in a pub enum are public by default.

Summary

This section focused heavily on emphasizing Rust's everything is private default behavior when it comes to modules. As you develop software, use privacy to your advantage and carefully decide what parts of the API should be made public.

Rust goes into great detail with regard to project management. As you create your own modules and crates, reviewing the section on Managing Growing Projects with Packages, Crates, and Modules will be extremely valuable.

Exercises

These exercises must be completed locally.

  • Move the interval module out of main.rs and into a separate file. Explore two approaches:
    1. First, create a file interval.rs in the src directory.
    2. Next, create a directory under src called interval and move interval.rs inside.
  • Advanced: Creating an external crate
    1. Create a library crate cargo new --lib interval and move the interval code into it.
    2. If you haven't already, create a binary crate cargo new grep and update the Cargo.toml file to use the external crate from the previous step. Refer to the section on Specifying Dependencies in The Cargo Book for guidance.

Next

Now, let's dive into generic types and make our Interval generic!


1

Refer to the Visibility and Privacy reference for details.

Generic Types

Many programming languages include tools for effectively handling the duplication of concepts, and Rust is no exception. Rust offers powerful support for generics, a topic that could easily warrant its own book. However, since we're on the Fast Track to Rust, we'll focus on the core concepts of generics and traits by making the interval module from the previous section generic.

We've already been working with generics throughout this course, using types such as Vec<T>, Option<T>, and Result<T, E>.

Similar to our approach in the previous section, we'll examine the helpful compiler errors as we work towards making the interval module generic. The rationale behind exposing you to these compiler errors is to illustrate how beneficial the compiler can be when working with the language.

Let's get started!

Interval

The current version of our Interval is limited to the usize type. If we want to support floating point values, negative integers, or other exotic numeric types, it won't work. We're going to address this by making it compatible with any numeric type, with some restrictions.1

Generic Data Type

The current definition of our Interval struct specifies the field types as usize. To redefine the struct to use a generic type parameter for its fields, we use the <> syntax.

pub struct Interval<T> {
    pub start: T,
    pub end: T,
}

By using a single generic type to define Interval<T>, we indicate that the Interval<T> struct is generic over some type T, and the fields start and end are both of that same type, whatever it may be. If we attempt to create an instance of Interval<T> with fields of different types, our code won't compile. To support different types for the fields, we would need to specify additional generic types, such as struct Interval<T, U>.

You've probably guessed it already, but this code won't compile. Take a moment to think about why that might be, and then run the code to see what the compiler has to say.

#![allow(unused_imports)]
extern crate regex; // this is needed for the playground
use interval::{Interval, IntervalError};
use itertools::Itertools;
use regex::Regex;
use std::fs::File;
use std::io::Read;
use std::io::{BufRead, BufReader};
use std::process::exit;

fn find_matching_lines(lines: &[String], regex: Regex) -> Vec<usize> {
    lines
        .iter()
        .enumerate()
        .filter_map(|(i, line)| match regex.is_match(line) {
            true => Some(i),
            false => None,
        })
        .collect() // turns anything iterable into a collection
}

fn create_intervals(
    lines: Vec<usize>,
    before_context: usize,
    after_context: usize,
) -> Result<Vec<Interval>, IntervalError> {
    lines
        .iter()
        .map(|line| {
            let start = line.saturating_sub(before_context);
            let end = line.saturating_add(after_context);
            Interval::new(start, end)
        })
        .collect()
}

fn merge_intervals(intervals: Vec<Interval>) -> Vec<Interval> {
    // merge overlapping intervals
    intervals
        .into_iter()
        .coalesce(|p, c| p.merge(&c).map_err(|_| (p, c)))
        .collect()
}

fn print_results(intervals: Vec<Interval>, lines: Vec<String>) {
    for interval in intervals {
        for (line_no, line) in lines
            .iter()
            .enumerate()
            .take(interval.end + 1)
            .skip(interval.start)
        {
            println!("{}: {}", line_no + 1, line)
        }
    }
}

fn read_file(file: impl Read) -> Vec<String> {
    BufReader::new(file).lines().map_while(Result::ok).collect()
}

fn main() {
    let poem = "I have a little shadow that goes in and out with me,
                And what can be the use of him is more than I can see.
                He is very, very like me from the heels up to the head;
                And I see him jump before me, when I jump into my bed.

                The funniest thing about him is the way he likes to grow -
                Not at all like proper children, which is always very slow;
                For he sometimes shoots up taller like an india-rubber ball,
                And he sometimes gets so little that there’s none of him at all.";

    let mock_file = std::io::Cursor::new(poem);

    // command line arguments
    let pattern = "(all)|(little)";
    let before_context = 1;
    let after_context = 1;

    // attempt to open the file
    let lines = read_file(mock_file);
    //let lines = match File::open(filename) {
    //    // convert the poem into lines
    //    Ok(file) => read_file(file),
    //    Err(e) => {
    //        eprintln!("Error opening {filename}: {e}");
    //        exit(1);
    //    }
    //};

    // compile the regular expression
    let regex = match Regex::new(pattern) {
        Ok(re) => re, // bind re to regex
        Err(e) => {
            eprintln!("{e}"); // write to standard error
            exit(1);
        }
    };

    // store the 0-based line number for any matched line
    let match_lines = find_matching_lines(&lines, regex);

    // create intervals of the form [a,b] with the before/after context
    let intervals =
        match create_intervals(match_lines, before_context, after_context) {
            Ok(intervals) => intervals,
            Err(_) => {
                eprintln!("An error occurred while creating intervals");
                exit(1);
            }
        };

    // merge overlapping intervals
    let intervals = merge_intervals(intervals);

    // print the lines
    print_results(intervals, lines);
}

pub mod interval {
    /// A list specifying general categories of Interval errors.
    pub enum IntervalError {
        /// Start is not less than or equal to end
        StartEndRangeInvalid,
        /// Two intervals to be merged do not overlap
        NonOverlappingInterval,
    }

    /// A closed-interval [`start`, `end`] type used for representing a range of
    /// values between `start` and `end` inclusively.
    ///
    /// # Examples
    ///
    /// You can create an `Interval` using `new`.
    ///
    /// ```rust
    /// let interval = Interval::new(1, 10).unwrap();
    /// assert_eq!(interval.start, 1);
    /// assert_eq!(interval.end, 10);
    /// ```
    pub struct Interval<T> {
        pub start: T,
        pub end: T,
    }

    impl Interval {
        /// Creates a new `Interval` set to `start` and `end`.
        ///
        /// # Examples
        ///
        /// ```rust
        /// let interval = Interval::new(1, 10).unwrap();
        /// assert_eq!(interval.start, 1);
        /// assert_eq!(interval.end, 10);
        /// ```
        pub fn new(start: usize, end: usize) -> Result<Self, IntervalError> {
            if start <= end {
                Ok(Self { start, end })
            } else {
                Err(IntervalError::StartEndRangeInvalid)
            }
        }

        /// Checks if two intervals overlap. Overlapping intervals have at least
        /// one point in common.
        ///
        /// # Examples
        ///
        /// ```rust
        /// let a = Interval::new(1, 3).unwrap();
        /// let b = Interval::new(3, 5).unwrap();
        /// assert_eq!(a.overlaps(&b), true);
        /// assert_eq!(b.overlaps(&a), true);
        /// ```
        ///
        /// ```rust
        /// let a = Interval::new(1, 5).unwrap();
        /// let b = Interval::new(2, 4).unwrap();
        /// assert_eq!(a.overlaps(&b), true);
        /// assert_eq!(b.overlaps(&a), true);
        /// ```
        ///
        /// ```rust
        /// let a = Interval::new(1, 3).unwrap();
        /// let b = Interval::new(4, 6).unwrap();
        /// assert_eq!(a.overlaps(&b), false);
        /// assert_eq!(b.overlaps(&a), true);
        /// ```
        pub fn overlaps(&self, other: &Interval) -> bool {
            self.end >= other.start
        }

        /// Merges two intervals returning a new `Interval`.
        ///
        /// The merged `Interval` range includes the union of ranges from each
        /// `Interval`.
        ///
        /// # Examples
        ///
        /// ```rust
        /// let a = Interval::new(1, 3).unwrap();
        /// let b = Interval::new(3, 5).unwrap();
        /// let c = a.merge(&b).unwrap();
        /// assert_eq!(c.start, 1);
        /// assert_eq!(c.end, 5);
        /// ```
        pub fn merge(&self, other: &Self) -> Result<Self, IntervalError> {
            if self.overlaps(other) {
                Ok(Self {
                    start: self.start,
                    end: other.end,
                })
            } else {
                Err(IntervalError::NonOverlappingInterval)
            }
        }
    }
}

As you guessed it, there's a lot of compiler errors and they're very helpful!

  • [E0107]: missing generics for struct Interval

Let's walk through the errors one by one. The first error informs us that the return value on line 26 references a generic type but fails to specify one. It shows the definition of the type on line 141 and provides a helpful tip for how to correct it. The compiler suggests appending <T> to Interval and specifying the type for T, which in our case will be usize.

error[E0107]: missing generics for struct `Interval`
   --> src/main.rs:26:17
    |
26  | ) -> Result<Vec<Interval>, IntervalError> {
    |                 ^^^^^^^^ expected 1 generic argument
    |
note: struct defined here, with 1 generic parameter: `T`
   --> src/main.rs:141:16
    |
141 |     pub struct Interval<T> {
    |                ^^^^^^^^ -
help: add missing generic argument
    |
26  | ) -> Result<Vec<Interval<T>>, IntervalError> {
    |                         +++

The fix should be relatively straightforward to understand, as we've used the same approach for other generic types we've worked with. Let's apply the fix to all the functions and try again!

#![allow(unused_imports)]
extern crate regex; // this is needed for the playground
use interval::{Interval, IntervalError};
use itertools::Itertools;
use regex::Regex;
use std::fs::File;
use std::io::Read;
use std::io::{BufRead, BufReader};
use std::process::exit;

fn find_matching_lines(lines: &[String], regex: Regex) -> Vec<usize> {
    lines
        .iter()
        .enumerate()
        .filter_map(|(i, line)| match regex.is_match(line) {
            true => Some(i),
            false => None,
        })
        .collect() // turns anything iterable into a collection
}

fn create_intervals(
    lines: Vec<usize>,
    before_context: usize,
    after_context: usize,
) -> Result<Vec<Interval<usize>>, IntervalError> {
    lines
        .iter()
        .map(|line| {
            let start = line.saturating_sub(before_context);
            let end = line.saturating_add(after_context);
            Interval::new(start, end)
        })
        .collect()
}

fn merge_intervals(intervals: Vec<Interval<usize>>) -> Vec<Interval<usize>> {
    // merge overlapping intervals
    intervals
        .into_iter()
        .coalesce(|p, c| p.merge(&c).map_err(|_| (p, c)))
        .collect()
}

fn print_results(intervals: Vec<Interval<usize>>, lines: Vec<String>) {
    for interval in intervals {
        for (line_no, line) in lines
            .iter()
            .enumerate()
            .take(interval.end + 1)
            .skip(interval.start)
        {
            println!("{}: {}", line_no + 1, line)
        }
    }
}

fn read_file(file: impl Read) -> Vec<String> {
    BufReader::new(file).lines().map_while(Result::ok).collect()
}

fn main() {
    let poem = "I have a little shadow that goes in and out with me,
                And what can be the use of him is more than I can see.
                He is very, very like me from the heels up to the head;
                And I see him jump before me, when I jump into my bed.

                The funniest thing about him is the way he likes to grow -
                Not at all like proper children, which is always very slow;
                For he sometimes shoots up taller like an india-rubber ball,
                And he sometimes gets so little that there’s none of him at all.";

    let mock_file = std::io::Cursor::new(poem);

    // command line arguments
    let pattern = "(all)|(little)";
    let before_context = 1;
    let after_context = 1;

    // attempt to open the file
    let lines = read_file(mock_file);
    //let lines = match File::open(filename) {
    //    // convert the poem into lines
    //    Ok(file) => read_file(file),
    //    Err(e) => {
    //        eprintln!("Error opening {filename}: {e}");
    //        exit(1);
    //    }
    //};

    // compile the regular expression
    let regex = match Regex::new(pattern) {
        Ok(re) => re, // bind re to regex
        Err(e) => {
            eprintln!("{e}"); // write to standard error
            exit(1);
        }
    };

    // store the 0-based line number for any matched line
    let match_lines = find_matching_lines(&lines, regex);

    // create intervals of the form [a,b] with the before/after context
    let intervals =
        match create_intervals(match_lines, before_context, after_context) {
            Ok(intervals) => intervals,
            Err(_) => {
                eprintln!("An error occurred while creating intervals");
                exit(1);
            }
        };

    // merge overlapping intervals
    let intervals = merge_intervals(intervals);

    // print the lines
    print_results(intervals, lines);
}

pub mod interval {
    /// A list specifying general categories of Interval errors.
    pub enum IntervalError {
        /// Start is not less than or equal to end
        StartEndRangeInvalid,
        /// Two intervals to be merged do not overlap
        NonOverlappingInterval,
    }

    /// A closed-interval [`start`, `end`] type used for representing a range of
    /// values between `start` and `end` inclusively.
    ///
    /// # Examples
    ///
    /// You can create an `Interval` using `new`.
    ///
    /// ```rust
    /// let interval = Interval::new(1, 10).unwrap();
    /// assert_eq!(interval.start, 1);
    /// assert_eq!(interval.end, 10);
    /// ```
    pub struct Interval<T> {
        pub start: T,
        pub end: T,
    }

    impl Interval {
        /// Creates a new `Interval` set to `start` and `end`.
        ///
        /// # Examples
        ///
        /// ```rust
        /// let interval = Interval::new(1, 10).unwrap();
        /// assert_eq!(interval.start, 1);
        /// assert_eq!(interval.end, 10);
        /// ```
        pub fn new(start: usize, end: usize) -> Result<Self, IntervalError> {
            if start <= end {
                Ok(Self { start, end })
            } else {
                Err(IntervalError::StartEndRangeInvalid)
            }
        }

        /// Checks if two intervals overlap. Overlapping intervals have at least
        /// one point in common.
        ///
        /// # Examples
        ///
        /// ```rust
        /// let a = Interval::new(1, 3).unwrap();
        /// let b = Interval::new(3, 5).unwrap();
        /// assert_eq!(a.overlaps(&b), true);
        /// assert_eq!(b.overlaps(&a), true);
        /// ```
        ///
        /// ```rust
        /// let a = Interval::new(1, 5).unwrap();
        /// let b = Interval::new(2, 4).unwrap();
        /// assert_eq!(a.overlaps(&b), true);
        /// assert_eq!(b.overlaps(&a), true);
        /// ```
        ///
        /// ```rust
        /// let a = Interval::new(1, 3).unwrap();
        /// let b = Interval::new(4, 6).unwrap();
        /// assert_eq!(a.overlaps(&b), false);
        /// assert_eq!(b.overlaps(&a), true);
        /// ```
        pub fn overlaps(&self, other: &Interval) -> bool {
            self.end >= other.start
        }

        /// Merges two intervals returning a new `Interval`.
        ///
        /// The merged `Interval` range includes the union of ranges from each
        /// `Interval`.
        ///
        /// # Examples
        ///
        /// ```rust
        /// let a = Interval::new(1, 3).unwrap();
        /// let b = Interval::new(3, 5).unwrap();
        /// let c = a.merge(&b).unwrap();
        /// assert_eq!(c.start, 1);
        /// assert_eq!(c.end, 5);
        /// ```
        pub fn merge(&self, other: &Self) -> Result<Self, IntervalError> {
            if self.overlaps(other) {
                Ok(Self {
                    start: self.start,
                    end: other.end,
                })
            } else {
                Err(IntervalError::NonOverlappingInterval)
            }
        }
    }
}

As expected, we still have some more compiler errors to work through and these lead us into how we implement methods for a generic type!

error[E0107]: missing generics for struct `Interval`
   --> src/main.rs:146:10
    |
146 |     impl Interval {
    |          ^^^^^^^^ expected 1 generic argument
    |
note: struct defined here, with 1 generic parameter: `T`
   --> src/main.rs:141:16
    |
141 |     pub struct Interval<T> {
    |                ^^^^^^^^ -
help: add missing generic argument
    |
146 |     impl Interval<T> {
    |                  +++

error[E0107]: missing generics for struct `Interval`
   --> src/main.rs:189:40
    |
189 |         pub fn overlaps(&self, other: &Interval) -> bool {
    |                                        ^^^^^^^^ expected 1 generic argument
    |
note: struct defined here, with 1 generic parameter: `T`
   --> src/main.rs:141:16
    |
141 |     pub struct Interval<T> {
    |                ^^^^^^^^ -
help: add missing generic argument
    |
189 |         pub fn overlaps(&self, other: &Interval<T>) -> bool {
    |                                                +++

For more information about this error, try `rustc --explain E0107`.

Inherent Implementation

There are two main types of implementation we define: inherent implementations (standalone) and trait implementations.2 We'll focus on inherent implementation first and explore trait implementations towards the end of this section.

We need to update our impl block to support generic type parameters for our generic type, and the syntax is as follows:

impl<T> Type<T> {}  // Type is replaced with Interval for our case

Here are the necessary changes. Review them and run the program again.

#![allow(unused_imports)]
extern crate regex; // this is needed for the playground
use interval::{Interval, IntervalError};
use itertools::Itertools;
use regex::Regex;
use std::fs::File;
use std::io::Read;
use std::io::{BufRead, BufReader};
use std::process::exit;

fn find_matching_lines(lines: &[String], regex: Regex) -> Vec<usize> {
    lines
        .iter()
        .enumerate()
        .filter_map(|(i, line)| match regex.is_match(line) {
            true => Some(i),
            false => None,
        })
        .collect() // turns anything iterable into a collection
}

fn create_intervals(
    lines: Vec<usize>,
    before_context: usize,
    after_context: usize,
) -> Result<Vec<Interval<usize>>, IntervalError> {
    lines
        .iter()
        .map(|line| {
            let start = line.saturating_sub(before_context);
            let end = line.saturating_add(after_context);
            Interval::new(start, end)
        })
        .collect()
}

fn merge_intervals(intervals: Vec<Interval<usize>>) -> Vec<Interval<usize>> {
    // merge overlapping intervals
    intervals
        .into_iter()
        .coalesce(|p, c| p.merge(&c).map_err(|_| (p, c)))
        .collect()
}

fn print_results(intervals: Vec<Interval<usize>>, lines: Vec<String>) {
    for interval in intervals {
        for (line_no, line) in lines
            .iter()
            .enumerate()
            .take(interval.end + 1)
            .skip(interval.start)
        {
            println!("{}: {}", line_no + 1, line)
        }
    }
}

fn read_file(file: impl Read) -> Vec<String> {
    BufReader::new(file).lines().map_while(Result::ok).collect()
}

fn main() {
    let poem = "I have a little shadow that goes in and out with me,
                And what can be the use of him is more than I can see.
                He is very, very like me from the heels up to the head;
                And I see him jump before me, when I jump into my bed.

                The funniest thing about him is the way he likes to grow -
                Not at all like proper children, which is always very slow;
                For he sometimes shoots up taller like an india-rubber ball,
                And he sometimes gets so little that there’s none of him at all.";

    let mock_file = std::io::Cursor::new(poem);

    // command line arguments
    let pattern = "(all)|(little)";
    let before_context = 1;
    let after_context = 1;

    // attempt to open the file
    let lines = read_file(mock_file);
    //let lines = match File::open(filename) {
    //    // convert the poem into lines
    //    Ok(file) => read_file(file),
    //    Err(e) => {
    //        eprintln!("Error opening {filename}: {e}");
    //        exit(1);
    //    }
    //};

    // compile the regular expression
    let regex = match Regex::new(pattern) {
        Ok(re) => re, // bind re to regex
        Err(e) => {
            eprintln!("{e}"); // write to standard error
            exit(1);
        }
    };

    // store the 0-based line number for any matched line
    let match_lines = find_matching_lines(&lines, regex);

    // create intervals of the form [a,b] with the before/after context
    let intervals =
        match create_intervals(match_lines, before_context, after_context) {
            Ok(intervals) => intervals,
            Err(_) => {
                eprintln!("An error occurred while creating intervals");
                exit(1);
            }
        };

    // merge overlapping intervals
    let intervals = merge_intervals(intervals);

    // print the lines
    print_results(intervals, lines);
}

pub mod interval {
    /// A list specifying general categories of Interval errors.
    pub enum IntervalError {
        /// Start is not less than or equal to end
        StartEndRangeInvalid,
        /// Two intervals to be merged do not overlap
        NonOverlappingInterval,
    }

    /// A closed-interval [`start`, `end`] type used for representing a range of
    /// values between `start` and `end` inclusively.
    ///
    /// # Examples
    ///
    /// You can create an `Interval` using `new`.
    ///
    /// ```rust
    /// let interval = Interval::new(1, 10).unwrap();
    /// assert_eq!(interval.start, 1);
    /// assert_eq!(interval.end, 10);
    /// ```
    pub struct Interval<T> {
        pub start: T,
        pub end: T,
    }

    impl<T> Interval<T> {
        /// Creates a new `Interval` set to `start` and `end`.
        ///
        /// # Examples
        ///
        /// ```rust
        /// let interval = Interval::new(1, 10).unwrap();
        /// assert_eq!(interval.start, 1);
        /// assert_eq!(interval.end, 10);
        /// ```
        pub fn new(start: T, end: T) -> Result<Self, IntervalError> {
            if start <= end {
                Ok(Self { start, end })
            } else {
                Err(IntervalError::StartEndRangeInvalid)
            }
        }

        /// Checks if two intervals overlap. Overlapping intervals have at least
        /// one point in common.
        ///
        /// # Examples
        ///
        /// ```rust
        /// let a = Interval::new(1, 3).unwrap();
        /// let b = Interval::new(3, 5).unwrap();
        /// assert_eq!(a.overlaps(&b), true);
        /// assert_eq!(b.overlaps(&a), true);
        /// ```
        ///
        /// ```rust
        /// let a = Interval::new(1, 5).unwrap();
        /// let b = Interval::new(2, 4).unwrap();
        /// assert_eq!(a.overlaps(&b), true);
        /// assert_eq!(b.overlaps(&a), true);
        /// ```
        ///
        /// ```rust
        /// let a = Interval::new(1, 3).unwrap();
        /// let b = Interval::new(4, 6).unwrap();
        /// assert_eq!(a.overlaps(&b), false);
        /// assert_eq!(b.overlaps(&a), true);
        /// ```
        pub fn overlaps(&self, other: &Interval<T>) -> bool {
            self.end >= other.start
        }

        /// Merges two intervals returning a new `Interval`.
        ///
        /// The merged `Interval` range includes the union of ranges from each
        /// `Interval`.
        ///
        /// # Examples
        ///
        /// ```rust
        /// let a = Interval::new(1, 3).unwrap();
        /// let b = Interval::new(3, 5).unwrap();
        /// let c = a.merge(&b).unwrap();
        /// assert_eq!(c.start, 1);
        /// assert_eq!(c.end, 5);
        /// ```
        pub fn merge(&self, other: &Self) -> Result<Self, IntervalError> {
            if self.overlaps(other) {
                Ok(Self {
                    start: self.start,
                    end: other.end,
                })
            } else {
                Err(IntervalError::NonOverlappingInterval)
            }
        }
    }
}

Well, that didn't work! Now we have two new compiler errors:

  • [E0369]: binary operation <= cannot be applied to type T
  • [E0507]: cannot move out of self.start which is behind a shared reference

That's okay, because these errors are a perfect segue into trait bounds!

Trait Bounds

The compiler generated a few very helpful error messages, which we can categorize into two types, both of which suggest the use of trait bounds:

error[E0369]: binary operation `<=` cannot be applied to type `T`
   --> src/main.rs:157:22
    |
157 |             if start <= end {
    |                ----- ^^ --- T
    |                |
    |                T
    |
help: consider restricting type parameter `T`
    |
146 |     impl<T: std::cmp::PartialOrd> Interval<T> {
    |           ++++++++++++++++++++++

error[E0507]: cannot move out of `self.start` which is behind a shared reference
   --> src/main.rs:210:28
    |
210 |                     start: self.start,
    |                            ^^^^^^^^^^ move occurs because `self.start` has
                                            type `T`, which does not implement the
                                            `Copy` trait
    |
help: if `T` implemented `Clone`, you could clone the value
   --> src/main.rs:146:10
    |
146 |     impl<T> Interval<T> {
    |          ^ consider constraining this type parameter with `Clone`
...
210 |                     start: self.start,
    |                            ---------- you could clone this value

With an introduction to trait bounds, everything will become clear. Let's begin with the first error. The compiler indicates that a comparison between two generic types is being attempted in the new function, and not all types support this capability. In other words, not all types implement the PartialOrd trait, which is necessary for using comparison operators like <= and >=.

For those with an object-oriented programming background, a trait can be thought of as similar to an interface.

pub fn new(start: T, end: T) -> Result<Self, IntervalError> {
    if start <= end {
        Ok(Self { start, end })
    } else {
        Err(IntervalError::StartEndRangeInvalid)
    }
}

Since our Interval struct is generic over T, the types for start and end can be anything, including types that aren't comparable. Therefore, the compiler is instructing us to restrict T to any type that implements the PartialOrd trait. In other words, we need to place a bound on T.

The next error requires us to revisit the topic of move semantics in Rust. Except for primitive types, most standard library types and those found in third-party crates do not implement the Copy trait. In other words, they do not have copy semantics. Let's examine the function where the error occurs:

pub fn merge(&self, other: &Self) -> Result<Self, IntervalError> {
    if self.overlaps(other) {
        Ok(Self {
            start: self.start,
            end: other.end,
        })
    } else {
        Err(IntervalError::NonOverlappingInterval)
    }
}

The merge function borrows the values for self and other. If the two intervals overlap, a new interval is returned with the values copied into it. Since we have been using usize, a type that implements the Copy trait, this worked. To ensure it continues to work, we need to place another bound on T, limiting it to types that implement the Copy trait.

The compiler also mentions the Clone trait, which allows for explicit duplication of an object via the clone method. This means we could specify the trait bound as Clone instead of Copy and update the method to explicitly call clone.

With an understanding of trait bounds, let's examine how to impose these restrictions on our Interval. Does the program work now?

#![allow(unused_imports)]
extern crate regex; // this is needed for the playground
use interval::{Interval, IntervalError};
use itertools::Itertools;
use regex::Regex;
use std::fs::File;
use std::io::Read;
use std::io::{BufRead, BufReader};
use std::process::exit;

fn find_matching_lines(lines: &[String], regex: Regex) -> Vec<usize> {
    lines
        .iter()
        .enumerate()
        .filter_map(|(i, line)| match regex.is_match(line) {
            true => Some(i),
            false => None,
        })
        .collect() // turns anything iterable into a collection
}

fn create_intervals(
    lines: Vec<usize>,
    before_context: usize,
    after_context: usize,
) -> Result<Vec<Interval<usize>>, IntervalError> {
    lines
        .iter()
        .map(|line| {
            let start = line.saturating_sub(before_context);
            let end = line.saturating_add(after_context);
            Interval::new(start, end)
        })
        .collect()
}

fn merge_intervals(intervals: Vec<Interval<usize>>) -> Vec<Interval<usize>> {
    // merge overlapping intervals
    intervals
        .into_iter()
        .coalesce(|p, c| p.merge(&c).map_err(|_| (p, c)))
        .collect()
}

fn print_results(intervals: Vec<Interval<usize>>, lines: Vec<String>) {
    for interval in intervals {
        for (line_no, line) in lines
            .iter()
            .enumerate()
            .take(interval.end + 1)
            .skip(interval.start)
        {
            println!("{}: {}", line_no + 1, line)
        }
    }
}

fn read_file(file: impl Read) -> Vec<String> {
    BufReader::new(file).lines().map_while(Result::ok).collect()
}

fn main() {
    let poem = "I have a little shadow that goes in and out with me,
                And what can be the use of him is more than I can see.
                He is very, very like me from the heels up to the head;
                And I see him jump before me, when I jump into my bed.

                The funniest thing about him is the way he likes to grow -
                Not at all like proper children, which is always very slow;
                For he sometimes shoots up taller like an india-rubber ball,
                And he sometimes gets so little that there’s none of him at all.";

    let mock_file = std::io::Cursor::new(poem);

    // command line arguments
    let pattern = "(all)|(little)";
    let before_context = 1;
    let after_context = 1;

    // attempt to open the file
    let lines = read_file(mock_file);
    //let lines = match File::open(filename) {
    //    // convert the poem into lines
    //    Ok(file) => read_file(file),
    //    Err(e) => {
    //        eprintln!("Error opening {filename}: {e}");
    //        exit(1);
    //    }
    //};

    // compile the regular expression
    let regex = match Regex::new(pattern) {
        Ok(re) => re, // bind re to regex
        Err(e) => {
            eprintln!("{e}"); // write to standard error
            exit(1);
        }
    };

    // store the 0-based line number for any matched line
    let match_lines = find_matching_lines(&lines, regex);

    // create intervals of the form [a,b] with the before/after context
    let intervals =
        match create_intervals(match_lines, before_context, after_context) {
            Ok(intervals) => intervals,
            Err(_) => {
                eprintln!("An error occurred while creating intervals");
                exit(1);
            }
        };

    // merge overlapping intervals
    let intervals = merge_intervals(intervals);

    // print the lines
    print_results(intervals, lines);
}

pub mod interval {
    /// A list specifying general categories of Interval errors.
    pub enum IntervalError {
        /// Start is not less than or equal to end
        StartEndRangeInvalid,
        /// Two intervals to be merged do not overlap
        NonOverlappingInterval,
    }

    /// A closed-interval [`start`, `end`] type used for representing a range of
    /// values between `start` and `end` inclusively.
    ///
    /// # Examples
    ///
    /// You can create an `Interval` using `new`.
    ///
    /// ```rust
    /// let interval = Interval::new(1, 10).unwrap();
    /// assert_eq!(interval.start, 1);
    /// assert_eq!(interval.end, 10);
    /// ```
    pub struct Interval<T> {
        pub start: T,
        pub end: T,
    }

    impl<T: Copy + PartialOrd> Interval<T> {
        /// Creates a new `Interval` set to `start` and `end`.
        ///
        /// # Examples
        ///
        /// ```rust
        /// let interval = Interval::new(1, 10).unwrap();
        /// assert_eq!(interval.start, 1);
        /// assert_eq!(interval.end, 10);
        /// ```
        pub fn new(start: T, end: T) -> Result<Self, IntervalError> {
            if start <= end {
                Ok(Self { start, end })
            } else {
                Err(IntervalError::StartEndRangeInvalid)
            }
        }

        /// Checks if two intervals overlap. Overlapping intervals have at least
        /// one point in common.
        ///
        /// # Examples
        ///
        /// ```rust
        /// let a = Interval::new(1, 3).unwrap();
        /// let b = Interval::new(3, 5).unwrap();
        /// assert_eq!(a.overlaps(&b), true);
        /// assert_eq!(b.overlaps(&a), true);
        /// ```
        ///
        /// ```rust
        /// let a = Interval::new(1, 5).unwrap();
        /// let b = Interval::new(2, 4).unwrap();
        /// assert_eq!(a.overlaps(&b), true);
        /// assert_eq!(b.overlaps(&a), true);
        /// ```
        ///
        /// ```rust
        /// let a = Interval::new(1, 3).unwrap();
        /// let b = Interval::new(4, 6).unwrap();
        /// assert_eq!(a.overlaps(&b), false);
        /// assert_eq!(b.overlaps(&a), true);
        /// ```
        pub fn overlaps(&self, other: &Interval<T>) -> bool {
            self.end >= other.start
        }

        /// Merges two intervals returning a new `Interval`.
        ///
        /// The merged `Interval` range includes the union of ranges from each
        /// `Interval`.
        ///
        /// # Examples
        ///
        /// ```rust
        /// let a = Interval::new(1, 3).unwrap();
        /// let b = Interval::new(3, 5).unwrap();
        /// let c = a.merge(&b).unwrap();
        /// assert_eq!(c.start, 1);
        /// assert_eq!(c.end, 5);
        /// ```
        pub fn merge(&self, other: &Self) -> Result<Self, IntervalError> {
            if self.overlaps(other) {
                Ok(Self {
                    start: self.start,
                    end: other.end,
                })
            } else {
                Err(IntervalError::NonOverlappingInterval)
            }
        }
    }
}

Voila! And we now have a generic Interval module that can support any type implementing the Copy and PartialOrd traits!

Exercise

  • Modify the program to use the Clone trait instead of Copy.
Solution
impl<T: Clone + PartialOrd> Interval<T> {
    pub fn merge(&self, other: &Self) -> Result<Self, IntervalError> {
        if self.overlaps(other) {
            Ok(Self {
                start: self.start.clone(),
                end: other.end.clone(),
            })
        } else {
            Err(IntervalError::NonOverlappingInterval)
        }
    }
}
#![allow(unused_imports)]
extern crate regex; // this is needed for the playground
use interval::{Interval, IntervalError};
use itertools::Itertools;
use regex::Regex;
use std::fs::File;
use std::io::Read;
use std::io::{BufRead, BufReader};
use std::process::exit;

fn find_matching_lines(lines: &[String], regex: Regex) -> Vec<usize> {
    lines
        .iter()
        .enumerate()
        .filter_map(|(i, line)| match regex.is_match(line) {
            true => Some(i),
            false => None,
        })
        .collect() // turns anything iterable into a collection
}

fn create_intervals(
    lines: Vec<usize>,
    before_context: usize,
    after_context: usize,
) -> Result<Vec<Interval<usize>>, IntervalError> {
    lines
        .iter()
        .map(|line| {
            let start = line.saturating_sub(before_context);
            let end = line.saturating_add(after_context);
            Interval::new(start, end)
        })
        .collect()
}

fn merge_intervals(intervals: Vec<Interval<usize>>) -> Vec<Interval<usize>> {
    // merge overlapping intervals
    intervals
        .into_iter()
        .coalesce(|p, c| p.merge(&c).map_err(|_| (p, c)))
        .collect()
}

fn print_results(intervals: Vec<Interval<usize>>, lines: Vec<String>) {
    for interval in intervals {
        for (line_no, line) in lines
            .iter()
            .enumerate()
            .take(interval.end + 1)
            .skip(interval.start)
        {
            println!("{}: {}", line_no + 1, line)
        }
    }
}

fn read_file(file: impl Read) -> Vec<String> {
    BufReader::new(file).lines().map_while(Result::ok).collect()
}

fn main() {
    let poem = "I have a little shadow that goes in and out with me,
                And what can be the use of him is more than I can see.
                He is very, very like me from the heels up to the head;
                And I see him jump before me, when I jump into my bed.

                The funniest thing about him is the way he likes to grow -
                Not at all like proper children, which is always very slow;
                For he sometimes shoots up taller like an india-rubber ball,
                And he sometimes gets so little that there’s none of him at all.";

    let mock_file = std::io::Cursor::new(poem);

    // command line arguments
    let pattern = "(all)|(little)";
    let before_context = 1;
    let after_context = 1;

    // attempt to open the file
    let lines = read_file(mock_file);
    //let lines = match File::open(filename) {
    //    // convert the poem into lines
    //    Ok(file) => read_file(file),
    //    Err(e) => {
    //        eprintln!("Error opening {filename}: {e}");
    //        exit(1);
    //    }
    //};

    // compile the regular expression
    let regex = match Regex::new(pattern) {
        Ok(re) => re, // bind re to regex
        Err(e) => {
            eprintln!("{e}"); // write to standard error
            exit(1);
        }
    };

    // store the 0-based line number for any matched line
    let match_lines = find_matching_lines(&lines, regex);

    // create intervals of the form [a,b] with the before/after context
    let intervals =
        match create_intervals(match_lines, before_context, after_context) {
            Ok(intervals) => intervals,
            Err(_) => {
                eprintln!("An error occurred while creating intervals");
                exit(1);
            }
        };

    // merge overlapping intervals
    let intervals = merge_intervals(intervals);

    // print the lines
    print_results(intervals, lines);
}

pub mod interval {
    /// A list specifying general categories of Interval errors.
    pub enum IntervalError {
        /// Start is not less than or equal to end
        StartEndRangeInvalid,
        /// Two intervals to be merged do not overlap
        NonOverlappingInterval,
    }

    /// A closed-interval [`start`, `end`] type used for representing a range of
    /// values between `start` and `end` inclusively.
    ///
    /// # Examples
    ///
    /// You can create an `Interval` using `new`.
    ///
    /// ```rust
    /// let interval = Interval::new(1, 10).unwrap();
    /// assert_eq!(interval.start, 1);
    /// assert_eq!(interval.end, 10);
    /// ```
    pub struct Interval<T> {
        pub start: T,
        pub end: T,
    }

    impl<T: Copy + PartialOrd> Interval<T> {
        /// Creates a new `Interval` set to `start` and `end`.
        ///
        /// # Examples
        ///
        /// ```rust
        /// let interval = Interval::new(1, 10).unwrap();
        /// assert_eq!(interval.start, 1);
        /// assert_eq!(interval.end, 10);
        /// ```
        pub fn new(start: T, end: T) -> Result<Self, IntervalError> {
            if start <= end {
                Ok(Self { start, end })
            } else {
                Err(IntervalError::StartEndRangeInvalid)
            }
        }

        /// Checks if two intervals overlap. Overlapping intervals have at least
        /// one point in common.
        ///
        /// # Examples
        ///
        /// ```rust
        /// let a = Interval::new(1, 3).unwrap();
        /// let b = Interval::new(3, 5).unwrap();
        /// assert_eq!(a.overlaps(&b), true);
        /// assert_eq!(b.overlaps(&a), true);
        /// ```
        ///
        /// ```rust
        /// let a = Interval::new(1, 5).unwrap();
        /// let b = Interval::new(2, 4).unwrap();
        /// assert_eq!(a.overlaps(&b), true);
        /// assert_eq!(b.overlaps(&a), true);
        /// ```
        ///
        /// ```rust
        /// let a = Interval::new(1, 3).unwrap();
        /// let b = Interval::new(4, 6).unwrap();
        /// assert_eq!(a.overlaps(&b), false);
        /// assert_eq!(b.overlaps(&a), true);
        /// ```
        pub fn overlaps(&self, other: &Interval<T>) -> bool {
            self.end >= other.start
        }

        /// Merges two intervals returning a new `Interval`.
        ///
        /// The merged `Interval` range includes the union of ranges from each
        /// `Interval`.
        ///
        /// # Examples
        ///
        /// ```rust
        /// let a = Interval::new(1, 3).unwrap();
        /// let b = Interval::new(3, 5).unwrap();
        /// let c = a.merge(&b).unwrap();
        /// assert_eq!(c.start, 1);
        /// assert_eq!(c.end, 5);
        /// ```
        pub fn merge(&self, other: &Self) -> Result<Self, IntervalError> {
            if self.overlaps(other) {
                Ok(Self {
                    start: self.start,
                    end: other.end,
                })
            } else {
                Err(IntervalError::NonOverlappingInterval)
            }
        }
    }
}

Next

Let's dive deeper into traits!

1

The restrictions we place on the supported types are known as trait bounds.

2

For more information on impl Trait syntax, see the Rust book

Traits

In the previous section, we specified trait bounds to restrict the types that can be used with our Interval. Let's formalize what a trait is:

A trait is officially defined as a collection of methods for an unknown type: Self.1

In simpler terms, a trait allows you to define shared behavior in an abstract way. It specifies a set of methods that a type must implement, similar to interfaces in other programming languages.

The Rust By Example book includes many examples of traits and explores their usage in depth. Reviewing this material is highly recommended.

The compiler can automatically provide basic implementations for certain traits using the derive (#[derive]) attribute. However, if more complex behavior is needed, these traits can still be manually implemented. Here is a list of derivable traits:

  • Comparison traits: Eq, PartialEq, Ord, PartialOrd.
  • Clone, to create T from &T via a copy.
  • Copy, to give a type copy semantics instead of move semantics.
  • Hash, to compute a hash from &T.
  • Default, to create an empty instance of a data type.
  • Debug, to format a value using the {:?} formatter.

Deriving a Trait

Someone using our Interval might find it helpful to have an easy way to compare them. For instance, a user may want to check if two intervals are equal. The PartialEq trait, part of the cmp module, specifies two methods that any type must define to implement the PartialEq trait.

pub trait PartialEq<Rhs = Self>
where
    Rhs: ?Sized,
{
    // Required method
    fn eq(&self, other: &Rhs) -> bool;

    // Provided method
    fn ne(&self, other: &Rhs) -> bool { ... }
}

Notice the comment above the ne method that says "Provided method." This indicates that if we manually implement this trait, we only need to provide an implementation for eq.

Here's our revised Interval with support for PartialEq, thanks to the derive attribute and the Rust compiler. Give it a try!

The Debug trait has also been derived, enabling intervals to be printed in a programmer-facing context using the :? operator. This common pattern in Rust provides more helpful output in the program below.

use interval::Interval;

pub mod interval {
    /// A list specifying general categories of Interval errors.
    #[derive(Debug)]
    pub enum IntervalError {
        /// Start is not less than or equal to end
        StartEndRangeInvalid,
        /// Two intervals to be merged do not overlap
        NonOverlappingInterval,
    }

    /// A closed-interval [`start`, `end`] type used for representing a range of
    /// values between `start` and `end` inclusively.
    ///
    /// # Examples
    ///
    /// You can create an `Interval` using `new`.
    ///
    /// ```rust
    /// let interval = Interval::new(1, 10).unwrap();
    /// assert_eq!(interval.start, 1);
    /// assert_eq!(interval.end, 10);
    /// ```
    #[derive(Debug, PartialEq)]
    pub struct Interval<T> {
        pub start: T,
        pub end: T,
    }

    impl<T: Copy + PartialOrd> Interval<T> {
        /// Creates a new `Interval` set to `start` and `end`.
        ///
        /// # Examples
        ///
        /// ```rust
        /// let interval = Interval::new(1, 10).unwrap();
        /// assert_eq!(interval.start, 1);
        /// assert_eq!(interval.end, 10);
        /// ```
        pub fn new(start: T, end: T) -> Result<Self, IntervalError> {
            if start <= end {
                Ok(Self { start, end })
            } else {
                Err(IntervalError::StartEndRangeInvalid)
            }
        }

        /// Checks if two intervals overlap. Overlapping intervals have at least
        /// one point in common.
        ///
        /// # Examples
        ///
        /// ```rust
        /// let a = Interval::new(1, 3).unwrap();
        /// let b = Interval::new(3, 5).unwrap();
        /// assert_eq!(a.overlaps(&b), true);
        /// assert_eq!(b.overlaps(&a), true);
        /// ```
        ///
        /// ```rust
        /// let a = Interval::new(1, 5).unwrap();
        /// let b = Interval::new(2, 4).unwrap();
        /// assert_eq!(a.overlaps(&b), true);
        /// assert_eq!(b.overlaps(&a), true);
        /// ```
        ///
        /// ```rust
        /// let a = Interval::new(1, 3).unwrap();
        /// let b = Interval::new(4, 6).unwrap();
        /// assert_eq!(a.overlaps(&b), false);
        /// assert_eq!(b.overlaps(&a), true);
        /// ```
        pub fn overlaps(&self, other: &Interval<T>) -> bool {
            self.end >= other.start
        }

        /// Merges two intervals returning a new `Interval`.
        ///
        /// The merged `Interval` range includes the union of ranges from each
        /// `Interval`.
        ///
        /// # Examples
        ///
        /// ```rust
        /// let a = Interval::new(1, 3).unwrap();
        /// let b = Interval::new(3, 5).unwrap();
        /// let c = a.merge(&b).unwrap();
        /// assert_eq!(c.start, 1);
        /// assert_eq!(c.end, 5);
        /// ```
        pub fn merge(&self, other: &Self) -> Result<Self, IntervalError> {
            if self.overlaps(other) {
                Ok(Self {
                    start: self.start,
                    end: other.end,
                })
            } else {
                Err(IntervalError::NonOverlappingInterval)
            }
        }
    }
}

fn main() {
    let a = Interval::new(1, 5).unwrap(); // unwrap returns the Ok value or panics
    let b = Interval::new(1, 5).unwrap();
    let c = Interval::new(2, 4).unwrap();

    // Comparing intervals with eq and ne.
    println!("{:?} == {:?} => {}", a, b, a.eq(&b));
    println!("{:?} == {:?} => {}", a, c, a.eq(&c));
    println!("{:?} != {:?} => {}", a, b, a.ne(&b));
    println!("{:?} != {:?} => {}", a, c, a.ne(&c));

    // Rust supports operator overloading too!
    println!("{:?} == {:?} => {}", a, b, a == b);
    println!("{:?} != {:?} => {}", a, c, a != c);
}

The unwrap() method is implemented by Result and other types like Option. For Result, it returns the Ok value or panics if the value is Err. Its usage is generally discouraged and is only used here for brevity.

Operator Overloading

You might have noticed the comparison between intervals using == and !=. This works because Rust includes support for operator overloading! You can find all the detailed information in the Operator Overloading section of the Rust by Example book and in the ops module.

Implementing a Trait

Just as comparing traits for equality can be helpful, so can generating a user-facing representation of an Interval. By implementing the Display trait, we can achieve this quite efficiently. Additionally, implementing the Display trait automatically implements the ToString trait, enabling the use of the to_string() method. Let's enhance our Interval code to print an interval in the form [x, y].

Here's the trait definition for Display:

pub trait Display {
    // Required method
    fn fmt(&self, f: &mut Formatter<'_>) -> Result<(), Error>;
}

The <'_> suffix next to Formatter is known as lifetime annotation syntax.2

Here's our revised Interval with the Display trait implemented. Give it a try!

use interval::Interval;

pub mod interval {
    /// A list specifying general categories of Interval errors.
    #[derive(Debug)]
    pub enum IntervalError {
        /// Start is not less than or equal to end
        StartEndRangeInvalid,
        /// Two intervals to be merged do not overlap
        NonOverlappingInterval,
    }

    /// A closed-interval [`start`, `end`] type used for representing a range of
    /// values between `start` and `end` inclusively.
    ///
    /// # Examples
    ///
    /// You can create an `Interval` using `new`.
    ///
    /// ```rust
    /// let interval = Interval::new(1, 10).unwrap();
    /// assert_eq!(interval.start, 1);
    /// assert_eq!(interval.end, 10);
    /// ```
    #[derive(Debug, PartialEq)]
    pub struct Interval<T> {
        pub start: T,
        pub end: T,
    }

    use std::fmt;
    impl<T: fmt::Display> fmt::Display for Interval<T> {
        fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> fmt::Result {
            write!(f, "[{}, {}]", self.start, self.end)
        }
    }

    impl<T: Copy + PartialOrd> Interval<T> {
        /// Creates a new `Interval` set to `start` and `end`.
        ///
        /// # Examples
        ///
        /// ```rust
        /// let interval = Interval::new(1, 10).unwrap();
        /// assert_eq!(interval.start, 1);
        /// assert_eq!(interval.end, 10);
        /// ```
        pub fn new(start: T, end: T) -> Result<Self, IntervalError> {
            if start <= end {
                Ok(Self { start, end })
            } else {
                Err(IntervalError::StartEndRangeInvalid)
            }
        }

        /// Checks if two intervals overlap. Overlapping intervals have at least
        /// one point in common.
        ///
        /// # Examples
        ///
        /// ```rust
        /// let a = Interval::new(1, 3).unwrap();
        /// let b = Interval::new(3, 5).unwrap();
        /// assert_eq!(a.overlaps(&b), true);
        /// assert_eq!(b.overlaps(&a), true);
        /// ```
        ///
        /// ```rust
        /// let a = Interval::new(1, 5).unwrap();
        /// let b = Interval::new(2, 4).unwrap();
        /// assert_eq!(a.overlaps(&b), true);
        /// assert_eq!(b.overlaps(&a), true);
        /// ```
        ///
        /// ```rust
        /// let a = Interval::new(1, 3).unwrap();
        /// let b = Interval::new(4, 6).unwrap();
        /// assert_eq!(a.overlaps(&b), false);
        /// assert_eq!(b.overlaps(&a), true);
        /// ```
        pub fn overlaps(&self, other: &Interval<T>) -> bool {
            self.end >= other.start
        }

        /// Merges two intervals returning a new `Interval`.
        ///
        /// The merged `Interval` range includes the union of ranges from each
        /// `Interval`.
        ///
        /// # Examples
        ///
        /// ```rust
        /// let a = Interval::new(1, 3).unwrap();
        /// let b = Interval::new(3, 5).unwrap();
        /// let c = a.merge(&b).unwrap();
        /// assert_eq!(c.start, 1);
        /// assert_eq!(c.end, 5);
        /// ```
        pub fn merge(&self, other: &Self) -> Result<Self, IntervalError> {
            if self.overlaps(other) {
                Ok(Self {
                    start: self.start,
                    end: other.end,
                })
            } else {
                Err(IntervalError::NonOverlappingInterval)
            }
        }
    }
}

fn main() {
    let a = Interval::new(1, 5).unwrap(); // unwrap returns the Ok value or panics
    let b = Interval::new(1, 5).unwrap();
    let b_str = b.to_string();
    let c = Interval::new(2, 4).unwrap();

    println!(
        "Comparisons between three intervals: {}, {}, {}...",
        a, b_str, c
    );

    // Comparing intervals with eq and ne.
    println!("{} == {} => {}", a, b, a.eq(&b));
    println!("{} == {} => {}", a, c, a.eq(&c));
    println!("{} != {} => {}", a, b, a.ne(&b));
    println!("{} != {} => {}", a, c, a.ne(&c));

    // Rust supports operator overloading too!
    println!("{} == {} => {}", a, b, a == b);
    println!("{} != {} => {}", a, c, a != c);
}

Summary

  • A trait defines shared behavior abstractly, specifying methods a type must implement, similar to interfaces in other languages.
  • The compiler can automatically provide basic implementations for certain traits using the #[derive] attribute.
  • Rust includes support for operator overloading.

Exercise

  • Implement the PartialOrd trait for Interval. To keep the code simple, you can return None for overlapping intervals.
use interval::Interval;

pub mod interval {
    /// A list specifying general categories of Interval errors.
    #[derive(Debug)]
    pub enum IntervalError {
        /// Start is not less than or equal to end
        StartEndRangeInvalid,
        /// Two intervals to be merged do not overlap
        NonOverlappingInterval,
    }

    /// A closed-interval [`start`, `end`] type used for representing a range of
    /// values between `start` and `end` inclusively.
    ///
    /// # Examples
    ///
    /// You can create an `Interval` using `new`.
    ///
    /// ```rust
    /// let interval = Interval::new(1, 10).unwrap();
    /// assert_eq!(interval.start, 1);
    /// assert_eq!(interval.end, 10);
    /// ```
    #[derive(Debug, PartialEq)]
    pub struct Interval<T> {
        pub start: T,
        pub end: T,
    }

    use std::fmt;
    impl<T: fmt::Display> fmt::Display for Interval<T> {
        fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> fmt::Result {
            write!(f, "[{}, {}]", self.start, self.end)
        }
    }

    impl<T: Copy + PartialOrd> Interval<T> {
        /// Creates a new `Interval` set to `start` and `end`.
        ///
        /// # Examples
        ///
        /// ```rust
        /// let interval = Interval::new(1, 10).unwrap();
        /// assert_eq!(interval.start, 1);
        /// assert_eq!(interval.end, 10);
        /// ```
        pub fn new(start: T, end: T) -> Result<Self, IntervalError> {
            if start <= end {
                Ok(Self { start, end })
            } else {
                Err(IntervalError::StartEndRangeInvalid)
            }
        }

        /// Checks if two intervals overlap. Overlapping intervals have at least
        /// one point in common.
        ///
        /// # Examples
        ///
        /// ```rust
        /// let a = Interval::new(1, 3).unwrap();
        /// let b = Interval::new(3, 5).unwrap();
        /// assert_eq!(a.overlaps(&b), true);
        /// assert_eq!(b.overlaps(&a), true);
        /// ```
        ///
        /// ```rust
        /// let a = Interval::new(1, 5).unwrap();
        /// let b = Interval::new(2, 4).unwrap();
        /// assert_eq!(a.overlaps(&b), true);
        /// assert_eq!(b.overlaps(&a), true);
        /// ```
        ///
        /// ```rust
        /// let a = Interval::new(1, 3).unwrap();
        /// let b = Interval::new(4, 6).unwrap();
        /// assert_eq!(a.overlaps(&b), false);
        /// assert_eq!(b.overlaps(&a), true);
        /// ```
        pub fn overlaps(&self, other: &Interval<T>) -> bool {
            self.end >= other.start
        }

        /// Merges two intervals returning a new `Interval`.
        ///
        /// The merged `Interval` range includes the union of ranges from each
        /// `Interval`.
        ///
        /// # Examples
        ///
        /// ```rust
        /// let a = Interval::new(1, 3).unwrap();
        /// let b = Interval::new(3, 5).unwrap();
        /// let c = a.merge(&b).unwrap();
        /// assert_eq!(c.start, 1);
        /// assert_eq!(c.end, 5);
        /// ```
        pub fn merge(&self, other: &Self) -> Result<Self, IntervalError> {
            if self.overlaps(other) {
                Ok(Self {
                    start: self.start,
                    end: other.end,
                })
            } else {
                Err(IntervalError::NonOverlappingInterval)
            }
        }
    }
}

fn main() {
    use std::cmp::Ordering::{Equal, Greater, Less};

    let a = Interval::new(0, 1).unwrap();
    let b = Interval::new(0, 1).unwrap();
    assert_eq!(a.partial_cmp(&b), Some(Equal));
    assert_eq!(b.partial_cmp(&a), Some(Equal));

    let a = Interval::new(0, 1).unwrap();
    let b = Interval::new(2, 3).unwrap();
    assert_eq!(a.partial_cmp(&b), Some(Less));
    assert_eq!(b.partial_cmp(&a), Some(Greater));

    println!("Nice Work!");
}
Solution
use interval::Interval;

pub mod interval {
    /// A list specifying general categories of Interval errors.
    #[derive(Debug)]
    pub enum IntervalError {
        /// Start is not less than or equal to end
        StartEndRangeInvalid,
        /// Two intervals to be merged do not overlap
        NonOverlappingInterval,
    }

    /// A closed-interval [`start`, `end`] type used for representing a range of
    /// values between `start` and `end` inclusively.
    ///
    /// # Examples
    ///
    /// You can create an `Interval` using `new`.
    ///
    /// ```rust
    /// let interval = Interval::new(1, 10).unwrap();
    /// assert_eq!(interval.start, 1);
    /// assert_eq!(interval.end, 10);
    /// ```
    #[derive(Debug, PartialEq)]
    pub struct Interval<T> {
        pub start: T,
        pub end: T,
    }

    use std::fmt;
    impl<T: fmt::Display> fmt::Display for Interval<T> {
        fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> fmt::Result {
            write!(f, "[{}, {}]", self.start, self.end)
        }
    }

    use std::cmp::Ordering;
    impl<T: PartialEq + PartialOrd> PartialOrd for Interval<T> {
        fn partial_cmp(&self, other: &Self) -> Option<Ordering> {
            if self == other {
                Some(Ordering::Equal)
            } else if self.end < other.start {
                Some(Ordering::Less)
            } else if self.start > other.end {
                Some(Ordering::Greater)
            } else {
                None // Intervals overlap
            }
        }
    }

    impl<T: Copy + PartialOrd> Interval<T> {
        /// Creates a new `Interval` set to `start` and `end`.
        ///
        /// # Examples
        ///
        /// ```rust
        /// let interval = Interval::new(1, 10).unwrap();
        /// assert_eq!(interval.start, 1);
        /// assert_eq!(interval.end, 10);
        /// ```
        pub fn new(start: T, end: T) -> Result<Self, IntervalError> {
            if start <= end {
                Ok(Self { start, end })
            } else {
                Err(IntervalError::StartEndRangeInvalid)
            }
        }

        /// Checks if two intervals overlap. Overlapping intervals have at least
        /// one point in common.
        ///
        /// # Examples
        ///
        /// ```rust
        /// let a = Interval::new(1, 3).unwrap();
        /// let b = Interval::new(3, 5).unwrap();
        /// assert_eq!(a.overlaps(&b), true);
        /// assert_eq!(b.overlaps(&a), true);
        /// ```
        ///
        /// ```rust
        /// let a = Interval::new(1, 5).unwrap();
        /// let b = Interval::new(2, 4).unwrap();
        /// assert_eq!(a.overlaps(&b), true);
        /// assert_eq!(b.overlaps(&a), true);
        /// ```
        ///
        /// ```rust
        /// let a = Interval::new(1, 3).unwrap();
        /// let b = Interval::new(4, 6).unwrap();
        /// assert_eq!(a.overlaps(&b), false);
        /// assert_eq!(b.overlaps(&a), true);
        /// ```
        pub fn overlaps(&self, other: &Interval<T>) -> bool {
            self.end >= other.start
        }

        /// Merges two intervals returning a new `Interval`.
        ///
        /// The merged `Interval` range includes the union of ranges from each
        /// `Interval`.
        ///
        /// # Examples
        ///
        /// ```rust
        /// let a = Interval::new(1, 3).unwrap();
        /// let b = Interval::new(3, 5).unwrap();
        /// let c = a.merge(&b).unwrap();
        /// assert_eq!(c.start, 1);
        /// assert_eq!(c.end, 5);
        /// ```
        pub fn merge(&self, other: &Self) -> Result<Self, IntervalError> {
            if self.overlaps(other) {
                Ok(Self {
                    start: self.start,
                    end: other.end,
                })
            } else {
                Err(IntervalError::NonOverlappingInterval)
            }
        }
    }
}

fn main() {
    use std::cmp::Ordering::{Equal, Greater, Less};

    let a = Interval::new(0, 1).unwrap();
    let b = Interval::new(0, 1).unwrap();
    assert_eq!(a.partial_cmp(&b), Some(Equal));
    assert_eq!(b.partial_cmp(&a), Some(Equal));

    let a = Interval::new(0, 1).unwrap();
    let b = Interval::new(2, 3).unwrap();
    assert_eq!(a.partial_cmp(&b), Some(Less));
    assert_eq!(b.partial_cmp(&a), Some(Greater));

    println!("Nice Work!");
}

Next

Let's combine all the code and take a brief moment to explain the attributes.


1

https://doc.rust-lang.org/rust-by-example/trait.html

2

This is a separate topic that you can learn more about in the Validating References with Lifetimes section of the Rust Programming Language and the Lifetimes section of Rust By Example.

Grep

Now that we've covered traits, let's ensure our projects are synchronized. Our interval module has been enhanced to support several new capabilities. We can print programmer-facing intervals using the Debug trait and user-facing intervals using the Display trait. Additionally, we have limited support for comparing intervals through the PartialEq and PartialOrd traits. Here's the completed code:

#![allow(unused_imports)]
extern crate regex; // this is needed for the playground
use interval::{Interval, IntervalError};
use itertools::Itertools;
use regex::Regex;
use std::fs::File;
use std::io::Read;
use std::io::{BufRead, BufReader};
use std::process::exit;

fn find_matching_lines(lines: &[String], regex: Regex) -> Vec<usize> {
    lines
        .iter()
        .enumerate()
        .filter_map(|(i, line)| match regex.is_match(line) {
            true => Some(i),
            false => None,
        })
        .collect() // turns anything iterable into a collection
}

fn create_intervals(
    lines: Vec<usize>,
    before_context: usize,
    after_context: usize,
) -> Result<Vec<Interval<usize>>, IntervalError> {
    lines
        .iter()
        .map(|line| {
            let start = line.saturating_sub(before_context);
            let end = line.saturating_add(after_context);
            Interval::new(start, end)
        })
        .collect()
}

fn merge_intervals(intervals: Vec<Interval<usize>>) -> Vec<Interval<usize>> {
    // merge overlapping intervals
    intervals
        .into_iter()
        .coalesce(|p, c| p.merge(&c).map_err(|_| (p, c)))
        .collect()
}

fn print_results(intervals: Vec<Interval<usize>>, lines: Vec<String>) {
    for interval in intervals {
        for (line_no, line) in lines
            .iter()
            .enumerate()
            .take(interval.end + 1)
            .skip(interval.start)
        {
            println!("{}: {}", line_no + 1, line)
        }
    }
}

fn read_file(file: impl Read) -> Vec<String> {
    BufReader::new(file).lines().map_while(Result::ok).collect()
}

fn main() {
    let poem = "I have a little shadow that goes in and out with me,
                And what can be the use of him is more than I can see.
                He is very, very like me from the heels up to the head;
                And I see him jump before me, when I jump into my bed.

                The funniest thing about him is the way he likes to grow -
                Not at all like proper children, which is always very slow;
                For he sometimes shoots up taller like an india-rubber ball,
                And he sometimes gets so little that there’s none of him at all.";

    let mock_file = std::io::Cursor::new(poem);

    // command line arguments
    let pattern = "(all)|(little)";
    let before_context = 1;
    let after_context = 1;

    // attempt to open the file
    let lines = read_file(mock_file);
    //let lines = match File::open(filename) {
    //    // convert the poem into lines
    //    Ok(file) => read_file(file),
    //    Err(e) => {
    //        eprintln!("Error opening {filename}: {e}");
    //        exit(1);
    //    }
    //};

    // compile the regular expression
    let regex = match Regex::new(pattern) {
        Ok(re) => re, // bind re to regex
        Err(e) => {
            eprintln!("{e}"); // write to standard error
            exit(1);
        }
    };

    // store the 0-based line number for any matched line
    let match_lines = find_matching_lines(&lines, regex);

    // create intervals of the form [a,b] with the before/after context
    let intervals =
        match create_intervals(match_lines, before_context, after_context) {
            Ok(intervals) => intervals,
            Err(_) => {
                eprintln!("An error occurred while creating intervals");
                exit(1);
            }
        };

    // merge overlapping intervals
    let intervals = merge_intervals(intervals);

    // print the lines
    print_results(intervals, lines);
}

pub mod interval {
    /// A list specifying general categories of Interval errors.
    #[derive(Debug)]
    pub enum IntervalError {
        /// Start is not less than or equal to end
        StartEndRangeInvalid,
        /// Two intervals to be merged do not overlap
        NonOverlappingInterval,
    }

    /// A closed-interval [`start`, `end`] type used for representing a range of
    /// values between `start` and `end` inclusively.
    ///
    /// # Examples
    ///
    /// You can create an `Interval` using `new`.
    ///
    /// ```rust
    /// let interval = Interval::new(1, 10).unwrap();
    /// assert_eq!(interval.start, 1);
    /// assert_eq!(interval.end, 10);
    /// ```
    #[derive(Debug, PartialEq)]
    pub struct Interval<T> {
        pub start: T,
        pub end: T,
    }

    impl<T: Copy + PartialOrd> Interval<T> {
        /// Creates a new `Interval` set to `start` and `end`.
        ///
        /// # Examples
        ///
        /// ```rust
        /// let interval = Interval::new(1, 10).unwrap();
        /// assert_eq!(interval.start, 1);
        /// assert_eq!(interval.end, 10);
        /// ```
        pub fn new(start: T, end: T) -> Result<Self, IntervalError> {
            if start <= end {
                Ok(Self { start, end })
            } else {
                Err(IntervalError::StartEndRangeInvalid)
            }
        }

        /// Checks if two intervals overlap. Overlapping intervals have at least
        /// one point in common.
        ///
        /// # Examples
        ///
        /// ```rust
        /// let a = Interval::new(1, 3).unwrap();
        /// let b = Interval::new(3, 5).unwrap();
        /// assert_eq!(a.overlaps(&b), true);
        /// assert_eq!(b.overlaps(&a), true);
        /// ```
        ///
        /// ```rust
        /// let a = Interval::new(1, 5).unwrap();
        /// let b = Interval::new(2, 4).unwrap();
        /// assert_eq!(a.overlaps(&b), true);
        /// assert_eq!(b.overlaps(&a), true);
        /// ```
        ///
        /// ```rust
        /// let a = Interval::new(1, 3).unwrap();
        /// let b = Interval::new(4, 6).unwrap();
        /// assert_eq!(a.overlaps(&b), false);
        /// assert_eq!(b.overlaps(&a), true);
        /// ```
        pub fn overlaps(&self, other: &Interval<T>) -> bool {
            self.end >= other.start
        }

        /// Merges two intervals returning a new `Interval`.
        ///
        /// The merged `Interval` range includes the union of ranges from each
        /// `Interval`.
        ///
        /// # Examples
        ///
        /// ```rust
        /// let a = Interval::new(1, 3).unwrap();
        /// let b = Interval::new(3, 5).unwrap();
        /// let c = a.merge(&b).unwrap();
        /// assert_eq!(c.start, 1);
        /// assert_eq!(c.end, 5);
        /// ```
        pub fn merge(&self, other: &Self) -> Result<Self, IntervalError> {
            if self.overlaps(other) {
                Ok(Self {
                    start: self.start,
                    end: other.end,
                })
            } else {
                Err(IntervalError::NonOverlappingInterval)
            }
        }
    }

    use std::fmt;
    impl<T: fmt::Display> fmt::Display for Interval<T> {
        fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> fmt::Result {
            write!(f, "[{}, {}]", self.start, self.end)
        }
    }

    use std::cmp::Ordering;
    impl<T: PartialEq + PartialOrd> PartialOrd for Interval<T> {
        fn partial_cmp(&self, other: &Self) -> Option<Ordering> {
            if self == other {
                Some(Ordering::Equal)
            } else if self.end < other.start {
                Some(Ordering::Less)
            } else if self.start > other.end {
                Some(Ordering::Greater)
            } else {
                None // Intervals overlap
            }
        }
    }
}

Next

Let's shift our focus to attributes!

Attributes

Throughout this course, we've been using attributes, but we haven't delved into them yet. Let's briefly discuss the attributes you've encountered and provide some useful resources for further exploration.

At a high level, an attribute is metadata applied to a crate, module, or item. This metadata serves various purposes. Attributes follow this syntax:

InnerAttribute :
   # ! [ Attr ]

OuterAttribute :
   # [ Attr ]

Attr :
      SimplePath AttrInput?
   | unsafe ( SimplePath AttrInput? )

AttrInput :
      DelimTokenTree
   | = Expression

Attributes come in two forms: InnerAttribute and OuterAttribute. The difference between them is that an InnerAttribute includes a bang (!) between the # and the [. An InnerAttribute applies to the item it is declared within, while an OuterAttribute applies to whatever follows it.

Inner Attributes

For example, in our grep program, we've used the inner attribute #![allow(unused_imports)]. By default, the Rust compiler issues warnings if your program imports packages via use but doesn't actually utilize them. Our program imports the File module with use std::fs::File;, but since we are using Cursor for development and have commented out the file I/O code, the compiler complains about the unused import. This attribute disables that warning for the crate.

Outer Attributes

Recall in the last section, we derived the Debug and PartialEq traits. We achieved this using the #[derive(...)] attribute. Notice how this attribute is placed directly before the definition of the Interval struct. This is an example of an OuterAttribute.

#[derive(Debug, PartialEq)]
pub struct Interval<T> {
    pub start: T,
    pub end: T,
}

Classification of Attributes

Attributes are commonly grouped into the following categories:

Summary

The Rust Reference provides an in-depth look at attributes, and you are encouraged to explore it for a comprehensive understanding of their uses. Additionally, Rust By Example offers extra material and illustrates various use cases.

Next

Attributes offer even greater utility, which we'll explore through command line argument parsing!

Command Line Arguments

In the Cargo.toml section, you were asked to add the clap crate as a dependency to support command line argument parsing for our grep program. With our current understanding of attributes, it's an ideal time to utilize this crate by extending our grep program to handle command line arguments.

One method for defining command line arguments with clap involves using custom attributes. This is why we specified the derive feature in the dependency.

If you didn't complete the exercise and are working through this course locally, you can add the dependency to the grep crate with:

$ cargo add clap --features derive

Advanced Attributes

The topic of macro attributes and derive macro helper attributes is quite advanced and beyond the scope of this course. Therefore, we won't delve into their inner workings in depth. However, keep this section in mind as you continue your Rust journey, because the time will likely come when you will need to implement similar attributes yourself for code generation.

Extending Grep

We're going to add command line argument support to our program. However, we'll continue to mock the command line arguments to keep developing this course online. After configuring clap, this will be the auto-generated help (i.e., typing grep --help in the terminal).

$ grep.exe --help
Usage: grep.exe [OPTIONS] <PATTERN> [FILES]...

Arguments:
  <PATTERN>   The regular expression to match
  [FILES]...  List of files to search

Options:
  -l, --line-number           Prefix each line of output with the 1-based line
                              number within its input file
  -b, --before-context <num>  Print num lines of trailing context before matching
                              lines [default: 0]
  -a, --after-context <num>   Print num lines of trailing context after matching
                              lines [default: 0]
  -h, --help                  Print help
  -V, --version               Print version

The CLAP documentation is very extensive and includes plenty of examples to learn from. If you find yourself building a CLI application, you will most certainly want to reference the documentation.

Here's the updated version of grep with command line argument support added:

#![allow(unused_imports)]
extern crate regex; // this is needed for the playground
use clap::Parser;
use interval::{Interval, IntervalError};
use itertools::Itertools;
use regex::Regex;
use std::fs::File;
use std::io::Read;
use std::io::{BufRead, BufReader};
use std::path::PathBuf;
use std::process::exit;

fn find_matching_lines(lines: &[String], regex: Regex) -> Vec<usize> {
    lines
        .iter()
        .enumerate()
        .filter_map(|(i, line)| match regex.is_match(line) {
            true => Some(i),
            false => None,
        })
        .collect() // turns anything iterable into a collection
}

fn create_intervals(
    lines: Vec<usize>,
    before_context: usize,
    after_context: usize,
) -> Result<Vec<Interval<usize>>, IntervalError> {
    lines
        .iter()
        .map(|line| {
            let start = line.saturating_sub(before_context);
            let end = line.saturating_add(after_context);
            Interval::new(start, end)
        })
        .collect()
}

fn merge_intervals(intervals: Vec<Interval<usize>>) -> Vec<Interval<usize>> {
    // merge overlapping intervals
    intervals
        .into_iter()
        .coalesce(|p, c| p.merge(&c).map_err(|_| (p, c)))
        .collect()
}

fn print_results(
    intervals: Vec<Interval<usize>>,
    lines: Vec<String>,
    line_number: bool,
) {
    for interval in intervals {
        for (line_no, line) in lines
            .iter()
            .enumerate()
            .take(interval.end + 1)
            .skip(interval.start)
        {
            if line_number {
                print!("{}: ", line_no + 1);
            }
            println!("{}", line);
        }
    }
}

fn read_file(file: impl Read) -> Vec<String> {
    BufReader::new(file).lines().map_while(Result::ok).collect()
}

#[derive(Parser)]
#[command(version, about, long_about = None)]
struct Cli {
    /// Prefix each line of output with the 1-based line number within its
    /// input file.
    #[arg(short, long, default_value_t = false)]
    line_number: bool,

    /// Print num lines of trailing context before matching lines.
    #[arg(short, long, default_value_t = 0, value_name = "num")]
    before_context: u8,

    /// Print num lines of trailing context after matching lines.
    #[arg(short, long, default_value_t = 0, value_name = "num")]
    after_context: u8,

    /// The regular expression to match.
    #[arg(required = true)]
    pattern: String,

    /// List of files to search.
    #[arg(required = true)]
    files: Vec<PathBuf>,
}

fn main() {
    let poem = "I have a little shadow that goes in and out with me,
                And what can be the use of him is more than I can see.
                He is very, very like me from the heels up to the head;
                And I see him jump before me, when I jump into my bed.

                The funniest thing about him is the way he likes to grow -
                Not at all like proper children, which is always very slow;
                For he sometimes shoots up taller like an india-rubber ball,
                And he sometimes gets so little that there’s none of him at all.";

    let mock_file = std::io::Cursor::new(poem);

    // let cli = Cli::parse(); // for production use
    // mock command line arguments
    let cli = match Cli::try_parse_from([
        "grep", // executable name
        "--line-number",
        "--before-context",
        "1",
        "--after-context",
        "1",
        "(all)|(little)", // pattern
        "poem.txt",       // file
    ]) {
        Ok(cli) => cli,
        Err(e) => {
            eprintln!("Error parsing command line arguments: {e:?}");
            exit(1);
        }
    };

    // get values from clap
    let pattern = cli.pattern;
    let line_number = cli.line_number;
    let before_context = cli.before_context as usize;
    let after_context = cli.after_context as usize;

    // attempt to open the file
    let lines = read_file(mock_file);
    //let lines = match File::open(filename) {
    //    // convert the poem into lines
    //    Ok(file) => read_file(file),
    //    Err(e) => {
    //        eprintln!("Error opening {filename}: {e}");
    //        exit(1);
    //    }
    //};

    // compile the regular expression
    let regex = match Regex::new(&pattern) {
        Ok(re) => re, // bind re to regex
        Err(e) => {
            eprintln!("{e}"); // write to standard error
            exit(1);
        }
    };

    // store the 0-based line number for any matched line
    let match_lines = find_matching_lines(&lines, regex);

    // create intervals of the form [a,b] with the before/after context
    let intervals =
        match create_intervals(match_lines, before_context, after_context) {
            Ok(intervals) => intervals,
            Err(_) => {
                eprintln!("An error occurred while creating intervals");
                exit(1);
            }
        };

    // merge overlapping intervals
    let intervals = merge_intervals(intervals);

    // print the lines
    print_results(intervals, lines, line_number);
}

pub mod interval {
    /// A list specifying general categories of Interval errors.
    #[derive(Debug)]
    pub enum IntervalError {
        /// Start is not less than or equal to end
        StartEndRangeInvalid,
        /// Two intervals to be merged do not overlap
        NonOverlappingInterval,
    }

    /// A closed-interval [`start`, `end`] type used for representing a range of
    /// values between `start` and `end` inclusively.
    ///
    /// # Examples
    ///
    /// You can create an `Interval` using `new`.
    ///
    /// ```rust
    /// let interval = Interval::new(1, 10).unwrap();
    /// assert_eq!(interval.start, 1);
    /// assert_eq!(interval.end, 10);
    /// ```
    #[derive(Debug, PartialEq)]
    pub struct Interval<T> {
        pub start: T,
        pub end: T,
    }

    impl<T: Copy + PartialOrd> Interval<T> {
        /// Creates a new `Interval` set to `start` and `end`.
        ///
        /// # Examples
        ///
        /// ```rust
        /// let interval = Interval::new(1, 10).unwrap();
        /// assert_eq!(interval.start, 1);
        /// assert_eq!(interval.end, 10);
        /// ```
        pub fn new(start: T, end: T) -> Result<Self, IntervalError> {
            if start <= end {
                Ok(Self { start, end })
            } else {
                Err(IntervalError::StartEndRangeInvalid)
            }
        }

        /// Checks if two intervals overlap. Overlapping intervals have at least
        /// one point in common.
        ///
        /// # Examples
        ///
        /// ```rust
        /// let a = Interval::new(1, 3).unwrap();
        /// let b = Interval::new(3, 5).unwrap();
        /// assert_eq!(a.overlaps(&b), true);
        /// assert_eq!(b.overlaps(&a), true);
        /// ```
        ///
        /// ```rust
        /// let a = Interval::new(1, 5).unwrap();
        /// let b = Interval::new(2, 4).unwrap();
        /// assert_eq!(a.overlaps(&b), true);
        /// assert_eq!(b.overlaps(&a), true);
        /// ```
        ///
        /// ```rust
        /// let a = Interval::new(1, 3).unwrap();
        /// let b = Interval::new(4, 6).unwrap();
        /// assert_eq!(a.overlaps(&b), false);
        /// assert_eq!(b.overlaps(&a), true);
        /// ```
        pub fn overlaps(&self, other: &Interval<T>) -> bool {
            self.end >= other.start
        }

        /// Merges two intervals returning a new `Interval`.
        ///
        /// The merged `Interval` range includes the union of ranges from each
        /// `Interval`.
        ///
        /// # Examples
        ///
        /// ```rust
        /// let a = Interval::new(1, 3).unwrap();
        /// let b = Interval::new(3, 5).unwrap();
        /// let c = a.merge(&b).unwrap();
        /// assert_eq!(c.start, 1);
        /// assert_eq!(c.end, 5);
        /// ```
        pub fn merge(&self, other: &Self) -> Result<Self, IntervalError> {
            if self.overlaps(other) {
                Ok(Self {
                    start: self.start,
                    end: other.end,
                })
            } else {
                Err(IntervalError::NonOverlappingInterval)
            }
        }
    }

    use std::fmt;
    impl<T: fmt::Display> fmt::Display for Interval<T> {
        fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> fmt::Result {
            write!(f, "[{}, {}]", self.start, self.end)
        }
    }

    use std::cmp::Ordering;
    impl<T: PartialEq + PartialOrd> PartialOrd for Interval<T> {
        fn partial_cmp(&self, other: &Self) -> Option<Ordering> {
            if self == other {
                Some(Ordering::Equal)
            } else if self.end < other.start {
                Some(Ordering::Less)
            } else if self.start > other.end {
                Some(Ordering::Greater)
            } else {
                None // Intervals overlap
            }
        }
    }
}

Summary

We made some minor code changes along with configuring clap. Most of it should be straightforward by now, so we'll focus on the new aspects:

  • The PathBuf module facilitates cross-platform path manipulation. Since our grep program requires files to search, it's better to use PathBuf for handling that input.
  • We updated print_results to take a new boolean argument, line_number, which specifies whether or not to print line numbers in the output. If true, the print! macro is used to output the line number without emitting a newline.
  • We used clap attributes to derive our command line arguments. The clap documentation covers these attributes in full detail.
  • In main, we mock command line arguments with an array of string slices that gets parsed by clap.

    The match expression is for development purposes. If we made any mistakes in our mock command line arguments, we would catch the error and print it.

  • Since we defined the before_context and after_context variables as u8 in the Cli structure but use usize throughout the code, an explicit cast using as is necessary when assigning these values to local variables.

Next

Onward to concurrency!

Concurrency

We're nearing the end of the course, and there's no better way to wrap things up than with concurrency! Rust, like many other languages, includes a native thread module for spawning threads and performing tasks in parallel. However, numerous crates build on Rust's native multithreading capabilities, simplifying the process and offering various threading patterns.

For instance, if you've programmed in Go, you're likely familiar with channels for moving data between goroutines. Similarly, if you've worked with JavaScript, you probably know about async and await. Both of these paradigms, along with many others, are available as crates in Rust.

Here's a few examples worth exploring:

  • crossbeam: A set of tools for concurrent programming
  • rayon: A data-parallelism library for Rust.
  • tokio: An event-driven, non-blocking I/O platform for writing asynchronous I/O backed applications.

Let's dive into multithreading!

Non-scoped Threads

The simplest way to enable multithreading support in our Grep program is to spawn a thread for each specified file, perform the pattern matching steps we've developed, and print the results using a fork/join pattern. If you have a C++ background, you might start by exploring the functions in the thread module, discovering the spawn function, and diving right in.

However, this approach is flawed! While it might be a valuable learning experience, it will likely be time-consuming, frustrating, and confusing. Let's examine some of the pitfalls of this approach and then explore a more effective solution.

The spawn Function

Here's the signature for the spawn function:

pub fn spawn<F, T>(f: F) -> JoinHandle<T>
where
    F: FnOnce() -> T + Send + 'static,
    T: Send + 'static,

The spawn function takes a closure (f) as an argument, spawns a new thread to run the closure, and returns a JoinHandle. As the name suggests, the JoinHandle provides a join method to wait for the spawned thread to finish. So far, so good. However, the 'static1 constraint on F and the return value T is where, as the Bard would say, lies the rub.

The 'static constraint requires that the closure and its return value must live for the entire duration of the program's execution. This is necessary because threads can outlive the context in which they were created.

If a thread, and by extension its return value, can outlive its caller, we need to ensure they remain valid afterward. Since we can't predict when the thread will finish, they must be valid until the end of the program. Hence, the 'static lifetime.

Don't worry if this concept is still unclear; it's one of the most challenging aspects of Rust, and many developers find it difficult at first.

Let's examine a simple program to better understand these concepts. Review the following code snippet and then run the program.

fn main() {
    use std::thread;

    let error = 0373; // Compiler error E0373

    let handler = thread::spawn(|| {
        println!("{error}");
    });

    handler.join().unwrap();
}

This simple program seems perfectly valid. So why does the Rust compiler complain about code that would run without issues in most other languages? Even though it's not the case in this example, in a more general sense, the spawned thread created in main could potentially outlive the thread from which it was created. Because the closure borrows the error value, the parent thread could go out of scope, causing the error value to be dropped rendering the reference invalid. Because the compiler can't know whether or not that's going to happen, we see the error:

error[E0373]: closure may outlive the current function, but it borrows error, which is owned by the current function

The compiler does offer a solution that may work under some circumstances:

help: to force the closure to take ownership of error (and any other referenced variables), use the move 2 keyword

Let's see what happens if we do that:

fn main() {
    use std::thread;

    let error = 0373; // Compiler error E0373

    let handler = thread::spawn(move || {
        println!("{error}");
    });

    handler.join().unwrap();
}

Recalling our discussion on copy semantics, it probably makes sense why this worked. The value was copied into the scope of the spawned thread, making it independent from the original and solving the lifetime problem. The variable is now guaranteed to live for as long as the thread. This works fine for primitive types and types implementing the Copy trait. But what if copying is an expensive operation, or the object can't be copied? This leads to the next challenge that developers learning Rust often face: what happens if we need to spawn additional threads that also need access to the same variable?

For example, consider this case:

fn main() {
    use std::thread;

    let error = String::from("E0373"); // Compiler error E0373

    let handler1 = thread::spawn(move || {
        println!("{error}");
    });

    let handler2 = thread::spawn(move || {
        println!("{error}");
    });

    handler1.join().unwrap();
    handler2.join().unwrap();
}

While these strict rules may seem daunting, especially in larger programs, they are the foundation of Rust's fearless concurrency goals. These rules ensure that concurrency in your programs is both safe and efficient.

Now that we understand why the traditional fork/join model, which works in many other languages, is likely to fail in Rust, let's explore how to correctly implement this approach!


1

Lifetimes are another type of generic. However, instead of ensuring that a type has the desired behavior, they ensure that references remain valid for the desired duration.

2

move converts any variables captured by reference or mutable reference to variables captured by value.

Scoped Threads

In the previous section, we explored the spawn method in the thread module and noted its limitation: it cannot borrow non-'static data because the compiler cannot guarantee that all threads will be joined before the lifetimes of any borrowed values expire. To address this issue, the thread module also offers the scope function, which works in conjunction with Scope. This combination allows for borrowing non-'static data by ensuring that all spawned threads within the scope are automatically joined before the function in which they were created returns, unless they were manually joined earlier.

The scope Function

Here's the signature for the scope function:

pub fn scope<'env, F, T>(f: F) -> T
where
    F: for<'scope> FnOnce(&'scope Scope<'scope, 'env>) -> T,

The scope function takes a closure (f) as an argument, which receives a Scope object (created by scope). This Scope object allows threads to be spawned using the spawn method. The spawn method returns a ScopedJoinHandle, which, as the name suggests, provides a join method to wait for the spawned thread to complete.

Scoped threads involve two lifetimes: 'scope and 'env:

  • 'env: This is a lifetime parameter that represents the lifetime of the environment data that the Scope can borrow (meaning the data from outside the scope). It ensures that any references to the environment data outlive the scope.
  • 'scope: This is another lifetime parameter that represents the lifetime of the scope itself. This is the period during which new scoped threads can be spawned and running. It ensures that any running threads are joined before the lifetime ends.

In plain English, the scope function takes a closure f that can borrow data with a specific lifetime 'env and ensures that all threads spawned within the scope are joined before the function returns.

Let's revisit one of the examples from the previous section that failed to compile to better understand these concepts. Review the following code snippet and then run the program.

fn main() {
    use std::thread;

    let error = String::from("E0373"); // Compiler error E0373

    thread::scope(|s| {
        let handler = s.spawn(|| {
            println!("{error}");
        });

        s.spawn(|| {
            println!("{error}");
        });

        handler.join().unwrap(); // Manually join the first thread

        // Second thread is automatically joined when the closure returns.
    });
}

Through these ownership and type system tools, it is guaranteed that all threads created within scope are joined before their lifetimes end. This allows the Rust compiler to be certain that all borrowed data (in this case, the error String) remains valid for the lifetime of the threads. This is how Rust turns many runtime errors into compile-time errors. Concepts like this facilitate fearless concurrency in Rust!

Multithreaded Grep

With our understanding of scoped vs non-scoped threads, we are now prepared to correctly update Grep to process each file specified on the command line in a separate thread.

Here's the updated version of the program with the changes visible.

#![allow(unused_imports)]
extern crate regex; // this is needed for the playground
use clap::Parser;
use interval::{Interval, IntervalError};
use itertools::Itertools;
use regex::Regex;
use std::collections::HashMap;
use std::fs::File;
use std::io::Read;
use std::io::{BufRead, BufReader};
use std::path::PathBuf;
use std::process::exit;
use std::thread;

fn find_matching_lines(lines: &[String], regex: &Regex) -> Vec<usize> {
    lines
        .iter()
        .enumerate()
        .filter_map(|(i, line)| match regex.is_match(line) {
            true => Some(i),
            false => None,
        })
        .collect() // turns anything iterable into a collection
}

fn create_intervals(
    lines: Vec<usize>,
    before_context: usize,
    after_context: usize,
) -> Result<Vec<Interval<usize>>, IntervalError> {
    lines
        .iter()
        .map(|line| {
            let start = line.saturating_sub(before_context);
            let end = line.saturating_add(after_context);
            Interval::new(start, end)
        })
        .collect()
}

fn merge_intervals(intervals: Vec<Interval<usize>>) -> Vec<Interval<usize>> {
    // merge overlapping intervals
    intervals
        .into_iter()
        .coalesce(|p, c| p.merge(&c).map_err(|_| (p, c)))
        .collect()
}

fn print_results(
    intervals: Vec<Interval<usize>>,
    lines: Vec<String>,
    line_number: bool,
) {
    for interval in intervals {
        for (line_no, line) in lines
            .iter()
            .enumerate()
            .take(interval.end + 1)
            .skip(interval.start)
        {
            if line_number {
                print!("{}: ", line_no + 1);
            }
            println!("{}", line);
        }
    }
}

fn read_file(file: impl Read) -> Vec<String> {
    BufReader::new(file).lines().map_while(Result::ok).collect()
}

#[derive(Parser)]
#[command(version, about, long_about = None)]
struct Cli {
    /// Prefix each line of output with the 1-based line number within its
    /// input file.
    #[arg(short, long, default_value_t = false)]
    line_number: bool,

    /// Print num lines of trailing context before matching lines.
    #[arg(short, long, default_value_t = 0, value_name = "num")]
    before_context: u8,

    /// Print num lines of trailing context after matching lines.
    #[arg(short, long, default_value_t = 0, value_name = "num")]
    after_context: u8,

    /// The regular expression to match.
    #[arg(required = true)]
    pattern: String,

    /// List of files to search.
    #[arg(required = true)]
    files: Vec<PathBuf>,
}

// Result from a thread
struct GrepSuccess {
    intervals: Vec<Interval<usize>>,
    lines: Vec<String>,
}

// Result from a failed thread
struct GrepFailure {
    error: String,
}

fn main() {
    // let cli = Cli::parse(); // for production use
    // mock command line arguments
    let cli = match Cli::try_parse_from([
        "grep", // executable name
        "--line-number",
        "--before-context",
        "1",
        "--after-context",
        "1",
        "(all)|(will)", // pattern
        // file(s)...
        "poem.txt",
        "bad_file.txt", // intention failure
        "scoped_threads.txt",
    ]) {
        Ok(cli) => cli,
        Err(e) => {
            eprintln!("Error parsing command line arguments: {e:?}");
            exit(1);
        }
    };

    // map of filename/file contents to simulate opening a file
    let mock_disk = HashMap::from([
        (
            "poem.txt",
            "I have a little shadow that goes in and out with me,
And what can be the use of him is more than I can see.
He is very, very like me from the heels up to the head;
And I see him jump before me, when I jump into my bed.

The funniest thing about him is the way he likes to grow -
Not at all like proper children, which is always very slow;
For he sometimes shoots up taller like an india-rubber ball,
And he sometimes gets so little that there’s none of him at all.",
        ),
        (
            "scoped_threads.txt",
            "When we work with scoped threads, the compiler can clearly see,
if the variables we want to use will be avilable to me.
Because of this visiblity, I'm runtime error free!
And issues in my code will be exposed by rustc.
If this sort of safety is provided at native speeds,
there's simply no compelling case to stick with cpp!",
        ),
    ]);

    // get values from clap
    let pattern = cli.pattern;
    let line_number = cli.line_number;
    let before_context = cli.before_context as usize;
    let after_context = cli.after_context as usize;
    let files = cli.files;

    // compile the regular expression
    let regex = match Regex::new(&pattern) {
        Ok(re) => re, // bind re to regex
        Err(e) => {
            eprintln!("{e}"); // write to standard error
            exit(1);
        }
    };

    thread::scope(|s| {
        let handles: Vec<_> = files
            .iter()
            .map(|file| {
                let filename = match file.to_str() {
                    Some(filename) => filename,
                    None => {
                        return Err(GrepFailure {
                            error: format!(
                                "Invalid filename: {}",
                                file.display()
                            ),
                        })
                    }
                };

                // attempt to open the file
                //let lines = match File::open(filename) {
                //    // convert the poem into lines
                //    Ok(file) => read_file(file),
                //    Err(e) => {
                //        eprintln!("Error opening {filename}: {e}");
                //        exit(1);
                //    }
                //};

                if !mock_disk.contains_key(filename) {
                    return Err(GrepFailure {
                        error: format!("File not found: {}", filename),
                    });
                }

                Ok(filename)
            })
            .map_ok(|filename| {
                // only spawn a thread for accessible file
                s.spawn(|| {
                    let contents = mock_disk.get(filename).unwrap();
                    let mock_file = std::io::Cursor::new(contents);
                    let lines = read_file(mock_file);

                    // store the 0-based line number for any matched line
                    let match_lines = find_matching_lines(&lines, &regex);

                    // create intervals of the form [a,b] with the before/after context
                    let intervals = match create_intervals(
                        match_lines,
                        before_context,
                        after_context,
                    ) {
                        Ok(intervals) => intervals,
                        Err(_) => return Err(GrepFailure {
                            error: String::from(
                                "An error occurred while creating intervals",
                            ),
                        }),
                    };

                    // merge overlapping intervals
                    let intervals = merge_intervals(intervals);
                    Ok(GrepSuccess { intervals, lines })
                })
            })
            .collect();

        // process all the results
        for handle in handles {
            let result = match handle {
                Ok(scoped_join_handle) => scoped_join_handle,
                Err(e) => {
                    eprintln!("{}", e.error);
                    continue;
                }
            };

            if let Ok(result) = result.join() {
                match result {
                    Ok(result) => print_results(
                        result.intervals,
                        result.lines,
                        line_number,
                    ),
                    Err(e) => eprintln!("{}", e.error),
                };
            };
        }
    });
}

pub mod interval {
    /// A list specifying general categories of Interval errors.
    #[derive(Debug)]
    pub enum IntervalError {
        /// Start is not less than or equal to end
        StartEndRangeInvalid,
        /// Two intervals to be merged do not overlap
        NonOverlappingInterval,
    }

    /// A closed-interval [`start`, `end`] type used for representing a range of
    /// values between `start` and `end` inclusively.
    ///
    /// # Examples
    ///
    /// You can create an `Interval` using `new`.
    ///
    /// ```rust
    /// let interval = Interval::new(1, 10).unwrap();
    /// assert_eq!(interval.start, 1);
    /// assert_eq!(interval.end, 10);
    /// ```
    #[derive(Debug, PartialEq)]
    pub struct Interval<T> {
        pub start: T,
        pub end: T,
    }

    impl<T: Copy + PartialOrd> Interval<T> {
        /// Creates a new `Interval` set to `start` and `end`.
        ///
        /// # Examples
        ///
        /// ```rust
        /// let interval = Interval::new(1, 10).unwrap();
        /// assert_eq!(interval.start, 1);
        /// assert_eq!(interval.end, 10);
        /// ```
        pub fn new(start: T, end: T) -> Result<Self, IntervalError> {
            if start <= end {
                Ok(Self { start, end })
            } else {
                Err(IntervalError::StartEndRangeInvalid)
            }
        }

        /// Checks if two intervals overlap. Overlapping intervals have at least
        /// one point in common.
        ///
        /// # Examples
        ///
        /// ```rust
        /// let a = Interval::new(1, 3).unwrap();
        /// let b = Interval::new(3, 5).unwrap();
        /// assert_eq!(a.overlaps(&b), true);
        /// assert_eq!(b.overlaps(&a), true);
        /// ```
        ///
        /// ```rust
        /// let a = Interval::new(1, 5).unwrap();
        /// let b = Interval::new(2, 4).unwrap();
        /// assert_eq!(a.overlaps(&b), true);
        /// assert_eq!(b.overlaps(&a), true);
        /// ```
        ///
        /// ```rust
        /// let a = Interval::new(1, 3).unwrap();
        /// let b = Interval::new(4, 6).unwrap();
        /// assert_eq!(a.overlaps(&b), false);
        /// assert_eq!(b.overlaps(&a), true);
        /// ```
        pub fn overlaps(&self, other: &Interval<T>) -> bool {
            self.end >= other.start
        }

        /// Merges two intervals returning a new `Interval`.
        ///
        /// The merged `Interval` range includes the union of ranges from each
        /// `Interval`.
        ///
        /// # Examples
        ///
        /// ```rust
        /// let a = Interval::new(1, 3).unwrap();
        /// let b = Interval::new(3, 5).unwrap();
        /// let c = a.merge(&b).unwrap();
        /// assert_eq!(c.start, 1);
        /// assert_eq!(c.end, 5);
        /// ```
        pub fn merge(&self, other: &Self) -> Result<Self, IntervalError> {
            if self.overlaps(other) {
                Ok(Self {
                    start: self.start,
                    end: other.end,
                })
            } else {
                Err(IntervalError::NonOverlappingInterval)
            }
        }
    }

    use std::fmt;
    impl<T: fmt::Display> fmt::Display for Interval<T> {
        fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> fmt::Result {
            write!(f, "[{}, {}]", self.start, self.end)
        }
    }

    use std::cmp::Ordering;
    impl<T: PartialEq + PartialOrd> PartialOrd for Interval<T> {
        fn partial_cmp(&self, other: &Self) -> Option<Ordering> {
            if self == other {
                Some(Ordering::Equal)
            } else if self.end < other.start {
                Some(Ordering::Less)
            } else if self.start > other.end {
                Some(Ordering::Greater)
            } else {
                None // Intervals overlap
            }
        }
    }
}

Missing Error Output

File not found: bad_file.txt

The error message for the bad file doesn't appear in the playground output because standard error output isn't captured.

Our grep program is now multithreaded and processes all the input files in parallel! Let's walk through the code changes.

mock_disk

The mock_disk HashMap simulates disk access to check if a file exists and is accessible. The filename serves as the key, while the file's contents are the value. This approach aids in testing and development, but its use here is solely for Rust playground compatibility. To ensure the creation of multiple threads, I added a new poem written by yours truly.

thread::scope

  1. Iterate over all files specified on the command line using the map iterator adapter, which returns a Result. The Ok variant holds the filename if valid, while the Error holds the error for an invalid filename.
  2. The map_ok iterator adapter processes each Result, calling the provided closure on any Ok values, allowing us to ignore any invalid filenames. The provided Scope (s) spawns one thread per file for processing. The closure returns a Result: Err with an error message in a GrepFailure struct if processing fails, or Ok with a GrepSuccess struct containing intervals and lines from the input file if successful.
  3. Use collect to create a vector (Vec) of results from each file iteration, binding it to handles.
  4. Finally, iterate over the elements in the handles vector using a for loop. Print any errors to standard error, and pass successful pattern matching results to the print_results function for output to standard output.

find_matching_lines

Since each thread needs access to the regex object, the value is borrowed instead of moved.

Summary

  • Ownership and type systems are powerful tools for managing memory safety and concurrency issues. By leveraging ownership and type checking, many concurrency errors in Rust are caught at compile time rather than at runtime.
  • Unlike non-scoped threads, scoped threads can borrow non-'static data because the scope ensures all threads are joined before it ends.

Conclusion

Congratulations on reaching the end of the course! I truly hope you found this material helpful as you continue your journey as a Rust developer. Hopefully, you've gained a solid understanding of some foundational concepts in Rust that many beginners find challenging. What we've covered here is just the beginning. There are many more topics to explore! Stay curious and enjoy the journey. Rust is the future!

Here are some resources used throughout this course, listed alphabetically:

A Note About Copilot

The content in this book is the author's own words. However, some of the content was improved with the assistance of Copilot to enhance clarity and maintain a consistent writing style throughout the book.

If you find that anything appears to be copied from another source without proper accreditation, please start a discussion on GitHub.

Final Grep

As we reach the end of this journey, I am pleased to present the final version of the Grep program, ready for production use. To compile a release version, be sure to use cargo build --release. I hope you have enjoyed this journey as much as I have enjoyed guiding you through it. Thank you very much for completing the course. Have fun on your Rust journey!

use clap::Parser;
use interval::{Interval, IntervalError};
use itertools::Itertools;
use regex::Regex;
use std::fs::File;
use std::io::Read;
use std::io::{BufRead, BufReader};
use std::path::PathBuf;
use std::process::exit;
use std::thread;

fn find_matching_lines(lines: &[String], regex: &Regex) -> Vec<usize> {
    lines
        .iter()
        .enumerate()
        .filter_map(|(i, line)| match regex.is_match(line) {
            true => Some(i),
            false => None,
        })
        .collect() // turns anything iterable into a collection
}

fn create_intervals(
    lines: Vec<usize>,
    before_context: usize,
    after_context: usize,
) -> Result<Vec<Interval<usize>>, IntervalError> {
    lines
        .iter()
        .map(|line| {
            let start = line.saturating_sub(before_context);
            let end = line.saturating_add(after_context);
            Interval::new(start, end)
        })
        .collect()
}

fn merge_intervals(intervals: Vec<Interval<usize>>) -> Vec<Interval<usize>> {
    // merge overlapping intervals
    intervals
        .into_iter()
        .coalesce(|p, c| p.merge(&c).map_err(|_| (p, c)))
        .collect()
}

fn print_results(
    intervals: Vec<Interval<usize>>,
    lines: Vec<String>,
    line_number: bool,
) {
    for interval in intervals {
        for (line_no, line) in lines
            .iter()
            .enumerate()
            .take(interval.end + 1)
            .skip(interval.start)
        {
            if line_number {
                print!("{}: ", line_no + 1);
            }
            println!("{}", line);
        }
    }
}

fn read_file(file: impl Read) -> Vec<String> {
    BufReader::new(file).lines().map_while(Result::ok).collect()
}

#[derive(Parser)]
#[command(version, about, long_about = None)]
struct Cli {
    /// Prefix each line of output with the 1-based line number within its
    /// input file.
    #[arg(short, long, default_value_t = false)]
    line_number: bool,

    /// Print num lines of trailing context before matching lines.
    #[arg(short, long, default_value_t = 0, value_name = "num")]
    before_context: u8,

    /// Print num lines of trailing context after matching lines.
    #[arg(short, long, default_value_t = 0, value_name = "num")]
    after_context: u8,

    /// The regular expression to match.
    #[arg(required = true)]
    pattern: String,

    /// List of files to search.
    #[arg(required = true)]
    files: Vec<PathBuf>,
}

// Result from a thread
struct GrepSuccess {
    intervals: Vec<Interval<usize>>,
    lines: Vec<String>,
}

// Result from a failed thread
struct GrepFailure {
    error: String,
}

fn main() {
    let cli = Cli::parse();

    // get values from clap
    let pattern = cli.pattern;
    let line_number = cli.line_number;
    let before_context = cli.before_context as usize;
    let after_context = cli.after_context as usize;
    let files = cli.files;

    // compile the regular expression
    let regex = match Regex::new(&pattern) {
        Ok(re) => re, // bind re to regex
        Err(e) => {
            eprintln!("{e}"); // write to standard error
            exit(1);
        }
    };

    thread::scope(|s| {
        let handles: Vec<_> = files
            .iter()
            .map(|file| {
                let filename = match file.to_str() {
                    Some(filename) => filename,
                    None => {
                        return Err(GrepFailure {
                            error: format!(
                                "Invalid filename: {}",
                                file.display()
                            ),
                        })
                    }
                };

                // attempt to open the file
                File::open(filename).map_err(|e| GrepFailure {
                    error: format!("Error opening {filename}: {e}"),
                })
            })
            .map_ok(|file| {
                // only spawn a thread for accessible file
                s.spawn(|| {
                    let lines = read_file(file);

                    // store the 0-based line number for any matched line
                    let match_lines = find_matching_lines(&lines, &regex);

                    // create intervals of the form [a,b] with the before/after context
                    let intervals = match create_intervals(
                        match_lines,
                        before_context,
                        after_context,
                    ) {
                        Ok(intervals) => intervals,
                        Err(_) => return Err(GrepFailure {
                            error: String::from(
                                "An error occurred while creating intervals",
                            ),
                        }),
                    };

                    // merge overlapping intervals
                    let intervals = merge_intervals(intervals);
                    Ok(GrepSuccess { intervals, lines })
                })
            })
            .collect();

        // process all the results
        for handle in handles {
            let result = match handle {
                Ok(scoped_join_handle) => scoped_join_handle,
                Err(e) => {
                    eprintln!("{}", e.error);
                    continue;
                }
            };

            if let Ok(result) = result.join() {
                match result {
                    Ok(result) => print_results(
                        result.intervals,
                        result.lines,
                        line_number,
                    ),
                    Err(e) => eprintln!("{}", e.error),
                };
            };
        }
    });
}

pub mod interval {
    /// A list specifying general categories of Interval errors.
    #[derive(Debug)]
    pub enum IntervalError {
        /// Start is not less than or equal to end
        StartEndRangeInvalid,
        /// Two intervals to be merged do not overlap
        NonOverlappingInterval,
    }

    /// A closed-interval [`start`, `end`] type used for representing a range of
    /// values between `start` and `end` inclusively.
    ///
    /// # Examples
    ///
    /// You can create an `Interval` using `new`.
    ///
    /// ```rust
    /// let interval = Interval::new(1, 10).unwrap();
    /// assert_eq!(interval.start, 1);
    /// assert_eq!(interval.end, 10);
    /// ```
    #[derive(Debug, PartialEq)]
    pub struct Interval<T> {
        pub start: T,
        pub end: T,
    }

    impl<T: Copy + PartialOrd> Interval<T> {
        /// Creates a new `Interval` set to `start` and `end`.
        ///
        /// # Examples
        ///
        /// ```rust
        /// let interval = Interval::new(1, 10).unwrap();
        /// assert_eq!(interval.start, 1);
        /// assert_eq!(interval.end, 10);
        /// ```
        pub fn new(start: T, end: T) -> Result<Self, IntervalError> {
            if start <= end {
                Ok(Self { start, end })
            } else {
                Err(IntervalError::StartEndRangeInvalid)
            }
        }

        /// Checks if two intervals overlap. Overlapping intervals have at least
        /// one point in common.
        ///
        /// # Examples
        ///
        /// ```rust
        /// let a = Interval::new(1, 3).unwrap();
        /// let b = Interval::new(3, 5).unwrap();
        /// assert_eq!(a.overlaps(&b), true);
        /// assert_eq!(b.overlaps(&a), true);
        /// ```
        ///
        /// ```rust
        /// let a = Interval::new(1, 5).unwrap();
        /// let b = Interval::new(2, 4).unwrap();
        /// assert_eq!(a.overlaps(&b), true);
        /// assert_eq!(b.overlaps(&a), true);
        /// ```
        ///
        /// ```rust
        /// let a = Interval::new(1, 3).unwrap();
        /// let b = Interval::new(4, 6).unwrap();
        /// assert_eq!(a.overlaps(&b), false);
        /// assert_eq!(b.overlaps(&a), true);
        /// ```
        pub fn overlaps(&self, other: &Interval<T>) -> bool {
            self.end >= other.start
        }

        /// Merges two intervals returning a new `Interval`.
        ///
        /// The merged `Interval` range includes the union of ranges from each
        /// `Interval`.
        ///
        /// # Examples
        ///
        /// ```rust
        /// let a = Interval::new(1, 3).unwrap();
        /// let b = Interval::new(3, 5).unwrap();
        /// let c = a.merge(&b).unwrap();
        /// assert_eq!(c.start, 1);
        /// assert_eq!(c.end, 5);
        /// ```
        pub fn merge(&self, other: &Self) -> Result<Self, IntervalError> {
            if self.overlaps(other) {
                Ok(Self {
                    start: self.start,
                    end: other.end,
                })
            } else {
                Err(IntervalError::NonOverlappingInterval)
            }
        }
    }

    use std::fmt;
    impl<T: fmt::Display> fmt::Display for Interval<T> {
        fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> fmt::Result {
            write!(f, "[{}, {}]", self.start, self.end)
        }
    }

    use std::cmp::Ordering;
    impl<T: PartialEq + PartialOrd> PartialOrd for Interval<T> {
        fn partial_cmp(&self, other: &Self) -> Option<Ordering> {
            if self == other {
                Some(Ordering::Equal)
            } else if self.end < other.start {
                Some(Ordering::Less)
            } else if self.start > other.end {
                Some(Ordering::Greater)
            } else {
                None // Intervals overlap
            }
        }
    }
}

You can find the final version of the Grep program we built together during this course here.