12327 lines
428 KiB
Plaintext
12327 lines
428 KiB
Plaintext
U00: Let’s Rust
|
||
|
||
Welcome at DSys GmbH, we are happy to have you as a new junior engineer.
|
||
|
||
On your first day, we present this onboarding slide deck to you and let
|
||
you set up your rusty workstation for the time ahead.
|
||
|
||
We are looking forward to this, as you are going to work on some
|
||
exciting projects and we help you to learn about dependability &
|
||
dependable systems and software.
|
||
|
||
Rust
|
||
|
||
We at DSys believe that Rust (the programming language and the
|
||
ecosystem) is the future for dependable software systems.
|
||
|
||
We are particularly inclined to the mission of the Rust project, namely
|
||
to be “a language empowering everyone to build reliable and efficient
|
||
software” — including you!
|
||
|
||
Hence, you have to learn Rust in the following weeks. But do not be
|
||
afraid, we are here to help.
|
||
|
||
So let us dive right in and get our hands rusty.
|
||
|
||
Setup Your Development System
|
||
|
||
First, make sure you have Rust installed; the project has an excellent
|
||
guide to doing so… if you struggle, let us know!
|
||
|
||
For development tools, we highly recommend:
|
||
|
||
- Visual Studio Code (VSCode)
|
||
|
||
- including the VSCode exentsions: rust-analyzer and BetterToml
|
||
|
||
- A decent terminal emulator:
|
||
|
||
- on Linux/Mac, you are covered already
|
||
|
||
- on Windows, we recommend Windows Terminal
|
||
|
||
Finally, you must use Git, as software development cannot be dependable
|
||
without a version control system. How to set it up is described on their
|
||
website.
|
||
|
||
Optionally, you might already create an account on GitLab.com. We do not
|
||
use it right away, but it is required later on. It could also help you
|
||
store the software you produce in a safe place starting from day 1.
|
||
|
||
Hello World
|
||
|
||
Now with Rust installed on your system, you can run the famous Hello
|
||
World program:
|
||
|
||
fn main() {
|
||
println!("Hello World");
|
||
}
|
||
|
||
You can run it right inside this book, but you’re here to build
|
||
something, so do the following:
|
||
|
||
- In a new folder, run cargo new --bin hello.
|
||
- Enter the folder hello, run code . to edit things.
|
||
- Run cargo check when you are done (e.g., by putting the above
|
||
snippet into main.rs).
|
||
- Ideally, this succeeded, and your program is accepted by the
|
||
compiler. If not, we hope that the compiler provided you with some
|
||
helpful error messages.
|
||
- Now use cargo build, which might take a little bit of time, and
|
||
afterwards, you have an executable binary in target/debug. Have a
|
||
look and execute the binary from your console.
|
||
- Back in your project folder, make a change to main.rs and, e.g.,
|
||
change the text. Now type cargo run and see what happens.
|
||
- Interesting! Apparently, run first checks, then builds, and finally
|
||
executes your binary.
|
||
- Why didn’t we show you run in the first place? Due to Rust being a
|
||
compiled language, it often makes sense to only do check while you
|
||
work on the project. When you want to share your code, build is the
|
||
way.
|
||
|
||
println!()
|
||
|
||
Let’s have a closer look at this program and, in particular, println!().
|
||
To be precise, this is a macro as indicated by the !
|
||
|
||
The macro prints its arguments to stdout. There is also eprintln!() to
|
||
print to stderr.
|
||
|
||
Here are some ways to use the macro to format arguments in different
|
||
ways:
|
||
|
||
println!("Hello"); // no args
|
||
println!("Hello {}", "world"); // simple
|
||
println!("Hello {1} {0}", "world", 1); // positional
|
||
println!("{value}", value=4); // named
|
||
println!("Hello {:?}", ("world", 5)); // debug
|
||
println!("Hello {:#?}", ("world", 5)); // pretty-print
|
||
|
||
More details on formatting in std::fmt.
|
||
|
||
Fibonacci
|
||
|
||
Now let’s do something more sensible and compute a Fibonacci number:
|
||
|
||
{{#playground fib.rs}}
|
||
|
||
Here, you see how you 1) define a function, 2) use control flow (if,
|
||
else), and 3) call a function (recursively).
|
||
|
||
Cargo.toml and Cargo.lock
|
||
|
||
The following shows our hello world’s Cargo.toml, specifying package’s
|
||
name, the version, and the used Rust edition (see below). There could
|
||
also be third-party crates to be imported under [dependencies]:
|
||
|
||
[package]
|
||
name = "helloworld"
|
||
version = "0.1.0"
|
||
edition = "2018"
|
||
|
||
[dependencies]
|
||
...
|
||
|
||
In general, reproducible builds (i.e., building code produces identical
|
||
output) are getting more relevant to counter, e.g., security and
|
||
consistency problems.
|
||
|
||
The file Cargo.lock is created when the current dependencies are present
|
||
and cargo build is invoked. Thereby, versions are entirely fixed and
|
||
reproduced when another developer reuses this Cargo.lock file.
|
||
|
||
# This file is automatically @generated by Cargo.
|
||
# It is not intended for manual editing.
|
||
version = 3
|
||
|
||
[[package]]
|
||
name = "helloworld"
|
||
version = "0.1.0"
|
||
dependencies = [
|
||
"foobar",
|
||
]
|
||
|
||
|
||
[[package]]
|
||
name = "foobar"
|
||
version = "0.42.1"
|
||
source = "registry+https://github.com/rust-lang/crates.io-index"
|
||
checksum = "..."
|
||
dependencies = [
|
||
...
|
||
]
|
||
|
||
Maintaining Rust
|
||
|
||
There won’t be a Rust 2.0
|
||
|
||
Versions
|
||
|
||
- Rust 1.0 was released in May 2015.
|
||
- New version every 6 weeks.
|
||
- The latest Rust version can be found on What Rust is it?.
|
||
|
||
Editions (Theme)
|
||
|
||
- 2015: Stability
|
||
- 2018: ProductivityIntroduced keywords (async, await, try)
|
||
- 2021: Sustainability
|
||
- 2024: Scale Empowerment
|
||
|
||
Editions are compatible and opt-in. Use 2015 crate X in your 2018
|
||
crate Y (and vice versa).
|
||
|
||
More details are in the Edition Guide.
|
||
|
||
S00: Sample Solution
|
||
|
||
Getting Rusty
|
||
|
||
- https://www.rust-lang.org/tools/install
|
||
|
||
- fn hello(name: &str, age: i32) {
|
||
println!(
|
||
"Hello, my name is {} and I am {} year{} old.",
|
||
name,
|
||
age,
|
||
if age == 1 { "" } else { "s" }
|
||
);
|
||
}
|
||
|
||
fn main() {
|
||
let name = "Ferris";
|
||
let age = 11;
|
||
hello(name, age);
|
||
}
|
||
|
||
- fn fib(n: u8) -> u16 {
|
||
if n == 0 || n == 1 {
|
||
return 1;
|
||
}
|
||
|
||
let mut res = 0;
|
||
let mut last = 1;
|
||
let mut curr = 1;
|
||
for _i in 1..n {
|
||
res = last + curr;
|
||
last = curr;
|
||
curr = res;
|
||
}
|
||
res
|
||
}
|
||
|
||
fn main() {
|
||
let n = 9;
|
||
let res = fib(n);
|
||
println!("fib({}) = {}", n, res);
|
||
}
|
||
|
||
Dependability
|
||
|
||
- Discussed in plenum.
|
||
|
||
Summary
|
||
|
||
What did you learn?
|
||
|
||
- Why Rust and Dependability are important and the topic of this
|
||
course.
|
||
- How to set up Rust on your system.
|
||
- How to write first programs in Rust.
|
||
|
||
Where can you learn more?
|
||
|
||
- Rust Basics:
|
||
- Rust Book: Ch. 01 + 02
|
||
- Programming Rust: Ch. 01 + 02
|
||
- Rust in Action: Ch. 01
|
||
- cheats.rs: Hello Rust
|
||
- Rust Way of Life:
|
||
- Rust for Rustaceans: Ch. 01
|
||
- Rustacean Principles
|
||
- Dependability Basics:
|
||
- “Basic Concepts and Taxonomy of Dependable and Secure Computing”
|
||
by Laprie et al.
|
||
- “Are We Really Engineers?” by Hillel Wayne (interesting comparison
|
||
of software- and other engineers)
|
||
- Modern Unix (a collection of CLI utilities, many of them written in
|
||
Rust)
|
||
- atuin - 🐢 magical shell history
|
||
- Sustainability with Rust
|
||
|
||
W00: Work Sheet
|
||
|
||
Getting Rusty
|
||
|
||
- Setup Rust on your system.
|
||
|
||
- Modify Hello World to print your name and age, both provided as
|
||
arguments.
|
||
|
||
- Rewrite fib to compute the value using a for loop.
|
||
|
||
- Setup Rustlings on your system, ideally with Rust-Analyzer support.
|
||
You will need this in the upcoming units.
|
||
|
||
Dependability
|
||
|
||
- Do an online search for definitions of the term dependability and
|
||
its attributes. Bring them to the next plenum.
|
||
|
||
Computing with Rust
|
||
|
||
We already covered how to print on the console and how to compute the
|
||
Fibonacci number. In this section, we have a detailed look at how
|
||
programs in Rust can look like.
|
||
|
||
This section is intentionally kept brief and you should read the
|
||
excellent 3rd chapter of the Rust book if you have any doubts or want
|
||
a more in-depth introduction to the common programming concepts.
|
||
|
||
Variables
|
||
|
||
First, let’s have a look at how variables are declared in Rust. We
|
||
declare variables using let and can assign a type with : Type:
|
||
|
||
let variable : Type = something(); // Type could, e.g., be u16, i64, bool, String, ...
|
||
|
||
Type can be omitted if the compiler can infer[1] it, i.e. it is
|
||
unambiguous.
|
||
|
||
let variable = something(); // fn something() -> Type
|
||
|
||
Mutability
|
||
|
||
Now, let’s try to change the variable:
|
||
|
||
let variable = 3;
|
||
variable = 5;
|
||
|
||
When we run the code, we get:
|
||
|
||
error[E0384]: cannot assign twice to immutable variable `variable`
|
||
--> src/main.rs:5:1
|
||
|
|
||
4 | let variable = 3;
|
||
| --------
|
||
| |
|
||
| first assignment to `variable`
|
||
| help: consider making this binding mutable: `mut variable`
|
||
5 | variable = 5;
|
||
| ^^^^^^^^^^^^ cannot assign twice to immutable variable
|
||
|
||
error: aborting due to previous error
|
||
|
||
For more information about this error, try `rustc --explain E0384`.
|
||
error: could not compile `playground`
|
||
|
||
To learn more, run the command again with --verbose.
|
||
|
||
We learn that variables can be immutable (which they are by default) or
|
||
mutable. We can specify that using mut:
|
||
|
||
let mut variable = 3;
|
||
variable = 5;
|
||
|
||
This is an aspect where Rust is different from many other languages.
|
||
First, by making mutability explicit, it requires the programmer to
|
||
state her intents (to others AND herself). Second, by making
|
||
immutability the default, it takes a safe route. This is because it is
|
||
easier to reason about immutable variables and you immediately detect
|
||
the rare mutable variables due to the keyword mut. Later, when we cover
|
||
functional programming, you see that you can get far without using any
|
||
mutable variables. We can consider the mut annotations as a feature of
|
||
Rust that encourages the creation of dependable code.
|
||
|
||
Constants
|
||
|
||
Rust also allows declaring constants using const instead of let and
|
||
specifying the type. The value you assign to them also have to be
|
||
constant, i.e. is fixed at compile time. Here is an example:
|
||
|
||
const PI: f32 = 3.14;
|
||
|
||
Shadowing
|
||
|
||
Finally, Rust is different from many other languages in that it supports
|
||
shadowing, i.e. a variable name can be reused in a code block. This is
|
||
particularly helpful when parsing:
|
||
|
||
let mut guess = String::new();
|
||
// ... read from stdin into guess
|
||
let guess : i32 = guess.parse().unwrap();
|
||
|
||
This means there is no need to invent variable types or use type
|
||
suffixes, e.g. input_str, input_i32 (which is common in older
|
||
languages). Thanks to type inference and picking a normal name, the name
|
||
is always accurate and refactorings do not lead to having to change the
|
||
variable name.
|
||
|
||
Functions
|
||
|
||
Apart from variables, we also need functions to build reusable blocks of
|
||
code. One of the most important functions is main, which serves as the
|
||
entry point to programs that compile to executable binaries. You already
|
||
saw them in the previous unit. Here is another example:
|
||
|
||
fn mul(x: i32, y: i32) -> i32 {
|
||
x * y
|
||
}
|
||
|
||
Function bodies contain a series of statements (none in this case) and
|
||
optionally an ending expression that defines the return value (x * y in
|
||
this case). Here is how such a declaration is decomposed:
|
||
|
||
|
||
Parameters
|
||
Identifier | | Return Type
|
||
| | | |
|
||
--- ------ ------ ---
|
||
fn mul(x: i32, y: i32) -> i32 {
|
||
-- - --- -- -
|
||
| | | | |
|
||
Keyword | Type return |
|
||
| |
|
||
Identifier Begin Function
|
||
Body
|
||
|
||
In Rust’s function signatures, you MUST declare all parameter types and
|
||
they are not inferred by the compiler.
|
||
|
||
Recursion
|
||
|
||
In Rust you can call functions recursively, just like in this function
|
||
for computing the greatest common divisor using Euclid’s algorithm:
|
||
|
||
fn gcd(m: i32, n: i32) -> i32 {
|
||
if m == 0 {
|
||
n.abs()
|
||
} else {
|
||
gcd(n % m, m)
|
||
}
|
||
}
|
||
|
||
Namespaces
|
||
|
||
When you start producing more and more code, you certainly run into the
|
||
following issues:
|
||
|
||
1. you want to reuse a name (e.g. parse might be defined for multiple
|
||
types)
|
||
2. you want to group things together
|
||
3. you want to hide certain functionality for others
|
||
|
||
For this, Rust provides you with several means:
|
||
|
||
- Crates: confined collection of functionality by a single vendor. So
|
||
far, you have created a single crate and used other crates (if you
|
||
experimented).
|
||
|
||
- Modules: layered (sub-)sets of functionality within a crate.
|
||
|
||
The latter is created and used like this:
|
||
|
||
mod math {
|
||
pub fn gcd(m: i32, n: i32) -> i32 {
|
||
// ...
|
||
}
|
||
|
||
pub fn fib(n: u32) -> u32 {
|
||
// ...
|
||
}
|
||
}
|
||
|
||
fn main() {
|
||
use math::fib;
|
||
let gcd = math::gcd(30, 12);
|
||
let f = fib(3);
|
||
println!("gcd: {}, fib: {}", gcd, f);
|
||
}
|
||
|
||
The mod keyword adds a module (module / package in Python / Java). With
|
||
pub we allow gcd and fib to be accessed from the parent module.
|
||
Everything in mod math must be accessed via math::. With use, a binding
|
||
can be introduced that allows to shorten a path (see use math::fib and
|
||
the usage fib above).
|
||
|
||
We could have put math into a separate file math.rs and use it like
|
||
this:
|
||
|
||
mod math;
|
||
|
||
fn main() {
|
||
// ...
|
||
}
|
||
|
||
This way of structuring your Rust programs is further discussed in a
|
||
later unit.
|
||
|
||
Control Flow
|
||
|
||
Finally, we need to introduce control flow constructs to allow
|
||
conditions, loops, etc.
|
||
|
||
Rust is expression-based, which means that control flow expressions have
|
||
a value, like here:
|
||
|
||
let condition = true;
|
||
let number = if condition { 5 } else { 6 };
|
||
println!("The value of number is: {}", number);
|
||
|
||
Loops
|
||
|
||
loop
|
||
|
||
With loop the block of code is executed over and over again (in other
|
||
languages this is done using while true, but this is not idiomatic Rust
|
||
code). The only way to stop it is with a panic (where the whole program
|
||
ends) or a break statement.
|
||
|
||
loop {
|
||
let interval = time::Duration::from_seconds(1)
|
||
match send_heartbeat() {
|
||
Heartbeat::Success => {
|
||
thread::sleep(interval);
|
||
},
|
||
Heartbeat::Timeout => {
|
||
break;
|
||
},
|
||
Heartbeat::Error => panic!("unexpected condition"),
|
||
}
|
||
}
|
||
// handle reconnection in case of timeout
|
||
|
||
Ignore the details of match for now and look at the structure: On
|
||
success, the thread waits until one interval has passed. On timeout, the
|
||
loop is exited and reconnection happens. Only on error, the program
|
||
halts.
|
||
|
||
while
|
||
|
||
Here is how to compute the greatest common divisor using the iterative
|
||
Euclidean algorithm:
|
||
|
||
fn gcd(mut m: i32, mut n: i32) -> i32 {
|
||
while m != 0 {
|
||
let old_m = m;
|
||
m = n % m;
|
||
n = old_m;
|
||
}
|
||
n.abs()
|
||
}
|
||
|
||
for
|
||
|
||
Finally, we have the for loop that works on iterators (they are covered
|
||
in a later unit). For now, consider (n..m), which gives you a range from
|
||
\(n\) to \(m-1\) (i.e. exclusive range). Here, this is used to compute
|
||
the Fibonacci number with a loop:
|
||
|
||
fn fib(n: u8) -> u16 {
|
||
let mut fib = (1, 1);
|
||
for _ in 0..n {
|
||
fib = (fib.1, fib.0 + fib.1)
|
||
}
|
||
fib.0
|
||
}
|
||
|
||
fn main() {
|
||
let n = 4;
|
||
let res = fib(n);
|
||
println!("fib({}) = {}", n, res);
|
||
}
|
||
|
||
Fundamentals of Dependability
|
||
|
||
Dependability is a broad term with lots of different meanings and
|
||
subsumes a large set of properties that all contribute to a system that
|
||
one can depend on. The goal of this section is to a) give you an
|
||
intuitive introduction to the different terms, using everyday examples,
|
||
and b) show you how these terms are defined. These definitions are
|
||
important, as developing dependable system often involves people from
|
||
very different backgrounds (e.g. safety, security, psychology,
|
||
philosophy, …) and is applied in different domains (e.g. medical,
|
||
transport, manufacturing, energy, …).
|
||
|
||
But first, let’s start from scratch with some psychology. Safety and
|
||
security are — relatively fundamental — human needs:
|
||
|
||
{{#include img/maslow.svg }}
|
||
|
||
Androidmarsexpress, Maslow’s Hierarchy of Needs2, colors changed by
|
||
Andreas Schmidt, CC BY-SA 4.0
|
||
|
||
Hence, we as humans long for our environment to be safe and secure,
|
||
i.e. that it can be depended upon for living. In the following, we look
|
||
at various everyday situations, where concepts are put in place to
|
||
provide us with safety and security.
|
||
|
||
At Home
|
||
|
||
Everyone knows that “most accidents happen at home”. But what is an
|
||
accident?
|
||
|
||
Definition: Accident is an undesired and unplanned (but not
|
||
necessarily unexpected) event that results in a specified level of
|
||
loss. - Safeware
|
||
|
||
At this stage, this might sound rather cryptic to you. Speaking of our
|
||
household example, a loss could be that you cut your finger when using a
|
||
knife — which is undesired and unplanned as you have to put something on
|
||
the wound and have to stop cutting stuff for a while.
|
||
|
||
If at some point you almost cut your finger, that is an incident and
|
||
might tell you that you should concentrate more to avoid this… or ask
|
||
someone else to do it.
|
||
|
||
Definition: Incident or near-miss is an event that involves no loss
|
||
(or only minor loss) but with the potential for loss under different
|
||
circumstances. - Safeware
|
||
|
||
From this definition, it could also be that your small cut is an
|
||
incident, while chopping your finger off is an accident. As you see,
|
||
these definitions lead to subjective results — a pattern that you
|
||
encounter throughout safety considerations.
|
||
|
||
Now apart from knives, what else do you have at home that is dangerous?
|
||
Consider the following:
|
||
|
||
- Electric sockets
|
||
- Hot liquids
|
||
- Slippery floor
|
||
- Toxic material (medicines, sanitizers)
|
||
- Sharp edges
|
||
|
||
These things cause hazards and expose risks:
|
||
|
||
Definition: Hazard is a state or set of conditions of a system (or an
|
||
object) that, together with other conditions in the environment of the
|
||
system (or object), will lead inevitably to an accident (loss
|
||
event). - Safeware
|
||
|
||
Definition: Risk is the hazard level combined with (1) the likelihood
|
||
of the hazard leading to an accident (sometimes called danger) and (2)
|
||
hazard exposure or duration (sometimes called latency). - Safeware
|
||
|
||
Now you know what hazards and risks are, we ask you in the work sheet to
|
||
find safety concepts. But what is such a concept?
|
||
|
||
Definition: A safety concept is a measure taken to remove, contain, or
|
||
reduce a hazard. - Own
|
||
|
||
But hold on, we have not defined Safety yet?
|
||
|
||
Definition: Safety is freedom from accident or losses. - Safeware
|
||
|
||
We also have to make a distinction:
|
||
|
||
- Safety is freedom from negative impact from the environmente.g. not
|
||
getting hurt by a falling roof tile
|
||
- Security is freedom from negative impact by a hostile persone.g. not
|
||
getting hurt by a falling piano that was pushed to hurt us
|
||
|
||
But instead of walking the streets in the city and keeping close to
|
||
buildings from which tiles and pianos can fall, we leave the house
|
||
towards the train station.
|
||
|
||
Construction Site
|
||
|
||
When we cross a construction site, we realize that there are concepts
|
||
that are used to provide safety. Consider the following:
|
||
|
||
- A fence surrounds the place.
|
||
- People wear helmets.
|
||
- There are warning signs all over the place.
|
||
|
||
On the work sheet, we ask you to think about the hazards that lead to
|
||
these concepts.
|
||
|
||
Train Station
|
||
|
||
Arriving at the train station, we ask ourselves what are objects /
|
||
concepts that are related to our current idea of dependability (safety +
|
||
security)?
|
||
|
||
- Safety window glasses.
|
||
- Doors only open when the train stops.
|
||
- Metro walls between landing and vehicle (in metropolitan areas).
|
||
- Staff members with pepper spray walking the place.
|
||
|
||
Note that, at the train station, we also care that the train comes on
|
||
time, so that we don’t get a delay in our journey’s schedule. This leads
|
||
us to reliability:
|
||
|
||
Definition: Reliability is the probability that a piece of equipment
|
||
or component will perform its intended function satisfactorily for a
|
||
prescribed time and under stipulated environmental conditions. -
|
||
Safeware
|
||
|
||
Before you are allowed to enter a train, you have to purchase a ticket
|
||
at the ticket machine. Occasionally, this machine is defective and needs
|
||
to be repaired by staff members — it is unavailable. This leads us to
|
||
availability:
|
||
|
||
Definition:
|
||
|
||
Availability is the readiness for correct service. - Laprie et al.
|
||
|
||
Availability is the fraction of time the system is operational. -
|
||
Better Embedded System Software
|
||
|
||
Onward with Dependability
|
||
|
||
With these intuitive definitions and examples from the previous section
|
||
in mind, we want to stress that for the rest of the course, we are
|
||
following:
|
||
|
||
- “Basic Concepts and Taxonomy of Dependable and Secure Computing” by
|
||
Laprie et al.
|
||
- Safeware (engineering terms are prefered over taking misused terms
|
||
by computer scientists)
|
||
|
||
They define a set of dimensions that form dependability.
|
||
|
||
Dependability Dimensions
|
||
|
||
- Availability: readiness for correct service
|
||
- Reliability: continuity of correct service
|
||
- Safety: absence of catastrophic consequences on the user(s) and the
|
||
environment
|
||
- Integrity: absence of improper system alterations
|
||
- Maintainability: ability to undergo modifications and repairs
|
||
- Confidentiality: the absence of unauthorized disclosure of
|
||
information
|
||
- Security: concurrent existence of confidentiality and integrity and
|
||
availability
|
||
- Survivability: chance of surviving a catastrophic failure
|
||
|
||
As you see, all dimensions are about a service provided. The dimensions
|
||
are orthogonal to each other and you should not assume any relationship
|
||
between them. For instance, a system might be highly available, but
|
||
totally unmaintainable because the inventor ceased to exist. Similarly,
|
||
a system might be perfectly safe, but not perform its original service
|
||
(i.e. it is unavailable). As with other engineering problems that are
|
||
quantified with respect to different dimensions, one cannot maximize all
|
||
of them simultaneously – hence trade-offs are required.
|
||
|
||
During this course, we regularly refer back to these dimensions and
|
||
highlight which tool, process, or language construct has an effect on
|
||
which dependability dimension.
|
||
|
||
The Eternal Chain of Events
|
||
|
||
Before we dive into detail, we also look at faults that interfere with
|
||
these dimensions:
|
||
|
||
Fault (active / dormant) Error Failure Activate next fault …
|
||
|
||
Definition: Fault is the adjudged or hypothesized cause of an error. -
|
||
Taxonomy of Dependable Computing
|
||
|
||
Definition: Error is a design flaw or deviation from a desired or
|
||
intended state. - Safeware
|
||
|
||
Definition: Failure is the nonperformance or inability of the system
|
||
or component to perform its intended function for a specified time
|
||
under specified environmental conditions. - Safeware
|
||
|
||
Dependability Means
|
||
|
||
In essence, achieving dependability is about dealing with faults. This
|
||
can be achieved both at system design-time and operation-time using the
|
||
following classes of approaches:
|
||
|
||
- Fault prevention is about avoiding the occurrence or introduction of
|
||
faults in the first place.
|
||
- Fault tolerance is about keeping the service operational, even if a
|
||
fault happens.
|
||
- Fault removal is about reducing the number and decreasing the
|
||
severity of faults.
|
||
- Fault forecasting is about estimating the current number of faults,
|
||
in order to predict future faults.
|
||
|
||
These faults can be further divided, depending on when they occur:
|
||
|
||
- Development faults may occur while a system is envisioned and
|
||
created.
|
||
- Physical faults include everything that involves hardware (and
|
||
non-electric parts too).
|
||
- Interaction faults are everything where the external environment is
|
||
the cause.
|
||
|
||
For the remainder of the course, we encounter approaches to improve the
|
||
different dependability dimensions of a service and improve our systems
|
||
and software by tackling faults.
|
||
|
||
Dependability Process
|
||
|
||
The following diagram shows the development cycle used to produce
|
||
dependable products — commonly known as the V (or “Vee”) Model due to
|
||
its shape:
|
||
|
||
+--------------+ +--------------+
|
||
| Specify |<----- Traceability & Verification ----->| Acceptance |-> Product
|
||
| Product | Test Plan & Results | Test |
|
||
+--------------+ +--------------+
|
||
Product | ^ Software
|
||
Requirements | | Test
|
||
V | Results
|
||
+--------------+ +--------------+
|
||
| Specify |<--------------------------->| Software |
|
||
| Software | Test Plan & Results | Test |
|
||
+--------------+ +--------------+
|
||
Software | ^ Integration
|
||
Requirements | | Test
|
||
V | Results
|
||
+--------------+ +--------------+
|
||
| Create SW |<--------------->| Integration |
|
||
| Architecture | Test Plan & | Test |
|
||
+--------------+ Results +--------------+
|
||
High | ^ Unit
|
||
Level | | Test
|
||
Design V | Results
|
||
+--------------+ +--------------+
|
||
| Design | <-> | Unit |
|
||
| Modules | | Test |
|
||
+--------------+ +--------------+
|
||
Detailed | ^
|
||
Design | | Source
|
||
V | Code
|
||
+--------------+
|
||
| Implement |
|
||
+--------------+
|
||
|
||
What we see from this diagram are multiple things:
|
||
|
||
- in the left half, we go from high-level product specification down
|
||
to the minutiae of implementing software code
|
||
- in the right half, we go from pieces of source code to a
|
||
full-fledged product
|
||
- on each horizontal layer, we have a specification on the left and a
|
||
verification means on the right — both having the same abstraction
|
||
level
|
||
|
||
The V process is, quite helpfully in terms of abbreviations, amended by
|
||
so-called verification & validation (V & V) activities. Note that these
|
||
two V-terms are often used in confusing or even wrong ways — even by
|
||
laws and standards. We use the following (German) article as a basis for
|
||
this course.
|
||
|
||
Definition: Verification is the check, using objective means, that
|
||
specified properties (of products or components) are fulfilled. -
|
||
Translation of Johner-Institute Definition
|
||
|
||
In our diagram, verification activities deal with the horizontal,
|
||
left-to-right, double-ended arrows. Hence, a verification always deals
|
||
with a single layer in the V-model, e.g. correctness of software modules
|
||
is proven by unit tests.
|
||
|
||
The article further defines validation:
|
||
|
||
Definition: Validation is the check, using objective means, that the
|
||
specified users can, in a specified context, reach specified usage
|
||
goals. - Translation of Johner-Institute Definition
|
||
|
||
Note that this is a high-level activity where (the whole / one iteration
|
||
of the) V process has been executed. We often find another definition of
|
||
validation in everyday dependability conversation, which works as
|
||
follows:
|
||
|
||
Definition: Validation is the check that a step in the development
|
||
process produces the intended outputs.
|
||
|
||
Looking at the diagram, this means that validation activities deal with
|
||
the top-down/bottom-up, single-ended arrows. For instance, peer-review
|
||
can be used as a means to validate the transformation of software
|
||
requirements into a high-level design.
|
||
|
||
U01: Computing Dependably
|
||
|
||
Now that you have your system up and running, we want to get our hands
|
||
dirty with Rust by learning how to compute with Rust. But we also have a
|
||
bit of brain work to do by digging into what dependability is.
|
||
|
||
S01: Sample Solution
|
||
|
||
Rust
|
||
|
||
fn is_prime(n: u32) -> bool {
|
||
let limit = (n as f64).sqrt() as u32;
|
||
!(2..limit).any(|a| n % a == 0)
|
||
}
|
||
|
||
Dependability
|
||
|
||
- Household Safety Concepts:
|
||
- Knives are stored in a drawer; sharp knives have a sheath.
|
||
- Electric sockets are connected to a fuse.
|
||
- Slippery floors get warning signs.
|
||
- Toxic materials are stored behind locked doors.
|
||
- On edges you put on bumpers.
|
||
- Journey reliability concepts:
|
||
- Aim for an earlier train. If there is a delay, you might still
|
||
be on time.
|
||
- Be early at the train station to make sure you don’t miss the
|
||
departure.
|
||
- Kitchen availability concepts:
|
||
- Have a french press if you Kaffeevollautomat fails.
|
||
- Have a microwave to prepare food in case your oven is broken.
|
||
- Have more knives than you need, so that more people can work.
|
||
|
||
Summary
|
||
|
||
What did you learn?
|
||
|
||
- Rust:
|
||
- How variables, statements and expressions form functions.
|
||
- How control flow can be specified.
|
||
- How modules allow to group related code together.
|
||
- Dependability:
|
||
- What dependability, safety, security, … are and why they are
|
||
important.
|
||
- How the world around you is full of hazards, risks, as well as
|
||
accidents and incidents.
|
||
- How faults lead to errors to failures and potentially repeat.
|
||
|
||
Where can you learn more?
|
||
|
||
- Rust:
|
||
- Rust Book: Ch. 02, 03, 07.2
|
||
- Programming Rust: Ch. 03 + 06
|
||
- Rust in Action: Ch. 02
|
||
- cheats.rs: Control Flow
|
||
- Exercism Rust Track
|
||
- Dependability:
|
||
- Embedded Software Development for Safety-Critical Systems: Ch.
|
||
02
|
||
- Safeware: Ch. 08 + 09
|
||
- Safety is a System Property, not a Software Property
|
||
|
||
W01: Work Sheet
|
||
|
||
Rust
|
||
|
||
- The section on Rust programming concepts is intentionally kept
|
||
brief. Make sure you read the associated Rust Book chapter if you
|
||
couldn’t follow or have doubts. This allows you to answer the
|
||
following questions:
|
||
- How are immutable variables different from constants?
|
||
- How is shadowing different from reassignment of a mut variable?
|
||
- Why does let number = if condition { 5 } else { "six" }; not
|
||
compile?
|
||
- Implement a function for computing if a number is a prime number:
|
||
fn is_prime(n: u32) -> bool. % is the modulo operator, which should
|
||
be helpful.
|
||
|
||
Rustlings
|
||
|
||
The Rustlings project provides small exercises to practice specific
|
||
features of the Rust language.
|
||
|
||
- Setup rustlings following this tutorial.
|
||
|
||
- Do the Rustlings exercises variables, functions, and if.
|
||
|
||
Dependability
|
||
|
||
- Based on the hazards we identified at home, name safety mechanisms
|
||
that avoid that these hazards cause harm to a human. State if they
|
||
prevent/tolerate/remove/predict faults.
|
||
|
||
- For the construction site, you learned about safety concepts. Which
|
||
hazard are they tackling? Do they prevent/tolerate/remove/predict
|
||
faults?
|
||
|
||
- Your journey involves using the train. What can you do to improve
|
||
the reliability of your journey (i.e. the odds of you reaching the
|
||
destination on time)?
|
||
|
||
- Consider your kitchen. Where do you have availability concepts?
|
||
|
||
cargo Tools
|
||
|
||
When you develop code, there are many things that can bug you:
|
||
|
||
- Broken (aka non-compiling) code on main/development branches.
|
||
|
||
- Badly formatted code.
|
||
|
||
- Smelly code (e.g. unnecessary mutability, &Vec<T>).
|
||
|
||
- Inappropriate 3rd party licenses.You maintain a permissive FOSS
|
||
project and someone adds a GPL3 dependency.
|
||
|
||
- Uncovered code.Code not covered by tests.
|
||
|
||
- Undocumented code.
|
||
|
||
- Manual builds.
|
||
|
||
- No cross-platform support.
|
||
|
||
- Manual releases… sent via email.
|
||
|
||
The good news is, Rust’s cargo is here to help with its many functions
|
||
(advanced tools in brackets):
|
||
|
||
- cargo check
|
||
|
||
- cargo-about
|
||
|
||
- cargo-udep
|
||
|
||
- cargo clippy
|
||
|
||
- cargo fmt
|
||
|
||
- cargo test (cargo-tarpaulin)
|
||
|
||
- cargo doc
|
||
|
||
- cargo build (cargo-cross)
|
||
|
||
- cargo publish
|
||
|
||
check
|
||
|
||
Comes with your rustup installation.
|
||
|
||
First and foremost, the cargo command you will probably use the most:
|
||
check. This command checks your code and all of its dependencies for
|
||
errors (type system, ownership, …). At the same time, it does not create
|
||
compiled artifacts, which means it completes very quickly so you have
|
||
rapid feedback.
|
||
|
||
about
|
||
|
||
cargo install cargo-about
|
||
|
||
Some of the software we develop at DSys is open source software. This
|
||
means that it must be appropriately licensed and we have to track the
|
||
licenses of our third-party libraries as well (more on this later).
|
||
cargo-about helps you by: * listing crates you use * collecting their
|
||
licenses * checking for acceptance
|
||
|
||
To support this, there is a about.toml configuration file that
|
||
defines: * which licenses are [accepted] * [[DEPENDENCY.additional]] if
|
||
the license is not discoverable by cargo
|
||
|
||
Finally, there is a about.hbs template HTML file to generate a webpage
|
||
that contains all licenses of third-party crates. cargo-about exits with
|
||
non-zero when a crate uses a non-accepted license, which makes it ideal
|
||
for continuous integration tests.
|
||
|
||
You can set up cargo-about for your project with cargo about init.
|
||
Afterwards, the following let’s you generate the licenses page:
|
||
|
||
cargo about generate about.hbs > license.html
|
||
|
||
udeps
|
||
|
||
cargo install cargo-udeps --locked
|
||
|
||
During development, it can happen that you add a crate that later
|
||
becomes unused, i.e. you are no longer using any of its functionality.
|
||
cargo-udeps helps you identify exactly these crates and makes your
|
||
Cargo.tomls more cleaned up. It requires nightly, so you typically run
|
||
it like this:
|
||
|
||
cargo +nightly udeps --all-targets
|
||
|
||
Note however, that it does not recognize an unused dependency that is
|
||
relevant transitively.
|
||
|
||
clippy
|
||
|
||
rustup component add clippy
|
||
|
||
Remember Karl Klammer? He is back in Rust and way less annoying. clippy
|
||
works similar to check, but provides more information, e.g. warnings for
|
||
common mistakes:
|
||
|
||
Note that after check you need to clean before clippy (so it is advised
|
||
to use clippy over check when you are interested in hints).
|
||
|
||
fmt
|
||
|
||
cargo fmt allows to automatically & consistently format all the files in
|
||
your crate. Furthermore, it can be used as a linter, indicating if the
|
||
crate fulfills all formatting rules or not.
|
||
|
||
When reading code, formatting can help or impede understanding what is
|
||
going on. While a particular formatting rule might not be measurably
|
||
better than another (i.e. having all language elements of a certain type
|
||
in camelCase vs. snake_case makes no difference), it is important that
|
||
the formatting is consistent, so that readers can focus on the code
|
||
itself and not the formatting. With Rust, we tend to build systems out
|
||
of many third party dependencies, which means that the total number of
|
||
different authors that contribute to the code used to compile a single
|
||
piece of software can easily be in the 10s or even beyond 100. Hence,
|
||
cargo fmt is a valuable tool, as it comes with a default configuration
|
||
that is the convention for most Rust developed (and published as FOSS).
|
||
While you can configure it using rustfmt.toml, you should not and
|
||
instead stick to the default configuration. This should also help to
|
||
keep yourself out of religious discussions that sometimes emerge in
|
||
communities where there is no well-established standard.
|
||
|
||
build
|
||
|
||
So far, we have only checked our code for functional or esthetical
|
||
issues, but never actually created working software. With cargo build
|
||
you can build a binary or library. If you want to use it productively,
|
||
add the --release flag, which tells the compiler to optimize:
|
||
|
||
cargo build --release
|
||
|
||
cross-compilation made easy
|
||
|
||
cargo install cross
|
||
|
||
If you want to create software for multiple target platforms (Windows,
|
||
Linux, different architectures, …), you can use cross, behaves as a 1:1
|
||
replacement for cargo (i.e. it uses the same CLI parameters). cross
|
||
makes use of Docker to pull in appropriate build environments. If you,
|
||
for instance, want to create a standalone linux binary (using musl) you
|
||
can do so like this:
|
||
|
||
cross build --release --target x86_64-unknown-linux-musl
|
||
|
||
publish
|
||
|
||
You have already worked with many other crates that you have downloaded
|
||
from crates.io. Now you might ask yourself how you can publish something
|
||
there? In order to learn this (and not pollute crates.io with our
|
||
experiments), we provide you with a private crate registry based on
|
||
kellnr. To work with this, you have to execute the following steps:
|
||
|
||
- Log in at kellnr.hod.cs.uni-saarland.de using the credentials
|
||
provided to you.
|
||
- Create an Authentication Token by going to Settings. Keep that token
|
||
somewhere, it is only displayed once.
|
||
- Change your local ~/.cargo/config.tomland add the following:
|
||
|
||
[net]
|
||
git-fetch-with-cli = true
|
||
|
||
[registries]
|
||
kellnr = { index = "git://kellnr.hod.cs.uni-saarland.de/index", token = "YOURTOKEN" }
|
||
|
||
- Alternatively, you can use cargo login to connect to the registry or
|
||
use --token <YOURTOKEN> when you publish.
|
||
- Now in the crate you want to publish, make sure the Cargo.toml looks
|
||
like this:
|
||
|
||
[package]
|
||
# ...
|
||
publish = ["kellnr"]
|
||
|
||
Now you are ready to publish. But keep in mind:
|
||
|
||
This is irrevocable! Once published, forever it shall remain!
|
||
Probably.
|
||
|
||
cargo publish will only work if some requirements are met:
|
||
|
||
1. The name is not taken
|
||
2. Your crate can be built
|
||
3. Your Cargo.toml does not prohibit publishing
|
||
4. You specified the authors, license, homepage, documentation,
|
||
repository, and readme file plus provided a description in your
|
||
Cargo.toml. (only true for crates.io)
|
||
5. Your local files do not diverge from the ones in the repository
|
||
|
||
A dry run performs all checks without publishing and does not require a
|
||
login → perfect for continuous integration tests.
|
||
|
||
Always check first with cargo publish --dry-run.
|
||
|
||
GitLab
|
||
|
||
GitLab is open source software to collaborate on code.
|
||
|
||
GitLab offers:
|
||
|
||
- Git repositories and source code management
|
||
- Continuous integration and deployment
|
||
- Issue trackers
|
||
- Wikis
|
||
- Hosting static websites
|
||
- Package registries
|
||
|
||
In Free Open Source Software (FOSS) jargon, platforms such as GitLab,
|
||
GitHub, BitBucket are called software forges.
|
||
|
||
In case you do not know Git, please check out Learn Git Branching and
|
||
this chapter of the Missing Semester.
|
||
|
||
For the sake of this course, you are going to use the first two
|
||
features, as they relate the most to dependability. Notably, the
|
||
repository creates a traceable history of changes to files that are part
|
||
of the repository.
|
||
|
||
Projects
|
||
|
||
Projects can be created on the projects view and hit the New Project
|
||
button. Afterwards go for Create blank project, pick a name and
|
||
description. It is good practice to initialize the project with a
|
||
README.md. This file can be used to store helpful information that
|
||
first-time users of your repo see immediately. During the course, we ask
|
||
you to either 1) turn some of your projects public or 2) give a special
|
||
user access to the project, so that we can access them.
|
||
|
||
Continuous Software Development
|
||
|
||
There are various terms you find online (like Continuous Integration,
|
||
CI/CD, DevOps) that relate to the following practice:
|
||
|
||
When developing code collaboratively, regularly merge, check, test,
|
||
build, and even deploy your software in a shared environment. In this
|
||
context, regular means once per day or even multiple times a day.
|
||
|
||
The idea behind this is that the changes a developer makes only deviate
|
||
from the mainline (the shared ground truth) for a short period of time
|
||
(while developing fixes or new features). Afterwards, the code is merged
|
||
with changes by others, and it is checked if the changes still conform
|
||
with good practice in the project (e.g. they always build successfully,
|
||
don’t introduce failing tests, …).
|
||
|
||
If such a regular integration happens, we speak about continuous
|
||
integration (CI). Some companies even take one step further, i.e. when
|
||
an integration is successful, the changed code is released
|
||
(e.g. deployed to production environment, packaged, containerized, …).
|
||
The latter is called continuous delivery (CD).
|
||
|
||
With GitLab and CI/CD, every time you push your Git commits, a set of
|
||
jobs (called pipeline) is executed to integrate and deploy your
|
||
software.
|
||
|
||
Using continuous methods is recommended when developing dependable
|
||
software. This approach ensures a sufficient level of quality for new
|
||
commits that get pushed or merged to the mainline — in an automated
|
||
fashion. Depending on the tools used in the pipelines (compilers, static
|
||
checkers, linters, …), different qualities can be assessed.
|
||
|
||
gitlab-ci.yml
|
||
|
||
In GitLab, the .gitlab-ci.yml in the root of your project declares
|
||
almost everything related to your CI/CD pipeline:
|
||
|
||
image: registry.gitlab.com/hands-on-dependability/docker-rust:latest
|
||
|
||
stages:
|
||
- check
|
||
- test
|
||
- deploy
|
||
...
|
||
|
||
check:
|
||
stage: check
|
||
tags:
|
||
- docker
|
||
before_script:
|
||
- rustc --version
|
||
- cargo --version
|
||
- mkdir -p .cargo_cache
|
||
- export CARGO_HOME="${PWD}/.cargo_cache"
|
||
script:
|
||
- cargo check
|
||
cache:
|
||
key: ${CI_COMMIT_REF_SLUG}
|
||
paths:
|
||
- .cargo_cache/
|
||
- target/
|
||
|
||
When using private GitLab repositories as cargo dependencies within your
|
||
CI/CD pipeline, create a deploy token and use it like this:
|
||
|
||
before_script:
|
||
- git config --global url."https://gitlab-ci-token:${REPO_ACCESS_TOKEN}@${CI_SERVER_HOST}/".insteadOf "https://git.example.com/"
|
||
|
||
There’s much to learn about CI/CD, check it out.
|
||
|
||
Upload & Release
|
||
|
||
Before, we learned how to publish crates. Another common form of
|
||
releasing your software is by providing a release in your software
|
||
forge. For Gitlab, you can use the Package Registry for various package
|
||
managers. There is no crates support yet, so we upload generic files.
|
||
|
||
After we have called cross (for our fancy CLI app fcapp), we also set
|
||
the following environment variable:
|
||
|
||
export LINUX_X86_64_ASSET="fcapp-v${PACKAGE_VERSION}-x86_64-unknown-linux-musl.tar.gz"
|
||
|
||
Afterwards, our upload job looks like this:
|
||
|
||
upload:
|
||
stage: upload
|
||
image: curlimages/curl:latest
|
||
needs:
|
||
- job: build
|
||
artifacts: true
|
||
rules:
|
||
- if: $CI_COMMIT_TAG
|
||
script:
|
||
- |
|
||
tar -czvf ${LINUX_X86_64_ASSET} -C target/${LIN_TARGET}/release fcapp
|
||
- |
|
||
curl --upload-file ${LINUX_X86_64_ASSET} \
|
||
--header "JOB-TOKEN: ${CI_JOB_TOKEN}" ${PACKAGE_REGISTRY_URL}/${LINUX_X86_64_ASSET}
|
||
|
||
To add a release to GitLab’s Release section (<repo url>/-/releases), we
|
||
do the following:
|
||
|
||
release:
|
||
stage: release
|
||
image:
|
||
registry.gitlab.com/gitlab-org/release-cli
|
||
needs:
|
||
- job: build
|
||
artifacts: true
|
||
- job: upload
|
||
artifacts: false
|
||
rules:
|
||
- if: $CI_COMMIT_TAG
|
||
script:
|
||
- |
|
||
release-cli create --name "Release $PACKAGE_VERSION" --tag-name v$PACKAGE_VERSION \
|
||
--assets-link "{\"name\":\"${LINUX_X86_64_ASSET}\",\"url\":\"${PACKAGE_REGISTRY_URL}/${LINUX_X86_64_ASSET}\"}"
|
||
|
||
U02: Fill Your Toolbox
|
||
|
||
Excellent, you got the first pieces of Rust software running on your
|
||
system. But as you can imagine, your job as DSys involves more than only
|
||
running your software on your own system.
|
||
|
||
Hence, this unit gives you a deep dive into Rust’s swiss knife cargo
|
||
that helps in many everyday activities (like testing, linting, …).
|
||
Further, we introduce the GitLab collaboration software that allows you
|
||
to work together with the other DSys engineers.
|
||
|
||
As you are now setup to contribute to production code, we have to
|
||
introduce you to Test-First Coding, as that is the way DSys implements
|
||
new features (our goal is to have almost all code covered by unit
|
||
tests). In our experience, this paradigm leads to more dependable code
|
||
and made us more productive when turning requirements into code.
|
||
|
||
Test-First in Action
|
||
|
||
- Project Template
|
||
- Consider Task
|
||
- Setup Testing Architecture
|
||
- Develop Logic Test-First
|
||
- Discuss about Testing Dimensions (Qualitative, Quantitative)
|
||
|
||
S02: Sample Solution
|
||
|
||
Test-First Rust Coding & Source Control
|
||
|
||
- Problem Domain:
|
||
|
||
- FizzBuzz algorithm (complicated), as it is not perfectly clear
|
||
how this should be done.
|
||
- CLI (complicated).
|
||
|
||
- cargo tarpaulin --verbose --all-features --ignore-tests --workspace --timeout 120 --out Xml
|
||
|
||
- Consider fizzbuzz.zip in dCMS.
|
||
|
||
GitLab CI
|
||
|
||
- Straightforward.
|
||
|
||
Summary
|
||
|
||
What did you learn?
|
||
|
||
- How cargo’s various commands help you in your process of developing
|
||
software (building, checking, formatting, releasing).
|
||
- How GitLab provides you with a place to store and work on your code
|
||
projects; including ways to automatically run cargo and other jobs.
|
||
- How test-first coding helps to produce dependable code that is
|
||
testable and well-structured.
|
||
|
||
Where can you learn more?
|
||
|
||
- Rust & cargo:
|
||
- Rust Book: Ch. 11
|
||
- Programming Rust: Ch. 02 + 08
|
||
- Rust for Rustaceans: Ch. 06
|
||
- cheats.rs: Cargo
|
||
- GitLab
|
||
- GitLab Documentation
|
||
- Netways GitLab Training
|
||
- Test-First Coding:
|
||
- Test-First Coding by Ralf Westphal (in German)
|
||
- Test-Driven Development Munich School (in German)
|
||
- Effective Software Testing
|
||
|
||
Test-First Coding
|
||
|
||
You might have heard about legacy code. According to Michael
|
||
Feathers[2], this is code that is not covered by tests to check for
|
||
correct behaviour — hence it is not dependable as the maintainability is
|
||
lacking.
|
||
|
||
There is also the notion of ancient code. Code created by one or more
|
||
people that have left the organization that maintains the code.
|
||
|
||
If you strive for dependable systems, it is important to avoid both
|
||
legacy and ancient code. Avoiding ancient code is an organizational
|
||
matter, i.e. making sure that multiple people know the code well and
|
||
that information is available in the organization[3]. Avoiding legacy
|
||
code is done by writing tests. In the domain of safety-critical
|
||
software, tests are even checked during certification activities to
|
||
prove that the code is dependable.
|
||
|
||
In practice, there are multiple approaches as to when to write tests.
|
||
Some argue that all tests must be specified before any coding starts,
|
||
while a large portion of industry writes tests after the code was
|
||
produced or to reproduce a bug that has been found in production.
|
||
|
||
In this section, we have a look at test-first coding, a practice that
|
||
helps you develop dependable code irrespectively of where you work… and
|
||
ensures you do not produce legacy code.
|
||
|
||
Motivation
|
||
|
||
But before we get started, let’s think about why we would write
|
||
automated tests. There are lots of good reasons to do so:
|
||
|
||
- Comfortable: Automated tests are easy to run and require no manual
|
||
effort.
|
||
- Reliable: There is no way to introduce manual errors while testing.
|
||
- Traceable: Requirements are documented, as tests are executable
|
||
specifications.
|
||
- Usable: Usage of code is documented, as tests are examples.
|
||
- Cheap: Tests have low costs, particularly lower than having a bug in
|
||
production code.
|
||
- Stable: Acceptance tests become regression (i.e. the behaviour has
|
||
changed in an increment) tests over time, making software less
|
||
brittle.
|
||
- Automatable: Tests can integrate into a larger automation framework
|
||
(CI).
|
||
- Observable: Code test coverage can be observed.
|
||
- Ordered: Code has more order as test automation requires code to be
|
||
ready for testing.
|
||
|
||
Now that you are convinced that you must write tests, the question is
|
||
why should you write them first?
|
||
|
||
- With test-first, our mind is still in conceptual solution mode and
|
||
not in technical coding mode. Hence we think about the problem and
|
||
not the concrete approach to solve it — leading to more expressive
|
||
solutions.
|
||
- Test-first ensures that no feature is added without tests, making
|
||
sure that logic is not an accident.
|
||
- Test-first enables better interfaces, as we approach a problem from
|
||
the user perspective of an interface and not from the solution
|
||
provider.
|
||
|
||
The ideal starting point for implementing logic is, when you have an
|
||
explicit function signature and a set of acceptance test cases.
|
||
|
||
Everything else is premature coding — creating production code without
|
||
having at least one “red” (failing) acceptance test.
|
||
|
||
Problem Complexity Continuum
|
||
|
||
Before we dig into writing tests, we want to have a look at problems of
|
||
varying difficulty. We start with the domain of travelling as an anology
|
||
and head over to coding problems right away.
|
||
|
||
Traveling Problems
|
||
|
||
Here are four tasks, with increasing difficulty:
|
||
|
||
- Commute to your school.You almost do it without thinking, as you did
|
||
it every day.
|
||
|
||
- Travel to Norddeich Mole.You (probably) weren’t there yet, but know
|
||
how to drive a car or book a train.
|
||
|
||
- Travel to Chhatrapati Shivaji Maharaj Vastu Sangrahalaya(formerly:
|
||
Prince of Wales museum in Mumbai).Even if you know how to book
|
||
international flights, using the Indian local transport is novel to
|
||
you.
|
||
|
||
- Travel to Mars.Nobody did that before…
|
||
|
||
Coding Problems
|
||
|
||
Assume your supervisors asks you to:
|
||
|
||
- Implement a Fibonacci function.You might have to look it up, but
|
||
there is a best practice for writing it.
|
||
|
||
- Implement a French Deck of Cards data structure and methods
|
||
(supporting sorting, shuffling, …).Using Ord, rand::SliceRandom and
|
||
other traits, you can make it work.
|
||
|
||
- Implement a ToDo app.Though this is the typical “Hello World”
|
||
example for MVC frameworks, the customer might have special things
|
||
in mind… you have to figure out things while you go.
|
||
|
||
- Implement a Corona Warning app.Assume for a moment it is March 2020…
|
||
nobody has done it before and there are tons of technical and legal
|
||
challenges ahead.
|
||
|
||
Cynefin
|
||
|
||
The previous examples show different groups of problems, depending on
|
||
their complexity/difficulty/novelty. We consider the Cynefin framework
|
||
(Welsh for “habitat”), which can also be used for non-coding tasks:
|
||
|
||
|
||
+---------------------------+---------------------------+
|
||
| - Complex - | - Complicated - |
|
||
| | |
|
||
| Enabling constraints | Governing constraints |
|
||
| Loosely coupled | Tightly coupled |
|
||
| Probe-Sense-Respond | Sense-Analyse-Respond |
|
||
| Emergent Practice | Good Practice |
|
||
| +------+-------+ |
|
||
+--------------------| - Disorder - |-------------------+
|
||
| - Chaotic - +------+-------+ - Clear - |
|
||
| | |
|
||
| Lacking constraints | Tightly constrained |
|
||
| Decoupled | No degrees of freedom |
|
||
| Act-Sense-Respond | Sense-Categorise-Respond |
|
||
| Novel Practice | Best Practice |
|
||
| | |
|
||
+---------------------------+---------------------------+
|
||
|
||
Depending on the habitat in which your problem lies, you change your
|
||
behaviour when coding:
|
||
|
||
- If you are in “clear” habitat, start coding immediately based on the
|
||
tests. The problem is trivial, i.e. you know exactly what code to
|
||
write right away. Note that even in this case, tests are a must. If
|
||
you leave them out, you risk leaving logic uncovered that might at a
|
||
later point grow to non-trivial size.
|
||
- If you are in the “complicated” habitat, try decomposing your
|
||
problem step by step. If you are successful, partial problems are in
|
||
the clear habitat and composing them again leads to a solution for a
|
||
complicated problem.
|
||
- If you are in the “complex” habitat, use trial-and-error to learn
|
||
more about the problem. Do not touch production code, but rather
|
||
experiment in the testing code.
|
||
- If you are in “chaos” habitat, don’t work in your normal codebase,
|
||
rather create prototypes (standalone project, paper) to come up with
|
||
acceptance tests. “Chaos” is also the habitat in which legacy code
|
||
lives: no one knows what effect a change causes.
|
||
- If you are in “disorder” habitat, try segmenting your problem into
|
||
domains where you know what the habitats are and continue from
|
||
there.
|
||
|
||
Step-Wise Coding in the Clear
|
||
|
||
In the clear domain, one distinguishes between trivial problems (writing
|
||
the logic is totally straightforward) and simple (it is not 100%
|
||
straightforward). A simple problem is when it is straightforward to
|
||
derive test cases from the requirements with increasing difficulty (baby
|
||
steps).
|
||
|
||
This stepwise/nested approach tries to trivialize the simple problem.
|
||
The incremental test cases form a strict total order on difficulty,
|
||
i.e. a more difficult problem subsumes the less difficult one. All tests
|
||
are associated with a single API function.
|
||
|
||
Variation Dimensions
|
||
|
||
When writing incremental tests, we look at the problem from two types of
|
||
dimensions: a) qualitative, b) quantitative. These dimensions have
|
||
effects in our solution like this with respect to the employed data
|
||
structures and algorithms:
|
||
|
||
- Qualitative: handling different problem aspects
|
||
- Data: structs, enums
|
||
- Logic: cases
|
||
- Quantitative: handling different problem sizes
|
||
- Data: arrays, lists, iterators
|
||
- Logic: loops
|
||
|
||
In order to achieve increasing difficulty, the steps along a dimension
|
||
must be ordered:
|
||
|
||
- Quantitative: 0, 1, 2, many
|
||
- Qualitative: whatever suits the dimension; it is non-trivial to
|
||
decide which is harder
|
||
|
||
In the example at the end of this section, we specify these domains and
|
||
give the increasing difficulty steps.
|
||
|
||
The remaining domains “complex” and “complicated” are not tackled in
|
||
this section as they require more advanced techniques.
|
||
|
||
Testing in Rust
|
||
|
||
Now with this theoretical knowledge, we start doing some actual testing
|
||
in Rust. First, we learn how to run and implement tests.
|
||
|
||
cargo test or cargo tarpaulin
|
||
|
||
With cargo test, all your tests are executed in parallel. If you add
|
||
more text behind cargo test, it filters for these tests by name.
|
||
|
||
If you are also concerned for test coverage (how much of your code is
|
||
examined by a test), cargo-tarpaulin provides this (on x86_64 and
|
||
Linux). #[cfg(not(tarpaulin_include))] helps to ignore parts where you
|
||
definitely don’t want/need coverage, e.g. getters/setters.
|
||
|
||
cargo install cargo-tarpaulin
|
||
|
||
cargo tarpaulin --verbose --all-features --ignore-tests --workspace --timeout 120 --out Xml
|
||
|
||
Writing Unit Tests
|
||
|
||
Unit tests are used to check a single unit of functionality (often one
|
||
or more functions). They are defined alongside the code, usually inside
|
||
the module like this:
|
||
|
||
// code under test
|
||
fn function(n: u32) -> u32 {
|
||
// ...
|
||
}
|
||
|
||
#[cfg(test)]
|
||
mod tests {
|
||
use super::*;
|
||
|
||
#[test]
|
||
fn test_something() {
|
||
assert_eq!(function(31), 42);
|
||
}
|
||
}
|
||
|
||
Writing Integration Tests
|
||
|
||
In contrast to unit tests, integration test check the interaction of
|
||
functional units in an end-to-end fashion. These are defined in .rs
|
||
files in <project root>/tests and not part of the normal source code.
|
||
The tests are external, i.e. they have to import the crate that they are
|
||
testing and they can only access public APIs. The integration test file
|
||
usually looks like this:
|
||
|
||
use crate_under_test::function;
|
||
|
||
#[test]
|
||
fn test_something() {
|
||
assert_eq!(function(31), 42);
|
||
}
|
||
|
||
Assertions
|
||
|
||
Core to your tests are assertions that separate passing from failing
|
||
tests:
|
||
|
||
- assert!(arg), check for arg to be true.
|
||
- assert_eq!(left, right), check for left to be equal to right.
|
||
- assert_ne!(left, right), check for non-equal
|
||
- #[should_panic], annotate the test to expect a panic
|
||
|
||
Also consider pretty_assertions as a drop-in replacement to make test
|
||
failures and causes better visible.
|
||
|
||
Writing Documentation Tests
|
||
|
||
Eventually, Rust’s documentation allows to include code examples with
|
||
assertions. These are called documentation tests and make sure that your
|
||
documentation and code stay in sync.
|
||
|
||
/// # Fibonacci
|
||
/// Generates the n-th fibonacci number.
|
||
///
|
||
/// fib(n) = fib(n-1) + fib(n-2)
|
||
///
|
||
///
|
||
/// Example usage:
|
||
/// ```rust
|
||
/// let n = 5;
|
||
///
|
||
/// assert_eq!(fib(n), 8);
|
||
/// ```
|
||
pub fn fib(n: u32) -> u32 {
|
||
if n == 0 || n == 1 {
|
||
1
|
||
} else {
|
||
fib(n-1) + fib(n-2)
|
||
}
|
||
}
|
||
|
||
The resulting testable documentation looks like this when accessed via a
|
||
web interface:
|
||
|
||
[Documentation]
|
||
|
||
Table-Based Testing
|
||
|
||
Here, make use of macros, which will be explained later in U10.
|
||
|
||
Often, you have a certain pattern to your test cases, i.e. you have a
|
||
string that gets converted to a well-known value, like this:
|
||
|
||
acceptance_test!(simple,
|
||
first: "XIV", 24,
|
||
second: "MCDIX", 1409,
|
||
third: "MMXXII", 2022,
|
||
);
|
||
|
||
The approach is that we pick a module name, a test-case name and then a
|
||
list of input and output values. In Rust, this kind of table-based
|
||
testing is implemented using macro_rules!:
|
||
|
||
use crate_under_test::function;
|
||
|
||
macro_rules! acceptance_test {
|
||
($suite:ident, $($name:ident: $input:expr, $output:expr,)*) => {
|
||
mod $suite {
|
||
use super::*;
|
||
$(
|
||
#[test]
|
||
fn $name() -> () {
|
||
let out = function($input);
|
||
assert_eq!($output, out);
|
||
}
|
||
)*
|
||
}
|
||
};
|
||
}
|
||
|
||
Roman Numbers Hands-On
|
||
|
||
In the following video, we put this practice in action to solve the
|
||
following problem:
|
||
|
||
Develop a library function that converts a roman number (e.g. XIV) to
|
||
a decimal number (e.g. 14) — and vice-versa.
|
||
|
||
GitLab & Testing
|
||
|
||
Test coverage results can be observed by GitLab. In -/settings/ci_cd, go
|
||
to “Test Coverage parsing” and enter ^\d+.\d+% coverage. The resulting
|
||
chart under -/graphs/<branch_name>/charts looks like this:
|
||
|
||
W02: Work Sheet
|
||
|
||
Test-First Rust Coding and Source Control
|
||
|
||
Develop Fizz buzz test-first and using a Git repository. Here are the
|
||
requirements in prose:
|
||
|
||
fizzbuzz is a command-line utility that takes a command-line argument
|
||
n and prints all numbers 1 to n (each on a separate line) while
|
||
following the Fizz Buzz rules. Every number that is divisible by 3 is
|
||
replaced with “Fizz”. Every number that is divisible by 5 is replaced
|
||
with “Buzz”. If it is divisible by both, print “FizzBuzz”.
|
||
|
||
- Think about which habitat this problem has (consider the Cynefin
|
||
model). Explain your choice.
|
||
|
||
- Make sure you watched the “Roman Numbers Hands-On” video, showing
|
||
you the test-driven development process.
|
||
|
||
- Create a GitLab project with a Git repository, named “Fizz Buzz”.
|
||
Add the template code to the repository.
|
||
|
||
- Optionally set up cargo-tarpaulin (if you are on x86 Linux) and
|
||
track your coverage while your code tests and algorithm. Check what
|
||
happens if you disable certain tests.
|
||
|
||
- Create acceptance tests for a function fizzbuzz(n: u32) -> String.
|
||
For each case of the requirements (actual number, Fizz, Buzz,
|
||
FizzBuzz), create a dedicated #[test] function. You might also use
|
||
the macro-based approach for table-based testing. Each commit should
|
||
add either a test or respective incremental code changes (and have a
|
||
special form for the commit message, you need that for a later
|
||
unit). Use increments, where in each increment you:
|
||
|
||
- add test for one more requirement case (commit with message
|
||
starting with “test: …”) or
|
||
- change the code to make the test pass (commit with message
|
||
starting with “feat: …”).
|
||
|
||
- Finally, implement the full program that reads the CLI argument and
|
||
prints to stdout (commits should again start with “feat: …”).
|
||
|
||
GitLab Continuous Integration
|
||
|
||
- Extend your fizzbuzz project by your first CI Pipeline with
|
||
individual jobs that do the following:
|
||
- cargo tarpaulin
|
||
- cargo fmt
|
||
- cargo clippy
|
||
- Verify that they work by temporarily introducing code changes that
|
||
make the jobs fail.
|
||
|
||
Learning from the Borrow Checker
|
||
|
||
The previous sections already showed that the borrow checker might be
|
||
strict, but its help is highly appreciated as it ensures memory and
|
||
thread safety. So keep in mind:
|
||
|
||
The borrow checker is your friend, not your foe.
|
||
|
||
In addition to helping with safety, it helps to make programs more
|
||
structured.
|
||
|
||
Sea or Forest?
|
||
|
||
(Source: Programming Rust)
|
||
|
||
With the ownership system, Rust discourages the Sea of Objects that is
|
||
common in other languages:
|
||
|
||
|
|
||
V +-------+
|
||
+-------+ +------->| |-------------+
|
||
| |----+ +-------+ V
|
||
+-------+ +-------+
|
||
| +------------->| |------->
|
||
| +-------+ | +-------+
|
||
+->| |--+ +-------+ |
|
||
+-------+ +--->| |<-------------+
|
||
+-------+
|
||
|
||
In this situation, testing gets hard, as does creation of objects,
|
||
following interactions, …
|
||
|
||
Rust instead, through ownership, encourages Trees of Objects[4] which
|
||
are much easier to reason about, change, and in general: maintain.
|
||
Hence, the software can be more dependable, as it’s easier to verify and
|
||
adapt.
|
||
|
||
+-------+
|
||
| |
|
||
+-------+
|
||
|
|
||
+------------------+-------------------+
|
||
V V
|
||
+-------+ +-------+
|
||
| | | |
|
||
+-------+ +-------+
|
||
|
|
||
+------------------+-------------------+
|
||
V V
|
||
+-------+ +-------+
|
||
| | | |
|
||
+-------+ +-------+
|
||
|
|
||
+-------------+--------------+
|
||
V V
|
||
+-------+ +-------+
|
||
| | | |
|
||
+-------+ +-------+
|
||
|
||
Coupling Components
|
||
|
||
(Source: Florian Gilcher’s Talk “Ownership and Borrowing from a systems
|
||
construction point of view”.)
|
||
|
||
When we write software, we develop different components (could be as
|
||
simple as a function for now) that are dependent on each other — they
|
||
are coupled. You also learned that the borrow checker takes care of
|
||
resources (files, sockets, …), making sure that they are dropped when
|
||
they are no longer in use. With function signatures, we make it very
|
||
clear how the coupling between the caller and the callee is and we
|
||
define the handover mechanism for the function parameters.
|
||
|
||
Now assume we implement a function that writes a string to a file, and
|
||
returns success/error when completed. Let us also assume that the
|
||
function is called from some other part of our code e.g.
|
||
|
||
// ... mystery code before
|
||
let writeResult = write(exampleFile, exampleStringBuffer);
|
||
// ... mystery code after
|
||
|
||
The location in code from which the write function is called is the call
|
||
site, whereas the overall function calling write is the caller. In this
|
||
case, write is the callee i.e. the function being called.
|
||
|
||
We can come up with at least three different variants:
|
||
|
||
fn write(file: File, string_buffer: String)
|
||
-> Result<usize, io::Error> {
|
||
}
|
||
|
||
This variant is called the independent, as caller and callee are not
|
||
coupled. Instead, the callee gets both the file and the string and is by
|
||
itself responsible for cleaning up (i.e. closing the file eventually and
|
||
releasing the string buffer).
|
||
|
||
fn write(file: File, string_buffer: &str)
|
||
-> Result<usize, io::Error> {
|
||
}
|
||
|
||
This variant is called the coupled, as the caller maintains ownership of
|
||
the string buffer but passes (moves) the file to the callee. However,
|
||
the callee can break the coupling as &str can be copied into a String.
|
||
So the write function could create its own copy and become independent
|
||
from the caller.
|
||
|
||
fn write(file: &mut File, string_buffer: &str)
|
||
-> Result<usize, io::Error> {
|
||
}
|
||
|
||
This variant is called the tightly coupled, as File is neither Clone nor
|
||
Copy. Hence the callee is dependent on the caller to borrow the file and
|
||
maintain ownership.
|
||
|
||
Apart from coupled functions, there are also examples in the Rust
|
||
standard library where we have coupled types, i.e. a type is depending
|
||
on another. An example (about which we learn more in U04) is Vec<T> to
|
||
with Iter<Item = T> can be coupled.
|
||
|
||
In summary, the ownership and type systems go a long way in making
|
||
component coupling clear — and not relying on natural language
|
||
explanation in the documentation that is easy to miss/misunderstand.
|
||
|
||
U03: Own Your Memory and More
|
||
|
||
Are you ready for a short, but highly important, excursion into the one
|
||
language feature that sets Rust really apart from other programming
|
||
languages? Yes? Ok, then let’s first have a look at memory management
|
||
and its pitfalls. With these challenges in mind, Rust’s dependable
|
||
Ownership Model will be eye-opening. Its “enforcer”, the so-called
|
||
Borrow Checker is a tool to learn from, allowing you to write more
|
||
dependable code.
|
||
|
||
Memory and its Management in a Nutshell
|
||
|
||
Before we look into how Rust enables automatic & safe memory management,
|
||
we first have to understand what can go wrong with memory in the first
|
||
place.
|
||
|
||
Here is a view into a 16-bit- / 2-byte-aligned memory[5] region (each .
|
||
is a bit):
|
||
|
||
| 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
|
||
+--------------------------------------------------|
|
||
0 | . . . . . . . . . . . . . . . . |
|
||
16 | . . . . . . . . . . . . . . . . |
|
||
32 | . . . . . . . . . . . . . . . . |
|
||
|
||
An ‘aligned’ memory address is for example 16, which points to the byte
|
||
marked with x in the following:
|
||
|
||
| 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
|
||
+--------------------------------------------------|
|
||
0 | . . . . . . . . . . . . . . . . |
|
||
16 | x x x x x x x x . . . . . . . . |
|
||
32 | . . . . . . . . . . . . . . . . |
|
||
|
||
Memory is provided to your program from, for instance, an operating
|
||
system or a different lower layer. Working with memory appropriately was
|
||
— and is still today — a challenging task. In languages such as C, C++,
|
||
and others, it is the software developer’s task to handle memory. This
|
||
means to allocate memory (known as malloc) when needed, read and write
|
||
to appropriate memory locations and free/deallocate memory when it is no
|
||
longer needed. In these languages, you can also use pointers to refer to
|
||
memory — even if the location pointed to
|
||
|
||
1. does not belong to the program (i.e. cannot be read/written or
|
||
both),
|
||
2. has not yet been allocated by the program,
|
||
3. has been deallocated by the program, or
|
||
4. does not exist at all (e.g. pointer to 4711 if you only have 2k =
|
||
2048 bytes of memory or pointer to NULL)
|
||
|
||
In computing, some of these mistakes in memory management have special
|
||
names. Let’s have a look at each of them individually:
|
||
|
||
Using Uninitialized Memory
|
||
|
||
Assume that you are allowed to use the following region of memory, but
|
||
it has not been initialized. This means that nobody took the effort to
|
||
bring it to a well-defined state (e.g. all bits set to 0). Instead, we
|
||
find the following seemingly random memory content:
|
||
|
||
| 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
|
||
+--------------------------------------------------|
|
||
0 | 0 1 0 0 1 1 1 0 0 1 1 0 1 1 0 1 |
|
||
16 | 0 1 0 1 0 1 0 0 1 1 0 1 0 1 0 1 |
|
||
32 | 1 0 0 1 1 1 0 1 1 1 1 1 0 0 1 1 |
|
||
|
||
Now assume we say that our variable a is at location 16, hence it has
|
||
the bit pattern (right to left = high to low) of 00101010 = 42 [6]. If
|
||
we get another region of uninitialized memory, the value might differ.
|
||
Hence if our program relies on the value being 0 on initialization, we
|
||
are in a bad situation.
|
||
|
||
Use After Free / Double Free
|
||
|
||
Assume you use the byte following location 32 to store foo in this code:
|
||
|
||
#[derive(Debug)]
|
||
struct Foo {
|
||
bar: u16,
|
||
}
|
||
|
||
let foo = Foo { bar: 5 };
|
||
println!("{:?}", foo);
|
||
drop(foo); // `foo` is freed
|
||
println!("{:?}", foo); // `foo` is used after free
|
||
|
||
Note that this Rust code does not compile for a reason you learn later.
|
||
For now, you should notice that the println! after the drop would be a
|
||
use after free. If this were allowed it could happen that the freed
|
||
memory is used by someone else and filled with another value than 5,
|
||
leading to surprising results.
|
||
|
||
A similar situation is caused when a region of memory is freed twice,
|
||
which can (in languages such as C) lead to invalid state of memory
|
||
allocations. This is called a double free and can lead to security
|
||
issues.
|
||
|
||
Buffer Over- or Underflow
|
||
|
||
While we will look in more detail at arrays later, for now just imagine
|
||
that they are a fixed number of elements of same type. Let’s take for
|
||
instance foo : [u8; 3], so three bytes located at 16 and marked with 0,
|
||
1, 2 in the following memory diagram.
|
||
|
||
| 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
|
||
+--------------------------------------------------|
|
||
0 | . . . . . . . . . . . . . . . . |
|
||
16 | 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 |
|
||
32 | 2 2 2 2 2 2 2 2 . . . . . . . . |
|
||
|
||
If we access foo[n], the compiler translates this into reading bits
|
||
\(16 + n \) to \(16 + (n+1) - 1\). Now with this formula, it is
|
||
certainly possible to compute bit ranges for \(n = -1 \) or \(n = 5\).
|
||
However, if we do this, we access memory that is outside of the region
|
||
allocated for foo — a buffer under- or overflow. In memory-safe
|
||
languages, this causes an index-out-of-bounds error. In languages such
|
||
as C/C++, this is not checked automatically and it is the job of the
|
||
developer to ensure the value is never above or below it.
|
||
|
||
Null Dereferences
|
||
|
||
For a long time, NULL is known to be a dangerous idea[7], however, we
|
||
still face it in many popular programming languages. The issue is the
|
||
following: If you have a pointer that should point to an object, but,
|
||
e.g., does not yet do so, Hoare decided that one would give it the value
|
||
of NULL (0 in most languages) to make it clear that it is not yet there.
|
||
If a program is to access it, one would first need to check for NULL and
|
||
depending on the result do this or that. However, this check is not
|
||
mandatory or enforced in many languages. In memory-managed languages,
|
||
this leads, e.g., to an NullReferenceException, which is safe but might
|
||
crash your program — and can be particularly hard to debug (i.e. finding
|
||
out where it became or should become non-null).
|
||
|
||
Data Races in Concurrent Access
|
||
|
||
Assume for a moment that two threads share this region of memory:
|
||
|
||
| 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
|
||
+--------------------------------------------------|
|
||
0 | 0 1 0 0 1 1 1 0 0 1 1 0 1 1 0 1 |
|
||
16 | 0 1 0 1 0 1 0 0 1 1 0 1 0 1 0 1 |
|
||
32 | 1 0 0 1 1 1 0 1 1 1 1 1 0 0 1 1 |
|
||
|
||
At location 16, a counter variable is stored, which is initially 42.
|
||
Both threads now have the task of incrementing it by 10, which should
|
||
eventually lead to the counter being 62. The naive version for this
|
||
looks like this:
|
||
|
||
use std::thread;
|
||
|
||
fn increment(mut counter: Counter) {
|
||
for _ in 0..10 {
|
||
counter.count += 1;
|
||
}
|
||
}
|
||
|
||
#[derive(Debug)]
|
||
struct Counter {
|
||
count: u32,
|
||
}
|
||
|
||
fn main() {
|
||
let mut counter = Counter { count: 42 };
|
||
let t1 = thread::spawn(|| increment(counter));
|
||
let t2 = thread::spawn(|| increment(counter));
|
||
t1.join().unwrap();
|
||
t2.join().unwrap();
|
||
println!("{:#?}", counter);
|
||
}
|
||
|
||
At this point, it is essential to tell you that the += operation is
|
||
composed of at least three operations:
|
||
|
||
- register = load(memory)
|
||
- increment(register)
|
||
- store(memory, register)
|
||
|
||
In a concurrent setting, the three operations for both threads can
|
||
interleave in arbitrary order. For example, thread 2 could read 42, then
|
||
thread 1 executes fully, and then thread 2 continues. What would be the
|
||
result? We assure you that 62 is certainly not the answer.
|
||
|
||
A Sidenote on Garbage Collectors
|
||
|
||
As of today, there are two approaches to memory management: manual
|
||
management and garbage collection. While the former puts a focus on
|
||
control, the latter puts it on safety. With Rust, you get both as we see
|
||
in the next section. Now why is control important? If you are writing
|
||
systems that should impose dependable timing, it is imperative that they
|
||
allocate and free memory in an automated and deterministic fashion or
|
||
provide you with primitives that allow you to make it deterministic. In
|
||
C/C++ these primitives are provided, but the compiler drops all safety
|
||
guarantees. In Java, safety is provided, but the compiler drops all
|
||
timing guarantees as a piece of memory can be freed at any time after
|
||
the last reference to it was invalidated. In the past, there has been
|
||
work on real-time garbage collection, but this hasn’t made it into
|
||
readily available technology stacks. So Rust provides an interesting
|
||
trade-off here so that you neither miss the predictable timing of manual
|
||
memory management, nor the safety of garbage collection. This leads to
|
||
automatic dependable memory management.
|
||
|
||
Onward
|
||
|
||
With these five dangerous memory operations in mind, we are ready to
|
||
look at Rust’s ownership model as well as other language features that
|
||
make these five causes of bugs impossible.
|
||
|
||
Ownership in Rust
|
||
|
||
Ownership Model and Borrowing
|
||
|
||
In Rust, any piece of data (typically called value) is owned by exactly
|
||
one owner (i.e., variable or other data structure). When the owner of a
|
||
value goes out of scope, the value is Droped.
|
||
|
||
Ownership Trees
|
||
|
||
The variables of your program act as roots of ownership trees. Let’s
|
||
consider the following program:
|
||
|
||
fn main() {
|
||
let a : (u8, u8) = (5, 7);
|
||
}
|
||
|
||
Here is how this tuple looks like in memory (we do not show the byte
|
||
values):
|
||
|
||
| 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
|
||
+--------------------------------------------------|
|
||
0 | . . . . . . . . . . . . . . . . |
|
||
16 | +--------------------------------------------+ |
|
||
32 | | a | |
|
||
48 | +--------------------------------------------+ |
|
||
63 | . . . . . . . . . . . . . . . . |
|
||
|
||
With a.0 we can access the 0th element of the tuple a. So another view
|
||
would be:
|
||
|
||
| 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
|
||
+--------------------------------------------------|
|
||
0 | . . . . . . . . . . . . . . . . |
|
||
16 | +--------------------+ +---------------------+ |
|
||
32 | | a.0 | | a.1 | |
|
||
48 | +--------------------+ +---------------------+ |
|
||
63 | . . . . . . . . . . . . . . . . |
|
||
|
||
Note that in this example, a tree is constructed. a is the root and a.0
|
||
as well as a.1 are children. We can also display it like this:
|
||
|
||
Stack of main()
|
||
└── a
|
||
├── .0
|
||
└── .1
|
||
|
||
Why Less Power is Sometimes Better
|
||
|
||
With the ownership system, Rust becomes less powerful than other
|
||
languages, i.e., there are algorithms and data structures you can
|
||
express in other languages that you cannot in Rust. In practice, less
|
||
power is not always a bad thing. In fact, with the restrictions Rust
|
||
imposes, we rule out a lot of programs that are hard to analyze for
|
||
correctness or are even fundamentally broken. So you see that this
|
||
limited power contributes to our software’s dependability. Later in U13
|
||
we look at unsafe Rust, a superset of safe Rust that allows certain
|
||
operations which could (but should not) circumvent ownership.
|
||
|
||
Now if we only had single owners for values and they could not be
|
||
changed for safety reasons, Rust would indeed be rather limited in its
|
||
functionality. Instead, Rust allows the following four operations to
|
||
increase its power again:
|
||
|
||
- ownership can be moved, i.e. the ownership can be transferred from
|
||
one variable to another, e.g. in assignments or function calls
|
||
- primitive types that can be copied, allowing for functions to be
|
||
called by value
|
||
- it is possible to borrow a reference to a value
|
||
- the standard library contains generic, reference-counted types
|
||
(e.g. Rc<T>)
|
||
|
||
move vs. Clone vs. Copy
|
||
|
||
As mentioned before, ownership in Rust does not need to be static. The
|
||
value can move from one owner to another. In this case, the old place
|
||
becomes uninitialized and can no longer be used. Rust checks for this by
|
||
disallowing access to the previous owner after the move.
|
||
|
||
A type can implement the Copy trait, which indicates that one can create
|
||
a duplicate of the original value by copying it bit-by-bit. If a type
|
||
implements Copy, moves become copies (e.g. a function parameter that is
|
||
Copy is copied when the function is called). This also means that
|
||
copying happens implicitly — we never explicitly call a function to do
|
||
so (Copy is a marker trait, having no functionality except giving
|
||
information to the compiler). The copy is complete (often called deep)
|
||
and independent of the original value — changing your copy does not
|
||
affect the original.
|
||
|
||
A type can implement the Clone trait, which allows us to create
|
||
duplicates of types that cannot be copied bit-by-bit. The duplication
|
||
logic is implemented in a custom clone function or it can be
|
||
automatically derived (if all elements of a type are Clone themselves).
|
||
Afterwards, a user can do this explicitly by calling value.clone() and
|
||
continuing to work with the return value. Whether the duplicate is
|
||
deep/independent is governed by the type for which Clone is
|
||
implemented (this is not formalized; you must consider the type
|
||
documentation). For String, a clone creates a deep copy that acts
|
||
independently from the original. For Rc (reference counter), a clone
|
||
creates a shallow copy that stays connected to the other instances.
|
||
|
||
Finally, there are types that are neither Copy nor Clone. The major
|
||
reason is that safe duplication cannot be done within Rust alone or it
|
||
would be misleading/unidiomatic. For example, a File does not implement
|
||
either of the traits: Copy does not work as a bit-by-bit copy of the
|
||
File struct would not create an independent file on the file system.
|
||
Clone could technically work, but what would be the exact semantics? If
|
||
we file.clone() what file name would the duplicate have?
|
||
|
||
You will learn more about traits later. For now, note that Clone is a
|
||
supertrait of Copy, so everything that is Copy must be Clone as well.
|
||
While Clone is a trait with an implementation you must derive or
|
||
implement, Copy only marks the type. Hence, it should be used with care
|
||
and only added to types that really fulfil the bit-by-bit copy-ability.
|
||
|
||
Owning Heap-Values
|
||
|
||
When you declare variables, the value that they are assigned to
|
||
typically lives on the stack. The stack is the area of your memory where
|
||
data related to the current scope is stored (e.g. the current function’s
|
||
body). However, if you plan to have values that live longer or they are
|
||
too large to store and move around on the stack, you must place them on
|
||
the heap. In Rust, you can do so by using various types, the easiest of
|
||
which is Box<T>. With Box::new(42u16), Rust allocates enough memory on
|
||
the heap to store a u16 and returns a Box pointer. Box implements Deref,
|
||
which means that in many cases you use it like you would use a u16
|
||
(e.g. calling methods on it). If you want to use it in an operation
|
||
(e.g. addition), you have to dereference explicitly using *. Another
|
||
example is Vec<T>, which stores a continuous collection of Ts on the
|
||
heap (cf. ArrayList in Java). Such continuous memory sections are also
|
||
referred to as being contiguous.
|
||
|
||
Borrowing References
|
||
|
||
With move, Clone and Copy, we are moving and copying data around, which
|
||
is not what we want in all cases. Particularly, when the data we operate
|
||
on is large and the function we give it to does not need ownership
|
||
(because it does not manipulate it in a way that requires ownership).
|
||
For these use cases, Rust provides references. You might have heard
|
||
about a pointer before in languages such as C/C++. A pointer is a value
|
||
that is interpreted as a memory location. These languages also give the
|
||
pointer a type to allow for compile-time checks for compatibility
|
||
(though in C, e.g., it is easy to cast a pointer to a different type
|
||
which is why this is forbidden in MISRA C).
|
||
|
||
In Rust, references represent non-owning pointers to data. Doing math on
|
||
them is not possible (e.g. shifting it by a couple of bytes) as this can
|
||
lead to memory issues. References (e.g. to Point) come in two
|
||
flavours: * Shared references, indicated by &, can be used to access the
|
||
data read-only. * Exclusive references, indicated by &mut, can be used
|
||
to mutate the data.
|
||
|
||
Furthermore, at compile-time, they are associated with a lifetime.
|
||
Lifetime is a concept within the Rust compiler that tracks the “time”
|
||
(portion of the program, actually) between a value being created and
|
||
droped. The borrow checker enforces the following about references: * A
|
||
reference cannot be created to null (or any other invalid memory
|
||
region). * No reference must outlive their referent (this avoids
|
||
dangling pointers). * At one point in “time”, there can only ever be
|
||
either an arbitrary number of shared or exactly one exclusive
|
||
references. * As long as there is an exclusive reference, the original
|
||
owner cannot do anything with the data.
|
||
|
||
Note that at runtime, the reference again is a memory address
|
||
(i.e. pointer), as the lifetimes (and types) are only used at
|
||
compile-time and dropped afterwards. As soon as safety has been checked,
|
||
there is no need to redo this at runtime.
|
||
|
||
You might have heard that lifetimes are hard to understand and an aspect
|
||
that really sets Rust apart from other languages. For now, you should
|
||
not care too much about lifetimes, because you will not need to
|
||
explicitly use them. There are multiple reasons behind:
|
||
|
||
1. When you write application code (not libraries), you are in full
|
||
control and don’t have to accomodate various use cases of your code.
|
||
2. If you run into lifetimes issues by the borrow checker, you can
|
||
often cheat by .clone()ing the value. This is not ideal in terms of
|
||
performance (you might not really need a clone), but can help you
|
||
make progress. Later, you can do performance profiling and figure
|
||
out if this clone is really a bottleneck.
|
||
3. Lifetime Elision leads to the situation, where Rust can infer the
|
||
lifetimes for many common use cases.
|
||
4. If you want to write a high-performance library, e.g. some zero-copy
|
||
data processing, you should learn in detail about lifetimes. But
|
||
this is out of the scope of this course.
|
||
|
||
Ownership by Example
|
||
|
||
Now, let’s put ownership into practice: Say we have a collection of
|
||
numbers that we want to square (we discuss collections in more detail in
|
||
U04). A first attempt would look like this:
|
||
|
||
fn square_list(list: Vec<u8>) -> Vec<u8> {
|
||
let mut squares = vec![];
|
||
for item in list { // item: u8
|
||
squares.push(item.pow(2))
|
||
}
|
||
squares
|
||
}
|
||
|
||
fn main() {
|
||
let list = vec![2,3,4];
|
||
let squares = square_list(list);
|
||
// println!("{:#?}", list); <-- does not work as square_list takes ownership of list
|
||
println!("{:#?}", squares);
|
||
}
|
||
|
||
What you see is that we move the list into the function (list parameter
|
||
has no & and accessing it afterwards fails to compile). As the result of
|
||
our function is a list as well, we return a new Vec<u8> (squares) and
|
||
list is droped at the end of square_list. This seems to be rather
|
||
complicated, given that we only want to generate a list of squares based
|
||
on an existing list.
|
||
|
||
First, we remove the cannot access list after square_list() issue by
|
||
using a reference instead of a move:
|
||
|
||
fn square_list(list: &Vec<u8>) -> Vec<u8> {
|
||
let mut squares = vec![];
|
||
for item in list { // item: &u8
|
||
squares.push(item.pow(2))
|
||
}
|
||
squares
|
||
}
|
||
|
||
fn main() {
|
||
let list = vec![2,3,4];
|
||
let squares = square_list(&list);
|
||
println!("{:#?}", list); // list is no longer moved into square_list
|
||
println!("{:#?}", squares);
|
||
}
|
||
|
||
Instead of using &Vec<T>, we can use &[T], which is a shared Slice a
|
||
special form of reference (there is also a &mut [T] exclusive slice). A
|
||
slice has a type, a start of a memory region and a count of elements.
|
||
Hence, we can do the following (which also allows us to call square_list
|
||
with arrays of numbers):
|
||
|
||
fn square_slice(list: &[u8]) -> Vec<u8> {
|
||
let mut squares = vec![];
|
||
for item in list { // item: &u8
|
||
squares.push(item.pow(2))
|
||
}
|
||
squares
|
||
}
|
||
|
||
fn main() {
|
||
let list = vec![2,3,4];
|
||
let array = &[5,6,7];
|
||
let squares = square_slice(&list);
|
||
println!("{:#?}", list);
|
||
println!("{:#?}", squares);
|
||
let squares = square_slice(array);
|
||
println!("{:#?}", array);
|
||
println!("{:#?}", squares);
|
||
}
|
||
|
||
At this point, the users of our function complain about its performance.
|
||
When calling it with large quantities of data, the algorithms seems to
|
||
be slow. They also mention that when calling the function, they are only
|
||
interested in the result and do not care about the original list. We
|
||
look at the function and see that we create a new Vec to insert the data
|
||
instead of manipulating the existing data. So we decide to change the
|
||
function as follows:
|
||
|
||
fn square_slice(list: &mut [u8]) {
|
||
for item in list { // item: &mut u8
|
||
*item = item.pow(2);
|
||
}
|
||
}
|
||
|
||
fn main() {
|
||
let mut list = vec![2,3,4];
|
||
let array = &mut [5,6,7];
|
||
square_slice(&mut list);
|
||
println!("{:#?}", list);
|
||
square_slice(array);
|
||
println!("{:#?}", array);
|
||
}
|
||
|
||
Here, we used two additional pieces of syntax:
|
||
|
||
- with ., we can interact with the reference and Rust automatically
|
||
borrows/dereferences the data.
|
||
- with *, we explicitly dereference the mutable borrow so that we can
|
||
assign the value to the original memory location.
|
||
|
||
A Visual Overview of Ownership
|
||
|
||
The concept and syntax associated with Ownership is visualized in the
|
||
following diagram:
|
||
|
||
{{#include img/rust-move-copy-borrow.svg }}
|
||
|
||
Rufflewind, Graphical depiction of ownership and borrowing in Rust, CC
|
||
BY 4.0
|
||
|
||
Legend:
|
||
|
||
- Move ()
|
||
- Copy ()
|
||
- Locked (): original object is locked while borrowed — nothing can be
|
||
done with it.
|
||
- Frozen (): original object is frozen: non-mutable references can be
|
||
taken (but no mutable references and it cannot be moved).
|
||
|
||
Revisiting Memory Management Issues in Rust
|
||
|
||
Now that we are equipped with some knowledge about the Ownership Model
|
||
and the Borrow Checker, we can revisit the memory issues we identified
|
||
before. Note that most of these checks are executed at compile-time,
|
||
making sure you never can ship software with this issue.
|
||
|
||
Uninitialized Memory
|
||
|
||
In Rust, you are not allowed to read from a variable that has not been
|
||
initialized:
|
||
|
||
let v : u32;
|
||
println!("{}", v);
|
||
|
||
Hence before you read from a variable, you have to first assign it an
|
||
initial value. Some data types work in a way that they have a
|
||
well-defined initial (or default) value, in which case you are not
|
||
required to specify it. In summary, you can rely on the fact that you
|
||
are never accessing uninitialized memory and have a memory safety issue.
|
||
|
||
Use After Free / Double Free
|
||
|
||
In Rust, if a variable is moved or dropped, the original variable
|
||
becomes no longer usable. The following example does not compile:
|
||
|
||
#[derive(Debug)]
|
||
struct Foo {
|
||
bar: u16,
|
||
}
|
||
|
||
let foo = Foo { bar: 5 };
|
||
println!("{:?}", foo);
|
||
drop(foo); // `foo` is freed
|
||
println!("{:?}", foo); // `foo` would be used after free
|
||
|
||
As the compiler states, the drop function takes its parameter by move,
|
||
so foo is no longer valid after the call to drop — use after free is
|
||
impossible. This also means that a second drop(foo) fails for the same
|
||
reason, hence a double free is impossible as well.
|
||
|
||
Buffer Over- or Underflow
|
||
|
||
In contrast to the other checks, this one is done at run-time —
|
||
particularly because the index into a buffer is most of the time dynamic
|
||
and not known at compile-time. As opposed to other languages, Rust adds
|
||
bound-checking code to all accesses of buffers. Depending on which
|
||
access method you use, the out-of-bounds could either trigger a panic!()
|
||
or yield an Option::None.
|
||
|
||
let v = vec![5, 7, 8];
|
||
let oob = v.get(4);
|
||
println!("{:#?}", oob);
|
||
println!("{}", v[4]);
|
||
|
||
Note that the runtime cost of this check is often negligible as branch
|
||
prediction of modern CPUs can often figure out whether the bounds check
|
||
succeeds or not.
|
||
|
||
Null References
|
||
|
||
In Rust, there is no such thing as a NULL constant that can be used.
|
||
Instead, null-able references come as Option<&T>, which are None if they
|
||
are non-existent. Hence, a developer has to write code in a way that the
|
||
None case is handled. There is no way to, by accident, work with a
|
||
reference if there is none.
|
||
|
||
Data Races in Concurrent Access
|
||
|
||
If we take the code from before, consider that we want to work on
|
||
mutable references in increment, and have a conversation with the
|
||
compiler (implementing suggested fixes iteratively), we eventually
|
||
arrive at this:
|
||
|
||
use std::thread;
|
||
|
||
fn increment(counter: &mut Counter) {
|
||
for _ in 0..10 {
|
||
counter.count += 1;
|
||
}
|
||
}
|
||
|
||
#[derive(Debug)]
|
||
struct Counter {
|
||
count: u32,
|
||
}
|
||
|
||
fn main() {
|
||
let mut counter = Counter { count: 42 };
|
||
let t1 = thread::spawn(move || increment(&mut counter));
|
||
let t2 = thread::spawn(move || increment(&mut counter));
|
||
t1.join().unwrap();
|
||
t2.join().unwrap();
|
||
println!("{:#?}", counter);
|
||
}
|
||
|
||
Here, we again see ownership at work: Rust mandates the move keyword to
|
||
bring the counter as a reference to the threads[8]. However, you cannot
|
||
move it twice (and eventually try to print it) and violate the only one
|
||
single write access rule mentioned above. Also note that the closure
|
||
move || ... &mut counter does not make much sense, as you do not have to
|
||
move something to get a mutable reference to it. How we can write a
|
||
thread-safe variant of this, which counts to 62 as expected, will be
|
||
discussed later in U11.
|
||
|
||
Other Resources than Memory
|
||
|
||
Note that the concept of ownership also helps with other resources that
|
||
are not memory: If a variable owns, for instance, a file or network
|
||
socket, the ownership enforces safe access and makes sure that the
|
||
resource is released on drop.
|
||
|
||
Summary
|
||
|
||
What did you learn?
|
||
|
||
- How Rust’s automated memory management can save you work and avoid
|
||
mistakes.
|
||
- What ownership means, how it is enforced and how your code is
|
||
affected by it.
|
||
- How the concept of ownership can contribute to software that is
|
||
clearer and easier to maintain.
|
||
|
||
Where can you learn more?
|
||
|
||
- Rust Book: Ch. 4.1 & 4.2
|
||
- Programming Rust: Ch. 4, 5
|
||
- Rust in Action: Ch. 4.3 & 4.4
|
||
- Rust for Rustaceans: Ch. 2
|
||
- cheats.rs: References & Pointers, Memory & Lifetimes
|
||
- Compile-Time Social Coordination (RustConf2021): “This is the story
|
||
of how [Zac] stopped stepping on everyone’s toes and learned to love
|
||
the borrow checker”
|
||
- RAII: Compile-Time Memory Management in C++ and Rust
|
||
- Memory Safety Project is about rewriting core Internet technology in
|
||
Rust
|
||
|
||
W03: Work Sheet
|
||
|
||
- Practice Ownership Rules using the Rustlings move_semantics and
|
||
primitive_types.
|
||
|
||
Closures
|
||
|
||
In the previous unit, you have already seen closures in action, often in
|
||
the form of helper functions:
|
||
|
||
let pow_of_2 = std::iter::successors(Some(1u8),
|
||
|n| n.checked_mul(2) // <--- closure
|
||
);
|
||
|
||
Closures are anonymous functions with a distinct type and potentially
|
||
state associated with them. They are commonly used in iterator methods
|
||
(see above), for threading (std::thread::spawn(|| ...)), or default
|
||
value methods:
|
||
|
||
use std::collections::HashMap;
|
||
|
||
let mut map = HashMap::new();
|
||
map.insert("Ferris", 42);
|
||
map.entry("Crab").or_insert_with(|| 47);
|
||
println!("{:#?}", map);
|
||
|
||
Save the Environment
|
||
|
||
Closures have a special power, namely that they are able to save their
|
||
environment[9]. Again, we already had an example for this in the
|
||
previous unit:
|
||
|
||
fn fib_iter(n: usize) -> impl Iterator<Item = u32> {
|
||
let mut state = (1,1);
|
||
std::iter::from_fn(move || {
|
||
let current = state.0;
|
||
state = (state.1, state.0 + state.1);
|
||
Some(current)
|
||
}).take(n)
|
||
}
|
||
|
||
fn main() {
|
||
for i in fib_iter(5) {
|
||
println!("{}", i);
|
||
}
|
||
}
|
||
|
||
Here, from_fn takes a closure. The closure steals the state variable,
|
||
which is for now stored next to it and updated whenever the closure’s
|
||
code is executed. For the iterator, every time next() is called, the
|
||
closure is executed. Note that we have to write move before the closure
|
||
to indicate that we want the closure to steal the environment. Without,
|
||
the closure is only allowed to borrow its environment (i.e. get & and
|
||
&mut references to variables). In this case, it must be ensured that the
|
||
closure does not outlive the variables to which it holds references.
|
||
|
||
Closures also, practically, save the environment because they are fast
|
||
and safe to be used. The compiler is allowed to inline them, achieving
|
||
zero overhead costs.
|
||
|
||
Function and Closure Types
|
||
|
||
Every closure has a distinct type (i.e. two closures with identical
|
||
input-output types are still considered different). All closures
|
||
implement the FnOnce trait. For reference, all functions are of type
|
||
fn(??) -> ?? (lower case) and one can obtain a function pointer for
|
||
them.
|
||
|
||
As you might already anticipate, there are more traits a closure can
|
||
implement. First, let’s look at a closure that drops something it stole
|
||
from the environment:
|
||
|
||
let v : Vec<u32> = vec![];
|
||
let f = || drop(v);
|
||
|
||
This closure implements FnOnce because it can only be called once
|
||
(otherwise, it would cause a double-free error). Pseudocode for this
|
||
trait would look like this:
|
||
|
||
trait FnOnce() -> R {
|
||
fn call_once(self) -> R;
|
||
}
|
||
|
||
So self is moved and hence consumed. A different closure is one that
|
||
only modifies the environment:
|
||
|
||
let mut i = 0;
|
||
let mut incr = || {
|
||
i += 1;
|
||
println!("Incremented! i is now {}", i);
|
||
};
|
||
incr();
|
||
incr();
|
||
|
||
This closure implements FnMut, as it can mutate the environment. The
|
||
pseudocode looks like this:
|
||
|
||
trait FnMut() -> R {
|
||
fn call_mut(&mut self) -> R;
|
||
}
|
||
|
||
Finally, a closure that only reads from the environment is a Fn:
|
||
|
||
trait Fn() -> R {
|
||
fn call_mut(&self) -> R;
|
||
}
|
||
|
||
Here is a Venn Diagram of closure traits:
|
||
|
||
+-------------------------------------+
|
||
| FnOnce(), e.g. || drop(v) |
|
||
| +---------------------------------+ |
|
||
| | FnMut(), e.g. |arg| v.push(arg) | |
|
||
| | +-----------------------------+ | |
|
||
| | | Fn (), | | |
|
||
| | | e.g. |arg| arg + 1 | | |
|
||
| | | or |arg| v.contains(arg) | | |
|
||
| | +-----------------------------+ | |
|
||
| +---------------------------------+ |
|
||
+-------------------------------------+
|
||
|
||
What we can deduce from this is that it is possible to pass a Fn to a
|
||
function that takes a FnOnce, but the opposite does not work.
|
||
|
||
Closures are like any other value, hence they can be assigned to
|
||
variables, as you have seen above. They can also be moved/copied or
|
||
cloned, depending on their type. If a closure only holds references and
|
||
does not mutate (Fn), it can be copied and cloned. If it mutates, it can
|
||
be neither Clone nor Copy, because we would then have multiple mutable
|
||
references and violate memory safety guarantees. For move closures, it
|
||
depends on the type of values that are moved into the closure. If they
|
||
are all clone or copy, the closure is clone or copy, respectively:
|
||
|
||
let mut greeting = String::from("Hello, ");
|
||
let greet = move |name| {
|
||
greeting.push_str(name);
|
||
println!("{}", greeting);
|
||
};
|
||
greet.clone()("Ferris");
|
||
greet.clone()("Hobbes");
|
||
|
||
Orthogonal to the traits a closure implements, the lifetime of the
|
||
closure is also part of its type. Hence, you can have 'static Fn (which
|
||
can be called everywhere as its lifetime is the whole program) or
|
||
'a FnOnce (which can only be called as long as 'a lives and only once)
|
||
as well as all other permutations of lifetimes and closure traits.
|
||
|
||
Collections
|
||
|
||
In this section, we take a closer look at three common collections that
|
||
help you work with multiple items at the same time.
|
||
|
||
Vector Vec<T>
|
||
|
||
This section is intentionally kept brief and you should read the
|
||
excellent chapter 8.1 of the Rust book if you have any doubts or want
|
||
a more in-depth introduction to vectors.
|
||
|
||
Our first type is the vector Vec<T>, which can be created and updated as
|
||
follows:
|
||
|
||
struct Point {
|
||
x: u8,
|
||
y: u8
|
||
}
|
||
|
||
let points: Vec<Point> = Vec::new();
|
||
|
||
let mut points: Vec<Point> = vec![Point { x: 0, y: 1 }, Point { x: 2, y: 3 }];
|
||
points.push(Point { x: 0, y: 0 });
|
||
|
||
A vector represents a continuous memory region on the heap, consisting
|
||
of elements of type T:
|
||
|
||
+---------+--------------+---------+
|
||
Stack: v = | buffer | capacity = 4 | len = 3 |
|
||
+----+----+--------------+---------+
|
||
|
|
||
V
|
||
Heap: +----+----+----+----+
|
||
| 27 | 31 | 42 | |
|
||
+----+----+----+----+
|
||
|
||
In contrast to LinkedList, vectors are known to be more efficient as
|
||
fewer pointers must be dereferenced and fewer random accesses happen.
|
||
When we access element with index i in the vector, we can use either
|
||
v[i] (which panics if the index is out of bounds) or the more robust
|
||
v.get(i) that returns an Option<T>, which is None if the index is out of
|
||
bounds. These index-based accesses are very efficient due to the fact
|
||
that the elements are stored contiguously. The same holds for iteration,
|
||
which can be easily done with for element in v.
|
||
|
||
The vector also supports adding elements at the end using
|
||
push(element: T) and removing elements from the end with
|
||
pop() -> Option<T>. Further, swap(a: usize, b: usize) is efficient as
|
||
the two memory regions can be moved. Note that, when you insert or
|
||
remove elements, the data structure can do “reallocations” (e.g. when
|
||
the capacity is reached and another element is added, or when we remove
|
||
from the front). Hence, it is good practice to:
|
||
|
||
- Not use this data structure if you often remove from the front. A
|
||
better choice would be VecDeque in this case.
|
||
- Use capacity information whenever it is available. For instance,
|
||
when you create a new vector to put a list of elements into it,
|
||
initialize the Vec::with_capacity, avoiding reallocations.
|
||
|
||
Finally, a vector implements the following useful methods:
|
||
|
||
- with join(&self, sep: Separator), we can flatten the vector,
|
||
inserting a separator in between elements
|
||
(e.g. ["Hey", "Ferris"].join(" ") -> "Hey Ferris")
|
||
- we can sort and search a vector
|
||
- using the third party rand crate, it is easily possible to shuffle
|
||
or choose from a vector.
|
||
|
||
In other languages, you might have already encountered iterator
|
||
invalidation errors (in Java this is known as the runtime
|
||
ConcurrentModificationException) which happen when you attempt to
|
||
manipulate an iterator while you iterate over it. Consider the following
|
||
attempt to extend a list of even numbers by the missing odd numbers:
|
||
|
||
let mut v = vec![0,2,4,6];
|
||
for element in v {
|
||
v.push(element + 1)
|
||
}
|
||
|
||
Note that in this situation we have undefined behaviour… how would you
|
||
handle adding the element to the list? Would it become part of the
|
||
iteration, which in this case would lead to an infinite loop? Or would
|
||
you keep the old iterator and the new elements separate?
|
||
|
||
Fortunately, Rust prevents this behaviour using its ownership system:
|
||
The code does not compile, which is advantageous over Java’s runtime
|
||
error. To understand this, let’s have a closer look at the expansion of
|
||
the for-loop:
|
||
|
||
let mut iterator = (v).into_iter();
|
||
while let Some(element) = iterator.next() {
|
||
v.push(element + 1)
|
||
}
|
||
|
||
We see two accesses to v with the following function signatures:
|
||
|
||
- fn into_iter(self) -> Self::IntoIter which takes v as self
|
||
- fn push(&mut self, value: T) which takes v as &mut self
|
||
|
||
Due to the move in into_iter, v can no longer be borrowed mutably for
|
||
push. rustc suggests to borrow v instead of move, which leads to the
|
||
following situation:
|
||
|
||
let mut iterator = (&v).into_iter();
|
||
while let Some(element) = iterator.next() {
|
||
v.push(element + 1)
|
||
}
|
||
|
||
Now we hold an immutable reference to v, which disallows to get another
|
||
mutable reference to v to execute push. So whatever we do, iterator
|
||
invalidation is not possible.
|
||
|
||
Dictionary HashMap<K,V>
|
||
|
||
This section is intentionally kept brief and you should read the
|
||
excellent chapter 8.3 of the Rust book if you have any doubts or want
|
||
a more in-depth introduction to hash maps.
|
||
|
||
For use cases where each element of type V has an associated key of type
|
||
K, we can employ a HashMap that acts as a lookup table or dictionary.
|
||
This data structure is particularly efficient when we want to look up a
|
||
value with a special key.
|
||
|
||
In memory, a HashMap<i32, char> looks like this:
|
||
|
||
+---------+---------------+-------+
|
||
Stack: v = | len = 4 | table_size: 8 | table |
|
||
+---------+---------------+---+---+
|
||
|
|
||
+-------------------------+
|
||
V
|
||
Heap: +------+------+------+------+------+------+------+------+
|
||
Hash Code: | cafe | 0 | c0de | dead | 0 | 0 | 0 | 4b1d |
|
||
+------+------+------+------+------+------+------+------+
|
||
Key: | 7 | | -3 | 42 | | | | 28 |
|
||
| | | | | | | | |
|
||
Value: | H | | e | H | | | | o |
|
||
+------+------+------+------+------+------+------+------+
|
||
|
||
Similar to vectors, we can collect into HashMaps and add elements like
|
||
this:
|
||
|
||
use std::collections::HashMap;
|
||
|
||
let key_values = vec![(7, 'H'), (-3, 'e')];
|
||
let mut map : HashMap<_, _> = key_values.into_iter().collect();
|
||
map.insert(42, 'H');
|
||
println!("{:#?}", map);
|
||
|
||
Again, if we know how many elements we are going to have, initializing
|
||
with_capacity is more efficient.
|
||
|
||
What is special about HashMaps is how to get elements. While get is
|
||
implemented similar to Vec::get, the entry() API is more commonly used:
|
||
|
||
use std::collections::HashMap;
|
||
|
||
let mut letters = HashMap::new();
|
||
|
||
for ch in "a practical course for computer scientists".chars() {
|
||
let counter = letters.entry(ch).or_insert(0);
|
||
*counter += 1;
|
||
}
|
||
println!("{:#?}", letters);
|
||
|
||
Here, the entry() call returns either a Occupied or Vacant variant. This
|
||
makes it very easy to initialize an entry with a default value,
|
||
e.g. using the or_insert(self, default: V) or or_default(self) methods
|
||
as shown above.
|
||
|
||
Finally, we can iterate over a HashMap, which gives us both keys and
|
||
values:
|
||
|
||
use std::collections::HashMap;
|
||
|
||
let key_values = vec![(7, 'H'), (-3, 'e')];
|
||
let mut map : HashMap<_, _> = key_values.into_iter().collect();
|
||
map.insert(42, 'H');
|
||
|
||
for (k,v) in map {
|
||
println!("K: {}, V: {}", k, v);
|
||
}
|
||
|
||
Set HashSet<T>
|
||
|
||
Finally, we look at HashSet, which is used for situations where you want
|
||
to have set semantics: any instance of type T can be in the set only
|
||
once. A major benefit of sets is their fast membership testing function
|
||
contains. A set can be used as follows:
|
||
|
||
use std::collections::HashSet;
|
||
|
||
let mut set : HashSet<_> = [4,5,4].into_iter().collect(); // duplicates are removed
|
||
set.insert(5);
|
||
set.insert(8);
|
||
println!("{:#?}", set);
|
||
set.extend(vec![7, 5, 3].into_iter());
|
||
println!("{:#?}", set);
|
||
|
||
Sets also support typical set operations such as intersection, union,
|
||
and difference, and we can also iterate over sets:
|
||
|
||
use std::collections::HashSet;
|
||
|
||
let setA : HashSet<_> = [1,2,3].into_iter().collect();
|
||
let setB : HashSet<_> = [2,3,4].into_iter().collect();
|
||
for i in setA.intersection(&setB) {
|
||
print!("{} ", i);
|
||
}
|
||
println!("");
|
||
|
||
for i in setA.union(&setB) {
|
||
print!("{} ", i);
|
||
}
|
||
println!("");
|
||
|
||
for i in setA.difference(&setB) {
|
||
print!("{} ", i);
|
||
}
|
||
println!("");
|
||
|
||
BTrees
|
||
|
||
Finally, it should be noted that there are also collections that
|
||
leverage B-trees, namely BTreeMap and BTreeSet. While the Hash*<T>
|
||
variants require you to implement Hash for T, the BTree* variants
|
||
require the Ord trait. Depending on your usecase and performance
|
||
considerations, one might be better suited than the other.
|
||
|
||
Enumerations
|
||
|
||
While structures serve to group behaviour and data, this section covers
|
||
enumerations (also known as sum types, discriminated unions, or
|
||
algebraic data types) that groups variants and behaviour. First, we
|
||
cover C-style enumerations that only cover variants, while later, we see
|
||
that Rust also allows variants to carry data.
|
||
|
||
This section is intentionally kept brief and you should read the
|
||
excellent 6th chapter of the Rust book if you have any doubts or want
|
||
a more in-depth introduction to enumerations. ## C-Style Enumerations
|
||
|
||
Here is how you can define a simple enum:
|
||
|
||
enum Ordering {
|
||
Less,
|
||
Equal,
|
||
Greater,
|
||
}
|
||
|
||
In memory, these values are stored as integers. You can also pick
|
||
distinct values for it:
|
||
|
||
enum HttpStatus {
|
||
Ok = 200,
|
||
NotModified = 304,
|
||
NotFound = 404,
|
||
...
|
||
}
|
||
|
||
When you want to convert, you can use the as syntax:
|
||
|
||
assert_eq!(HttpStatus::NotFound as i32, 404);
|
||
|
||
The other direction, however, is not allowed easily, as you could
|
||
attempt to convert a number that has no matching enum variant. Instead,
|
||
you have to write your own checked conversion:
|
||
|
||
fn http_status_from_u32(n: u32) -> Option<HttpStatus> {
|
||
match n {
|
||
200 => Some(HttpStatus::Ok),
|
||
304 => Some(HttpStatus::NotModified),
|
||
404 => Some(HttpStatus::NotFound),
|
||
...
|
||
_ => None,
|
||
}
|
||
}
|
||
|
||
The enum_primitive crate provides similar functionality.
|
||
|
||
Similar to deriving traits for structs, you can also derive traits for
|
||
enums. Finally, you can also implement methods on enums as you will see
|
||
in the next section.
|
||
|
||
Enum Variants with Data
|
||
|
||
Adding data to enum variants can use tuples or structs (and even
|
||
arbitrary combinations of the two). Here is how to declare enum tuple
|
||
variants:
|
||
|
||
enum HttpMessage {
|
||
Empty(HttpStatus),
|
||
Content(HttpStatus, String)
|
||
}
|
||
|
||
Certain HTTP messages do not contain a body (e.g. Not Modified), while
|
||
others carry both a status and the content:
|
||
|
||
let awesome = HttpMessage::Content(HttpStatus::Ok, "Ferris is awesome!".to_string());
|
||
|
||
Here is how to declare structure variants; the major benefit being that
|
||
fields are named:
|
||
|
||
enum Shape {
|
||
Rectangle { width: u32, height: u32 },
|
||
Square { side_length: u32 },
|
||
}
|
||
|
||
Generic Enums
|
||
|
||
While you learn about generics in a later unit, assume for now that
|
||
generic enums can be defined once and are instantiated for different
|
||
types. You already met two of these:
|
||
|
||
enum Option<T> {
|
||
Some(T),
|
||
None,
|
||
}
|
||
|
||
enum Result<T, E> {
|
||
Ok(T),
|
||
Err(E),
|
||
}
|
||
|
||
These two types are common in the Rust standard library and are covered
|
||
in detail in a later unit.
|
||
|
||
Let’s define a generic list that can store any type T:
|
||
|
||
enum List<T> {
|
||
Empty,
|
||
NonEmpty(Box<ListNode<T>>),
|
||
}
|
||
|
||
struct ListNode<T> {
|
||
element: T,
|
||
next: List<T>,
|
||
}
|
||
|
||
Each list is either empty or non-empty. If it is non-empty, it contains
|
||
a heap-allocated ListNode. Each list node has an element of type T and a
|
||
next list. Here is how we build a list:
|
||
|
||
use self::List::*;
|
||
let cah = NonEmpty(Box::new(ListNode {
|
||
element: "Calvin & Hobbes",
|
||
next: Empty,
|
||
}));
|
||
let peanuts = NonEmpty(Box::new(ListNode {
|
||
element: "Peanuts",
|
||
next: cah,
|
||
}));
|
||
|
||
As soon as we know more about pattern matching, we learn how to create a
|
||
convenient add method.
|
||
|
||
Enums for Dependability
|
||
|
||
Enumerations support dependable code in at least two ways:
|
||
|
||
1. Misuse-resistant storing of data in related variants.
|
||
2. Misuse-resistant encoding of boolean values.
|
||
|
||
Store data where it belongs
|
||
|
||
By allowing to store data in an enum variant, we get the opportunity to
|
||
only store it where it is needed. Languages that do not provide enums
|
||
with data often resort to solutions that are not safe to use by a
|
||
developer. This safe solution:
|
||
|
||
enum Variants {
|
||
First(boolean),
|
||
Second(i32),
|
||
}
|
||
|
||
is then replaced with an easy-to-misuse solution:
|
||
|
||
enum Variant {
|
||
First,
|
||
Second
|
||
}
|
||
|
||
struct Variants {
|
||
variant: Variant,
|
||
first_boolean: boolean,
|
||
second_i32: i32,
|
||
}
|
||
|
||
In this solution, the variant is decoupled from the data that is stored
|
||
inside, leading potentially to invalid accesses (variant is First, but
|
||
access second_i32).
|
||
|
||
Boolean values revisited
|
||
|
||
Another use case of enums are boolean values. In languages where enums
|
||
are not commonplace, you often run into the following issue. Assume your
|
||
hardware access library has the following method defined:
|
||
|
||
fn configure_pin(is_disabled: boolean, is_output: boolean);
|
||
|
||
Assume it is used here:
|
||
|
||
configure_pin(false, false)
|
||
|
||
Now, as a developer, it is your job to quickly and faithfully state if
|
||
the pin is enabled and an output pin. As you might realize, you easily
|
||
get confused with the negations (enabled = (is_disabled == false)).
|
||
Often, people argue that this is the only way to do it for efficiency
|
||
reasons (i.e. bools are cheaper to store than other types). On most
|
||
systems, this is non-sense as booleans are put into the smallest unit of
|
||
memory, which is often a byte. Hence, we can afford to replace boolean
|
||
with expressive enums:
|
||
|
||
enum Status {
|
||
Enabled,
|
||
Disabled,
|
||
}
|
||
|
||
enum Mode {
|
||
Output,
|
||
Input,
|
||
}
|
||
|
||
fn configure_pin(status: Status, mode: Mode);
|
||
|
||
The equivalent usage to the statement above then reads like:
|
||
|
||
configure_pin(Status::Enabled, Mode::Input);
|
||
|
||
making it crystal clear what the developer intended — without
|
||
compromising on efficiency (enum size is still a byte as this is enough
|
||
to express two variants).
|
||
|
||
Iterators
|
||
|
||
Ever since the creation of LISP (short for LISt Processor), developers
|
||
have been concerned with effective ways to work on lists of things.
|
||
Nowadays, we often talk about streams or iterators, which are a
|
||
generalization of lists; an iterator produces elements until it is
|
||
exhausted. A list could be the source of an iterator (we iterate over a
|
||
list) or the target of an iterator (we collect an iterator into a list).
|
||
|
||
In general, iterator pipelines have the following shape:
|
||
|
||
|
||
++=========++ +--------+ ++==========++
|
||
|| Produce ++---+> Adapt +---++> Consume ||
|
||
++=========++ +--------+ ++==========++
|
||
|
||
First, items of an iterator are produced (e.g., using a range or
|
||
collection). Afterwards, they might be adapted through one or more steps
|
||
(e.g., filtered, mapped, …). Eventually, they must be consumed (i.e.,
|
||
touching each item or storing it into a value).
|
||
|
||
The last step is extremely important, as iterators in Rust are lazy.
|
||
This means that without a consuming step, no item will ever be produced
|
||
or adapted. Instead, the consumer drives the iterator, by attempting to
|
||
consume item after item from the previous step, which in turn consumes
|
||
its previous step and so on.
|
||
|
||
Iterator Trait
|
||
|
||
Before we look at different ways to use producers, adapters, and
|
||
consumers, we look at the general form an iterator has, which is defined
|
||
by the Iterator trait in the standard library:
|
||
|
||
trait Iterator {
|
||
type Item;
|
||
fn next(&mut self) -> Option<Self::Item>;
|
||
}
|
||
|
||
This tells us that each iterator has a unique Item type, specifying
|
||
which kind of items it produces. next is the function that is called to
|
||
get another item from the iterator. As it is an Option, we can either
|
||
have Some(Item) or None. In the latter case, the iterator is considered
|
||
as consumed or depleted, i.e. it is not yielding more items.
|
||
|
||
A Minimal Pipeline
|
||
|
||
The most common and straightforward producer is an inclusive range that
|
||
is implemented with (1..=n) (if you leave out the = it becomes
|
||
excluding, leaving n out). A common way to consume it is by using a for
|
||
loop that is designed for this use case:
|
||
|
||
let r = (0..=5);
|
||
for element in r {
|
||
println!("{}", element);
|
||
}
|
||
|
||
The for loop is shorthand for directly accessing the iterator’s next
|
||
method like this:
|
||
|
||
let r = (0..=5);
|
||
let mut iterator = (r).into_iter();
|
||
while let Some(element) = iterator.next() {
|
||
println!("{}", element);
|
||
}
|
||
|
||
Producers
|
||
|
||
Let’s have a look at how we can produce an iterator in the first place.
|
||
A general form is the std::iter::from_fn function, where the closure we
|
||
pass to the function produces one item after the other:
|
||
|
||
fn fib_iter(n: usize) -> impl Iterator<Item = u32> {
|
||
let mut state = (1,1);
|
||
std::iter::from_fn(move || {
|
||
let current = state.0;
|
||
state = (state.1, state.0 + state.1);
|
||
Some(current)
|
||
}).take(n)
|
||
}
|
||
|
||
fn main() {
|
||
for i in fib_iter(5) {
|
||
println!("{}", i);
|
||
}
|
||
}
|
||
|
||
That is quite a lot in one take, so let’s walk through it. First, we
|
||
encapsulate the Fibonacci iterator into a function. The function returns
|
||
impl Iterator<Item = u32>, an existential type. You can think of this as
|
||
it returns something that is an iterator and produces u32s. The compiler
|
||
figures out which type it has exactly (actually the Take type).
|
||
|
||
Now let’s have a look at the function body. We start with a state that
|
||
captures the current pair of Fibonacci numbers (we always need the
|
||
current and the last to compute the next). With move we move the state
|
||
into the closure (more on closures in the next section; for now this is
|
||
just a function with state). The closure itself is then straightforward
|
||
if you know how to compute Fibonacci. We take the current number,
|
||
produce the next pair of numbers and return it. We have to wrap the
|
||
value in Some(), as the closure must return an Option. If we were
|
||
returning None in one step, the iteration would end.
|
||
|
||
Wait a minute… so we never return None and the iterator never ends? This
|
||
is correct, we produced an infinite iterator here (which makes sense as
|
||
the Fibonacci sequence is infinite too). In the next step, we use the
|
||
.take(n) adapter to reduce the sequence to the first n elements.
|
||
|
||
Rust also provides us with common iterators:
|
||
|
||
let once = std::iter::once(42);
|
||
for item in once {
|
||
println!("{}", item);
|
||
}
|
||
|
||
let repeat = std::iter::repeat(5).take(5);
|
||
for item in repeat {
|
||
println!("{}", item);
|
||
}
|
||
|
||
So wrapping a single value in a 1-element iterator or repeating it
|
||
infinitely work right away.
|
||
|
||
Another way to create a sequence (those that only depend on one last
|
||
item) is the std::iter::successors method. Here is how we generate
|
||
powers of two:
|
||
|
||
let pow_of_2 = std::iter::successors(Some(1u8), |n| n.checked_mul(2));
|
||
for item in pow_of_2 {
|
||
println!("{}", item);
|
||
}
|
||
|
||
Note that we do not have to take this apparently infinite iterator. The
|
||
reason is that checked_mul returns None when the type (u8 in this case)
|
||
would overflow.
|
||
|
||
Finally, the Result and Option types are also producers for iterators.
|
||
For Option, we consider the Some variant as a 1-element iterator, while
|
||
the None variant is an empty iterator. For Result, we have adapter
|
||
methods that work on either the success type or the error type, so we
|
||
can write different code for the different cases.
|
||
|
||
Adapters
|
||
|
||
You already saw the take adapter for taking a number of elements from
|
||
the iterator. This is often used together with the skip() operator that
|
||
leaves some elements out before we take some:
|
||
|
||
let sequence = std::iter::successors(Some(1u8), |n| n.checked_mul(2))
|
||
.skip(2)
|
||
.take(3);
|
||
for item in sequence {
|
||
println!("{}", item);
|
||
}
|
||
|
||
Another common use case is to map each element to something else:
|
||
|
||
let pow_of_2 = (2..5).map(|n| 2_i32.pow(n));
|
||
for item in pow_of_2 {
|
||
println!("{}", item);
|
||
}
|
||
|
||
We can also leave out elements we are not interested in:
|
||
|
||
let odd_numbers = (0..10).filter(|n| n % 2 == 1);
|
||
for item in odd_numbers {
|
||
println!("{}", item);
|
||
}
|
||
|
||
This can also be combined into a single adapter:
|
||
|
||
let odd_squares = (0..10).filter_map(|n|
|
||
if n % 2 == 1 {
|
||
Some(n * n)
|
||
} else {
|
||
None
|
||
});
|
||
for item in odd_squares {
|
||
println!("{}", item);
|
||
}
|
||
|
||
Sometimes, we have iterators of iterators and want to turn this into a
|
||
flat sequence:
|
||
|
||
use std::collections::BTreeMap;
|
||
|
||
let mut comics = BTreeMap::new();
|
||
comics.insert("Peanuts", vec!["Charlie", "Linus", "Lucy", "Snoopy"]);
|
||
comics.insert("Calvin & Hobbes", vec!["Calvin", "Hobbes", "Susie"]);
|
||
|
||
for character in comics.values().flatten() {
|
||
println!("{}", character);
|
||
}
|
||
|
||
When developing an iterator pipeline, it can be helpful to inspect a
|
||
pipeline by looking at each item immutably, e.g., to print it:
|
||
|
||
use std::collections::BTreeMap;
|
||
|
||
let mut comics = BTreeMap::new();
|
||
comics.insert("Peanuts", vec!["Charlie", "Linus", "Lucy", "Snoopy"]);
|
||
comics.insert("Calvin & Hobbes", vec!["Calvin", "Hobbes", "Susie"]);
|
||
let all_characters : Vec<_> = comics
|
||
.values()
|
||
.inspect(|value| { println!("Before {:?}", value); })
|
||
.flatten()
|
||
.inspect(|value| { println!("After: {}", value); })
|
||
.collect();
|
||
println!("All: {:?}", all_characters);
|
||
|
||
Multiple iterators can also be chain-ed together:
|
||
|
||
let range = (0..5).chain((7..14));
|
||
for item in range {
|
||
println!("{}", item);
|
||
}
|
||
|
||
In some situations, we are not only interested in the element, but also
|
||
the index of the element in the iterator:
|
||
|
||
for (i, item) in (5..10).enumerate() {
|
||
println!("{}th: {}", i, item);
|
||
}
|
||
|
||
Consumers
|
||
|
||
Eventually, when we have produced and adapted our iterators, we need to
|
||
consume them. You already saw for, but note that there are actually
|
||
three variants of it:
|
||
|
||
- for element in &collection { ... }: items are taken as shared
|
||
references
|
||
- for element in &mut collection { ... }: items are taken as mutable
|
||
references
|
||
- for element in collection { ... }: items are moved out of the
|
||
collection (which gets invalidated afterwards)
|
||
|
||
Often, we are also interested in accumulating the collection using
|
||
count, sum, or product:
|
||
|
||
fn triangle(n: u64) -> u64 {
|
||
(1..=n).sum()
|
||
}
|
||
|
||
fn factorial(n: u64) -> u64 {
|
||
(1..=n).product()
|
||
}
|
||
|
||
fn main() {
|
||
let n = 5;
|
||
println!("Triangle {}: {}", n, triangle(n));
|
||
println!("Factorial {}: {}", n, factorial(n));
|
||
}
|
||
|
||
We can also identify the largest or smallest element:
|
||
|
||
println!("Max: {:?}", [-7, 5, 0, 28, -2].iter().max());
|
||
println!("Min: {:?}", [-7, 5, 0, 28, -2].iter().min());
|
||
|
||
Another common use case is fold, where we accumulate the elements using
|
||
a custom initial value and accumulation function:
|
||
|
||
let a = [1, 2, 3, 4, 5];
|
||
println!("Sum: {}", a.iter().fold(0, |n, i| n + i));
|
||
println!("Product: {}", a.iter().fold(1, |n, i| n * i));
|
||
|
||
Finally, we get to the most powerful consumer function: collect. With
|
||
collect, we can turn an iterator into a collection. Above, you already
|
||
saw how we collected the characters into a Vec. We can also collect into
|
||
HashMaps:
|
||
|
||
use std::collections::HashMap;
|
||
|
||
let comics = ["Peanuts", "Calvin and Hobbes"];
|
||
let start_dates = [1950, 1985];
|
||
let start_dates = comics
|
||
.iter()
|
||
.zip(start_dates.iter())
|
||
.collect::<HashMap<_,_>>();
|
||
println!("{:?}", start_dates);
|
||
|
||
As collect can work by converting into different collections, you often
|
||
either need to annotate the let declaration with a type or use the
|
||
turbofish ::<> operator. The _ is used to run type inference, as the
|
||
Rust compiler can figure out that we use &str keys and u32 values from
|
||
the rest of the code.
|
||
|
||
When working with Result<_>, collect is also handy as it can turn an
|
||
iterator of results into a result of a collection or the first error
|
||
that occurred:
|
||
|
||
fn open_file(&self, path: String) -> Result<File, IoError> { /* ... */ }
|
||
fn to_hashmap(self, paths: Vec<String>) -> Result<Vec<File>, IoError> {
|
||
paths.into_iter() // Iterator<Item=String>
|
||
.map(|path| open_file(path)) // Iterator<Item=Result<File, IoError>>
|
||
.collect() // Result<Vec<File>, IoError>
|
||
}
|
||
|
||
Custom Iterator
|
||
|
||
Before we close this section, we want to implement a custom iterator by
|
||
hand. Following the idea of the std::iter::once iterator, we create the
|
||
extremely helpful Twice iterator:
|
||
|
||
struct Twice {
|
||
count: u32,
|
||
element: u32,
|
||
}
|
||
|
||
|
||
fn twice(element: u32) -> Twice {
|
||
Twice {
|
||
count: 0,
|
||
element,
|
||
}
|
||
}
|
||
|
||
impl Iterator for Twice {
|
||
type Item = u32;
|
||
fn next(&mut self) -> Option<u32> {
|
||
if self.count >= 2 {
|
||
None
|
||
} else {
|
||
self.count += 1;
|
||
Some(self.element)
|
||
}
|
||
}
|
||
}
|
||
|
||
fn main() {
|
||
let t = twice(5);
|
||
let c = t.collect::<Vec<_>>();
|
||
println!("{:?}", c);
|
||
assert_eq!(c, vec![5,5]);
|
||
}
|
||
|
||
U04: Putting Data Together… and Apart
|
||
|
||
Now that you know the fundamentals of Rust, we learn how we can use
|
||
parts of the Rust standard library and language to build more advanced
|
||
programs that process data, i.e. compute in memory (as opposed to
|
||
interacting with the network or operating system). This includes:
|
||
|
||
- Structures as well as Enumerations to put related data together,
|
||
including behaviour (with methods).
|
||
- Deconstructing this related data again using Patterns.
|
||
- Leveraging Iterators that allow you to work with Collections of
|
||
data.
|
||
- Closures that act as callable inputs to functions or to be stored
|
||
inside structures.
|
||
- Finally, Strings deserve a special mention as a collection for
|
||
characters, including the intricacies of human writing systems.
|
||
|
||
Patterns
|
||
|
||
While Rust offers structs and enums to group together data, it also
|
||
provides means to destructure / decompose the same: patterns.
|
||
|
||
This section is intentionally kept brief and you should read the
|
||
excellent 6th and 18th chapter of the Rust book if you have any doubts
|
||
or want a more in-depth introduction to patterns.
|
||
|
||
Using a match statement, we can for instance implement useful methods on
|
||
the HttpStatus enumeration:
|
||
|
||
impl HttpStatus {
|
||
fn message(self) -> &'static str {
|
||
match self {
|
||
Self::Ok => "200: Ok",
|
||
Self::NotModified => "304: Not Modified",
|
||
Self::NotFound => "404: Not Found",
|
||
...
|
||
}
|
||
}
|
||
}
|
||
|
||
This is also the case for patterns that contain data:
|
||
|
||
enum List<T> {
|
||
Empty,
|
||
NonEmpty(Box<ListNode<T>>),
|
||
}
|
||
|
||
impl<T> List<T> {
|
||
fn head(self) -> Option<T> {
|
||
match self {
|
||
List::Empty => None,
|
||
List::NonEmpty(node) => {
|
||
Some(node.element)
|
||
}
|
||
}
|
||
}
|
||
}
|
||
|
||
Let’s have a look at how this matching is done by executing this piece
|
||
of code:
|
||
|
||
let mut list = List::Empty;
|
||
list.add(5);
|
||
list.add(7);
|
||
assert_eq!(list.head(), Some(5));
|
||
|
||
When we run head(), self is passed into the match statement
|
||
pattern-by-pattern from top to bottom:
|
||
|
||
value: List::NonEmpty(ListNode { element: 5, next: ... })
|
||
|
|
||
X
|
||
|
|
||
pattern: List::Empty
|
||
|
||
Hence, the first pattern is not matched and we continue with the next:
|
||
|
||
value: List::NonEmpty(ListNode { element: 5, next: ... })
|
||
| |
|
||
OK |
|
||
| V
|
||
pattern: List::NonEmpty(node)
|
||
|
||
This matches with node = ListNode { element: 5, next: ... } and the
|
||
method returns Some(5).
|
||
|
||
Pattern Types
|
||
|
||
In Rust, patterns are very powerful and they can match on a lot of
|
||
different things:
|
||
|
||
- Literals (e.g. 1 or "foo")
|
||
|
||
- Ranges (e.g. 0..=42)
|
||
|
||
- Wildcard, i.e. anything (_)
|
||
|
||
- Variables, i.e. the value that matches is assigned to a local
|
||
variable (name, mut count)
|
||
|
||
- Enum variants (as seen above)
|
||
|
||
- Tuples (e.g. (key, value))
|
||
|
||
In the following, we give a couple of examples.
|
||
|
||
Literal and Variable Matching
|
||
|
||
Here is for instance a modified conversion method of the http_status
|
||
conversion method:
|
||
|
||
fn http_status_from_u32(n: u32) -> Result<HttpStatus, ParseError> {
|
||
match n {
|
||
200 => Ok(HttpStatus::Ok),
|
||
304 => Ok(HttpStatus::NotModified),
|
||
404 => Ok(HttpStatus::NotFound),
|
||
code => Err(ParseError(format!("Invalid code {}", code))),
|
||
}
|
||
}
|
||
|
||
Here, any code that is not matched by the initial literals is assigned
|
||
to code and used to create the Err variant of the Result return type.
|
||
|
||
Struct Matching
|
||
|
||
Consider the List<T> type we defined in the last section. Using struct
|
||
matching, we can implement the add method:
|
||
|
||
impl<T> List<T> {
|
||
fn add(&mut self, value: T) {
|
||
match *self {
|
||
List::Empty => {
|
||
*self = List::NonEmpty(Box::new(ListNode {
|
||
element: value,
|
||
next: List::Empty,
|
||
}))
|
||
}
|
||
List::NonEmpty(ref mut node) => {
|
||
node.next.add(value);
|
||
}
|
||
}
|
||
}
|
||
}
|
||
|
||
Using ref mut, we borrow node mutably, so that we can add the value to
|
||
it (or recurse again to eventually add it to the last element).
|
||
|
||
Matching Multiple Options
|
||
|
||
Furthermore, we can combine multiple matches into one, e.g., for another
|
||
version of FizzBuzz:
|
||
|
||
fn fizzbuzz(n: u32) -> String {
|
||
match n % 15 {
|
||
0 => format!("FizzBuzz"),
|
||
3 | 6 | 9 | 12 => format!("Fizz"),
|
||
5 | 10 => format!("Buzz"),
|
||
n => format!("{}", n),
|
||
}
|
||
}
|
||
|
||
The | acts as an or so any of the options lead to a match of the
|
||
respective arm.
|
||
|
||
Dependable Patterns
|
||
|
||
With pattern matching, multiple things can go wrong. If you know switch
|
||
statements from other languages, you know that in most cases, you have
|
||
to put a break; at the end of a case:
|
||
|
||
case 3:
|
||
case 6: // <- 3 and 6 are used together
|
||
result = "Fizz";
|
||
break;
|
||
case 10:
|
||
result = "Buzz";
|
||
case 0:
|
||
result = "FizzBuzz";
|
||
break;
|
||
|
||
This code contains an error. Namely, case 10: leads to
|
||
result = "FizzBuzz" as break is missing. In Rust, this cannot happen and
|
||
any match arm is clearly mapped to a single expression and
|
||
multi-matchings are done with |.
|
||
|
||
Another aspect are two properties match statements can have: they can be
|
||
exhaustive and/or overlapping.
|
||
|
||
The first property, exhaustiveness is checked by the compiler. You can
|
||
validate this by running the following example:
|
||
|
||
enum Variants {
|
||
FirstHandeled,
|
||
Second
|
||
}
|
||
impl Variants {
|
||
fn foo(self) -> String {
|
||
match self {
|
||
Self::FirstHandeled => format!("foo"),
|
||
}
|
||
}
|
||
}
|
||
|
||
As you can see, the Rust compiler rejects this code with an error.
|
||
|
||
For the second property overlap, there is also a check:
|
||
|
||
fn foo(n: u32) -> String {
|
||
match n {
|
||
0..=9 => "Below 10".to_string(),
|
||
0..=19 => "Below 20".to_string(),
|
||
n => format!("{} is nothing special", n),
|
||
}
|
||
}
|
||
|
||
fn main() {
|
||
println!("{}", foo(42));
|
||
}
|
||
|
||
Note that the code here in the book does not present you with warnings.
|
||
Here is what you get when you copy the code into a file (e.g.,
|
||
overlap.rs) and run it with cargo clippy:
|
||
|
||
❯ cargo clippy
|
||
warning: some ranges overlap
|
||
--> src/overlap.rs:3:9
|
||
|
|
||
3 | 0..=9 => "Below 10".to_string(),
|
||
| ^^^^^
|
||
|
|
||
= note: `#[warn(clippy::match_overlapping_arm)]` on by default
|
||
note: overlaps with this
|
||
--> src/overlap.rs:4:9
|
||
|
|
||
4 | 0..=19 => "Below 20".to_string(),
|
||
| ^^^^^^
|
||
= help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#match_overlapping_arm
|
||
|
||
A special case of overlapping is unreachable, where a pattern cannot be
|
||
reached because a previous pattern was already covering all cases. Here,
|
||
the compiler (not clippy) warns in a way similar to other forms of
|
||
unreachable code.
|
||
|
||
In summary, Rust ensures that your patterns are exhaustive and warns you
|
||
if you made them overlapping by accident.
|
||
|
||
S04: Sample Solution
|
||
|
||
Lists
|
||
|
||
#[derive(PartialEq, Debug)]
|
||
enum List<T> {
|
||
Empty,
|
||
NonEmpty(Box<ListNode<T>>),
|
||
}
|
||
|
||
#[derive(PartialEq, Debug)]
|
||
struct ListNode<T> {
|
||
element: T,
|
||
next: List<T>,
|
||
}
|
||
|
||
impl<T> List<T>
|
||
where
|
||
T: Copy,
|
||
{
|
||
fn add(&mut self, value: T) {
|
||
match *self {
|
||
List::Empty => {
|
||
*self = List::NonEmpty(Box::new(ListNode {
|
||
element: value,
|
||
next: List::Empty,
|
||
}))
|
||
}
|
||
List::NonEmpty(ref mut node) => {
|
||
node.next.add(value);
|
||
}
|
||
}
|
||
}
|
||
|
||
fn length(self) -> usize {
|
||
match self {
|
||
List::Empty => 0,
|
||
List::NonEmpty(node) => 1 + node.next.length(),
|
||
}
|
||
}
|
||
|
||
fn head(self) -> Option<T> {
|
||
match self {
|
||
List::Empty => None,
|
||
List::NonEmpty(node) => Some(node.element),
|
||
}
|
||
}
|
||
|
||
fn tail(self) -> List<T> {
|
||
match self {
|
||
List::Empty => List::Empty,
|
||
List::NonEmpty(node) => node.next,
|
||
}
|
||
}
|
||
|
||
fn get(&self, index: usize) -> Option<T> {
|
||
match index {
|
||
0 => match self {
|
||
List::Empty => None,
|
||
List::NonEmpty(node) => Some(node.element),
|
||
},
|
||
_ => match self {
|
||
List::Empty => None,
|
||
List::NonEmpty(node) => node.next.get(index - 1),
|
||
},
|
||
}
|
||
}
|
||
}
|
||
|
||
fn main() {
|
||
let mut list = List::Empty;
|
||
list.add(5);
|
||
list.add(7);
|
||
assert_eq!(list.head(), Some(5));
|
||
}
|
||
|
||
#[cfg(test)]
|
||
mod tests {
|
||
use super::*;
|
||
|
||
#[test]
|
||
fn test_head() {
|
||
let mut list = List::Empty;
|
||
list.add(5);
|
||
list.add(7);
|
||
assert_eq!(list.head(), Some(5));
|
||
}
|
||
|
||
#[test]
|
||
fn test_tail() {
|
||
let mut list = List::Empty;
|
||
list.add(5);
|
||
list.add(7);
|
||
|
||
let mut list_2 = List::Empty;
|
||
list_2.add(7);
|
||
assert_eq!(list.tail(), list_2);
|
||
}
|
||
|
||
#[test]
|
||
fn test_length() {
|
||
let mut list = List::Empty;
|
||
list.add(5);
|
||
list.add(7);
|
||
assert_eq!(list.length(), 2);
|
||
}
|
||
|
||
#[test]
|
||
fn test_get() {
|
||
let mut list = List::Empty;
|
||
list.add(5);
|
||
list.add(7);
|
||
assert_eq!(list.get(0), Some(5));
|
||
assert_eq!(list.get(1), Some(7));
|
||
assert_eq!(list.get(2), None);
|
||
}
|
||
}
|
||
|
||
Shape Library
|
||
|
||
use std::f64::consts::PI;
|
||
|
||
enum Shape {
|
||
Rectangle { width: u32, height: u32 },
|
||
Square { side_length: u32 },
|
||
Circle { radius: u32 },
|
||
}
|
||
|
||
impl Shape {
|
||
fn area(self) -> f64 {
|
||
match self {
|
||
Shape::Rectangle { width, height } => (width * height).into(),
|
||
Shape::Square { side_length } => side_length.pow(2).into(),
|
||
Shape::Circle { radius } => PI * (radius.pow(2) as f64),
|
||
}
|
||
}
|
||
|
||
fn circumference(self) -> f64 {
|
||
match self {
|
||
Shape::Rectangle { width, height } => (2 * width + 2 * height).into(),
|
||
Shape::Square { side_length } => (4 * side_length).into(),
|
||
Shape::Circle { radius } => 2.0 * PI * (radius as f64),
|
||
}
|
||
}
|
||
}
|
||
|
||
fn main() {}
|
||
|
||
#[cfg(test)]
|
||
mod tests {
|
||
use super::*;
|
||
|
||
#[test]
|
||
fn test_area_rectangle() {
|
||
let rect = Shape::Rectangle {
|
||
width: 4,
|
||
height: 3,
|
||
};
|
||
assert_eq!(rect.area(), 12.0);
|
||
}
|
||
|
||
#[test]
|
||
fn test_area_square() {
|
||
let square = Shape::Square { side_length: 5 };
|
||
assert_eq!(square.area(), 25.0);
|
||
}
|
||
|
||
#[test]
|
||
fn test_area_circle() {
|
||
let circle = Shape::Circle { radius: 3 };
|
||
assert_eq!(circle.area(), PI * 9.0);
|
||
}
|
||
|
||
#[test]
|
||
fn test_circumference_rectangle() {
|
||
let rect = Shape::Rectangle {
|
||
width: 4,
|
||
height: 3,
|
||
};
|
||
assert_eq!(rect.circumference(), 14.0);
|
||
}
|
||
|
||
#[test]
|
||
fn test_circumference_square() {
|
||
let square = Shape::Square { side_length: 5 };
|
||
assert_eq!(square.circumference(), 20.0);
|
||
}
|
||
|
||
#[test]
|
||
fn test_circumference_circle() {
|
||
let circle = Shape::Circle { radius: 3 };
|
||
assert_eq!(circle.circumference(), PI * 6.0);
|
||
}
|
||
}
|
||
|
||
Iterative FizzBuzz
|
||
|
||
fn fizz_iter(n: usize) -> impl Iterator<Item = String> {
|
||
let mut state = 0_usize;
|
||
std::iter::from_fn(move || {
|
||
let msg = match state % 15 {
|
||
0 => format!("FizzBuzz"),
|
||
3 | 6 | 9 | 12 => format!("Fizz"),
|
||
5 | 10 => format!("Buzz"),
|
||
state => format!("{}", state),
|
||
};
|
||
state += 1;
|
||
Some(msg)
|
||
})
|
||
.take(n)
|
||
}
|
||
|
||
fn main() {
|
||
for i in fizz_iter(11) {
|
||
println!("{}", i);
|
||
}
|
||
}
|
||
|
||
Word Count
|
||
|
||
use std::collections::HashMap;
|
||
use std::env;
|
||
use std::fs;
|
||
|
||
fn main() {
|
||
let args: Vec<String> = env::args().collect();
|
||
let filename = &args[1];
|
||
let file_content = fs::read_to_string(filename).expect("Something went wrong reading the file");
|
||
|
||
let mut words = HashMap::new();
|
||
for word in file_content
|
||
.lines()
|
||
.flat_map(|line| line.split_whitespace())
|
||
.collect::<Vec<_>>()
|
||
{
|
||
let counter = words.entry(word).or_insert(0);
|
||
*counter += 1;
|
||
}
|
||
println!("{:#?}", words)
|
||
}
|
||
|
||
Closure Types
|
||
|
||
- closure0: Function Pointer fn(u16) -> u16, implements FnOnce, FnMut,
|
||
Fn
|
||
- closure1: No pointer, implements Fn, FnMut, FnOnce
|
||
- closure2: No pointer, implements FnOnce
|
||
- closure3: No pointer, implements FnMut, FnOnce
|
||
|
||
Strings
|
||
|
||
Strings are complicated!
|
||
|
||
When working with collections of characters, we encounter all the
|
||
different issues that we have with written human language (e.g. what is
|
||
a character symbol, how many are there of it, do we read from
|
||
left-to-right or the other way round, …).
|
||
|
||
This section is intentionally kept brief and you should read the
|
||
excellent chapter 8.2 of the Rust book if you have any doubts or want
|
||
a more in-depth introduction to strings.
|
||
|
||
Unicode first
|
||
|
||
In contrast to other, older languages, Rust has been able to leverage
|
||
Unicode quite from the start (others needed major updates to enable
|
||
Unicode in all places). The topic itself is so complicated that there
|
||
are dedicated books to this, so we only provide a short overview here.
|
||
|
||
One of the first and still existing standardized character encoding
|
||
approaches is the American Standard Code for Information Interchange
|
||
(ASCII). ASCII uses seven bits, giving meaning to the values 0x00 to
|
||
0x7f. ISO/IEC 8859-1 is the Western European superset of ASCII which
|
||
uses 8 bits (0x00 to 0xff) to also encode characters such as ö, ç or ø.
|
||
In Unicode, this is called the Latin-1 code block. In Rust, the String
|
||
and str types use the UTF-8 encoding form, where each character is
|
||
encoded in a sequence of one to four bytes. Thereby, 0x1f980 becomes 🦀.
|
||
|
||
char
|
||
|
||
Internally, a String is a collection of bytes. Depending on the Unicode
|
||
code point, one to four bytes form a char. chars can be checked for
|
||
various properties (e.g is_numeric(), is_whitespace(), …), be converted
|
||
to_digit(radix) or char::from_digit(num, radix) using different bases.
|
||
Using to_lowercase and to_uppercase the casing can be changed. Finally,
|
||
with as u32 or from_u32 we can convert characters to integers (and
|
||
back).
|
||
|
||
String and str
|
||
|
||
The types String and str are guaranteed to only hold valid UTF-8
|
||
characters. They can be created and modified as follows:
|
||
|
||
let s = String::new();
|
||
let s = "Hey Ferris".to_string();
|
||
println!("{}", s);
|
||
let s = String::from_utf8(vec![0xF0, 0x9F, 0xA6, 0x80]); // 🦀
|
||
println!("{:#?}", s);
|
||
let mut s : String = vec!["Hey", "Ferris"].into_iter().collect();
|
||
println!("{}", s);
|
||
s.push_str("!");
|
||
println!("{}", s);
|
||
|
||
We can search for patterns and even replace parts:
|
||
|
||
let string = "Hello Ferris. How are you doing?";
|
||
let index = string.find("are");
|
||
println!("{:#?}", index);
|
||
println!("{}", string.replace("Ferris", "Corro"));
|
||
|
||
When processing text, a common task is to split by lines, or special
|
||
characters/whitespace:
|
||
|
||
let file_content = "Id,Name\n42,Ferris\n49,Corro";
|
||
for element in file_content
|
||
.lines()
|
||
.flat_map(|line| line.split(",")
|
||
.collect::<Vec<_>>()) {
|
||
println!("{}", element);
|
||
}
|
||
|
||
Formatting
|
||
|
||
A common use case for string processing is also to format text in
|
||
various ways. The Rust standard library comes with a formatting
|
||
language, which you already encountered in U00. The language is the same
|
||
across all instances that use a formatting string, e.g. println!() as
|
||
you have seen before, but also format!(), which creates a String
|
||
in-place. The format parameters have the form {which:how}, which are
|
||
both optional — in many cases, we use {} to use the n-th argument. With
|
||
which, it is possible to select parameters by name or index. With how,
|
||
we can control the formatting itself. Depending on the type of the
|
||
argument, we have different options at our disposal. Here are several
|
||
examples in addition to those shown previously:
|
||
|
||
println!("{:+}", 108); // forced sign
|
||
println!("{:10}", 108); // minimum field width
|
||
println!("{:010}", 108); // minimum field width, leading zeros
|
||
println!("{:02x}", 108); // hexadecimal
|
||
println!("{:02x?}", [108, 11, 42]);
|
||
println!("{:12.2}", 1234.5678); // float formatting
|
||
|
||
println!("{:10}", "Ferris"); // minimal field width
|
||
println!("{:.5}", "Hello Ferris"); // text length limit
|
||
println!("{:>20}", "Hello Ferris"); // alignment
|
||
println!("{:=^20}", "Ferris"); // padding + center
|
||
|
||
let data = std::rc::Rc::new("Ferris".to_string());
|
||
println!("{:p}", data); // pointer
|
||
|
||
Structures and Methods
|
||
|
||
When we start to create larger programs, we tend to have values that
|
||
“belong” together. For instance, the vessel DSys is building has an
|
||
engine that has current parameters such as current operating temperature
|
||
and rotations per minute. Ideally, these bits of information are stored
|
||
and used together. This is what can be done with structures (or struct
|
||
for short) in Rust. Operating on these structures is done with
|
||
operations, so both data and behaviour are grouped together; increasing
|
||
the maintainability of the code, which is a dependability/quality
|
||
property of the code.
|
||
|
||
This section is intentionally kept brief and you should read the
|
||
excellent 5th chapter of the Rust book if you have any doubts or want
|
||
a more in-depth introduction to structures and methods.
|
||
|
||
In Rust, we distinguish three types of structures:
|
||
|
||
- Named-Field
|
||
- Tuple-Like
|
||
- Unit-Like
|
||
|
||
Named-Field Structures
|
||
|
||
First, let’s look at how one can declare a struct:
|
||
|
||
struct Engine {
|
||
temperature: f64,
|
||
rotations_per_minute: u64
|
||
}
|
||
|
||
The structure is composed of two fields with distinct names. Note that
|
||
the struct name is in CamelCase and the field names are in snake_case —
|
||
a convention common in Rust.
|
||
|
||
Within the same module, a struct can be used as follows:
|
||
|
||
let mut engine = Engine {
|
||
temperature: 87.5,
|
||
rotations_per_minute: 47_000,
|
||
};
|
||
println!("Temperature: {}", engine.temperature);
|
||
engine.rotations_per_minute += 1000;
|
||
|
||
So fields are accessed with .name. When creating a struct based on local
|
||
variables, there is a shorthand when variable and field name are the
|
||
same:
|
||
|
||
let temperature = measure();
|
||
// ...
|
||
let engine = Engine {
|
||
temperature,
|
||
rotations_per_minute: 47_000,
|
||
}
|
||
|
||
By default, fields are private in Rust. When we access a struct defined
|
||
in a different module, there are two options:
|
||
|
||
1. the field is declared public and allows for direct access
|
||
2. the field is private and provides appropriate get/set or other
|
||
manipulation methods
|
||
|
||
pub struct EngineDirect {
|
||
pub temperature: f64, // <- allowing direct access
|
||
rotations_per_minute: u64
|
||
}
|
||
|
||
pub struct EngineCapsulated {
|
||
temperature: f64,
|
||
rotations_per_minute: u64
|
||
}
|
||
|
||
impl EngineCapsulated {
|
||
fn temperature(&self) -> f64 {
|
||
self.temperature
|
||
}
|
||
}
|
||
|
||
Note that the second option is preferred in almost all cases, as it
|
||
allows clean capsulation and even enables to provide fields whose public
|
||
API is read-only or write-only.
|
||
|
||
Behaviour using impl
|
||
|
||
In the last example, you already saw a method temperature in action.
|
||
Using impl blocks, we can define functions that are either associated
|
||
with the type (associated functions) or operate on instances of the type
|
||
(methods).
|
||
|
||
A typical example for associated functions are constructors, typically
|
||
named new:
|
||
|
||
impl EngineCapsulated {
|
||
fn new(temperature: f64) -> Self {
|
||
Self {
|
||
temperature,
|
||
rotations_per_minute: 0,
|
||
}
|
||
}
|
||
}
|
||
|
||
Associated functions do not have a first-parameter self. If such a
|
||
parameter is present, we have a method. As with other variables, we can
|
||
have self in three variants:
|
||
|
||
- self, the instance is moved into the function, i.e. the function
|
||
must take care of it from now
|
||
- &self, the instance is borrowed immutably (typically done for
|
||
getters)
|
||
- &mut self, the instance is borrowed mutable (typically done for
|
||
setters)
|
||
|
||
If you already programmed in a different language, the way Rust provides
|
||
structs and methods might surprise you. In the light of dependability,
|
||
this approach has major benefits:
|
||
|
||
1. data and behaviour are separated (struct definition, impl block) —
|
||
improving the readability and avoiding that local fields are
|
||
overlooked
|
||
2. self is explicit, making it clear which functions are associated or
|
||
methods
|
||
3. the variant of self makes it clear whether the function consumes the
|
||
instance (move), reads (&) or writes (&mut).
|
||
|
||
Tuple-Like Structures
|
||
|
||
In some cases, we do not have dedicated names for fields, but have a
|
||
natural mapping to indexes of a tuple. Here is how one can define such a
|
||
structure for navigation, a two-element point:
|
||
|
||
struct Waypoint(i64, i64);
|
||
|
||
The usage works as follows:
|
||
|
||
let origin = Waypoint(0,0);
|
||
let target = Waypoint(47,11);
|
||
println!("x: {}, y: {}", target.0, target.1);
|
||
|
||
Again, elements can be made public to be directly accessed from outside
|
||
the current module:
|
||
|
||
struct Waypoint(pub i64, pub i64);
|
||
|
||
Tuple-like structs are especially useful for so-called newtypes;
|
||
wrappers around existing types to make them more expressive or usable.
|
||
|
||
One use case is annotation, e.g. to create unit-safe interfaces:
|
||
|
||
struct Nauticmiles(f64);
|
||
|
||
fn forward(distance: Nauticmiles) -> ();
|
||
|
||
In this case, the forward method must receive a Nauticmiles struct and
|
||
not an f64. Thereby the caller is forced to wrap the number; making the
|
||
intent clear and avoiding that an f64 representing imperial miles or
|
||
kilometers is passed in accidentally. This is also what the “unit of
|
||
measure” (uom) crate provides.
|
||
|
||
Another use case is to change the API of a specific type:
|
||
|
||
struct AppendOnlyLog(Vec<String>);
|
||
|
||
impl AppendOnlyLog {
|
||
fn append(&mut self, log: String) -> () {
|
||
self.0.push(log);
|
||
}
|
||
}
|
||
|
||
Here, all methods of the inner type are hidden and only the methods of
|
||
the impl block are provided. In contrast to using the Vec directly, a
|
||
user can not remove elements from the log.
|
||
|
||
Unit-Like Structures
|
||
|
||
While the use case for the previous two struct types has been clear, the
|
||
use case for unit-like structures is a bit surprising. In some
|
||
situations, you need to have a structure that does not contain data:
|
||
|
||
struct Highlander;
|
||
|
||
As the name of this specific struct implies, there can ever only be one
|
||
of it, i.e. if you create it two times, they are still considered the
|
||
same (actually Rust does not allocate anything and only operates on the
|
||
type). Now how is this useful? When we work with traits and build state
|
||
machines in U10, this comes in handy.
|
||
|
||
Deriving Common Traits
|
||
|
||
Defining structs is straightforward, though using them can be a bit
|
||
wieldy. For instance, during development, you might want to print the
|
||
state of a structure to the console. This is provided by the Debug
|
||
trait, which you can implement by hand. As debug output is a rather
|
||
clear task, Rust comes with a set of derivable traits where the
|
||
implementation is done automatically. This is achieved as follows:
|
||
|
||
#[derive(Debug)] // <- does the magic
|
||
struct Engine {
|
||
temperature: f64,
|
||
rotations_per_minute: u64
|
||
}
|
||
|
||
fn main() {
|
||
let engine = Engine {
|
||
temperature: 74.11,
|
||
rotations_per_minute: 84_000,
|
||
};
|
||
println!("{:#?}", engine);
|
||
}
|
||
|
||
Later you learn more about these traits and how derivation works (you
|
||
can even create your own derivable traits).
|
||
|
||
Summary
|
||
|
||
What did you learn?
|
||
|
||
- The various ways how to structure data and variants in a Rust
|
||
program — and allowing to associate behaviour with it.
|
||
- Patterns that allow you to differentiate cases and destructure data.
|
||
- How to produce, adapt, and consume iterators.
|
||
- What closures are, how they can be used, as well as how their types
|
||
are determined and what this means for their capabilities.
|
||
- How the most common collections in the standard library work.
|
||
- How string and text handling work in the standard library.
|
||
|
||
Where can you learn more?
|
||
|
||
- Rust Book:
|
||
- Ch. 05
|
||
- Ch. 06
|
||
- Ch. 08
|
||
- Ch. 13
|
||
- Ch. 18
|
||
- Programming Rust: Ch. 09, 10, 14, 15, 16, 17
|
||
- Rust in Action: Ch. 02.10, 03
|
||
- cheats.rs:
|
||
- Data Structures
|
||
- Functions & Behaviour
|
||
- Pattern Matching
|
||
- Iterators
|
||
- Strings & Chars
|
||
|
||
W04: Work Sheet
|
||
|
||
Rustlings
|
||
|
||
Do the Rustlings exercises structs, enums, vecs, hashmaps and strings.
|
||
|
||
Lists
|
||
|
||
Using the List and ListNode structures defined in this unit. We have a
|
||
slightly update version here, i.e. including derive macros and generic
|
||
bounds (you will understand the extra syntax later):
|
||
|
||
#[derive(Debug, PartialEq)]
|
||
enum List<T: std::cmp::PartialEq + Copy> {
|
||
Empty,
|
||
NonEmpty(Box<ListNode<T>>),
|
||
}
|
||
|
||
#[derive(Debug, PartialEq)]
|
||
struct ListNode<T: std::cmp::PartialEq + Copy> {
|
||
element: T,
|
||
next: List<T>,
|
||
}
|
||
|
||
Develop the following methods: * fn length(self) -> usize, counting the
|
||
number of elements, * fn tail(self) -> List, return a list of all but
|
||
the first element. * fn get(&self, index: usize) -> Option<T>, returning
|
||
the index-th element if there is one.
|
||
|
||
Geometry
|
||
|
||
Write a geometry library including elementary tests (ideally, develop it
|
||
test-first). The library should provide the following:
|
||
|
||
- A shape enumeration, having the supported shapes as variants.
|
||
- Support the following shapes (as structs): Rectangle, Square,
|
||
Circle.
|
||
- Support the following methods: area, circumference.
|
||
- Elementary tests, one per (shape, method) combination.
|
||
|
||
Iterators, Collections & Strings
|
||
|
||
- Write an iterator-based FizzBuzz solution.
|
||
|
||
- Implement a word count program. Input: Path to a file with words.
|
||
Output: HashMap with keys (word) and values (count).
|
||
|
||
Closure Types
|
||
|
||
For each of the following closures, give which traits they implement.
|
||
Also indicate if a closure is a function pointer.
|
||
|
||
let closure0 = |i : u16| i + 5;
|
||
|
||
let v = vec![5, 7, 8, 19];
|
||
let closure1 = |j : u16| j * v.iter().sum::<u16>();
|
||
|
||
let v = vec![9, 8, 7];
|
||
let closure2 = move |k: u16| {
|
||
println!("Vec: {:#?}, k: {}", v, k);
|
||
v
|
||
};
|
||
|
||
let mut v = vec!['R', 'u', 's'];
|
||
let mut closure3 = |c| v.push(c);
|
||
|
||
println!("{}", closure0(5));
|
||
println!("{}", closure1(2));
|
||
closure2(3);
|
||
println!("{:#?}", closure3('t'));
|
||
println!("{:#?}", v);
|
||
|
||
Storyline
|
||
|
||
1. Look at the mess.
|
||
2. Apply Tidiness Tool 1: Functions.
|
||
- Discuss about Operation vs. Integration.
|
||
- Put functions under test.
|
||
3. Apply Tidiness Tool 2: Modules.
|
||
4. Apply Tidiness Tool 3: Objects.
|
||
- Extract the Line object.
|
||
- Show software cell in code.
|
||
5. Apply Tidiness Tool 4: Crate.
|
||
- Split binary and library crate.
|
||
6. Apply Tidiness Tool 5: Workspace.
|
||
7. Revisit Crate Structure.
|
||
|
||
U05: Tidy Code
|
||
|
||
Today is a special day, as DSys invited Ferris Kondō (a well-known
|
||
influencer and coach), who talks about:
|
||
|
||
- Minimalism
|
||
|
||
- Order
|
||
|
||
She is here to help us tidy up our code, introducing a range of tidiness
|
||
tools!
|
||
|
||
Ordnung ist das halbe Leben
|
||
|
||
The Messy Code and its Origin
|
||
|
||
Before we are getting started to learn about order, we look at an
|
||
example of where order is not present — a showcase where things are
|
||
rather messy:
|
||
|
||
fn main() {
|
||
let args = &std::env::args().into_iter().collect::<Vec<String>>()[1..];
|
||
|
||
let (path, length) = match args.len() {
|
||
2 => {
|
||
let path = args.get(0).unwrap();
|
||
let length = args.get(1).unwrap();
|
||
let length = length
|
||
.parse()
|
||
.unwrap_or_else(|_| panic!("Couldn not parse {} to number.", length));
|
||
(path, length)
|
||
}
|
||
_ => panic!("Must be called with 2 parameters: PATH LENGTH."),
|
||
};
|
||
|
||
let words: Vec<String> = std::fs::read_to_string(path)
|
||
.unwrap_or_else(|_| panic!("Could not read from file {}.", path))
|
||
.split_whitespace()
|
||
.map(|w| w.to_string())
|
||
.flat_map(|s| {
|
||
s.as_bytes()
|
||
.chunks(length)
|
||
.map(|w| String::from_utf8(w.into()).unwrap())
|
||
.collect::<Vec<String>>()
|
||
})
|
||
.collect();
|
||
|
||
let mut lines = vec![];
|
||
let mut line: Vec<String> = vec![];
|
||
for word in words {
|
||
if line.iter().map(|w| w.len()).sum::<usize>() + line.len() * 1 + word.len() <= length {
|
||
line.push(word);
|
||
} else {
|
||
lines.push(line);
|
||
line = vec![word];
|
||
}
|
||
}
|
||
lines.push(line);
|
||
|
||
let formatted = lines
|
||
.into_iter()
|
||
.map(|l| format!("{:^length$}", l.join(" ").to_string(), length = length))
|
||
.collect::<Vec<String>>()
|
||
.join("\n");
|
||
|
||
println!("{}", formatted)
|
||
}
|
||
|
||
Originally, this code was written to fulfil the following requirements:
|
||
|
||
Read from a text file and format so that the length of each line is
|
||
bound by a maximum value.
|
||
|
||
The idea is that this tool can be used at the command-line like this:
|
||
|
||
break german-tale.txt 25
|
||
|
||
taking this input
|
||
|
||
Vor einem großen Walde wohnte ein armer Holzhacker mit seiner
|
||
Frau und seinen
|
||
zwei
|
||
Kindern; das Bübchen hieß
|
||
|
||
Hänsel und das Mädchen
|
||
Gretel.
|
||
|
||
and producing this output
|
||
|
||
Vor einem großen Walde
|
||
wohnte ein armer
|
||
Holzhacker mit seiner
|
||
Frau und seinen zwei
|
||
Kindern; das Bübchen
|
||
hieß Hänsel und das
|
||
Mädchen Gretel.
|
||
|
||
Apart from the idea, there are also a couple of additional requirements
|
||
that clarify how certain situations should be handled:
|
||
|
||
- (Extraneous) whitespace of the source file is not maintained.
|
||
- Punctuation is considered to be part of the word.
|
||
- If word is longer than maximum line length, chunk it.
|
||
|
||
Why is this code messy?
|
||
|
||
First of all, this code is messy as reading the code is already hard.
|
||
Second, understanding the code is hard for several reasons:
|
||
|
||
- 40 lines are quite a long scopevariables / side-effects can happen
|
||
easily, so tracking them can be tough
|
||
- concerns are mixede.g. line 38 is responsible for formatting, lines
|
||
2-13 for argument parsing
|
||
- abstraction layers are mixedcustom logic, API calls, …
|
||
- requirements are not clearly visiblee.g. a word that is larger than
|
||
the line length should be put on a separate line and cut in chunks.
|
||
|
||
Eventually, testing the code as well as changing it without breaking
|
||
anything is hard.
|
||
|
||
We want more order, minimalism, cleanness and hygiene.
|
||
|
||
Why is messy code a problem?
|
||
|
||
Source: Andreas Schmidt
|
||
|
||
Visible signs of disorder encourage further disorder. cf. Broken
|
||
Window Theory by Wilson and Kelling.
|
||
|
||
Principles of Order
|
||
|
||
- Don’t Repeat Yourself (DRY)Who needs twenty can openers?
|
||
|
||
- Single Responsibility Principle (SRP)Using a knife to open a can
|
||
might not be ideal.
|
||
|
||
- Integration-Operation Separation Principle (IOSP)Anyone in your
|
||
household is either an operator (you opening a tin) or an integrator
|
||
(your pet telling you to open the can).
|
||
|
||
… there are more, but these are already going to help you make your code
|
||
more understandable, testable, and changeable.or to stay in the
|
||
metaphor: cleaner, more ordered, and more hygenic
|
||
|
||
Stratified Design
|
||
|
||
Source
|
||
|
||
This approach has originally been described in the context of Lisp
|
||
(Abelson et al. 1987). A Stratum is one of a series of layers, levels,
|
||
or gradations in an ordered system. The core metaphor here is that low
|
||
stratums serve as a basis for higher stratums. For our software,
|
||
functional dependencies should follow the abstraction gradient
|
||
(i.e. high stratums depend on lower ones).
|
||
|
||
Here is an example program, showing which functions call which other
|
||
functionality:
|
||
|
||
main.rs:
|
||
|
||
fn main() {
|
||
let application = Application::new();
|
||
application.run();
|
||
}
|
||
|
||
impl Application {
|
||
fn run(self) -> JoinHandle<()> {
|
||
let config = Config::new()
|
||
thread::spawn(move || {
|
||
// do something with `config`
|
||
})
|
||
}
|
||
}
|
||
|
||
lib.rs:
|
||
|
||
#[derive(serde::Serialize, serde::Deserialize)]
|
||
struct Config {
|
||
// ....
|
||
}
|
||
|
||
impl Config {
|
||
fn new() -> Self {
|
||
let content = std::fs::read_string("config").unwrap();
|
||
let config : Config = serde_yaml::from_str(&content).unwrap();
|
||
config
|
||
}
|
||
}
|
||
|
||
This shows the abstraction gradient of this application (A --> B = “A
|
||
depends on B”):
|
||
|
||
High +---------------
|
||
Abstraction application.run() | Binary
|
||
| | | Crate
|
||
| +-----------------------------------+ +-----------+
|
||
| | | | Library |
|
||
| V | | Crate |
|
||
| Config::new() | | |
|
||
| | | | |
|
||
| +----+ | | |
|
||
| | | | +-------+ |
|
||
| | V | | serde | |
|
||
| | serde_yaml::from_str() | | + | |
|
||
| | | | yaml | |
|
||
| | | +-------+---+---
|
||
V V V | std
|
||
Low std::fs::read_string("conf.yaml") thread::spawn() | (Rust)
|
||
Abstraction +---------------
|
||
|
||
In line with stratified design, higher levels should depend on lower
|
||
levels and this is the case here. What should not happen is that Config
|
||
knows about the application it is used to configure (as the Application
|
||
is at a higher level than Config). If we adopt this design approach, we
|
||
also avoid Seas of Objects as mentioned before.
|
||
|
||
S05: Sample Solution
|
||
|
||
- Order Principles: discussed in class.
|
||
|
||
- Rust Order Tools: Rustlings.
|
||
|
||
- Refactor to Order: discussed in class.
|
||
|
||
Summary
|
||
|
||
On Software Architecture
|
||
|
||
What we have been talking about in this section is software design and
|
||
software architecture. DSys highly recommends Making Architecture Matter
|
||
and other videos by Martin Fowler:
|
||
|
||
What did you learn?
|
||
|
||
- Why messy code is bad!
|
||
- A number of Principles of Order, e.g. the IOSP.
|
||
- Tidiness Tools (in Rust) such as
|
||
- Functions
|
||
- Modules
|
||
- Objects
|
||
- Crates
|
||
- Workspaces
|
||
- Repos
|
||
- Rust Module & Object Systems
|
||
- Software Cells
|
||
- Software Architecture Matters
|
||
|
||
Where can you learn more?
|
||
|
||
- Rust-Book: Ch. 07, 17
|
||
- Programming Rust: Ch. 08
|
||
- Rust in Motion: Module 1
|
||
- Rust for Rustaceans: Ch. 04, 06, 14
|
||
- cheats.rs: Organizing Code, Project Anatomy
|
||
- Software Flow-Design (in Deutsch)
|
||
|
||
Let’s Tidy Up
|
||
|
||
In the following video, we use the code showed before and refactor it to
|
||
provide order:
|
||
|
||
At Rust-Saar, a similar presentation was made. There, we applied even
|
||
more refactorings (to make the code clean) but did not introduce crates.
|
||
At the end, we arrived at the following code.
|
||
|
||
Final Confession
|
||
|
||
The presented code was first designed carefully and then order was
|
||
destroyed.Following the Software Flow-Design approach by Ralf
|
||
Westphal.
|
||
|
||
However, the approach showed here can also be applied to code that was
|
||
not carefully designed upfront. But as you can imagine, things get
|
||
complicated quite quickly, so ideally you try to be a good boy/girl
|
||
scout.
|
||
|
||
Tidiness Tools
|
||
|
||
In order to turn our messy code into code with order (or start right
|
||
away with clean code), we introduce you to the various tools you can
|
||
use:
|
||
|
||
- Tool #1: Functions
|
||
|
||
- Tool #2: Modules
|
||
|
||
- Tool #3 Objects
|
||
|
||
- Tool #4: Crates & Packages
|
||
|
||
- Tool #5: Workspaces
|
||
|
||
- Tool #6: Repos
|
||
|
||
Tool #1: Functions
|
||
|
||
A freestanding function like
|
||
|
||
pub(crate) fn split_words(content: &str) -> Vec<String> {
|
||
content.split_whitespace().map(|w| w.to_string()).collect()
|
||
}
|
||
|
||
- encapsulates purposehints on the purpose are given by name,
|
||
signature, and visibility
|
||
|
||
- can be unit tested effectively*
|
||
|
||
- has a scope that
|
||
|
||
- defines visibility (hides variable names, …)
|
||
|
||
- implements Resource acquisition is initialization (RAII)
|
||
|
||
- can be an integrating or operation function
|
||
|
||
* or at least better than one large main function; if 1) your types are
|
||
hard to construct or 2) your function works on resources, you might
|
||
still have a hard time
|
||
|
||
Operation vs. Integration
|
||
|
||
Operation
|
||
|
||
- Logic
|
||
- Operators (+, - , /)
|
||
- API calls to external functions
|
||
|
||
Operation Examples
|
||
|
||
- if x == 5 { return 0; }
|
||
- x.push_str('foo')
|
||
- fs::read_to_string("file.txt")
|
||
|
||
fn read_file(path: &str) -> String {
|
||
// could be more complex,
|
||
// e.g. with error handling
|
||
std::fs::read_to_string(path).unwrap()
|
||
}
|
||
|
||
Integration
|
||
|
||
- API calls to internal functions
|
||
|
||
Integration Examples
|
||
|
||
- any call to a function in your crate
|
||
|
||
fn main() {
|
||
let (path, size) = tui::parse_args();
|
||
let content = read_file(&path);
|
||
// ...
|
||
let task = Task::from_str(content);
|
||
let report = analyze(task, size);
|
||
tui::output(&report);
|
||
}
|
||
|
||
Tool #2: Modules
|
||
|
||
The following is a module that
|
||
|
||
mod tui {
|
||
pub(crate) fn parse_args() -> (String, usize) {
|
||
let args = &std::env::args().into_iter().collect::<Vec<String>>()[1..];
|
||
match args.len() {
|
||
2 => {
|
||
let path = args.get(0).unwrap();
|
||
let length = args.get(1).unwrap();
|
||
let length = length
|
||
.parse()
|
||
.unwrap_or_else(|_| panic!("Couldn not parse {} to number.", length));
|
||
(path.into(), length)
|
||
}
|
||
_ => panic!("Must be called with 2 parameters: PATH LENGTH."),
|
||
}
|
||
}
|
||
|
||
pub(crate) fn output(formatted: &str) {
|
||
println!("{}", formatted)
|
||
}
|
||
}
|
||
|
||
- encapsulates purpose (on higher stratum than functions)
|
||
|
||
- hides information and functionality
|
||
|
||
Rust’s Rules of Visibility
|
||
|
||
- Rust’s modules build up a tree, crate is the current crate’s root
|
||
element.
|
||
|
||
- Per default, Rust items (modules, functions, …) are private.only
|
||
visible within the current module and below
|
||
|
||
- Visibility can be changed to:
|
||
|
||
- pub: public (can be seen from everywhere)
|
||
- pub(crate): public within this crate
|
||
- pub(super): public in parent module
|
||
- pub(in path): public in module in path (path must be subpath of
|
||
item’s path)
|
||
|
||
- Items can be brought into the current namespace by useing them.
|
||
|
||
- With pub use (or any other visibility modifier), an item can be
|
||
re-exported.
|
||
|
||
Rule: Importing from above is ok, from below needs permission.
|
||
|
||
Modules in Separate Files
|
||
|
||
The keyword mod can be used to structure a file:
|
||
|
||
// lib.rs
|
||
mod tui {
|
||
fn output(text: &str) { ... }
|
||
...
|
||
}
|
||
|
||
mod domainlogic { ... }
|
||
|
||
However, it is more common that separate files are used:
|
||
|
||
// lib.rs
|
||
mod tui;
|
||
mod domainlogic;
|
||
|
||
// tui.rs or tui/mod.rs
|
||
fn output(text: &str) {
|
||
...
|
||
}
|
||
|
||
This results in a project structure like this:
|
||
|
||
src/
|
||
- tui/
|
||
- mod.rs // <---- either this
|
||
- format.rs
|
||
- lib.rs
|
||
- main.rs
|
||
- tui.rs // <---- or this
|
||
|
||
Having multiple files has (at least) the following benefits:
|
||
|
||
- Lessens the probability of Git merge conflicts.
|
||
- Smaller files are typically more accessible.
|
||
|
||
Preludes
|
||
|
||
Preludes can be seen as a pattern to make using multiple types more
|
||
convenient. - Rust Docs
|
||
|
||
In every Rust module, the compiler inserts:
|
||
|
||
use std::prelude::v1::*;
|
||
|
||
For many crates, there are also preludes you can import by yourself:
|
||
|
||
use chrono::prelude::*;
|
||
|
||
Note that preludes can be harmful for dependability:
|
||
|
||
- they can introduce naming conflicts, if multiple crates use types
|
||
with the same name
|
||
- they can occlude where types are coming from, making code harder to
|
||
understand
|
||
- they are hard to update, as code where they are used is often
|
||
tightly coupled to what they contain
|
||
|
||
Tool #3 Objects
|
||
|
||
Data-only objects
|
||
|
||
enum Justification {
|
||
Left,
|
||
Right,
|
||
Center,
|
||
}
|
||
|
||
pub(crate) struct Line {
|
||
words: Vec<String>,
|
||
maximum_length: usize,
|
||
}
|
||
|
||
type Words = Vec<String>;
|
||
|
||
- provide variants (enum)
|
||
|
||
- group related information in memory (enum, struct)
|
||
|
||
- provide better-to-use names (type)
|
||
|
||
- support #[derive(..)] (e.g. Debug, Eq, …)
|
||
|
||
Non-Anemic Data-Classes
|
||
|
||
- Sometimes, people advise making classes method-free, aka they only
|
||
carry data.
|
||
|
||
- Martin Fowler and Eric Evans called this the Anemic Domain
|
||
Model.anemic = too few red blood cells; lack of energy
|
||
|
||
- When we work in object-oriented languages, our domain models should
|
||
be rich, i.e. structs should have appropriate methods.
|
||
|
||
- In many cases, dot syntax (method-style) makes your code easier to
|
||
grasp.
|
||
|
||
Rust’s Methods
|
||
|
||
Defining our data and behaviour:
|
||
|
||
pub(crate) struct Line {
|
||
words: Vec<String>,
|
||
maximum_length: usize,
|
||
}
|
||
|
||
impl Line {
|
||
// ...
|
||
}
|
||
|
||
Adding an associated function:
|
||
|
||
// in impl block
|
||
pub(crate) fn new(maximum_length: usize) -> Self {
|
||
Self {
|
||
words: vec![],
|
||
maximum_length,
|
||
}
|
||
}
|
||
|
||
Methods start with self or &self or &mut self:
|
||
|
||
// in impl block
|
||
pub(crate) fn try_push(&mut self, word: String) -> Option<String> {
|
||
let current_length: usize = self.words.iter().map(|w| w.len()).sum();
|
||
let current_length_with_separator = current_length + (self.words.len()) * SEPARATOR_LENGTH;
|
||
if current_length_with_separator + SEPARATOR_LENGTH + word.len() <= self.maximum_length {
|
||
self.words.push(word);
|
||
None
|
||
} else {
|
||
Some(word)
|
||
}
|
||
}
|
||
|
||
Extension Traits
|
||
|
||
Challenge
|
||
|
||
- roxmltree is used to work with XML structures.
|
||
|
||
- attribute(name) returns an Option, but in our context, a None would
|
||
be a ParsingError.
|
||
|
||
let name = xmlnode.attribute("name")?; // <- ? is impossible as attribute() returns Option
|
||
|
||
Solution
|
||
|
||
use path::to::GetAttribute;
|
||
|
||
let name = xmlnode.try_get_attribute("name")?;
|
||
|
||
pub trait GetAttribute {
|
||
fn try_get_attribute(&self, attribute: &str) -> Result<&str, ParsingError>;
|
||
}
|
||
|
||
impl GetAttribute for roxmltree::Node<'_, '_> {
|
||
fn try_get_attribute(&self, attribute: &str) -> Result<&str, ParsingError> {
|
||
self.attribute(attribute)
|
||
.ok_or_else(|| ParsingError::MissingXMLAttribute(attribute.to_string()))
|
||
}
|
||
}
|
||
|
||
The Software Cell
|
||
|
||
- Functional Code is (usually) free of:
|
||
|
||
- mutable data
|
||
- state
|
||
- side effects
|
||
- resource access
|
||
|
||
- Functional Code is great for testability.
|
||
|
||
- Imperative Code that barely contains logic often needs no test.What
|
||
do you get from testing if println!() really works?
|
||
|
||
+-----------------------------+
|
||
| |
|
||
| Imperative Shell |
|
||
| (e.g. access DB) |
|
||
| |
|
||
| +---------------------+ |
|
||
| | | |
|
||
| | Functional Core | |
|
||
| | (e.g. compute | |
|
||
| | order total) | |
|
||
| | | |
|
||
| +---------------------+ |
|
||
+-----------------------------+
|
||
|
||
Tip: Try to keep your domain logic free of imperative code and
|
||
dependencies on resources (sockets, database, but also time, …).
|
||
|
||
For more details, consider this Twitter client example.
|
||
|
||
Tool #4: Crates & Packages
|
||
|
||
Crates
|
||
|
||
- are composed of modules, with a crate root being the top-level
|
||
module
|
||
- lib.rs is the default top-level file for library crates
|
||
- main.rs or bin/*.rs are the default top-level files for binary
|
||
crates
|
||
|
||
Packages
|
||
|
||
- improve separation and support collaboration
|
||
- have a version (could also be just a Git commit hash)
|
||
- contains zero or one library crate and arbitrary many binary crates
|
||
- can be put on crates.io
|
||
|
||
Tool #5: Workspaces
|
||
|
||
Workspaces can be used for grouping together multiple parallel packages
|
||
(e.g. in a single repo). Therefore, we must put [workspace] in the
|
||
top-level Cargo.toml like this:
|
||
|
||
[workspace]
|
||
members = [
|
||
"fancy-rs",
|
||
"fancy-rs-cli-util",
|
||
"cli",
|
||
]
|
||
|
||
As a result, Cargo.lock, compilation settings, and output directories
|
||
(target) are now shared for all packages in the workspace.
|
||
|
||
More details can be found in the Cargo Book.
|
||
|
||
Tool #6: Repos
|
||
|
||
- Allow you to organize your project’s history (commits) and variants
|
||
(branches).
|
||
|
||
- Supporting tools (e.g. GitLab) allow to manage the project
|
||
surroundings (issues, wiki, website, continuous integration, …).
|
||
|
||
- Normally, each package on crates.io has a dedicated repo (often on
|
||
GitHub) to facilitate collaboration.
|
||
|
||
How to size your repo is a popular topic of discussion: Mono- or
|
||
Multi-Repo?
|
||
|
||
W05: Work Sheet
|
||
|
||
Order Principles
|
||
|
||
- Think about the last time you had to review someone else’s code (if
|
||
you haven’t yet or can’t remember, ask a fellow student to show you
|
||
some recent code). Describe how well you could comprehend the code
|
||
and describe which principles the code adhered or didn’t adhere to.
|
||
Come up with ideas on how the code can be changed to have more
|
||
order.
|
||
|
||
- Reconsider your fizzbuzz code you wrote in U02 and make it adhere to
|
||
the IOSP principle.
|
||
|
||
Rust Order Tools
|
||
|
||
- Do the Rustlings exercises modules.
|
||
|
||
Refactor to Order
|
||
|
||
Consider the following binary Rust crate with its Cargo.toml:
|
||
|
||
[package]
|
||
name = "greeter"
|
||
version = "0.1.0"
|
||
authors = ["Ferris Kondō"]
|
||
edition = "2018"
|
||
|
||
[dependencies]
|
||
csv = "1.1"
|
||
|
||
and main.rs:
|
||
|
||
fn main() {
|
||
println!("Name:");
|
||
let mut name = String::new();
|
||
std::io::stdin()
|
||
.read_line(&mut name)
|
||
.expect("Failed to read line");
|
||
let name : &str = name.trim().into();
|
||
|
||
const GUEST_FILE: &str = "guests.csv";
|
||
|
||
let file = std::fs::OpenOptions::new()
|
||
.create(true)
|
||
.append(true)
|
||
.open(GUEST_FILE)
|
||
.expect("Could not work with file.");
|
||
|
||
csv::Writer::from_writer(file)
|
||
.write_record(&[name])
|
||
.expect("Could not write.");
|
||
|
||
let file = std::fs::OpenOptions::new()
|
||
.create(true)
|
||
.read(true)
|
||
.write(true)
|
||
.open(GUEST_FILE)
|
||
.expect("Could not work with file");
|
||
|
||
let visits = csv::Reader::from_reader(file)
|
||
.records()
|
||
.into_iter()
|
||
.filter_map(|result| {
|
||
let record = result.expect("Couldn't not read entry");
|
||
if let Some(r) = record.get(0) {
|
||
if r == name {
|
||
return Some(1);
|
||
}
|
||
}
|
||
None
|
||
})
|
||
.sum();
|
||
|
||
let greeting = match visits {
|
||
1 => format!("Hello, {}!", name),
|
||
2 => format!("Welcome back, {}!", name),
|
||
25 => format!(
|
||
"Hello my good friend, {}! Congrats! You are now a platinum guest!",
|
||
name
|
||
),
|
||
_ => format!("Hello my good friend, {}!", name),
|
||
};
|
||
|
||
println!("{}", greeting);
|
||
}
|
||
|
||
Your task is now to refactor this into something that has more order, is
|
||
cleaner and hence more comprehensible and maintainable. Proceed as
|
||
follows:
|
||
|
||
1. Bring the system under test to ensure you are not breaking anything.
|
||
Do so by
|
||
|
||
1. identifying the domain logic in the program,
|
||
2. extracting it into a function, and
|
||
3. writing regression tests against it that capture what the system
|
||
is currently doing.
|
||
|
||
2. Use your first tidiness tool and introduce functions, where you feel
|
||
like blocks of code belong together. At least your main() function
|
||
should become a pure integration function.
|
||
|
||
3. Use your second tidiness tool and introduce modules to group
|
||
functionality together (e.g. by concern).
|
||
|
||
4. Use your third tidiness tool and introduce objects (they can share a
|
||
module):
|
||
|
||
- VisitEntry: The entry can be constructed from a multi-line
|
||
string (fn from_str(name: &str) -> VisitEntry), can be turned
|
||
into a greeting (fn to_greeting(&self) -> String) and has public
|
||
getters for its fields.
|
||
- VisitDatabase: The database can be created by specifying a path
|
||
(fn new(path: &str) -> VisitDatabase) and supports two
|
||
functions: fn register_visit(&mut self, name: &str) -> () and
|
||
fn retrieve_visits(&self, name: &str) -> u32.
|
||
|
||
5. Use your fourth tidiness tool and split the functionality into
|
||
crates. There are some functions that deal with logic you need for a
|
||
command-line interface application. These should remain in the
|
||
binary crate. Extract the remaining functions into a parallel
|
||
library crate so that other user interfaces (e.g. a web GUI) can be
|
||
used with the same logic. The binary crate afterwards uses the
|
||
public API of greeter.
|
||
|
||
6. Use your fifth tidiness tool to split the crates into several
|
||
folders of a workspace. After 5. you have two crates in one folder:
|
||
a binary and a library. Change the structure into a Rust workspace,
|
||
where you have two members: greeter (the CLI) and greetings (the
|
||
library).
|
||
|
||
7. Use your sixth tidiness tool to turn your workspace into a Git
|
||
repository. Add a README.md explaining the usage and a sensible
|
||
.gitignore. Push the results to a GitLab repository.
|
||
|
||
Storyline
|
||
|
||
1. Look at the initial code.
|
||
2. Extract a function.
|
||
3. Deal with two error types.
|
||
4. Introduce ?.
|
||
5. Introduce custom error type.
|
||
6. Add thiserror.
|
||
7. Add color_eyre.
|
||
|
||
U06: How to Err
|
||
|
||
After having developed algorithms and data structures to compute things,
|
||
the senior engineers want to introduce you to the code at DSys that
|
||
involves interacting with the operating system or other systems. First,
|
||
you learn about what can go wrong (in Rust and other languages) and what
|
||
different handling strategies there are. With these basic differences in
|
||
mind, we first look at std support for errors and later at third-party
|
||
crates to work with errors.
|
||
|
||
S06: Sample Solution
|
||
|
||
- Rustlings: discussed in class.
|
||
|
||
- Refactor: discussed in class.
|
||
|
||
std Error Handling
|
||
|
||
Don’t panic! …unless something happened that must never ever happen
|
||
|
||
panic!() is your Emergency Stop and allows you to handle programming
|
||
mistakes.
|
||
|
||
enum Color {
|
||
Orange,
|
||
Boring
|
||
}
|
||
|
||
fn parse(color: &str) -> Color {
|
||
match color {
|
||
"Orange" => Color::Orange,
|
||
"Boring" => Color::Boring,
|
||
_ => unimplemented!("All colors but orange are boring")
|
||
}
|
||
}
|
||
|
||
In this example, any non-orange color is considered boring and if a
|
||
different string is passed to parse the program panics (maybe this is a
|
||
bit exaggerated behaviour by Ferris).
|
||
|
||
When should you panic?
|
||
|
||
If you answer any of the following with yes, then panic!():
|
||
|
||
- Is continuing with the program incorrect?
|
||
|
||
- Did you attempt to access memory that you must not?either because
|
||
it’s not yours or uninitialized…
|
||
|
||
- Is there no way that your caller could recover from the current
|
||
situation?e.g. caller asked you to do something that is knowingly
|
||
unimplemented!()
|
||
|
||
- Would you need to change the code to fix it?
|
||
|
||
- Is this failure really absolutely unexpected?
|
||
|
||
Are you writing a library? If yes, panicking is generally discouraged.
|
||
|
||
Panic first, change later! (aka “Fail fast”)except if you write
|
||
safety-critical software where stopping is not a safe state!
|
||
|
||
Nice Panicking Macros
|
||
|
||
- unreachable!: impossible locationat least this is the programmer’s
|
||
assumption
|
||
|
||
- todo! / unimplemented!: not yet implemented
|
||
|
||
- assert!: check preconditions, tests
|
||
|
||
A Matter of Expectations
|
||
|
||
Expect Results
|
||
|
||
enum Result<T,E> {
|
||
Ok(T),
|
||
Err(E)
|
||
}
|
||
|
||
Success is expected and Failure the exception
|
||
|
||
Example: Parsing Numbers
|
||
|
||
let number : Result<u32, _> = guess.parse();
|
||
|
||
Check your Options
|
||
|
||
enum Option<T> {
|
||
Some(T),
|
||
None
|
||
}
|
||
|
||
Both cases are expected
|
||
|
||
Example: Vector Access
|
||
|
||
let head : Option<T> = list.get(0);
|
||
|
||
What to do with Results & Options?
|
||
|
||
Success (Ok(T) / Some(T))
|
||
|
||
- unwraprecoverable to unrecoverable panic!
|
||
- expect("..")prefered over unwrap
|
||
- unwrap_or_else(|| Default {})closure generates default value
|
||
- unwrap_or_default()if T implements Default
|
||
- is_ok, is_somemostly used in tests
|
||
|
||
Failure (Err(E) / None)
|
||
|
||
- unwrap_errpanic if Ok; common in tests
|
||
- expect_err("..")analogous; common in tests
|
||
- is_err, is_nonemostly used in tests
|
||
|
||
General Handling
|
||
|
||
- match option { ... }for any non-boilerplate handling
|
||
- if let Some(..) = opt { ... }might produce confusing code
|
||
|
||
Conversions
|
||
|
||
- result.ok()Result<T,E> -> Option<T>
|
||
- opt.ok_or(err_value : E)Option<T> -> Result<T,E>
|
||
|
||
Return Results
|
||
|
||
Return a Result:
|
||
|
||
fn get_guess() -> Result<u32, std::num::ParseIntError> {
|
||
let mut guess = String::new();
|
||
io::stdin()
|
||
.read_line(&mut guess)
|
||
.expect("Failed to read line");
|
||
guess.trim().parse()
|
||
}
|
||
|
||
Alternatively, you can return an opaque error:
|
||
|
||
fn get_guess() -> Result<u32, Box<dyn std::error::Error>> {
|
||
let mut guess = String::new();
|
||
match io::stdin().read_line(&mut guess) {
|
||
Ok(_) => {}
|
||
Err(e) => return Err(Box::new(e)),
|
||
}
|
||
match guess.trim().parse() {
|
||
Ok(r) => Ok(r),
|
||
Err(e) => return Err(Box::new(e)),
|
||
}
|
||
}
|
||
|
||
What ?
|
||
|
||
fn get_guess() -> Result<u32, Box<dyn std::error::Error>> {
|
||
let mut guess = String::new();
|
||
io::stdin().read_line(&mut guess)?;
|
||
|
||
Ok(guess.trim().parse()?)
|
||
}
|
||
|
||
- Leverages the From trait.In our case: automatically boxes into
|
||
std::error::Errors.
|
||
|
||
- Older code used try!(..) which does the same.No longer recommended
|
||
as it is more verbose and less “chainable”
|
||
|
||
Mapping Errors
|
||
|
||
- Imagine Result and Option as lists with either 0 or 1 element.
|
||
|
||
- map and map_err allow to transform one of the variants, while
|
||
keeping the other.
|
||
|
||
- Example: Transformation into custom errors (e.g. in a library).
|
||
|
||
let threshold : f64 = threshold.parse().map_err(|_| {
|
||
MarvinRsError::ParsingError(format!("Could not parse threshold: {}", threshold))
|
||
})?;
|
||
|
||
Use Your Results for Great Good
|
||
|
||
We lied to you a little bit before. As in C, Rust allows you to
|
||
accidentally ignore an error, if the function returns Result<(), E>
|
||
(i.e. no result is returned that you would consume). However, Result in
|
||
Rust is #[must_use], so by default rustc warns you in this case:
|
||
|
||
Compiling readfile v0.1.0 (file:///.../readfile)
|
||
warning: unused `std::result::Result` which must be used
|
||
--> src/main.rs:8:5
|
||
|
|
||
8 | file.read_to_string(&mut content);
|
||
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||
= note: #[warn(unused_must_use)] on by default
|
||
|
||
And you can do even better with this at the top-level module in all your
|
||
crates:
|
||
|
||
#![deny(unused_results)]
|
||
|
||
Now every unused result hinders successful compilation!
|
||
|
||
Custom Errors
|
||
|
||
If you are writing a fancy lib crate, here is how you can implement your
|
||
custom error:
|
||
|
||
#[derive(Debug)]
|
||
enum CustomError {
|
||
Io,
|
||
Parsing,
|
||
}
|
||
|
||
impl Error for CustomError {}
|
||
|
||
impl Display for CustomError {
|
||
...
|
||
}
|
||
|
||
impl From<std::io::Error> for CustomError {
|
||
fn from(_: std::io::Error) -> Self {
|
||
Self::Io {}
|
||
}
|
||
}
|
||
|
||
This is lots of work… and we see later how to save effort here.
|
||
|
||
Rust’s error handling is cool
|
||
|
||
As usual with Rust, it forces you to be explicit and say what you want!
|
||
For instance, you are forced to clearly separate panics from recoverable
|
||
errors. In consequence, your programs fail fast, loud, and very close to
|
||
the fault:
|
||
|
||
- loud: you cannot easily ignore error (compare C)
|
||
- fast: a panic immediately halts your program
|
||
- close: usually, there is no need to search long for root cause
|
||
|
||
The expect(...) function is a good way to document the programmer’s
|
||
assumption. Furthermore, function signatures make failure possibility
|
||
explicit. Finally, the compiler enforces error handling, as there is no
|
||
way to access the inner value of Result without handling.
|
||
|
||
But Rust error handling is also tedious, as…
|
||
|
||
- the compiler is telling all possible ways in which your program can
|
||
faileven impossible ones that it simply can’t check statically
|
||
|
||
- Rust focusses on the sad path through your program instead of the
|
||
happy pathis rustc a relative of Marvin?
|
||
|
||
Prototyping Tip: Use expect a lot or cheat your way to success with
|
||
unwrap.
|
||
|
||
Summary
|
||
|
||
What did you learn?
|
||
|
||
- std Error Handling
|
||
- panic! for things that should never ever happenand in which case
|
||
crashing is safe
|
||
- Result for things that should work
|
||
- Option for things that could work
|
||
- 3rd party error handling
|
||
- anyhow if you don’t care too much and talk to a user
|
||
- thiserror if you care and talk to other software components
|
||
|
||
Where can you learn more?
|
||
|
||
- Rust Book: Ch. 09
|
||
- Programming Rust: Ch. 7
|
||
- Rust in Action: Ch. 8.5
|
||
- Rust for Rustaceans: Ch. 5
|
||
- Embedded Software Development for Safety-Critical Systems: Ch. 8, 9
|
||
- Nick Cameron’s Error Docs
|
||
|
||
Third-Party Error Handling Crates
|
||
|
||
WARNING: Ecosystem under heavy construction work
|
||
|
||
Consider for example the following online resources:
|
||
|
||
- Aug 20th 2020: RustConf, Jane Lusby on Error Handling
|
||
- Sep 18th 2020: Announcing the Error Handling WG
|
||
|
||
However, we recommend two crates that help you handle and report errors.
|
||
|
||
thiserror
|
||
|
||
Use if you care about the exact error, because
|
||
|
||
- you write a library that you provide to others
|
||
- your code is communicating with other pieces of software, that might
|
||
be able to recover
|
||
|
||
[dependencies]
|
||
thiserror = "1.0.20"
|
||
|
||
use thiserror::Error;
|
||
|
||
#[non_exhaustive]
|
||
#[derive(Error, Debug)]
|
||
enum CustomError {
|
||
#[error("IO")]
|
||
Io(#[from] std::io::Error),
|
||
#[error("Parsing : {0:#?}")]
|
||
Parsing(#[from] std::num::ParseIntError),
|
||
}
|
||
|
||
anyhow
|
||
|
||
Use if you don’t care too much about the exact error, because
|
||
|
||
- you present it to a user and not another piece of software
|
||
- you have custom error handling / reporting mechanism
|
||
|
||
This is forked by eyre to include error reporting via backtracking.
|
||
color-eyre encapsulates eyre and improves the visual representation.
|
||
|
||
[dependencies]
|
||
color-eyre = "0.5"
|
||
|
||
use color_eyre::eyre::Result;
|
||
|
||
fn main() -> Result<()> {
|
||
color_eyre::install()?;
|
||
|
||
// ...
|
||
Ok(())
|
||
}
|
||
|
||
Other crates you might encounter
|
||
|
||
- quick-error + error-chain: The old guard (may be encountered in
|
||
older code)
|
||
|
||
- failure: Precursor to thiserror.
|
||
|
||
- fehler: Pitching #[throws] syntax and implicit Ok-wrapping.
|
||
|
||
- snafu: Similar to thiserror.
|
||
|
||
W06: Work Sheet
|
||
|
||
- Do the Rustlings exercises error_handling.
|
||
|
||
- Consider the “Refactor to Order” task on W05. There are several
|
||
instances of errors being expect-ed. Your task is to
|
||
|
||
- introduce an error enumeration using thiserror,
|
||
- change the main and refactored functions to return Results. For
|
||
the main function, use the color_eyre Result as you only report
|
||
errors; other functions should have your custom error type as
|
||
the error variant of Result,
|
||
- replace all calls to expect with appropriate calls to ?; use
|
||
map_err if you need to convert a std error to your custom error
|
||
type, and
|
||
- validate the created implementation by intentionally introducing
|
||
faults that lead to errors.
|
||
|
||
Let’s Work with Errors
|
||
|
||
In the following video, we use the Guessing Game from Rust Book Chapter
|
||
2 as a basis and introduce more elaborate error handling and reporting:
|
||
|
||
At the end of the video, Andreas forgot to add
|
||
|
||
color_eyre::install()?
|
||
|
||
to the beginning of main. If you do so, the output is also colorful:
|
||
|
||
[Color Eyre Output]
|
||
|
||
What Can Go Wrong?
|
||
|
||
Before you get started, you think about what can go wrong in larger
|
||
software systems and come up with the following answers:
|
||
|
||
- Programming Faultse.g., bugs, errors in specification, …
|
||
|
||
- System Errorse.g., can’t open file
|
||
|
||
- User Errorse.g., provide wrong input… (intentionally?)
|
||
|
||
Remember our considerations about faults, errors, and failures in U01.
|
||
The major focus of this unit is going to be on
|
||
|
||
- fault prevention (some concepts we learn to avoid that we introduce
|
||
faults) and
|
||
|
||
- fault tolerance (both by handling or reporting errors that are
|
||
caused by a fault)
|
||
|
||
to create more dependable systems.
|
||
|
||
When an error occurs, how can this be handled?
|
||
|
||
- Stop the program immediately
|
||
- Attempt to recover from the situation by…
|
||
- Repeating
|
||
- Doing something else
|
||
- Resorting to a well-known default
|
||
- Pass it up the responsibility chain
|
||
- Notify the user
|
||
|
||
C Way to “Exception Handling”
|
||
|
||
Before we look into how error / exception handling is done in Rust, we
|
||
have a look at how the C language handles this:
|
||
|
||
struct sockaddr_in address;
|
||
int sockfd = socket(AF_INET, SOCK_STREAM, 0);
|
||
// everything alright?
|
||
address.sin_family = AF_INET;
|
||
address.sin_addr.s_addr = INADDR_ANY;
|
||
address.sin_port = htons( PORT );
|
||
bind(sockfd, (struct sockaddr *) &address, sizeof(address));
|
||
// still?
|
||
|
||
In this snippet, there are multiple places where we can fail, e.g.
|
||
|
||
- socket (might fail due to missing permissions, lack of unused file
|
||
descriptors, …) and
|
||
- bind (might fail due to invalid configuration).
|
||
|
||
The code does not show any kind of error handling and this is in fact
|
||
the case: no handling is done and in the erroneous case, the program
|
||
continues… doing potentially harmful things (e.g. binding to a negative
|
||
sockfd).
|
||
|
||
Rust Approach with Result<T,E>
|
||
|
||
In Rust, the approach is quite different. Here, you see a similar
|
||
example (network handling):
|
||
|
||
fn handle_client(stream: TcpStream) { ... }
|
||
|
||
fn main() -> std::io::Result<()> {
|
||
let listener_r : Result<TcpListener, std::io::Error> = TcpListener::bind("127.0.0.1:80");
|
||
let listener : TcpListener = match listener_r {
|
||
Ok(l) => l,
|
||
Err(_) => panic!("Failed to bind");
|
||
};
|
||
// let listener = TcpListener::bind("127.0.0.1:80");
|
||
for stream in listener.incoming() {
|
||
// defined on TcpListener ^ not on Result
|
||
handle_client(stream?);
|
||
}
|
||
Ok(())
|
||
}
|
||
|
||
Rust makes sure that
|
||
|
||
- you are able to properly implement these different error causes and
|
||
error handling mechanisms.
|
||
- you do it properly — by enforcing error handling.
|
||
|
||
thereby making you create more reliable software.
|
||
|
||
Null Handling
|
||
|
||
A somewhat similar case to the error handling is handling of the NULL
|
||
value you already learned about in U03 — Tony Hoare’s billion-dollar
|
||
mistake.
|
||
|
||
In many older languages (or code written in old versions of them),
|
||
handling is done like in this Java example:
|
||
|
||
public class MainClass {
|
||
static String hundredth_function(DbEntry entry) {
|
||
return entry.name;
|
||
}
|
||
|
||
public static void main(String[] args) {
|
||
// ...
|
||
DbEntry entry = db.get_entry();
|
||
first_function(entry);
|
||
}
|
||
}
|
||
|
||
which might lead to
|
||
|
||
Exception in thread "main" java.lang.NullPointerException: Cannot read field "name" because "<parameter1>"
|
||
is null
|
||
at MainClass.hundredth_function(MainClass.java:6)
|
||
at MainClass.main(MainClass.java:11)
|
||
|
||
In this scenario, you can enjoy tracing back the error to the point
|
||
where it became null but shouldn’t have. The issue here is that null has
|
||
the same static type as an instance of the type used. Hence, checking
|
||
for null must be done manually (and also causes runtime costs).
|
||
|
||
In Rust (and nowadays in modern C#, Kotlin, …) we have the Option type:
|
||
|
||
fn hundredth_function(entry: Entry) -> String {
|
||
entry.name.clone()
|
||
}
|
||
|
||
// ...
|
||
|
||
fn main() {
|
||
let entry : Option<Entry> = db.get_entry(...);
|
||
first_function(entry);
|
||
}
|
||
|
||
Here, to debug a None value, you only have to check where Option’s are
|
||
passed around. Furthermore, checks for None are enforced before you are
|
||
allowed to access the inside of the Option.
|
||
|
||
Note that some of the older languages nowadays have support for a
|
||
Option-like language construct. However, they do not apply this as
|
||
thoroughly as Rust, as legacy code was created without this approach and
|
||
is still around.
|
||
|
||
Application Programming Interfaces (APIs)
|
||
|
||
When you develop software, it always provides a means to interface with
|
||
it. While applications provide, for instance, graphical or terminal user
|
||
interfaces, a software library or framework provides an Application
|
||
Programming Interface. What is important about the latter is that
|
||
applications or other libraries again build on top of the library, hence
|
||
depend on the API. When you are the author of that library, users care
|
||
about the way your API is designed and maintained.
|
||
|
||
API Properties
|
||
|
||
Rust for Rustaceans introduces four properties an ideal API should have:
|
||
unsurprising, flexible, obvious, and constrained. As always with
|
||
properties, they cannot be maximized at the same time, so it is your
|
||
task to find a good balance.
|
||
|
||
Unsurprising
|
||
|
||
There are situations in life where surprises might be appropriate or
|
||
even appreciated[10]. When developing dependable software, that is
|
||
certainly not the case. Surprises come in many forms, but at the core,
|
||
they are expectations that are not met. For instance, a functionality
|
||
having a surprising name (e.g., frobnicate on a list to add an element)
|
||
or functionality not being provided as expected (e.g., as in other
|
||
established solutions).
|
||
|
||
This brings us to the principle of least surprise / law of least
|
||
astonishment, stating that an interface should work in a way it is
|
||
expected by an as-large-as-possible group of users. For our dependable
|
||
Rust code, this means that we:
|
||
|
||
- Follow naming practices: the standard library as well as popular
|
||
third-party crates have their own taxonomy to name behaviour they
|
||
are providing, e.g., iter() methods to produce an iterator of
|
||
elements. If you provide a way to iterate over your data structure
|
||
that method should for sure use iter() as a name. If you do so, make
|
||
sure the behaviour is really consistent with the way how others
|
||
implement iter(), because it might also be surprising if you re-use
|
||
a name for slightly different functionality. Finally, if you work in
|
||
a certain application domain, it is also good advice to use terms
|
||
from this domain as consistently as possible.
|
||
- Implement common traits: the standard library as well as popular
|
||
third-party crates (e.g., serde) provide traits that might be
|
||
interesting for the data structures in your API. This is especially
|
||
important because users of your API cannot retroactively add it to
|
||
the types you are defining (you can only implement traits on types
|
||
from the crate you are developing). Hence, if any of the standard
|
||
traits (e.g., Debug, Clone, or Default) make sense for your
|
||
implementation, add them. In many cases, you might also want to
|
||
allow for equality checks (PartialEq and Eq), ordering (PartialOrd
|
||
and Ord), or hashing (Hash).
|
||
|
||
Flexible
|
||
|
||
Additionally, APIs should be flexible so that users have the option to
|
||
use them in as many contexts as possible. This includes avoiding
|
||
unnecessary restrictions that usually come in the form of function
|
||
parameter types. An example restriction would be to only implement a
|
||
function for a String parameter and not for &str or other Rust string
|
||
types. Function return types and values are what our API promises — and
|
||
should be only limited to those that it can keep.
|
||
|
||
A set of examples are the following function signatures that implement
|
||
different contracts (have different restrictions and promises):
|
||
|
||
fn frobnicate1(s: String) -> String
|
||
fn frobnicate2(s: &str) -> Cow<'_, str>
|
||
fn frobnicate3(s: impl AsRef<str>) -> impl AsRef<str>
|
||
|
||
All have in common that they take and return string types. For the
|
||
first, the caller must own the String and move it into the function,
|
||
which in turn returns another owned String. Making this function
|
||
allocation-free is not possible in a backwards-compatible way. For the
|
||
second, the caller is not required to own the string, but only needs a
|
||
reference (if they own it, they must convert it to &str). Returning a
|
||
Cow (copy-on-write) means it could be a reference or owned variant.
|
||
Changing this later is also not backwards-compatible. For the third, we
|
||
have very low restrictions as it only specifies that something that can
|
||
be converted to a string reference is passed in and returned by the
|
||
function.
|
||
|
||
Note that there is no better or worse API, but it depends on what you
|
||
want to achieve today and how you expect this API to change in the
|
||
future. Deciding whether parameters must be owned or borrowed is one of
|
||
the most common API decisions you have to make.
|
||
|
||
Obvious
|
||
|
||
An obvious API makes it as easy as possible for users to understand the
|
||
interface and as hard as possible for them to use it incorrectly. This
|
||
can be achieved by two means:
|
||
|
||
First, by elaborate documentation. This includes special sections on
|
||
panics (i.e., where the API could be used inappropriately, stopping
|
||
everything), errors (i.e., where inappropriate usages can be handled),
|
||
and safety aspects (i.e., invariants that must be upheld when working
|
||
with unsafe interfaces). Ideally, the documentation also contains
|
||
end-to-end examples, showcasing how to use the API.
|
||
|
||
Second, the type system helps to encode how the API should be used.
|
||
Having dedicated types, using traits for shared functionality, etc. help
|
||
to make the interface obvious, self-documenting (no additional text is
|
||
needed), and misuse-resistant (type mismatches are caused by
|
||
inappropriate usage of types). One example for the latter is semantic
|
||
typing that you have already seen in U04, where we used enums to
|
||
properly name boolean variants or newtype structs.
|
||
|
||
Constrained
|
||
|
||
Finally, it is a common truth that at some point in time, every piece of
|
||
your API (everything that is public) will be used by someone and changes
|
||
to these elements become backwards-incompatible.
|
||
|
||
For our dependable Rust code this means we should:
|
||
|
||
- Be careful with public fields. If all fields of a struct are public,
|
||
the struct can be created using the StructName { ... } syntax. If we
|
||
later want to add or remove a field from the struct, this breaks all
|
||
usages. Instead, it is advised to either a) do not use public fields
|
||
at all or b) declare #[non_exhaustive] on the struct, to prohibit
|
||
the use of said construction mechanism.
|
||
- When re-exporting types from other libraries, the newtype pattern
|
||
should be applied and methods should be provided on the newtype.
|
||
Thereby, we promise less and changes to the inner type can be hidden
|
||
from the outside.
|
||
|
||
(Semantic) Versioning
|
||
|
||
Though there are plenty of ways to identify versions of software, the
|
||
Semantic Versioning (SemVer) is one of the most common approaches to
|
||
this. These version numbers most of the time consist of three parts:
|
||
MAJOR.MINOR.PATCH (e.g., 3.1.4). Sometimes, additional labels are added
|
||
to indicate pre-release versions or build metadata (e.g., 3.1.4-alpha or
|
||
3.1.4-b68177). SemVer forces you to increment the:
|
||
|
||
1. MAJOR version when you make incompatible API changes (aka breaking
|
||
changes),
|
||
2. MINOR version when you add functionality in a backwards-compatible
|
||
manner, and
|
||
3. PATCH version when you make backwards-compatible bug fixes.
|
||
|
||
Using conventional commits we covered before, we add a BREAKING CHANGE:
|
||
footer to the respective commit message (e.g., like this). Afterwards,
|
||
an increment in the MAJOR version is required.
|
||
|
||
A special case of semantic versioning is Calendar Versioning (CalVer).
|
||
Many projects out there use a date-based version (e.g. using the release
|
||
year as the major version). CalVer is an attempt to standardize date-
|
||
(or better calendar-)based version schemes. A popular example is the
|
||
Ubuntu Linux operating system that uses this scheme: . Ubuntu is
|
||
released twice a year (in April and October), so that this year’s
|
||
releases would be 22.04 and 22.10.
|
||
|
||
cargo-semver-checks
|
||
|
||
Cargo packages are built with the SemVer approach in mind. Hence, when
|
||
you are providing a library crate with an API, you should ensure that
|
||
your package versioning policy follows SemVer. The Cargo Book has a
|
||
chapter on SemVer Compatibility, outlining how modifications of your API
|
||
should be reflected in the version. This is in plain English, and to be
|
||
honest, it is very easy to modify your code and forget about its impact
|
||
on the API. Therefore, the community has created cargo-semver-checks to
|
||
automate the process—allowing CI release checks as well. Eventually, it
|
||
is planned that this plugin becomes part of cargo itself.
|
||
|
||
Assume you have the following lib.rs:
|
||
|
||
pub fn get_blacklist() -> Vec<&'static str> {
|
||
vec![
|
||
"8.8.8.8"
|
||
]
|
||
}
|
||
|
||
published using the following Cargo.toml
|
||
|
||
[package]
|
||
name = "foss-rs"
|
||
version = "1.0.0"
|
||
edition = "2021"
|
||
|
||
Following the general trend to avoid exclusionary language, we want to
|
||
provide a denylist in the future. After changing the function name, we
|
||
run cargo semver-checks check-release --baseline-rev f7e8a5 (using a Git
|
||
revision as a example). This yields
|
||
|
||
Cloning f7e8a5
|
||
Parsing foss-rs v0.1.0 (current)
|
||
Parsing foss-rs v0.1.0 (baseline)
|
||
Checking foss-rs v0.1.0 -> v0.1.0 (no change)
|
||
Completed [ 0.063s] 22 checks; 21 passed, 1 failed, 0 unnecessary
|
||
|
||
--- failure function_missing: pub fn removed or renamed ---
|
||
|
||
Description:
|
||
A publicly-visible function cannot be imported by its prior path. A `pub use` may have been removed, or the function itself may have been renamed or removed entirely.
|
||
ref: https://doc.rust-lang.org/cargo/reference/semver.html#item-remove
|
||
impl: https://github.com/obi1kenobi/cargo-semver-check/tree/v0.14.0/src/queries/function_missing.ron
|
||
|
||
Failed in:
|
||
function foss_rs::get_blacklist, previously in file src/lib.rs:5
|
||
Final [ 0.064s] semver requires new major version: 1 major and 0 minor checks failed
|
||
|
||
Only after changing the version to 2.0.0, the check passes successfully.
|
||
|
||
Documentation
|
||
|
||
Let’s start from your own experience… have you ever attempted to use a
|
||
third-party library for a programming project of yours? Was it
|
||
documented at all? If yes, how good, extensive, and up-to-date was the
|
||
documentation?
|
||
|
||
As you hopefully realized, documentation is important for developing
|
||
software. And as software engineers, in many cases, it is our task to
|
||
write the docs. The good news is, however, that in many situations,
|
||
documentation can easily be written alongside code, so that the
|
||
development workflow does not need to change.
|
||
|
||
But what is documentation exactly?
|
||
|
||
When we refer to “documentation,” we’re talking about every
|
||
supplemental text that an engineer needs to write to do their job: not
|
||
only standalone documents, but code comments as well. - Software
|
||
Engineering at Google
|
||
|
||
With documentation, we answer, for instance, the following questions
|
||
from the “SWE at Google” book:
|
||
|
||
- Why were the design decisions made?
|
||
- Why did we implement code in this manner?
|
||
- Why did you implement this code in this manner, if you’re looking at
|
||
your own code two years later?
|
||
|
||
Despite being able to answer these questions and keep the software
|
||
maintainable, documentation is often seen as a burden not paying
|
||
immediate returns. We at DSys want to make it clear to you that we do
|
||
not believe in this mindset, but rather value good documentation. By the
|
||
way, this is also the case for the larger Rust ecosystem, where most
|
||
crates come at least with a minimal set of helpful documentation and
|
||
many come with extensive API documentation and handbook-style usage
|
||
references. Here are a couple of incentives for documentation:
|
||
|
||
- Writing the docs for an API helps to make it consistent and
|
||
sensible. When you struggle documenting it, most likely it is not
|
||
yet fit for use by others.
|
||
- Writing the docs helps when maintaining the code and getting into
|
||
the mindset you had when you wrote it.
|
||
- Writing the docs improves the look of the code with respect to
|
||
professionalism. If you were to pick between two third-party
|
||
libraries with similar functionality, you would for sure pick the
|
||
one with the better docs first.
|
||
- Writing the docs reduces the number of questions you get. When
|
||
explaining things multiple times, the time would have better been
|
||
spent on writing a good doc once.
|
||
|
||
But what is good documentation? Here are three attributes that good
|
||
documentation fulfils:
|
||
|
||
- Complete - everything is documented
|
||
- Accurate - every detail is documented
|
||
- Clear - everything documented is straightforward to understand
|
||
|
||
Typically, you don’t find all three at the same time, as they tend to
|
||
contradict each other (e.g. high accuracy impedes clarity, while
|
||
completeness reduces clarity). Hence, it makes sense to think about
|
||
which of the three the document should achieve for its purpose and stick
|
||
to that.
|
||
|
||
Documentation appears in different formats:
|
||
|
||
- Reference documentation (e.g. code comments)
|
||
- Design documents
|
||
- Tutorials
|
||
- Conceptual documentation
|
||
- Landing pages
|
||
|
||
For the remainder of this section, we focus on how to do code comments
|
||
in Rust and talk about one approach to provide landing pages or
|
||
conceptual docs with GitLab.
|
||
|
||
Code comments usually come in one of two forms: 1) API comments or 2)
|
||
implementation comments. The former are directed at users of the API,
|
||
while the latter are directed at implementers. Hence, they serve
|
||
different purposes and cater to different audiences.
|
||
|
||
Rust Documentation
|
||
|
||
In Rust, you can access documentation like this:
|
||
|
||
- rustup doclocal, offline documentation of Rust
|
||
|
||
- cargo doclocal, offline documentation of current crate
|
||
|
||
Documentation is fully searchable and elements are color-coded:
|
||
|
||
- Primitive Type
|
||
- Type
|
||
- Struct
|
||
- Function
|
||
- Enum
|
||
- Trait
|
||
- Macro
|
||
- Module
|
||
|
||
Writing Documentation in .rs Files
|
||
|
||
Now, what can you document with comments in your Rust code files?
|
||
|
||
- Files - storing related functionality.
|
||
- Data structures - storing related data.
|
||
- Functions - implementing functionality.
|
||
|
||
All these language elements mentioned above can be annotated with
|
||
documentation. You can use //! for documentation from within an element
|
||
(e.g., a module) and /// for what follows (e.g., a function).
|
||
|
||
In lib.rs, you can for example do the following:
|
||
|
||
//! `fcapp` - The Fancy CLI App <--- Docs for the lib (module)
|
||
|
||
/// Generates a random number between 1 and 100 <--- Docs for `random` (function)
|
||
pub fn random() -> usize {
|
||
...
|
||
}
|
||
|
||
In the Rust ecosystem, crates are hosted on crates.io and the
|
||
documentation is uploaded to docs.rs. When you publish your crate, make
|
||
sure that both locations are used to create minimal surprise for
|
||
potential users of your code.
|
||
|
||
GitLab Pages
|
||
|
||
Now assume for a moment that you either host the documents for an
|
||
internal project (so docs.rs is no option) or you want to create a
|
||
static webpage for your code repository. GitLab has you covered by
|
||
GitLab Pages, allowing to build and serve the webpage. You need to
|
||
create the following job in the .gitlab-ci.yml:
|
||
|
||
# the 'pages' job will deploy and build your site to the 'public' path
|
||
pages:
|
||
stage: deploy
|
||
script:
|
||
- cargo doc --lib --no-deps
|
||
- cp -R ./target/doc public
|
||
artifacts:
|
||
paths:
|
||
- public
|
||
expire_in: 1 week
|
||
only:
|
||
- main
|
||
|
||
Note that any file put into /public is served later by Gitlab.
|
||
|
||
Handbooks with mdbook
|
||
|
||
The RTIC framework, for instance, uses a handbook as its landing page,
|
||
also providing tutorials and conceptual documentation. The authors use
|
||
mdbook, the tool that is also behind the Rust book and this coursebook.
|
||
Here is how to configure it using a book.toml:
|
||
|
||
[book]
|
||
authors = ["Ferris"]
|
||
language = "en"
|
||
multilingual = false
|
||
src = "docs"
|
||
title = "ferris-rs"
|
||
|
||
[build]
|
||
build-dir = "public"
|
||
|
||
You can leverage GitLab pages in a similar way, by making sure mdbook
|
||
exports to the public folder.
|
||
|
||
Changelogs with git-cliff
|
||
|
||
Finally, we want to touch on another form of document: changelogs. When
|
||
software systems evolve over time and have a certain userbase, it is
|
||
common to document (at least breaking) changes. Source control such as
|
||
Git enables that these data points are created easily — a commit with a
|
||
succinct message can convey the meaning. You can even go and apply
|
||
conventional commits, a popular form of structuring your commit
|
||
messages.
|
||
|
||
When you do so, git-cliff[11] helps you to build a changelog. Here is an
|
||
example cliff.toml file where you get sections per version of your repo
|
||
and subsections per type of change.
|
||
|
||
[changelog]
|
||
header = """
|
||
# Changelog\n
|
||
"""
|
||
body = """
|
||
{% if version %}\
|
||
## [{{ version | replace(from="v", to="") }}] - {{ timestamp | date(format="%Y-%m-%d") }}
|
||
{% else %}\
|
||
## [unreleased]
|
||
{% endif %}\
|
||
{% for group, commits in commits | group_by(attribute="group") %}
|
||
### {{ group | upper_first }}
|
||
{% for commit in commits %}
|
||
- {{ commit.message | upper_first }}\
|
||
{% endfor %}
|
||
{% endfor %}\n
|
||
"""
|
||
trim = true
|
||
footer = "<!-- generated by git-cliff -->"
|
||
|
||
[git]
|
||
conventional_commits = true
|
||
commit_parsers = [
|
||
{ message = "^bump*", group = "Version Updates"},
|
||
{ message = "^chore*", group = "Miscellaneous Tasks"},
|
||
{ message = "^ci*", group = "Continuous Integration"},
|
||
{ message = "^deps*", group = "Dependencies"},
|
||
{ message = "^feat*", group = "Features"},
|
||
{ message = "^fix*", group = "Bug Fixes"},
|
||
{ message = "^doc*", group = "Documentation"},
|
||
{ message = "^perf*", group = "Performance"},
|
||
{ message = "^refactor*", group = "Refactor"},
|
||
{ message = "^style*", group = "Styling"},
|
||
{ message = "^test*", group = "Testing"},
|
||
]
|
||
filter_commits = false
|
||
tag_pattern = "v[0-9]*"
|
||
|
||
With the following command, you can generate the CHANGELOG.md for your
|
||
project:
|
||
|
||
git cliff --output CHANGELOG.md
|
||
|
||
U07: Usable Software
|
||
|
||
Alright, it’s now been quite some time since you started at DSys and one
|
||
of the projects requires you to build a new software library from
|
||
scratch. As it is clear that this library will be used by other parties
|
||
as well, you have to take special care to make it usable and
|
||
maintainable (an aspect of dependability we covered in U01). In this
|
||
unit, we discuss in which ways we can improve this dimension:
|
||
|
||
- We talk about how writing documentation for your software is
|
||
essential and makes your software more usable. This not only
|
||
includes code comments, but also other pieces of information and
|
||
tools to generate & host this information.
|
||
- We have a look at APIs — which should be carefully designed and
|
||
maintained.
|
||
- We introduce supply chains & provenance as important topics in the
|
||
sharing of software for dependable systems.
|
||
|
||
Supply Chains & Provenance
|
||
|
||
The authors of this book are no lawyers. This section is attempting to
|
||
make software supply chains more clearly defined and this includes
|
||
copyright and license information. As such, use the presented tools to
|
||
improve your software metainformation. To make sure everything you
|
||
reuse / publish is legal, however, consult your favourite lawyer.
|
||
|
||
Supply chains describe how organizations, people, processes, etc.
|
||
contribute to supplying a product or service. When we talk about
|
||
Software Supply Chains, we are often interested in how the software is
|
||
composed out of parts. Each part has a Provenance, i.e. details on where
|
||
it comes from and under which conditions it has been developed. Similar
|
||
to a bill-of-material (BOM) in industrial manufacturing, Software Bills
|
||
of Material are getting increasingly relevant. While this is currently
|
||
strongly used in the US due to the Biden Executive Order from May 2021
|
||
making this mandatory for delivering software to federal organizations,
|
||
we can expect that similar regulations will emerge in Europe.
|
||
|
||
Even tough SBOMs themselves do not make the system more dependable in
|
||
itself, they help in making their development more dependable, as we get
|
||
transparency and traceability of the composition of software. This is
|
||
particularly true with respect to the security dimension of
|
||
dependability: knowing about a vulnerability in a specific software
|
||
version allows to trace it to software that depends on it. A common
|
||
issue today is that a) building software from scratch (and in-house) is
|
||
more and more infeasible due to the increasing complexity of systems,
|
||
and b) leveraging third-party software brings a large body of
|
||
functionality in that must be scrutinized. Hence, we must accept the
|
||
fact that sharing of software must become more dependable, i.e. the
|
||
correct, security-preserving and legal usage of third-party party
|
||
software must become more feasible.
|
||
|
||
A central information standard in this area is the Software Package Data
|
||
Exchange (SPDX). Beside licensing information (and a list of common
|
||
licenses), the SPDX specification allows to annotate files, store
|
||
checksums, and more. Other standard such as CycloneDX or SWID exist, but
|
||
we focus here on SPDX.
|
||
|
||
In the following, we assume that DSys wants to release the foss-rs crate
|
||
as Free Open Source (FOSS), making sure it is properly licensed and this
|
||
license is also clearly communicated.
|
||
|
||
REUSE Compliance Framework
|
||
|
||
<img src="https://reuse.software/img/reuse.png" width="15%" />
|
||
|
||
The purpose of REUSE is to clearly state the copyright and license of
|
||
any asset in your project. Thereto, it offers ways to annotate any file
|
||
with copyright via SPDX-FileCopyrightText information and license via
|
||
SPDX-License-Identifier. There are three ways:
|
||
|
||
- Comments, if the considered file format is textual and allows for
|
||
comments. In Rust files, for instance, we can have
|
||
// SPDX-FileCopyrightText: 2022 Ferris at the beginning of the file.
|
||
- .license files if either a) the file format does not support text
|
||
comments or b) you do not want to store it there. In this case, a
|
||
file with the same name plus a .license suffix can be stored there
|
||
and includes the SPDX-FileCopyrightText: 2022 Ferris header without
|
||
comment markings.
|
||
- dep5 is intended for large directories, where adding copyright to
|
||
all files is not doable. This approach supports file glob patterns,
|
||
e.g. *.rs to apply the information to all Rust source code files.
|
||
|
||
REUSE also provides a linter for checking compliance, the reuse-tool.
|
||
reuse uses your VCS (Version Control System), which means that it also
|
||
respects, for instance, .gitignore, and scans all files for appropriate
|
||
information. The easiest way to run it is using a Docker container:
|
||
|
||
docker run --rm --volume $(pwd):/data fsfe/reuse lint
|
||
|
||
Initially, our project does not comply. We can change this by adding
|
||
headers to the individual files:
|
||
|
||
reuse addheader --copyright "Ferris" --license="MIT" src/lib.rs
|
||
|
||
After that, the text files look like this:
|
||
|
||
// SPDX-FileCopyrightText: 2021 Ferris
|
||
//
|
||
// SPDX-License-Identifier: MIT
|
||
|
||
...
|
||
|
||
When we did this to all files, we can
|
||
|
||
reuse download --all
|
||
|
||
to make sure all the license are downloaded as text and stored in
|
||
LICENSES.
|
||
|
||
Finally with
|
||
|
||
reuse lint
|
||
|
||
we can confirm that our system is compliant:
|
||
|
||
# SUMMARY
|
||
|
||
* Bad licenses:
|
||
* Deprecated licenses:
|
||
* Licenses without file extension:
|
||
* Missing licenses:
|
||
* Unused licenses:
|
||
* Used licenses: CC0-1.0, MIT
|
||
* Read errors: 0
|
||
* Files with copyright information: 4 / 4
|
||
* Files with license information: 4 / 4
|
||
|
||
Congratulations! Your project is compliant with version 3.0 of the REUSE Specification :-)
|
||
|
||
Now if we want to make sure that any contribution to our repository is
|
||
REUSE compliant, we can add a CI job like this:
|
||
|
||
reuse:
|
||
image:
|
||
name: fsfe/reuse:latest
|
||
entrypoint: [""]
|
||
script:
|
||
- reuse lint
|
||
|
||
We can also produce a SPDX SBOM:
|
||
|
||
SPDXVersion: SPDX-2.1
|
||
DataLicense: CC0-1.0
|
||
SPDXID: SPDXRef-DOCUMENT
|
||
DocumentName: data
|
||
DocumentNamespace: http://spdx.org/spdxdocs/spdx-v2.1-3600566a-fa94-47b5-8efa-9059fc4e2d26
|
||
Creator: Person: Anonymous ()
|
||
Creator: Organization: Anonymous ()
|
||
Creator: Tool: reuse-0.13.0
|
||
Created: 2021-12-03T15:53:52Z
|
||
CreatorComment: <text>This document was created automatically using available reuse information consistent with REUSE.</text>
|
||
Relationship: SPDXRef-DOCUMENT describes SPDXRef-8540736e946d41cc9583084c3e2d52b9
|
||
Relationship: SPDXRef-DOCUMENT describes SPDXRef-20c74af6a1a744e3937396ceb3650119
|
||
Relationship: SPDXRef-DOCUMENT describes SPDXRef-b4bd5775f2f58809bef6b0e1ccf3ecdb
|
||
Relationship: SPDXRef-DOCUMENT describes SPDXRef-91b555fff6242005192e133969e3a18a
|
||
|
||
FileName: ./.gitignore
|
||
SPDXID: SPDXRef-8540736e946d41cc9583084c3e2d52b9
|
||
FileChecksum: SHA1: 43ca72cab972d025aeaa11d014427c9160f4031f
|
||
LicenseConcluded: NOASSERTION
|
||
LicenseInfoInFile: CC0-1.0
|
||
FileCopyrightText: <text>SPDX-FileCopyrightText: 2021 Ferris</text>
|
||
|
||
FileName: ./Cargo.lock
|
||
SPDXID: SPDXRef-20c74af6a1a744e3937396ceb3650119
|
||
FileChecksum: SHA1: ff0851f26122894e84fdd71281fde25b4b780bd5
|
||
LicenseConcluded: NOASSERTION
|
||
LicenseInfoInFile: MIT
|
||
FileCopyrightText: <text>SPDX-FileCopyrightText: 2021 Ferris</text>
|
||
|
||
FileName: ./Cargo.toml
|
||
SPDXID: SPDXRef-b4bd5775f2f58809bef6b0e1ccf3ecdb
|
||
FileChecksum: SHA1: aacee43aeb79bf0ce04c6254afdae22f9a909143
|
||
LicenseConcluded: NOASSERTION
|
||
LicenseInfoInFile: MIT
|
||
FileCopyrightText: <text>SPDX-FileCopyrightText: 2021 Ferris</text>
|
||
|
||
FileName: ./src/lib.rs
|
||
SPDXID: SPDXRef-91b555fff6242005192e133969e3a18a
|
||
FileChecksum: SHA1: f6e43e37ec5671f8f1b9995a0491dacf8d5dd1b0
|
||
LicenseConcluded: NOASSERTION
|
||
LicenseInfoInFile: MIT
|
||
FileCopyrightText: <text>SPDX-FileCopyrightText: 2021 Ferris</text>
|
||
|
||
ClearlyDefined
|
||
|
||
<img src="https://clearlydefined.io/static/media/logo.2bf3df78.svg" />
|
||
|
||
clearlydefined.io is an online service that automatically harvests and
|
||
allows curation of project information, with respect to the following
|
||
properties:
|
||
|
||
- Described: where is the source hosted, where can I file bugs, when
|
||
was which version released?
|
||
- Licensed: what licenses have been declared, what do they imply,
|
||
etc.?
|
||
- Secure: have there been vulnerabilities discovered with respect to a
|
||
specific project version?this is mostly under development
|
||
|
||
In essence, ClearlyDefined provides a database for many potential
|
||
sources (Git Repos, GitHub, PyPI or crates.io packages, …) and serves
|
||
the respective information. In Rust projects, we know all dependencies
|
||
of our software due to the Cargo.lock file. The cargo-clearlydefined
|
||
utility leverages this and queries all dependencies (specific versions)
|
||
for the associated information. The following command produces the table
|
||
below:
|
||
|
||
cargo clearlydefined --approve-osi --exclude=foss-rs --link -o markdown > cd.md
|
||
|
||
--------------------------------------------------------------------------------------------------
|
||
Name Version Declared License Score
|
||
license
|
||
------------ --------- ------------ --------- ----------------------------------------------------
|
||
autocfg 1.1.0 Apache-2.0 ✅ [88]
|
||
OR MIT
|
||
|
||
num-traits 0.2.15 MIT OR ✅ [53]
|
||
Apache-2.0
|
||
|
||
typenum 1.15.0 MIT OR ✅ [88]
|
||
Apache-2.0
|
||
|
||
uom 0.33.0 Apache-2.0 ✅ [87]
|
||
OR MIT
|
||
--------------------------------------------------------------------------------------------------
|
||
|
||
Discussion
|
||
|
||
- --exclude=foss-rs: We exclude the crate itself (we are in the
|
||
process of publishing it, so we won’t get a high enough score right
|
||
away).
|
||
- --approve-osi: We also specify that we want to approve OSI-approved
|
||
licenses.
|
||
- Finally, the ClearlyLicensed score is taken into account. A typical
|
||
threshold value is 75 (e.g. by the Eclipse Foundation), which means
|
||
it is sufficiently defined with respect to licensing
|
||
information (metric specification). With REUSE, we also get high
|
||
ClearlyLicensed scores as they check if all files have a
|
||
discoverable license.
|
||
|
||
tern
|
||
|
||
In the last decade, (Docker) containers have become a common exchange
|
||
format for software (in addition to binaries or virtual machines).
|
||
|
||
Their README comes with a explanation of how to analyze Docker
|
||
containers using tern itself in a Docker container as well. After setup,
|
||
you can do this:
|
||
|
||
docker run --rm ternd report -i debian:buster
|
||
|
||
which returns:
|
||
|
||
This report was generated by the Tern Project
|
||
Version: 2.10.1
|
||
|
||
Docker image: debian:buster:
|
||
Layer 1:
|
||
info: Layer created by commands: /bin/sh -c #(nop) ADD file:1fb366429a5df94c7ba642735d6aa77e201f90e0843de03721a6ad19f80ee4e0 in /
|
||
info: Found 'Debian GNU/Linux 10 (buster)' in /etc/os-release.
|
||
info: Retrieved package metadata using dpkg default method.
|
||
|
||
File licenses found in Layer: None
|
||
Packages found in Layer:
|
||
+------------------------+-------------------------+-----------------------------------------------+------------+
|
||
| Package | Version | License(s) | Pkg Format |
|
||
+------------------------+-------------------------+-----------------------------------------------+------------+
|
||
| adduser | 3.118 | | deb |
|
||
| apt | 1.8.2.3 | GPLv2+ | deb |
|
||
| base-files | 10.3+deb10u13 | | deb |
|
||
| base-passwd | 3.5.46 | GPL-2, PD | deb |
|
||
| bash | 5.0-4 | | deb |
|
||
| bsdutils | 1:2.33.1-0.1 | BSD-4-clause, LGPL-3+, MIT, LGPL-2+, | deb |
|
||
| | | LGPL-2.1+, public-domain, GPL-3+, GPL-2+, | |
|
||
| | | BSD-2-clause, BSD-3-clause, LGPL, GPL-2 | |
|
||
| coreutils | 8.30-3 | | deb |
|
||
| dash | 0.5.10.2-5 | | deb |
|
||
| debconf | 1.5.71+deb10u1 | BSD-2-clause | deb |
|
||
| debian-archive-keyring | 2019.1+deb10u1 | | deb |
|
||
| debianutils | 4.8.6.1 | | deb |
|
||
| diffutils | 1:3.7-3 | | deb |
|
||
| dpkg | 1.19.8 | public-domain-md5, GPL-2+, BSD-2-clause, | deb |
|
||
| | | public-domain-s-s-d, GPL-2 | |
|
||
| e2fsprogs | 1.44.5-1+deb10u3 | | deb |
|
||
| fdisk | 2.33.1-0.1 | BSD-4-clause, LGPL-3+, MIT, LGPL-2+, | deb |
|
||
| | | LGPL-2.1+, public-domain, GPL-3+, GPL-2+, | |
|
||
| | | BSD-2-clause, BSD-3-clause, LGPL, GPL-2 | |
|
||
| findutils | 4.6.0+git+20190209-2 | | deb |
|
||
| gcc-8-base | 8.3.0-6 | | deb |
|
||
| gpgv | 2.2.12-1+deb10u2 | LGPL-3+, LGPL-2.1+, permissive, RFC- | deb |
|
||
| | | Reference, CC0-1.0, GPL-3+, BSD-3-clause, | |
|
||
| | | TinySCHEME, Expat, GPL-3+ or BSD-3-clause | |
|
||
| grep | 3.3-1 | GPL-3+ | deb |
|
||
| gzip | 1.9-3+deb10u1 | | deb |
|
||
| hostname | 3.21 | | deb |
|
||
| init-system-helpers | 1.56+nmu1 | BSD-3-clause, GPL-2+ | deb |
|
||
| iproute2 | 4.20.0-2+deb10u1 | GPL-2 | deb |
|
||
| iputils-ping | 3:20180629-2+deb10u2 | | deb |
|
||
| libacl1 | 2.2.53-4 | LGPL-2+, GPL-2+ | deb |
|
||
| libapt-pkg5.0 | 1.8.2.3 | GPLv2+ | deb |
|
||
| libattr1 | 1:2.4.48-4 | LGPL-2+, GPL-2+ | deb |
|
||
| libaudit-common | 1:2.8.4-3 | LGPL-2.1, GPL-2 | deb |
|
||
| libaudit1 | 1:2.8.4-3 | LGPL-2.1, GPL-2 | deb |
|
||
| libblkid1 | 2.33.1-0.1 | BSD-4-clause, LGPL-3+, MIT, LGPL-2+, | deb |
|
||
| | | LGPL-2.1+, public-domain, GPL-3+, GPL-2+, | |
|
||
| | | BSD-2-clause, BSD-3-clause, LGPL, GPL-2 | |
|
||
| libbz2-1.0 | 1.0.6-9.2~deb10u2 | GPL-2, BSD-variant | deb |
|
||
| libc-bin | 2.28-10+deb10u1 | | deb |
|
||
| libc6 | 2.28-10+deb10u1 | | deb |
|
||
| libcap-ng0 | 0.7.9-2 | | deb |
|
||
| libcap2 | 1:2.25-2 | BSD-3-clause or GPL-2, GPL-2+, BSD-3-clause, | deb |
|
||
| | | BSD-3-clause or GPL-2+, GPL-2 | |
|
||
| libcap2-bin | 1:2.25-2 | BSD-3-clause or GPL-2, GPL-2+, BSD-3-clause, | deb |
|
||
| | | BSD-3-clause or GPL-2+, GPL-2 | |
|
||
| libcom-err2 | 1.44.5-1+deb10u3 | | deb |
|
||
| libdb5.3 | 5.3.28+dfsg1-0.5 | | deb |
|
||
| libdebconfclient0 | 0.249 | | deb |
|
||
| libelf1 | 0.176-1.1 | | deb |
|
||
| libext2fs2 | 1.44.5-1+deb10u3 | | deb |
|
||
| libfdisk1 | 2.33.1-0.1 | BSD-4-clause, LGPL-3+, MIT, LGPL-2+, | deb |
|
||
| | | LGPL-2.1+, public-domain, GPL-3+, GPL-2+, | |
|
||
| | | BSD-2-clause, BSD-3-clause, LGPL, GPL-2 | |
|
||
| libffi6 | 3.2.1-9 | | deb |
|
||
| libgcc1 | 1:8.3.0-6 | | deb |
|
||
| libgcrypt20 | 1.8.4-5+deb10u1 | | deb |
|
||
| libgmp10 | 2:6.1.2+dfsg-4+deb10u1 | | deb |
|
||
| libgnutls30 | 3.6.7-4+deb10u9 | LGPLv3+_or_GPLv2+, GPLv3+, Public domain. | deb |
|
||
| libgpg-error0 | 1.35-1 | LGPL-2.1+, g10-permissive, GPL-3+, | deb |
|
||
| | | BSD-3-clause, LGPL-2.1+ or BSD-3-clause | |
|
||
| libhogweed4 | 3.4.1-1+deb10u1 | other, LGPL-2+, LGPL-2.1+, GPL-2+ with | deb |
|
||
| | | Autoconf exception, public-domain, GPL-2+, | |
|
||
| | | GAP, GPL-2 | |
|
||
| libidn2-0 | 2.0.5-1+deb10u1 | LGPL-3+ or GPL-2+, LGPL-3+, GPL-3+, GPL-2+, | deb |
|
||
| | | Unicode | |
|
||
| liblz4-1 | 1.8.3-1+deb10u1 | BSD-2-clause, GPL-2, GPL-2+ | deb |
|
||
| liblzma5 | 5.2.4-1+deb10u1 | GPL-2, Autoconf, config-h, none, LGPL-2.1+, | deb |
|
||
| | | PD-debian, GPL-2+, PD, noderivs, probably-PD, | |
|
||
| | | permissive-fsf, permissive-nowarranty | |
|
||
| libmnl0 | 1.0.4-2 | LGPL-2.1, GPL-2+ | deb |
|
||
| libmount1 | 2.33.1-0.1 | BSD-4-clause, LGPL-3+, MIT, LGPL-2+, | deb |
|
||
| | | LGPL-2.1+, public-domain, GPL-3+, GPL-2+, | |
|
||
| | | BSD-2-clause, BSD-3-clause, LGPL, GPL-2 | |
|
||
| libncursesw6 | 6.1+20181013-2+deb10u2 | | deb |
|
||
| libnettle6 | 3.4.1-1+deb10u1 | other, LGPL-2+, LGPL-2.1+, GPL-2+ with | deb |
|
||
| | | Autoconf exception, public-domain, GPL-2+, | |
|
||
| | | GAP, GPL-2 | |
|
||
| libp11-kit0 | 0.23.15-2+deb10u1 | ISC, BSD-3-Clause, ISC+IBM, permissive-like- | deb |
|
||
| | | automake-output, same-as-rest-of-p11kit | |
|
||
| libpam-modules | 1.3.1-5 | | deb |
|
||
| libpam-modules-bin | 1.3.1-5 | | deb |
|
||
| libpam-runtime | 1.3.1-5 | | deb |
|
||
| libpam0g | 1.3.1-5 | | deb |
|
||
| libpcre3 | 2:8.39-12 | | deb |
|
||
| libseccomp2 | 2.3.3-4 | LGPL-2.1 | deb |
|
||
| libselinux1 | 2.8-1+b1 | | deb |
|
||
| libsemanage-common | 2.8-2 | | deb |
|
||
| libsemanage1 | 2.8-2 | | deb |
|
||
| libsepol1 | 2.8-1 | | deb |
|
||
| libsmartcols1 | 2.33.1-0.1 | BSD-4-clause, LGPL-3+, MIT, LGPL-2+, | deb |
|
||
| | | LGPL-2.1+, public-domain, GPL-3+, GPL-2+, | |
|
||
| | | BSD-2-clause, BSD-3-clause, LGPL, GPL-2 | |
|
||
| libss2 | 1.44.5-1+deb10u3 | | deb |
|
||
| libstdc++6 | 8.3.0-6 | | deb |
|
||
| libsystemd0 | 241-7~deb10u8 | LGPL-2.1+, CC0-1.0, public-domain, GPL-2+, | deb |
|
||
| | | Expat, GPL-2 | |
|
||
| libtasn1-6 | 4.13-3 | | deb |
|
||
| libtinfo6 | 6.1+20181013-2+deb10u2 | | deb |
|
||
| libudev1 | 241-7~deb10u8 | LGPL-2.1+, CC0-1.0, public-domain, GPL-2+, | deb |
|
||
| | | Expat, GPL-2 | |
|
||
| libunistring2 | 0.9.10-1 | GFDL-1.2+, LGPL-3+, MIT, GPL-3+, GPL-2+, | deb |
|
||
| | | GPL-3+ or GFDL-1.2+, LGPL-3+ or GPL-2+, | |
|
||
| | | FreeSoftware, GPL-2+ with distribution | |
|
||
| | | exception | |
|
||
| libuuid1 | 2.33.1-0.1 | BSD-4-clause, LGPL-3+, MIT, LGPL-2+, | deb |
|
||
| | | LGPL-2.1+, public-domain, GPL-3+, GPL-2+, | |
|
||
| | | BSD-2-clause, BSD-3-clause, LGPL, GPL-2 | |
|
||
| libxtables12 | 1.8.2-4 | custom, GPL-2, Artistic-2, GPL-2+ | deb |
|
||
| libzstd1 | 1.3.8+dfsg-3+deb10u2 | zlib, GPL-2+, BSD-3-clause, Expat, GPL-2, | deb |
|
||
| | | BSD-3-clause and GPL-2 | |
|
||
| login | 1:4.5-1.1 | | deb |
|
||
| mawk | 1.3.3-17+b3 | | deb |
|
||
| mount | 2.33.1-0.1 | BSD-4-clause, LGPL-3+, MIT, LGPL-2+, | deb |
|
||
| | | LGPL-2.1+, public-domain, GPL-3+, GPL-2+, | |
|
||
| | | BSD-2-clause, BSD-3-clause, LGPL, GPL-2 | |
|
||
| ncurses-base | 6.1+20181013-2+deb10u2 | | deb |
|
||
| ncurses-bin | 6.1+20181013-2+deb10u2 | | deb |
|
||
| passwd | 1:4.5-1.1 | | deb |
|
||
| perl-base | 5.28.1-6+deb10u1 | GPL-1+ or Artistic or Artistic-dist, GPL-1+ | deb |
|
||
| | | or Artistic, BSD-3-clause, SDBM-PUBLIC- | |
|
||
| | | DOMAIN, Artistic or GPL-1+ or Artistic-dist, | |
|
||
| | | GPL-1+ or Artistic, and Expat, HSIEH-BSD, | |
|
||
| | | BSD-3-clause-with-weird-numbering, ZLIB, | |
|
||
| | | BSD-3-clause-GENERIC, REGCOMP, and GPL-1+ or | |
|
||
| | | Artistic, GPL-1+ or Artistic, and | |
|
||
| | | BSD-4-clause-POWERDOG, GPL-3+-WITH-BISON- | |
|
||
| | | EXCEPTION, HSIEH-DERIVATIVE, RRA-KEEP-THIS- | |
|
||
| | | NOTICE, TEXT-TABS, GPL-1+ or Artistic, and | |
|
||
| | | BSD-3-clause-GENERIC, LGPL-2.1, Artistic-2, | |
|
||
| | | Unicode, BSD-4-clause-POWERDOG, GPL-1+, DONT- | |
|
||
| | | CHANGE-THE-GPL, CC0-1.0, GPL-1+ or Artistic, | |
|
||
| | | and Unicode, BZIP, REGCOMP, GPL-2+ or | |
|
||
| | | Artistic, GPL-2+, S2P, Artistic-dist, Expat, | |
|
||
| | | Artistic, Expat or GPL-1+ or Artistic | |
|
||
| sed | 4.7-1 | | deb |
|
||
| sysvinit-utils | 2.93-8 | GPL-2+ | deb |
|
||
| tar | 1.30+dfsg-6 | | deb |
|
||
| tzdata | 2021a-0+deb10u7 | | deb |
|
||
| util-linux | 2.33.1-0.1 | BSD-4-clause, LGPL-3+, MIT, LGPL-2+, | deb |
|
||
| | | LGPL-2.1+, public-domain, GPL-3+, GPL-2+, | |
|
||
| | | BSD-2-clause, BSD-3-clause, LGPL, GPL-2 | |
|
||
| zlib1g | 1:1.2.11.dfsg-1+deb10u2 | Zlib | deb |
|
||
+------------------------+-------------------------+-----------------------------------------------+------------+
|
||
=======================================================================================
|
||
|
||
###########################################
|
||
# Summary of licenses found in Container: #
|
||
###########################################
|
||
Public domain., LGPL, Artistic or GPL-1+ or Artistic-dist, ZLIB, LGPL-3+, public-domain-s-s-d, permissive-fsf, GPLv3+, config-h, BSD-variant, BSD-3-clause or GPL-2, public-domain, GPL-1+ or Artistic, and BSD-3-clause-GENERIC, TinySCHEME, BSD-3-clause and GPL-2, CC0-1.0, GPL-1+ or Artistic, and Unicode, BZIP, PD, noderivs, GFDL-1.2+, BSD-4-clause, g10-permissive, LGPL-3+ or GPL-2+, GPL-3+ or BSD-3-clause, GPL-1+ or Artistic, and BSD-4-clause-POWERDOG, HSIEH-DERIVATIVE, RRA-KEEP-THIS-NOTICE, GPL-2+ with distribution exception, MIT, BSD-4-clause-POWERDOG, LGPL-2.1+ or BSD-3-clause, zlib, other, REGCOMP, GAP, Expat, public-domain-md5, GPL-1+ or Artistic, BSD-3-clause, permissive-like-automake-output, BSD-3-clause-with-weird-numbering, probably-PD, Zlib, none, REGCOMP, and GPL-1+ or Artistic, FreeSoftware, ISC+IBM, BSD-3-Clause, GPL-2+ with Autoconf exception, TEXT-TABS, GPL-3+ or GFDL-1.2+, LGPL-2.1, Unicode, GPL-1+, GPL-2+, S2P, SDBM-PUBLIC-DOMAIN, Artistic, GPL-2, PD-debian, LGPL-2.1+, GPL-1+ or Artistic or Artistic-dist, permissive, Expat or GPL-1+ or Artistic, HSIEH-BSD, GPL-1+ or Artistic, and Expat, BSD-3-clause-GENERIC, RFC-Reference, GPLv2+, Autoconf, LGPL-2+, GPL-3+, custom, BSD-2-clause, Artistic-2, permissive-nowarranty, DONT-CHANGE-THE-GPL, LGPLv3+_or_GPLv2+, ISC, GPL-2+ or Artistic, Artistic-dist, BSD-3-clause or GPL-2+, same-as-rest-of-p11kit, GPL-3+-WITH-BISON-EXCEPTION
|
||
|
||
S07: Sample Solution
|
||
|
||
- Continuous Documentation: discussed in class.
|
||
|
||
- Landing Page: discussed in class.
|
||
|
||
- Changelog: discussed in class.
|
||
|
||
- APIs: 2.0.0
|
||
|
||
Summary
|
||
|
||
What did you learn?
|
||
|
||
- Why documentation is essential for dependable and, in particular,
|
||
maintainable software.
|
||
- How to leverage Rust and other tools to generate and publish
|
||
documentation for various purposes.
|
||
- What properties an API should have and how your implementation
|
||
choices have an impact on these.
|
||
- How you can make your project REUSE-able and ClearlyDefined —
|
||
providing software bills of material.
|
||
|
||
Where can you learn more?
|
||
|
||
- Documentation:
|
||
- cargo-doc
|
||
- GitLab Pages
|
||
- Software Engineering at Google: Ch. 10
|
||
- cheats.rs: Documentation
|
||
- Commit Virtual 2021: Use Gitlab to Deliver “Docs-as-Code”
|
||
Technical Documentation
|
||
- APIs:
|
||
- Rust for Rustaceans: Ch. 04
|
||
- Rust API Guidelines
|
||
- Semantic Versioning (SemVer)
|
||
- Semantic Versioning Compatibility
|
||
- “Type-Driven API Design in Rust” by Will Crichton
|
||
- Software Bills of Material:
|
||
- Why the World Needs a Software Bill of Materials Now
|
||
- What is a Software Bill of Material
|
||
- Understanding SBOM Standards
|
||
- SBOMs Supporting Safety Critical Software
|
||
|
||
W07: Work Sheet
|
||
|
||
Continuous Documentation
|
||
|
||
Re-use the FizzBuzz project created in U02 and extend it by:
|
||
|
||
- rudimentary documentation for the fizzbuzz function and the
|
||
library’s main module
|
||
- a CI job that produces the documentation
|
||
- GitLab pages to host the documentation (<pages-url>/fizzbuzz)
|
||
|
||
Landing Page and Handbook
|
||
|
||
Again, re-use the FizzBuzz project and extend it by:
|
||
|
||
- rudimentary mdbook configuration, allowing you to write
|
||
supplementary text for it
|
||
- write a page that explains how FizzBuzz works and how one can setup
|
||
your code (git clone, cargo install)
|
||
- use GitLab CI and pages to generate and host this as the landing
|
||
page (<pages-url>/)
|
||
|
||
Changelog
|
||
|
||
Use git-cliff to generate a CHANGELOG.md for FizzBuzz. If you picked
|
||
proper messages in the respective unit, you should get proper commit
|
||
groups for the [Unreleased] version of FizzBuzz.
|
||
|
||
APIs and Versions
|
||
|
||
Assume version 1.3.1 of your crate has the following code:
|
||
|
||
pub struct Engine {
|
||
pub temperature: f64,
|
||
pub rotations: u64,
|
||
}
|
||
|
||
Now you add pub kind: EngineKind, with pub enum EngineKind to tell
|
||
electric from combustion engines apart. What should the new version of
|
||
your crate be?
|
||
|
||
Coding
|
||
|
||
General Coding Process
|
||
|
||
Now that we know how error control generally works to improve
|
||
reliability of a system, we look at the process of coding information in
|
||
detail. This process looks like this:
|
||
|
||
{{#include img/CodingProcess.svg }}
|
||
|
||
We have the following variables:
|
||
|
||
- \(i I\): Information (Data)
|
||
- \(I\) is the information alphabet.
|
||
- Example: a set of symbols like { START, STOP, RESUME, EXIT }.
|
||
- \(r R^+\): Received Data
|
||
- \(R\) is the channel alphabet, i.e. each word represents a
|
||
receivable message.
|
||
- \(r\) is non-empty word from \(R\)
|
||
- Example: binary numbers \(R = \{0,1\}\).
|
||
- \(c C\): Coded Data
|
||
- \(C\) is the code word alphabet. Code words can be received, but
|
||
not everything that can be received is a code word, i.e. \(C
|
||
R^+\).
|
||
- The encoding function is \(encode : I C\).
|
||
- The correction function is \(correct : R^+ C\). Note that only
|
||
for perfect codes, this mapping is total.
|
||
- The decoding function is \(decode : C I\), i.e. the inverse of
|
||
\(encode\).
|
||
- \(f R^+\): Error
|
||
- Added by noise etc.
|
||
- \(s\): Syndrome
|
||
- Used for error detection and correction.
|
||
- \(o\): Error Locator
|
||
- Derived from \(s\) to get \(f\).
|
||
|
||
Definitions
|
||
|
||
First, we have to define terms we have used loosely in the previous
|
||
sections in a clearer way:
|
||
|
||
- Information: The actual data we want to transmit.
|
||
- Code: The mapping between information words and code words.
|
||
- Redundancy: Parts of the original information cleverly…
|
||
- (re-) arranged,
|
||
- combined, or
|
||
- otherwise mathematically transformed and
|
||
- transmitted.
|
||
|
||
Block Codes
|
||
|
||
While there are various ways to do coding, we only concentrate on block
|
||
codes in this unit.
|
||
|
||
A block code transforms \(k\) information symbols to \(n\) code symbols,
|
||
so the code rate is \(\).(with \(r=(n-k)\) redundancy symbols)
|
||
|
||
We define the Hamming Distance \(d\) as the difference between two code
|
||
words. The minimal distance between any two code words (\(d_{min}\))
|
||
gives the distance of the code. This distance gives us the capability of
|
||
the code. A code with \(d_{min}\) can:
|
||
|
||
- Detect \(e\) errors, if \(d_{min} e+1\)
|
||
|
||
- Correct \(e\) errors, if \(d_{min} 2e+1\)
|
||
|
||
Hamming Code
|
||
|
||
One example of a block code is a Hamming Code (HC). The HC operates on
|
||
symbols that are single bits. We denote it with \(HC(n,k)\). HC make use
|
||
of so-called parity bits which are:
|
||
|
||
- 0 for even number of bits set to 1 in block.
|
||
- 1 else.
|
||
|
||
These \(r\) parity bits are at positions \(2^x\) (i.e., 1, 2, 4, 8, …)
|
||
|
||
The syndrome \(s\) is used to recalculate parity including parity bits.
|
||
The syndrome both checks for an error and locates it:
|
||
|
||
- \(s=0\): no error.
|
||
- \(s \): syndrome value is the location of the error.
|
||
|
||
More information about the code generation algorithm can be found
|
||
here.
|
||
|
||
Hamming Code | Example HC(7,4)
|
||
|
||
In this example our alphabets are:
|
||
|
||
- \( = [0, 1]\), note that in this case \(+\) and \(-\) become XOR.
|
||
- \(I = ^4\)
|
||
- \(r ^7\)
|
||
|
||
In the following, we give a worked example for a Hamming Code.
|
||
|
||
Transmitter (\(x = [x_1 … x_n]\))
|
||
|
||
Encode 4 bits \(i = 1001 = c_3 c_5 c_6 c_7\).
|
||
|
||
Parities:
|
||
|
||
\(p_1 = (c_3 + c_5 + c_7) = 0 = c_1\)
|
||
|
||
\(p_2 = (c_3 + c_6 + c_7) = 0 = c_2\)
|
||
|
||
\(p_3 = (c_5 + c_6 + c_7) = 1 = c_4\)
|
||
|
||
Result:
|
||
|
||
\(c = [ 0 0 1 ]\)
|
||
|
||
\(f = [ 0 0 0 0 0 1 0 ]\)
|
||
|
||
Receiver
|
||
|
||
\(r = [ 0 0 1 1 0 1 ] 0 1\) (wrong!)
|
||
|
||
Syndromes:
|
||
|
||
\(s_1 = (p_1 + c_3 + c_5 + c_7) = 0\)
|
||
|
||
\(s_2 = (p_2 + c_3 + c_6 + c_7) = 1\)
|
||
|
||
\(s_3 = (p_3 + c_5 + c_6 + c_7) = 1\)
|
||
|
||
\(s = [ 0 1 1 ] \) , hence error occured
|
||
|
||
Location: \(s_1 + s_2 + s_3 = 6 = o f\)
|
||
|
||
\(r’ = r - f = [ 0 0 1 1 001 ] \)
|
||
|
||
Advanced Codes
|
||
|
||
A hamming code is a rather simple coding approach. There is a vast
|
||
amount of literature on other coding schemes, for example:
|
||
|
||
BCH Codes, where we add multiple Hamming Codes together to get more
|
||
correction capabilities.
|
||
|
||
Reed-Solomon Codes, which work on bytes rather than single bits. This
|
||
code is able to correct full bytes, independent of how many bit errors
|
||
happened within it. This is ideal for computer systems, with 8-bit
|
||
symbols (byte).
|
||
|
||
With Code Concatenation multiple codes are used inside each other. Doing
|
||
this efficiently is a complex topic on its own.
|
||
|
||
Bursts and how to get rid of them
|
||
|
||
Remember, bursts are multiple consecutive errors. Assume the following:
|
||
|
||
The information we want to send is: \([ 1 0 0 1 1 0 0 1]\), which yields
|
||
the following code: \([ 0 0 1 1 0 0 1 0 0 1 1 0 0 1]\) (HC(7,4)).
|
||
Assume the channel causes two errors in different ways:
|
||
|
||
- a) \([ 0 0 1 1 0 1 1 0 1 1 1 0 0 1]\)
|
||
|
||
- b) \([ 0 0 1 1 1 1 1 0 0 1 1 0 0 1]\)
|
||
|
||
For each option, think about whether you can correct the errors or not?
|
||
|
||
For a), we can correct as there is 1 error per block. For b), we cannot
|
||
correct as 2 errors are in the 1st block, which exceeds the correction
|
||
capabilities of HC(7,4). Now you might wonder if we can do something
|
||
about the second case, where we have enough correction capabilities but
|
||
the errors are distributed over blocks in an unfortunate way.
|
||
|
||
Interleaving
|
||
|
||
As you might have guessed, there is such an approach and it is called
|
||
interleaving. The basic idea is to scramble bit positions and spread
|
||
adjacent symbols apart. This helps with burst errors, but it is also
|
||
time-consuming, as data symbols have to be aggregated at transmitter and
|
||
receiver before sending or delivering.
|
||
|
||
An interleaver is parameterized by picking numbers for columns \(C\) and
|
||
rows \(R\). After interleaving, the new distance between originally
|
||
adjacent symbols (within block) becomes \(R\). In between blocks, the
|
||
distance is different.
|
||
|
||
At the interleaver, we fill row-wise and read column-wise:
|
||
|
||
\(i = [ 0, 1, 2, 3, 4, 5, 6, 7]\)
|
||
|
||
0
|
||
1
|
||
2
|
||
3
|
||
4
|
||
5
|
||
6
|
||
7
|
||
\(c = [0, 4, 1, 5, 2, 6, 3, 7]\)
|
||
|
||
At the deinterleaver, we fill column-wise, read row-wise:
|
||
|
||
\(c = [0, 4, 1, 5, 2, 6, 3, 7]\)
|
||
|
||
0
|
||
1
|
||
2
|
||
3
|
||
4
|
||
5
|
||
6
|
||
7
|
||
\(i = [ 0, 1, 2, 3, 4, 5, 6, 7]\)
|
||
|
||
Interleaving Example
|
||
|
||
Information: \([ 1 0 0 1 1 0 0 1 ]\)
|
||
|
||
Code: \([ 0 0 1 1 0 0 1 0 0 1 1 0 0 1]\) (HC(7,4))
|
||
|
||
Transmitting (0 = padding)
|
||
|
||
Modify code before sending: [ 0 0 1 1 0 0 1 0 0 1 1 0 0 1 0 0 ]
|
||
|
||
0
|
||
0
|
||
1
|
||
1
|
||
0
|
||
0
|
||
1
|
||
0
|
||
0
|
||
1
|
||
1
|
||
0
|
||
0
|
||
1
|
||
0
|
||
0
|
||
[ 0 0 0 0 0 0 1 1 1 1 1 0 1 0 0 0 ]
|
||
|
||
Channel
|
||
|
||
Error occurs: [ 0 0 0 0 0 1 0 1 1 1 1 0 1 0 0 0 ]
|
||
|
||
Receiving
|
||
|
||
0
|
||
0
|
||
1
|
||
1
|
||
0
|
||
1
|
||
1
|
||
0
|
||
0
|
||
0
|
||
1
|
||
0
|
||
0
|
||
1
|
||
0
|
||
0
|
||
0 0 1 1 0 1 1 0 0 0 1 0 0 1 0 0
|
||
|
||
Now, the two bit error burst becomes correctable!
|
||
|
||
Coding Project
|
||
|
||
In P02, you must implement an FEC coding scheme by hand. Note that a
|
||
Hamming Code will suffice to pass the project. At the same time,
|
||
interleaving and Reed-Solomon codes can improve your performance and are
|
||
worth learning / applying — so don’t hesitate to try them out.
|
||
|
||
Error Control
|
||
|
||
As we discussed previously, faults cannot be completely avoided. In a
|
||
similar tone, a communication channel or computation is never 100%
|
||
guaranteed to be correct. Therefore, it is necessary to (a) know which
|
||
errors can happen, (b) how to detect them, and (c) how to prevent or
|
||
handle them.
|
||
|
||
Error Types
|
||
|
||
First, we look at what types of errors can happen in communication
|
||
between two systems:
|
||
|
||
{{#include img/bit_error_packet_erasure.svg }}
|
||
|
||
- Bit Errors are caused by physical problems (noise, etc.).
|
||
|
||
- Packet Erasures are caused by
|
||
|
||
- physical problems (e.g. shadowing in wireless media) or
|
||
- logical problems (e.g. buffers are filled and newly arriving
|
||
packets must be dropped).
|
||
|
||
- Delayed Packets caused by
|
||
|
||
- differing paths across a network,
|
||
- network congestion, or
|
||
- insufficient priority compared to other network traffic.
|
||
|
||
Error Distributions
|
||
|
||
Now that we know about the different types of errors, it is also
|
||
essential to look at how likely errors are — in particular how they are
|
||
distributed.
|
||
|
||
{{#include img/sporadic_burst_error.svg }}
|
||
|
||
Informally speaking, we talk about sporadic errors that happen once in a
|
||
while and only affect single or small-scale units of data. Burst errors
|
||
are instead multiple consecutive errors that indicate some error
|
||
correlation. This can be due to, e.g., a scratch in a CD (multiple bits
|
||
affected) or an intermittent link failure (multiple packets affected).
|
||
|
||
General Error Control
|
||
|
||
In accordance with the Shannon model of communication, the function of
|
||
error control is split between the transmitter and the receiver. The
|
||
transmitter has the task of providing redundancy, i.e. repeating some of
|
||
the information or coding the information into a different form to be
|
||
transmitted. The receiver has multiple tasks:
|
||
|
||
- First, it has to detect if there was an error. If this is the case,
|
||
it has two options:
|
||
- Hand over the received data to a correction task, or
|
||
- Discard the data with the erroneous data. This can transform
|
||
single bit errors into packet erasures.
|
||
- Second, if correction is attempted, the receiver locates the error.
|
||
Using this information, it searches for the closest valid code
|
||
symbol to the received non-valid symbol. Closest in this context
|
||
means that this symbol has the highest likelihood, assuming random
|
||
noise on the channel.
|
||
|
||
For the correction, there are two common approaches on how the receiver
|
||
can get access to redundant information in order to correct the error:
|
||
|
||
- Proactive, also known as forward error coding (FEC)
|
||
- Reactive, also known as automated repeat request (ARQ)
|
||
|
||
Proactive
|
||
|
||
In the proactive approach, the transmitter anticipates that some
|
||
information is lost on the channel. Therefore, it transmits more data
|
||
(i.e. data + redundancy) to increase the likelihood of enough data
|
||
arriving at the receiver to allow for decoding the original information.
|
||
|
||
There are multiple schemes to add this redundancy:
|
||
|
||
- Redundancy Packet: Send additional packets used to recover erasures.
|
||
|
||
- Robust Packet: Send packets with included redundancy to recover bit
|
||
flips.
|
||
|
||
- Piggy-Back: Include digest of packet n+1 in packet n to conceal
|
||
erasures.
|
||
|
||
This approach has the benefit that correction can be attempted without
|
||
waiting for additional redundancy to arrive (as it is sent immediately
|
||
by the transmitter). The drawbacks are:
|
||
|
||
- Data rate increases statically.Independent of actual errors.
|
||
- Picking redundancy amount is tricky.Too much: Waste capacity. Too
|
||
little: Fail regularly.
|
||
- De- and Encoding takes time.Coding process to generate redudancy.
|
||
Aggregating application data to allow efficient block coding.
|
||
|
||
Reactive
|
||
|
||
A different approach is the reactive approach that is, for example,
|
||
employed in the Transmission Control Protocol (TCP). This approach is
|
||
especially efficient when transmission is fast, as:
|
||
|
||
- time consumption for correction is transmission time + repetition
|
||
timeout length, and
|
||
- if no errors occur, there is no added overhead by ARQ (it is a
|
||
reactive scheme).
|
||
|
||
The approach is problematic when transmission takes long, as:
|
||
|
||
- spending a second operation time can exceed limits.
|
||
- retransmission timers may expire too early and redo transmission
|
||
without need.
|
||
|
||
Nobody is Perfect
|
||
|
||
Finally, even when we put these error-control mechanisms in place, it is
|
||
still possible that we are not successful in recovering the
|
||
transmitters’s information due to two possible reasons:
|
||
|
||
Decoding Failure
|
||
|
||
A decoding failure happens when more than one symbol is closest to the
|
||
received one. Mathematically speaking, this means that the code’s
|
||
equation system cannot be solved and the correction cannot decide for
|
||
one of the options. Using perfect codes avoids this completely.
|
||
|
||
Decoding Error
|
||
|
||
In contrast, a decoding error is when a symbol is changed on the channel
|
||
to an extent that a different code symbol appears closer. Hence, the
|
||
correction happens but yields information that is different from the one
|
||
transmitted. In this case, one cannot blame the coding system for
|
||
providing a wrong result — rather one must change the coding system to
|
||
provide higher correction capabilities.
|
||
|
||
Detection vs. Correction
|
||
|
||
The rule of thumb to favour detection over correction or vice-versa is:
|
||
|
||
- Detection is better on reliable media.The common case is successful
|
||
transmission. Only retransmit in the rare cases (save data rate).
|
||
- Correction is better on unreliable media.The common case is
|
||
unsuccessful transmission. Always transmit more (save latency for
|
||
retransmissions).
|
||
|
||
You can find examples for this in communication protocols:
|
||
|
||
- Error detection is used in, e.g., Ethernet (802.3) or CAN bus
|
||
- Error correction is used in, e.g., WLAN (802.11), LTE, UMTS
|
||
|
||
A Quantom of Information Theory
|
||
|
||
Communication Systems
|
||
|
||
In information theory, communication systems are typically described
|
||
according to a general model developed by Claude Elwood Shannon:
|
||
|
||
{{#include img/Shannon_communication_system.min.svg}}
|
||
|
||
Source: Wikipedia
|
||
|
||
Bits
|
||
|
||
A bit is the basic unit of information in computing and digital
|
||
communications. The word is a portmanteau of binary digit. A bit can
|
||
only have two values: 0 or 1. This can be compared with a light bulb
|
||
that can be either on or off. In information theory, you also find the
|
||
unit 1 Sh (Shannon). Bit is often used for data and Shannon for
|
||
information.
|
||
|
||
Information and Entropy
|
||
|
||
When talking about the information content of some message, we use
|
||
information (measured in bits) to describe it. If we talk about the
|
||
information involved in a random process, we often use the term entropy
|
||
(expected information). The term has its origin in thermodynamics and
|
||
describes the disorder in a system. Thanks to the second law of
|
||
thermodynamics, ultimate chaos is inevitable!
|
||
|
||
In information theory, Shannon described that, in principle, the
|
||
receiver attempts to infer which message has been sent. The receiver is
|
||
uncertain about this (before receiving as well as after), but
|
||
anticipates certain information. The entropy then describes a) how
|
||
uncertain she is before the reception, b) how uncertain she is after the
|
||
reception and hence c) how much uncertainty was removed by the reception
|
||
(i.e. a - b). Information (and in turn entropy) also depends on the
|
||
number and likelihood of different options (e.g. sides of a die or a die
|
||
showing a certain number). This set of options is called \(\) and has
|
||
\(N=||\) elements. In this case, \(_2(N)\) gives the number of bits
|
||
required to identify these options by a unique binary number (and is
|
||
optimal, if they are uniformly distributed). A single option \(x\) has
|
||
the information \(-_2(p_x)\). Intuitively, we have the following
|
||
relationships:
|
||
|
||
- Likely Option \(\) Low Information.e.g. white pixel of a document
|
||
scan
|
||
|
||
- Unlikely Option \(\) High Information.e.g. traffic light is yellow
|
||
light
|
||
|
||
- More Options \(\) Higher Information.e.g. traffic light vs. 7-segm.
|
||
display
|
||
|
||
The (discrete!) entropy of the process \(H(X)\) can be quantified as the
|
||
expected information content of \(X\) and is measured in bits like this:
|
||
\(- _{x X} p_x _2(p_x)\).
|
||
|
||
Case Study: Inefficiency of Textual Protocols
|
||
|
||
Let’s look at a textual protocol involving a command field, which can be
|
||
one of the following:
|
||
|
||
- Retrieve (GET),
|
||
- Create (ADD),
|
||
- Modify (MOD), and
|
||
- Delete (DEL)
|
||
|
||
How many bits are used for the textual and binary solution?
|
||
|
||
- For textual, we have 3 characters for each command and one ASCII
|
||
char needs 7 bits (often even 1 byte, but let’s be fair). Hence, the
|
||
result is: \(3 bit = 21 bit\)
|
||
|
||
- For binary, the 4 different commands (0,1,2,3) mean that we have 4
|
||
values that require two bits each (00, 01, 10, 11). Hence, the
|
||
result is \(2 bit\)
|
||
|
||
Now you might ask, why textual protocols are used at all? The answer is
|
||
that efficiency is not your only parameter! Compression can bring
|
||
efficiency without requiring explicit mapping from information to binary
|
||
sequences.
|
||
|
||
U08: Working Reliably with Codes
|
||
|
||
In U07, we told you that you have to write a software library… but what
|
||
should it do? Here we are with the following challenge: DSys products
|
||
communicate with each other using various communication means. Some of
|
||
them are even wireless, which is known to be not as reliable as cables
|
||
(reliability being one dependability dimension). As the, what network
|
||
engineers call, “lower layers” are built out of off-the-shelf WLAN
|
||
components, you can only change your communication protocol.
|
||
|
||
To prepare you for this task, we start with a little bit of information
|
||
theory, continue with the concept of error control to increase
|
||
reliability, and deal with actual codes that allow you to detect and
|
||
correct bit errors.
|
||
|
||
S08: Sample Solution
|
||
|
||
Information Theory
|
||
|
||
- Message (the information content), signal (the encoded information).
|
||
Transmitter encodes (turns information symbols into code symbols),
|
||
adds redundancy, transforms symbol into a transmissible form
|
||
(e.g. electromagnetic waves), …
|
||
|
||
- -1/6 * log2(1/6) = 0.43082708345
|
||
|
||
- Encoding Die Throw:
|
||
|
||
- ASCII: 1 byte = 7 or 8 bits;
|
||
- Binary: 3 bits (0 .. 7 -> 1 .. 6)
|
||
- 3 / 8 = 37.5% (reduced to)
|
||
|
||
- Traffic Light:
|
||
|
||
- Four phases: red active in two, yellow active in two, green
|
||
active in one
|
||
- Probabilities: Red 1/2, Yellow 1/2, Green 1/4
|
||
- Information: 1, 1, 2,
|
||
- Entropies: 0.5, 0.5, 0.5 => Total Entropy 1.5
|
||
- US Traffic:
|
||
- 1/3, 1/3, 1/3 -> 0.5283
|
||
- Total Entropy 1.5849
|
||
- US traffic lights are more “surprising” hence more dangerous
|
||
if you ask me
|
||
|
||
Error Control
|
||
|
||
- Proactive should be used if error likelyhood and recovery time are
|
||
high.
|
||
|
||
- Some single bit errors might not be correctable, leading to a
|
||
discard of the larger unit of information.
|
||
|
||
- They can’t.
|
||
|
||
- Failure: cannot correct, Error: can correct, but do not arrive at
|
||
true value
|
||
|
||
Coding
|
||
|
||
- HC(7,4)
|
||
- Data: [0101]
|
||
- Encoded: [0100101]
|
||
- Error: [0000010]
|
||
- Received: [0100111]
|
||
- Syndrome: [011] -> 6 -> Error [0000010]
|
||
- Corrected: [0100101]
|
||
- Interleaver (P = Padding): [0, 3, 1, 4, 2, 5, 6, P, 7, P, P, P]
|
||
|
||
Summary
|
||
|
||
What did you learn?
|
||
|
||
- Information theory is relevant for building reliable communicating
|
||
systems.
|
||
- Errors come in various types and distributions and you should know
|
||
about them to tune your error control approach.
|
||
- Coding schemes generate redundancy based on data — their detection
|
||
and correction capabilities differ, so they must be chosen wisely.
|
||
|
||
Where can you learn more?
|
||
|
||
- Information theory and Coding theory are good starting points.
|
||
- Error Correction Code (ECC) Memory is another application of coding
|
||
that is more computation- than communication-centered.
|
||
|
||
W08: Work Sheet
|
||
|
||
Information Theory
|
||
|
||
- Explain the difference between the information source’s message and
|
||
the signal. Describe what different things can happen in the
|
||
transmitter.
|
||
|
||
- You have a 6-sided fair die. What is the entropy of throwing a 6?
|
||
|
||
- You encode the result of a die throw in ASCII text (1, 2, …, 6) and
|
||
binary. How many bits does the binary encoding need? Encoding binary
|
||
reduces the used bits to how many percent of the textual encoding?
|
||
|
||
<img src="./img/amp_seq.gif" alt="Traffic Light" />
|
||
|
||
- Assume you have a German traffic light such as in the animation on
|
||
the right. Assume for now that the different light phases are of
|
||
equal duration (as in the animation). Calculate both information and
|
||
entropy of seeing each individual light (red, yellow, green) being
|
||
“on”. Does a US traffic light have higher or lower entropy than the
|
||
German ones (assuming equal duration)?
|
||
|
||
Error Control
|
||
|
||
- Explain in which cases you should prefer proactive over reactive
|
||
error control.
|
||
|
||
- Explain how single bit errors can turn into packet erasures.
|
||
|
||
- Explain how an overly delayed packet can be told apart from a lost
|
||
packet.
|
||
|
||
- Explain the difference between a decoding failure and a decoding
|
||
error.
|
||
|
||
Coding
|
||
|
||
- Assume you use a \(HC(7,4)\) and the following bit sequences have
|
||
this form \([x_1, … x_n]\). Encode the 4 bits \([[0101]\). When you
|
||
transmit, the following error happens on the channel \([0000010]\).
|
||
Compute the syndrome and show how it detects and locates the error.
|
||
|
||
- Assume you have the data sequence \([0, 1, 2, 3, 4, 5, 6, 7]\) and
|
||
you feed it into a 2 x 3 interleaver. Compute the resulting data
|
||
sequence after the interleaver.
|
||
|
||
Binary Trees
|
||
|
||
(Source: Programming Rust)
|
||
|
||
In this section, we cover binary trees, i.e. trees where elements have
|
||
0–2 children. Children can be left or right of the parent. Furthermore,
|
||
a binary search tree has the property that elements left of a parent are
|
||
<= the parent element and right of the parent are >.
|
||
|
||
Declaration
|
||
|
||
Here is how we declare types for binary trees:
|
||
|
||
enum BinaryTree<T> {
|
||
Empty,
|
||
NonEmpty(Box<TreeNode<T>>),
|
||
}
|
||
|
||
struct TreeNode<T> {
|
||
element: T,
|
||
left: BinaryTree<T>,
|
||
right: BinaryTree<T>,
|
||
}
|
||
|
||
Note that the NonEmpty variant carries a Box. Why is this the case?
|
||
Assume we would use the following:
|
||
|
||
enum BinaryTree<T> {
|
||
Empty,
|
||
NonEmpty(TreeNode<T>),
|
||
}
|
||
|
||
What can go wrong?
|
||
|
||
In fact, Rust complains because it cannot figure out the memory size of
|
||
BinaryTree as we now made it infinite. Why?
|
||
|
||
Enums are sized according to the largest type they contain. So
|
||
BinaryTree<T> has the size of TreeNode<T> plus the space to store that
|
||
it is the NonEmpty variant. Now how big is TreeNode<T>? The node
|
||
contains up to two BinaryTrees which again could, in the worst case,
|
||
contain a TreeNode<T>. So we create a recursive dependency. With Box, we
|
||
introduce a pointer with a fixed size that points to a heap-allocated
|
||
value and its size. This means that the BinaryTree<T> enum only carries
|
||
the size of a pointer.
|
||
|
||
Population
|
||
|
||
Now with the data structure at hand, let’s implement our first
|
||
algorithm, namely a way to fill (or populate) the tree:
|
||
|
||
impl<T: Ord> BinaryTree<T> {
|
||
fn insert(&mut self, value: T) {
|
||
match self {
|
||
BinaryTree::Empty => {
|
||
*self = BinaryTree::NonEmpty(Box::new(TreeNode {
|
||
element: value,
|
||
left: BinaryTree::Empty,
|
||
right: BinaryTree::Empty,
|
||
}))
|
||
},
|
||
BinaryTree::NonEmpty(ref mut node) => {
|
||
if value <= node.element {
|
||
node.left.insert(value);
|
||
} else {
|
||
node.right.insert(value);
|
||
}
|
||
}
|
||
}
|
||
}
|
||
}
|
||
|
||
Here we see how two concepts play nicely together when working with tree
|
||
data structures: match expressions and recursion.
|
||
|
||
First, we split the handling of two different cases: a) empty node and
|
||
b) non-empty node. If empty, we start with a newly created node. If
|
||
non-empty, we recurse with adding to either left or right, depending on
|
||
the value to be inserted. Thereby, we ensure the order-property of the
|
||
tree is maintained.
|
||
|
||
Width
|
||
|
||
Now, it’s time to compute things while working our way through the tree.
|
||
The width gives the number of leaf elements the tree contains:
|
||
|
||
fn width(&self) -> u32 {
|
||
match self {
|
||
Self::Empty => 0,
|
||
Self::NonEmpty(t) => u32::max(1, t.left.width() + t.right.width()),
|
||
}
|
||
}
|
||
|
||
Hence, an empty tree has no leaves. A non-empty tree either has a width
|
||
of 1 (it is a leaf) or the combined width of its left and right
|
||
children. Take a piece of paper and validate that all four cases (leaf,
|
||
non-leaf with left child, with right child, with two children) yield the
|
||
correct answer.
|
||
|
||
Projecting
|
||
|
||
Finally, a common use case for trees is to traverse them in a particular
|
||
order, e.g. to compute a projection (i.e. enumerate the elements in said
|
||
order). Here is how to compute a preorder (root, left sub-tree, right
|
||
sub-tree):
|
||
|
||
fn project_preorder<'a>(&self) -> PreOrderProjection<T> {
|
||
PreOrderProjection { stack: vec![self] }
|
||
}
|
||
|
||
struct PreOrderProjection<'a, T> {
|
||
stack: Vec<&'a BinaryTree<T>>,
|
||
}
|
||
|
||
impl<'a, T> Iterator for PreOrderProjection<'a, T>
|
||
where
|
||
T: Copy,
|
||
{
|
||
type Item = T;
|
||
|
||
fn next(&mut self) -> Option<Self::Item> {
|
||
let root = self.stack.pop();
|
||
match root {
|
||
None => None,
|
||
Some(t) => match t {
|
||
BinaryTree::Empty => None,
|
||
BinaryTree::NonEmpty(t) => {
|
||
if let BinaryTree::NonEmpty(_r) = &t.right {
|
||
self.stack.push(&t.right);
|
||
}
|
||
if let BinaryTree::NonEmpty(_l) = &t.left {
|
||
self.stack.push(&t.left);
|
||
}
|
||
Some(t.element)
|
||
}
|
||
},
|
||
}
|
||
}
|
||
}
|
||
|
||
Note that we implement a custom struct that works as a projection of the
|
||
tree. It implements Iterator so that consuming code can call it as it
|
||
would call other iterators.
|
||
|
||
Fault Trees
|
||
|
||
When analyzing systems for safety and reliability, fault trees have
|
||
shown great performance and are in broad use since their invention (in
|
||
1961 by Bell Laboratories). Standards such as
|
||
|
||
- IEC 61508-3 (electrical/electronic/programmable systems),
|
||
- ISO 26262-10 (automotive),
|
||
- EN 50126-2 (rail) and
|
||
- ISO 14971 (medical)
|
||
|
||
recommend the use of fault tree analysis to check for the safety of
|
||
systems.
|
||
|
||
Fault Trees (FT) and Algorithms
|
||
|
||
Fault trees serve multiple purposes. Fault trees… * trace back
|
||
influences to a given hazard or failure, * help to find all
|
||
influences, * graphically explain causal chains leading to the hazard, *
|
||
can be used to find event combinations that are sufficient to cause
|
||
hazard, orqualitative analysis: systematic investigation for
|
||
combinations * can be used to calculate hazard probability from
|
||
influence probabilitiesquantitative analysis: systematic investigation
|
||
for likelihoods
|
||
|
||
Originally, fault trees were only boolean trees, but over time various
|
||
different forms evolved. In this section, we focus on Boolean and
|
||
Extended Boolean Fault Trees. Additional forms and analysis techniques
|
||
are left for you to explore.
|
||
|
||
Boolean Fault Trees
|
||
|
||
The concept underlying a boolean fault tree is straightforward. The tree
|
||
is built up of nodes and edges, where nodes are basic events (the leaves
|
||
of the tree) that can happen or logic gates that combine multiple basic
|
||
or intermediate events (the non-leaves of the tree). When evaluating the
|
||
tree, events can be present (true) or non-present (false) and edges
|
||
propagate this information upward. Typically, at least the Or and And
|
||
gate are supported that combine the truth value of their lower events
|
||
into a new one (using the respective boolean operation). Eventually, the
|
||
root is the top-level event in question and evaluation of the tree leads
|
||
to either true or false for this top-level event. The tree itself has
|
||
failure logic, i.e. the top-level becoming true means it failed, as is
|
||
the case for other events. So, an event failing means that the value
|
||
changes from false to true. This is in contrast to a success tree, where
|
||
true means something is successful or present. In essence, fault trees
|
||
are equivalent with negation-free boolean formulas (only And/Or are
|
||
supported).
|
||
|
||
When it comes to (graphical) notation, you find lots of different ways
|
||
to specify the same set of core elements of a fault tree. In particular,
|
||
gate symbols are often borrowed from circuit design (where the common
|
||
symbols differ from US to EU, for instance) and often it is only their
|
||
shape that indicates their function. In some cases you also get
|
||
operators (e.g. &) in the symbol itself, but this is not always the
|
||
case. In the following, we use & for And and >=1 for Or.
|
||
|
||
Assume the following fault tree that captures how it could happen that
|
||
you were late at the uni (hypothetically — we know this never happens to
|
||
you!):
|
||
|
||
Late at
|
||
the uni
|
||
|
|
||
+-----+
|
||
| >=1 |
|
||
+-----+
|
||
| |
|
||
+--+ |
|
||
| |
|
||
+-------+ |
|
||
| & | |
|
||
+-------+ |
|
||
| | |
|
||
O O O
|
||
^ ^ ^
|
||
Alarm not | Train
|
||
set | late
|
||
|
|
||
Slept
|
||
too long
|
||
|
||
Note that Fault Trees have their use, even if no analysis is carried
|
||
out. Constructing the FT already helps in understanding the system,
|
||
revealing problems, and building awareness on safety and reliability. In
|
||
this example, you already see that if the train is late, having slept
|
||
too long is not relevant anymore. Thereby, we already carried out a
|
||
qualitative analysis, i.e. checking if the top-event is reachable,
|
||
depending on the basic events.
|
||
|
||
This leads us to the definition of two special sets:
|
||
|
||
- Cut Set: set of basic events which causes the top event in
|
||
conjunction
|
||
- Path Set: set of basic events that (by being false) inhibit the
|
||
top-event from occurring
|
||
|
||
If you have a careful look, you see that these sets are usually bigger
|
||
than they need to be to fulfill their definition (e.g. a cut set
|
||
contains an event that does not need to be true for the top-event to
|
||
become true, e.g., because it is or-ed with another event that is true).
|
||
Hence, there are also:
|
||
|
||
- Minimal Cut Set (MCS): smallest set of events that, if failing, lead
|
||
to top-level fail
|
||
- Minimal Path Set (MPS): path set where removing any basic event
|
||
means it no longer is a path set
|
||
|
||
In fault tree analysis, one is usually concerned with MCS of order 1 or
|
||
2, as well as MCS with probability > 0.01 (which require quantitative
|
||
analysis that we learn later in this section). That means you focus on
|
||
single points of failure or small combinations that appear with
|
||
significant probability.
|
||
|
||
In the example above, we have the following:
|
||
|
||
- minimal cut sets: [Alarm not set, Slept too long] and [Train late].
|
||
- minimal path sets: [Train late, Alarm not set] and [Train late,
|
||
Slept too long].
|
||
|
||
Extended Boolean Fault Tree
|
||
|
||
If quantitative analysis is planned, we have to use extended boolean
|
||
FTs. The diagram we showed above is a good basis for such a tree, we
|
||
only have to decorate it with failure probabilities. This is depicted
|
||
below, where the failure probabilities of basic events induce the
|
||
probabilities further upwards in the tree (as per the rules quoted
|
||
below):
|
||
|
||
Late at
|
||
the uni
|
||
0.154
|
||
|
|
||
+-----+
|
||
| >=1 |
|
||
+-----+
|
||
| |
|
||
+--+ |
|
||
0.06 | |
|
||
+-------+ |
|
||
| & | |
|
||
+-------+ |
|
||
| | |
|
||
O O O
|
||
0.2 0.3 0.1
|
||
^ ^ ^
|
||
Alarm not | Train
|
||
set | late
|
||
|
|
||
Slept
|
||
too long
|
||
|
||
In general, for quantitative evaluation basic events should be chosen
|
||
to: a) have clear semantics, b) be self-contained and independent, c)
|
||
have a probability value assigned to them.
|
||
|
||
When asked to compute the probability of the top-level element failing,
|
||
we traverse the tree bottom-up and apply the following rules for the
|
||
gates:
|
||
|
||
- And Gate: \[P_{out} = _{i=1}^{n} P_i\]
|
||
- Or Gate: \[P_{out} = 1 - _{i=1}^{n} (1 - P_i)\]
|
||
|
||
Note that the above echoes the De Morgan law \[(A B) = (A B)\] since
|
||
\(1-P\) can be though of being the probability of \(X\) if \(P\) is the
|
||
probability of \(X\).
|
||
|
||
Efficient Minimal Cut Sets Computation
|
||
|
||
As minimal cut sets are so important for analysis (e.g., finding single
|
||
points of failure), an efficient computation is essential, especially
|
||
for large trees of complex systems. For this purpose, we can use the
|
||
following algorithm to compute the set of minimal cut sets:
|
||
|
||
- Traverse the tree recursively.
|
||
- At an OR gate, generate one entry per input: \([(i_1), …, (i_n)]\).
|
||
- At an AND gate, generate one entry with all inputs: \([(i_1, …,
|
||
i_n)]\).
|
||
- Drop duplicates during the process.
|
||
|
||
Fault Trees in Rust
|
||
|
||
The following shows how a fault tree is defined in one of the projects
|
||
you will work on:
|
||
|
||
#[derive(Clone, Debug, PartialEq, Eq, PartialOrd, Ord)]
|
||
pub enum Tree {
|
||
BasicEvent(Event),
|
||
IntermediateEvent(String, Box<Tree>),
|
||
Gate(Gate),
|
||
}
|
||
|
||
Similar to how we defined binary trees here, we have variants that
|
||
contain Tree — making the data structure recursive.
|
||
|
||
Gates
|
||
|
||
Gates store sub-trees and the gate-function itself:
|
||
|
||
#[derive(Clone, Debug, PartialEq, Eq, PartialOrd, Ord)]
|
||
pub enum Gate {
|
||
Or(Vec<Tree>),
|
||
And(Vec<Tree>),
|
||
}
|
||
|
||
Events
|
||
|
||
Events store a name as well as a probability:
|
||
|
||
#[derive(Clone, Debug, PartialEq, Eq, PartialOrd, Ord)]
|
||
pub struct Event(String, uom::si::rational64::Ratio);
|
||
|
||
Cut Sets
|
||
|
||
A cut set, as defined above, is a set of events for which the tree
|
||
evaluates to true if we set the respective events to true and traverse
|
||
the tree upwards. Here is a way to test if a certain set of events is a
|
||
cut set:
|
||
|
||
impl Tree {
|
||
fn cut_set(&self, events: &BTreeSet<Event>) -> bool {
|
||
match self {
|
||
Tree::BasicEvent(event) => events.contains(event),
|
||
Tree::IntermediateEvent(_, subtree) => subtree.cut_set(events),
|
||
Tree::Gate(gate) => match gate {
|
||
Gate::Or(subtrees) => subtrees.iter().any(|subtree| subtree.cut_set(&events)),
|
||
Gate::And(subtrees) => subtrees.iter().all(|subtree| subtree.cut_set(&events)),
|
||
},
|
||
}
|
||
}
|
||
}
|
||
|
||
You might notice how well the fault tree structure and logic maps to the
|
||
algorithm’s match statement (empty, gate, event) and subtree iteration
|
||
(or → any, and → all).
|
||
|
||
The following computes the set of minimum cut sets in a naive fashion:
|
||
|
||
fn naive_minimal_cut_sets(&self) -> BTreeSet<BTreeSet<Event>> {
|
||
let mut last_set = self.cut_sets();
|
||
let mut current_set = self.cut_sets();
|
||
loop {
|
||
let mut drop_set = BTreeSet::new();
|
||
for subset in ¤t_set {
|
||
let s = BTreeSet::from_iter(vec![subset.clone()]);
|
||
let others = current_set.difference(&s).cloned().collect::<Vec<_>>();
|
||
for other in others.into_iter() {
|
||
if subset.is_subset(&other) {
|
||
drop_set.insert(other);
|
||
}
|
||
}
|
||
}
|
||
current_set = current_set.difference(&drop_set).cloned().collect();
|
||
if current_set.len() < last_set.len() {
|
||
last_set = current_set.clone();
|
||
continue;
|
||
} else {
|
||
break;
|
||
}
|
||
}
|
||
current_set
|
||
}
|
||
|
||
The rationale is the following: We start with all cut sets (including
|
||
those that are not minimal). In every iterator of loop, we attempt to
|
||
make this collection smaller. As soon as we no longer succeed, we break.
|
||
Removal itself works by comparing all sets with each other and if one is
|
||
the subset of another set, we drop the other set (as it is not minimal).
|
||
|
||
Above, we showed one of many other algorithms that computes MCS in a way
|
||
that has a smaller computational complexity.
|
||
|
||
U09: Out in the Woods
|
||
|
||
Source: Andreas Schmidt
|
||
|
||
This has been a tough journey with DSys so far: You learned a lot, but
|
||
sitting in front of a computer all the time was quite stressful. It’s
|
||
about time to go outdoors and enjoy nature (if you are binge-learning
|
||
this course and have been sitting in front of the PC the whole day, go
|
||
outside ASAP).
|
||
|
||
Now that you are back from taking a close look at the trees outside, we
|
||
learn how to implement binary trees in Rust and afterwards we discuss
|
||
how fault trees can be used for dependability analysis.
|
||
|
||
S09: Sample Solution
|
||
|
||
Trees
|
||
|
||
- depth()
|
||
|
||
fn depth(&self) -> u32 {
|
||
match self {
|
||
Self::Empty => 0,
|
||
Self::NonEmpty(t) => 1 + u32::max(t.left.depth(), t.right.depth()),
|
||
}
|
||
}
|
||
|
||
- leaves()
|
||
|
||
fn leaves(&self) -> Vec<&T> {
|
||
match self {
|
||
Self::Empty => {
|
||
vec![]
|
||
}
|
||
Self::NonEmpty(tree) => {
|
||
let TreeNode {
|
||
element,
|
||
left,
|
||
right,
|
||
} = &**tree;
|
||
match (left, right) {
|
||
(Self::Empty, Self::Empty) => vec![element],
|
||
(left, right) => {
|
||
let mut leaves = left.leaves();
|
||
leaves.extend(right.leaves());
|
||
leaves
|
||
}
|
||
}
|
||
}
|
||
}
|
||
}
|
||
|
||
- project_inorder()
|
||
|
||
fn project_inorder(&self) -> Vec<&T> {
|
||
match self {
|
||
Self::Empty => vec![],
|
||
Self::NonEmpty(t) => {
|
||
let mut l = t.left.project_inorder();
|
||
l.push(&t.element);
|
||
l.append(&mut t.right.project_inorder());
|
||
l
|
||
}
|
||
}
|
||
}
|
||
|
||
- project_postorder()
|
||
|
||
fn project_postorder(&self) -> Vec<&T> {
|
||
match self {
|
||
Self::Empty => vec![],
|
||
Self::NonEmpty(t) => {
|
||
let mut l = t.left.project_postorder();
|
||
l.append(&mut t.right.project_postorder());
|
||
l.push(&t.element);
|
||
l
|
||
}
|
||
}
|
||
}
|
||
|
||
- find()
|
||
|
||
fn find(&self, f: fn(&T) -> bool) -> Option<&T> {
|
||
match self {
|
||
Self::Empty => None,
|
||
Self::NonEmpty(t) => {
|
||
if f(&t.element) == true {
|
||
Some(&t.element)
|
||
} else {
|
||
if let Some(v) = t.left.find(f) {
|
||
return Some(v);
|
||
}
|
||
if let Some(v) = t.right.find(f) {
|
||
return Some(v);
|
||
}
|
||
None
|
||
}
|
||
}
|
||
}
|
||
}
|
||
|
||
- balanced()
|
||
|
||
fn balanced(&self) -> bool {
|
||
match self {
|
||
Self::Empty => true,
|
||
Self::NonEmpty(t) => {
|
||
t.left.balanced()
|
||
&& t.right.balanced()
|
||
&& ((t.left.depth() as i64) - (t.right.depth() as i64)).abs() <= 1
|
||
}
|
||
}
|
||
}
|
||
|
||
- balance()
|
||
|
||
fn balance(self) -> Self {
|
||
let array: Vec<T> = self.project_inorder().into_iter().cloned().collect();
|
||
Self::from_sorted(&array)
|
||
}
|
||
|
||
fn from_sorted(slice: &[T]) -> Self {
|
||
if slice.len() == 0 {
|
||
Self::Empty
|
||
} else {
|
||
let mid_index = slice.len() / 2;
|
||
let mid = &slice[mid_index];
|
||
let left = Self::from_sorted(&slice[0..mid_index]);
|
||
let right = Self::from_sorted(&slice[mid_index + 1..slice.len()]);
|
||
Self::NonEmpty(Box::new(TreeNode {
|
||
element: mid.clone(),
|
||
left,
|
||
right,
|
||
}))
|
||
}
|
||
}
|
||
|
||
- map():
|
||
|
||
fn map<U: Ord + Clone>(self, f: fn(T) -> U) -> BinaryTree<U> {
|
||
match self {
|
||
Self::Empty => BinaryTree::Empty,
|
||
Self::NonEmpty(t) => {
|
||
let element = f(t.element);
|
||
let left = t.left.map(f);
|
||
let right = t.right.map(f);
|
||
BinaryTree::NonEmpty(Box::new(TreeNode {
|
||
element,
|
||
left,
|
||
right,
|
||
}))
|
||
}
|
||
}
|
||
}
|
||
|
||
- fold():
|
||
|
||
fn fold<A>(&self, acc: A, f: fn(A, &T) -> A) -> A {
|
||
match self {
|
||
Self::Empty => acc,
|
||
Self::NonEmpty(t) => {
|
||
let acc = t.left.fold(acc, f);
|
||
let acc = f(acc, &t.element);
|
||
t.right.fold(acc, f)
|
||
}
|
||
}
|
||
}
|
||
|
||
Fault Trees
|
||
|
||
- MCS: [[V], [S1,S2], [S1,S3], [S2,S3]]
|
||
|
||
- MPS: [[V,S1,S2],[V,S1,S3],[V,S2,S3]]
|
||
|
||
- Top-Level Probability:
|
||
|
||
- S1 & S2 (and others): 0.01
|
||
- || over &: 0.0297
|
||
- Top-Level ||: 0.0394
|
||
|
||
Summary
|
||
|
||
What did you learn?
|
||
|
||
- How tree data structures and algorithms are implemented in Rust.
|
||
- How fault trees are used to do dependability analysis.
|
||
- How fault trees and some of their algorithms can be implemented in
|
||
Rust.
|
||
|
||
Where can you learn more?
|
||
|
||
- Embedded Software Development for Safety-Critical Systems: Ch. 12
|
||
- Fault Tree Analysis:
|
||
- on Wikipedia
|
||
- Survey by Enno Ruijters and Mariëlle Stoelinga (University of
|
||
Twente)
|
||
- Overview Article by Sohag Kabir (University of Hull)
|
||
|
||
W09: Work Sheet
|
||
|
||
Tree Algorithms
|
||
|
||
One of the projects you get assigned to work on makes use of tree data
|
||
structures. To prepare you, this work sheet focuses on implementing
|
||
tree-based algorithms.
|
||
|
||
Here is an example_tree for which we show outputs for every method to be
|
||
implemented:
|
||
|
||
4
|
||
/ \
|
||
3 6
|
||
/ / \
|
||
2 5 7
|
||
/
|
||
1
|
||
|
||
Informational Algorithms
|
||
|
||
- Add a method depth that computes the depth or height of the tree:
|
||
fn depth(&self) -> u32
|
||
|
||
assert_eq!(example_tree.depth(), 4);
|
||
|
||
- Add a method leaves that returns a vector with all the leaf
|
||
elements: fn leaves(&self) -> Vec<&T>
|
||
|
||
assert_eq!(example_tree.leaves(), vec![&1, &5, &7]);
|
||
|
||
Projecting
|
||
|
||
- Add a method project_inorder that returns the tree elements
|
||
in-order: fn project_inorder(&self) -> Vec<&T>
|
||
|
||
assert_eq!(example_tree.project_inorder(), vec![&1, &2, &3, &4, &5, &6, &7]);
|
||
|
||
- Add a method project_postorder that returns the tree elements
|
||
post-order: fn project_postorder(&self) -> Vec<&T>
|
||
|
||
assert_eq!(example_tree.project_postorder(), vec![&1, &2, &3, &5, &7, &6, &4]);
|
||
|
||
Finding
|
||
|
||
- Add a method find that returns the first element where a predicate f
|
||
returns true: fn find(&self, f: fn(&T) -> bool) -> Option<&T>
|
||
|
||
assert_eq!(example_tree.find(|&e| e >= 5), Some(&6));
|
||
|
||
Balancing
|
||
|
||
- Add a method balanced that returns whether a tree is balanced or not
|
||
(height difference between leaves max. 1):
|
||
fn balanced(&self) -> bool
|
||
|
||
assert_eq!(example_tree.balanced(), false);
|
||
|
||
- Add a method balance that turns a tree into a balanced version:
|
||
fn balance(self) -> Self
|
||
|
||
4
|
||
/ \
|
||
/ \
|
||
2 6
|
||
/ \ / \
|
||
1 3 5 7
|
||
|
||
Map & Fold
|
||
|
||
- Add a method map that turns each element of the tree into something
|
||
different:
|
||
fn map<U: Ord + Clone>(self, f: fn(T) -> U) -> BinaryTree<U>
|
||
|
||
assert_eq!(example_tree.map(|e| e * 2).project_inorder(), vec![&2, &4, &6, &8, &10, &12, &14]);
|
||
|
||
- Add a method fold that traverses a tree inorder and folds the values
|
||
to an accumulator: fn fold<U>(&self, acc: A, f: fn(A, &T) -> A) -> A
|
||
|
||
assert_eq!(example_tree.fold(0, |a,e| a + e), 28);
|
||
|
||
Fault Tree Analysis
|
||
|
||
We consider a triple-modular redundancy scheme with a voter (V) and
|
||
three systems (S1, S2, S3). The fault tree of this system looks as in
|
||
the following diagram.
|
||
|
||
System
|
||
failed
|
||
|
|
||
+-----+
|
||
| >=1 |
|
||
+-----+
|
||
| |
|
||
| +--+
|
||
| |
|
||
| +-------+
|
||
| | >=1 |
|
||
| +-------+
|
||
| | | +----------------+
|
||
| | +---------+ |
|
||
| | | |
|
||
| +-------+ +-------+ +-------+
|
||
| | & | | & | | & |
|
||
| +-------+ +-------+ +-------+
|
||
| | | | | | |
|
||
O O O O O O O
|
||
^ ^ ^ ^ ^ ^ ^
|
||
V | S2 | S3 | S3
|
||
failed | failed | failed | failed
|
||
S1 S1 S2
|
||
failed failed failed
|
||
|
||
Your task is now to:
|
||
|
||
- Compute the minimal cut sets.
|
||
- Compute the minimal path sets.
|
||
- Compute the top-level failure probability using the gate formulas
|
||
and \(P_V = 0.01\) and \(P_{S1} = P_{S2} = P_{S3} = 0.1\).
|
||
|
||
Generics and Traits
|
||
|
||
You have already encountered generic types and traits and now is the
|
||
time to take a closer look at these two fundamental features of Rust.
|
||
Both allow you to write code that is able to operate on many different
|
||
and not just a single type.
|
||
|
||
This section is intentionally kept brief and you should read the
|
||
excellent 10th chapter of the Rust book if you have any doubts or want
|
||
a more in-depth introduction to generics and traits.
|
||
|
||
Generics
|
||
|
||
Generic Structs and Enums
|
||
|
||
First, we look at a generic type: Point<T>, a 2-dimensional point that
|
||
can be defined for different scales:
|
||
|
||
struct Point<T> {
|
||
x: T,
|
||
y: T,
|
||
}
|
||
|
||
fn main() {
|
||
let integer = Point { x: 5, y: 10 };
|
||
let float = Point { x: 1.0, y: 4.0 };
|
||
// let not_possible = Point { x: 1.0, y: 4 };
|
||
}
|
||
|
||
Note how the type itself is independent of the type that is used for the
|
||
two dimensions. We can use, e.g., integers or floats to specify them. It
|
||
is also possible to use complex numbers (or something awkward such as
|
||
strings) as instantiations of T, as long as both are the same.
|
||
|
||
Two common examples for generic enums are the Result<T,E> and Option<T>
|
||
types, that are defined like this:
|
||
|
||
enum Option<T> {
|
||
Some(T),
|
||
None,
|
||
}
|
||
|
||
enum Result<T, E> {
|
||
Ok(T),
|
||
Err(E),
|
||
}
|
||
|
||
In both cases, the variants can contain arbitrary types (or none for
|
||
None), that can, e.g., be extracted via pattern matching.
|
||
|
||
Generic Functions
|
||
|
||
Another typical use case for generics are functions that are capable of
|
||
working on any type. For instance, consider the following function:
|
||
|
||
fn largest<T>(list: &[T]) -> T {
|
||
let mut largest = list[0];
|
||
|
||
for &item in list {
|
||
if item > largest {
|
||
largest = item;
|
||
}
|
||
}
|
||
|
||
largest
|
||
}
|
||
|
||
First, we realize that it is generic in T. To achieve this, all code
|
||
inside the body must be independent of which T we have and what is
|
||
supported on T. If we look through it line by line, we see that T itself
|
||
is one assigned (which is supported by any type) and compared against
|
||
another instance of T. The latter aspect is the reason why you get a
|
||
compiler error when executing this code: nobody guaranteed that you can
|
||
do T > T. We can achieve this by adding a trait bound, i.e., limiting on
|
||
which types our function is defined. For supporting >, T must implement
|
||
the std::cmp::PartialOrd trait and we change the function signature to:
|
||
|
||
fn largest<T: std::cmp::PartialOrd>(list: &[T]) -> T
|
||
|
||
Monomorphization
|
||
|
||
Finally, a note on performance: Supporting generics means that the
|
||
different types (remember this implements polymorphism) must be handled
|
||
differently at machine-level, despite their common definition. One way
|
||
to do this is using virtual function calls, where we have a distinction
|
||
at run-time which type is present and which code is executed. However,
|
||
Rust uses a different approach, where the generic code is monomorphized,
|
||
i.e., for each used type, a distinct implementation is generated,
|
||
optimized, and referenced at the call-site. This increases compile time
|
||
but reduces run time, making Rust generics faster than generics in some
|
||
other languages.
|
||
|
||
Traits
|
||
|
||
Using Traits
|
||
|
||
As you have seen already, traits encapsulate a certain feature or
|
||
property a type has or supports. It can be seen as a capability:
|
||
something a type can do. In U04, you saw the std::iter::Iterator trait
|
||
which is implemented for types that can produce a sequence of values.
|
||
|
||
When we want to use traits, we must make sure that the trait itself is
|
||
in the current scope. Some of them are already, because they are part of
|
||
std’s prelude, but others must be used.
|
||
|
||
For example, the following code is only valid with the first line:
|
||
|
||
use std::io::Write;
|
||
|
||
let mut buf: Vec<u8> = vec![];
|
||
buf.write_all(b"hello");
|
||
|
||
Vec<u8> implements Write, but for Write::write_all to be accessible, it
|
||
must be in scope. This is to avoid naming conflicts, as types can
|
||
implement multiple traits with, potentially, identical function names.
|
||
In these cases, you use fully qualified method calls:
|
||
|
||
Write::write_all(&mut buf, b"hello");
|
||
OtherWriteTrait::write_all(&mut buf, b"hello");
|
||
|
||
Implementing Traits
|
||
|
||
You can define your own traits like this:
|
||
|
||
/// A trait for things that can be moved around
|
||
trait Moveable {
|
||
fn move(&mut self, distance: Point);
|
||
fn rotate(&mut self, angle: Angle);
|
||
}
|
||
|
||
Implementing it can be done like:
|
||
|
||
impl Moveable for Container {
|
||
fn move(&mut self, distance: Point) {
|
||
self.origin += distance;
|
||
}
|
||
|
||
fn rotate(&mut self, angle: Angle) {
|
||
// ...
|
||
}
|
||
}
|
||
|
||
Note that while you can write your own traits and implementations, it is
|
||
also possible to implement third-party traits for your own types (as you
|
||
see in a minute).
|
||
|
||
Traits can also be used to implement Default Methods. For instance,
|
||
consider a Sink writer (i.e. it implements Write) that simply discards
|
||
the data (you can think of this as > /dev/null on Linux):
|
||
|
||
pub struct Sink;
|
||
|
||
use std::io::{Write, Result};
|
||
|
||
impl Write for Sink {
|
||
fn write(&mut self, buf: &[u8]) -> Result<usize> {
|
||
Ok(buf.len()) // claim the full data has been written
|
||
}
|
||
|
||
fn flush(&mut self) -> Result<()> {
|
||
Ok(())
|
||
}
|
||
}
|
||
|
||
As you see, we only specified the write and flush methods. If something
|
||
implements Write, it also supports the write_all method you have seen
|
||
before. This is done via a default implementation in the Write trait:
|
||
|
||
trait Write {
|
||
fn write(&mut self, buf: &[u8]) -> Result<usize>;
|
||
fn flush(&mut self) -> Result<()>;
|
||
fn write_all(&mut self, buf: &[u8]) -> Result<()> {
|
||
let mut bytes_written = 0;
|
||
while bytes_written < buf.len() {
|
||
bytes_written += self.write(&buf[bytes_written..]?);
|
||
}
|
||
Ok(())
|
||
}
|
||
}
|
||
|
||
Utility Traits
|
||
|
||
We conclude this section by having a closer look at a couple of utility
|
||
traits that are part of the standard library… and can be used to work
|
||
with fantasy creatures.
|
||
|
||
With Drop, we can implement a custom destruction method for a type:
|
||
|
||
struct Pokémon {
|
||
name: String,
|
||
// ...
|
||
}
|
||
|
||
impl Pokémon {
|
||
fn new(name: String) -> Self {
|
||
println!("A wild {} appears!", name);
|
||
Self {
|
||
name,
|
||
// ...
|
||
}
|
||
}
|
||
}
|
||
|
||
impl Drop for Pokémon {
|
||
fn drop(&mut self) {
|
||
println!("{} disappears!", self.name);
|
||
}
|
||
}
|
||
|
||
fn main() {
|
||
println!("Game start.");
|
||
{
|
||
let pikachu = Pokémon::new("Pikachu".into());
|
||
} // pikachu is dropped at the scope end
|
||
println!("Game end.");
|
||
}
|
||
|
||
With Default, we can define default values:
|
||
|
||
enum Pokéball {
|
||
Empty,
|
||
Filled(Pokémon),
|
||
}
|
||
|
||
impl Default for Pokéball {
|
||
fn default() -> Self {
|
||
Pokéball::Empty
|
||
}
|
||
}
|
||
|
||
fn main() {
|
||
let ball : Pokéball = Default::default();
|
||
}
|
||
|
||
Finally, there are the From / Into and TryFrom / TryInto trait pairs
|
||
used to do conversions:
|
||
|
||
impl From<Pokémon> for Pokéball {
|
||
fn from(pokémon: Pokémon) -> Self {
|
||
println!("{} was captured.", pokémon.name);
|
||
Self::Filled(pokémon)
|
||
}
|
||
}
|
||
|
||
fn main() {
|
||
let pikachu = Pokémon::new("Pikachu".into());
|
||
let ball: Pokéball = pikachu.into();
|
||
// or
|
||
// let ball = Pokéball::from(pikachu);
|
||
}
|
||
|
||
Note that by convention (and default implementation), if you implement
|
||
From, you get the inverse Into for free. The Try variants are fallible,
|
||
i.e they return Result<Self,E>.
|
||
|
||
Macros
|
||
|
||
In your computing career, you might have heard the term macros before
|
||
(which is short for macro instruction, i.e. a long/large instruction).
|
||
Abstractly, it is defined as a rule or pattern that specifies how a
|
||
certain input should be mapped to a replacement output (cf. Wikipedia).
|
||
Now this sounds like any function, and indeed a macro is a function. The
|
||
difference, though, is that macros usually produce inputs to a computer
|
||
program (e.g. characters, keystrokes, or syntax trees) — automating the
|
||
process of using the program. Spreadsheet or photography applications
|
||
often provide this to turn a sequence of arbitrary process steps into a
|
||
single instruction. For us, we mostly care about macros that take code
|
||
and produce (usually more) code. The expansion happens during
|
||
compilation, which means that compilation metadata is also available.
|
||
|
||
In this section, we will learn about different types of macros. You are
|
||
already familiar with the println! macro for printing a formatted
|
||
string.
|
||
|
||
Reasons for Macros
|
||
|
||
Before we get into the details of how to use Rust macros, let’s revisit
|
||
the various use cases:
|
||
|
||
- Avoid Boilerplate Code
|
||
- Domain Specific Languages
|
||
- Conditional Compilation
|
||
- Inlining
|
||
|
||
Avoid Boilerplate Code
|
||
|
||
First of all, as macros simply automate the process of code-production,
|
||
they can be used to simplify the generation of repetitive code. This
|
||
means, whenever you realize that you are writing the same kind of code
|
||
over and over (with the only exception that you might be doing it for
|
||
different types or other slight variations), macros can help. Note that,
|
||
in the case of your variable being a type, generics should be an easier
|
||
solution. Leveraging macros in these situations increases the
|
||
maintainability with respect to:
|
||
|
||
- readability, i.e. developers first understand the macro (or infer it
|
||
from the name) and then the usages,
|
||
- changeability, i.e. changes can be done once and are applied
|
||
everywhere.
|
||
|
||
The best code is no code at all. Every line of code you willingly
|
||
bring into the world is code that has to be debugged, code that has to
|
||
be read and understood, code that has to be supported. - Jeff Atwood
|
||
|
||
A straightforward example is the println!() macro that allows us to pass
|
||
a format string and a variable list of arguments. We can use
|
||
cargo-expand to show how all macros in our code are expanded. This piece
|
||
of code:
|
||
|
||
fn main() {
|
||
println!("Macro magic {}!", "rulz!");
|
||
}
|
||
|
||
is expanded into:
|
||
|
||
#![feature(prelude_import)]
|
||
#[prelude_import]
|
||
use std::prelude::rust_2021::*;
|
||
#[macro_use]
|
||
extern crate std;
|
||
fn main() {
|
||
{
|
||
::std::io::_print(::core::fmt::Arguments::new_v1(&["Macro magic ",
|
||
"!\n"], &[::core::fmt::ArgumentV1::new_display(&"rulz!")]));
|
||
};
|
||
}
|
||
|
||
While there are many new pieces of code added, look at the usage of
|
||
_print function. Imagine, you had to write this code every time you
|
||
wanted to print something as simple as the string above. Additionally,
|
||
println!() supports a variable number of arguments (which normal
|
||
functions do not) so the macro helps here too by turning arguments into
|
||
lists of elements.
|
||
|
||
Domain-Specific Languages (DSL)
|
||
|
||
Another common use case for macros are languages that are
|
||
domain-specific. This can mean various things, one example could already
|
||
be the table-based tests, we wrote in U02. A tester only needs to
|
||
understand the test-specification language and needs to have no clue
|
||
about Rust.
|
||
|
||
While Python is not really domain-specific (it is a general purpose
|
||
language), the following example showcases how a DSL would be used. The
|
||
inline-python crate provides the python!{...} macro that allows a
|
||
developer to write Python code in Rust. This includes that data can be
|
||
shared between the two. Here is an example where we assume we have an
|
||
existing algorithm in Python and want to use it one-to-one in Rust:
|
||
|
||
use inline_python::{Context,python};
|
||
let c: Context = python! {
|
||
def fib(n):
|
||
if n == 0 or n == 1:
|
||
return 1
|
||
else:
|
||
return fib(n-1) + fib(n-2)
|
||
res = fib(7)
|
||
};
|
||
|
||
assert_eq!(c.get::<i32>("res"), 21);
|
||
|
||
Conditional Compilation
|
||
|
||
When you develop larger software projects, you face the challenge that
|
||
some parts of your code are necessary in some situations but not all. An
|
||
example could be debugging code or platform-specific
|
||
code (e.g. Windows-specific behaviour). The manual solutions to this are
|
||
to comment in/out code on demand or introduce global boolean variables
|
||
to enable/disable functionality. Global variables have the drawback that
|
||
the code itself is still compiled into the binary, i.e. you are “paying
|
||
in binary size” for code that is never used. Comments overcome this
|
||
issue, but adding/removing comments is tedious (and does not integrate
|
||
well with version control). Furthermore, as both things are done
|
||
manually, they impose a risk for dependability (as both comments and
|
||
global variables can be overlooked).
|
||
|
||
The most elegant and dependable solution is to use conditional
|
||
compilation. This means that, at the time of compilation, various
|
||
conditions get evaluated and depending on the result, parts of the code
|
||
are still used or not. Let’s take the ntohs function as an example,
|
||
which converts a network u16 to a host u16, respecting endianness. What
|
||
does this mean? While people have agreed that multi-byte numbers on the
|
||
network are sent as big-endian (most significant byte first), most
|
||
desktop systems are little-endian (most significant byte last). Hence,
|
||
our function should take the CPU endianness into account, which is
|
||
available as the target_endian variable.
|
||
|
||
#[cfg(target_endian = "big")]
|
||
fn ntohs(input: u16) -> u16 {
|
||
input
|
||
}
|
||
|
||
#[cfg(target_endian = "little")]
|
||
fn ntohs(input: u16) -> u16 {
|
||
input.swap_bytes()
|
||
}
|
||
|
||
fn main() {
|
||
println!("{:X}", ntohs(0xA010));
|
||
}
|
||
|
||
Here, on a big-endian system, ntohs is an identity function (which might
|
||
be optimized away by a clever compiler). On the little-endian system,
|
||
however, the bytes must be swapped.
|
||
|
||
In fact, the attributes here are built-in attributes and not
|
||
(attribute-)macros, so the compiler itself knows how to interpret
|
||
them. However, other more complex forms of conditional compilation can
|
||
be realized using macros.
|
||
|
||
Inlining
|
||
|
||
If you are writing performant software in a modular way, you often end
|
||
up with functions that are called frequently (often refered to as hot
|
||
functions). Entering and exiting a function does not come for
|
||
free (variables must be copied, stacks prepared, etc.). A solution to
|
||
this is to remove the function and inline its functionality, where it is
|
||
needed. This has multiple drawbacks: a) readability is lost, as the
|
||
function with a name is replaced with a (complex) expression, b)
|
||
maintainability is lost, as changing the function means changing every
|
||
occurence. For these reasons, inlining should not be done manually, but
|
||
rather using compiler-support. In C/C++, people often use macros for
|
||
this or the inline keyword. The former will always do the replacement,
|
||
while the latter leaves it to the compiler’s implementation.
|
||
|
||
Similar to C/C++, we can use macros to inline functionality in Rust.
|
||
However, the more common approach is to use attributes to specify
|
||
whether a function is inline or not. There are four cases:
|
||
|
||
- No attribute. If we do not specify anything, the compiler might
|
||
decide to inline it (depending on optimization level, function size
|
||
etc.). These functions are never inlined across crates.
|
||
- #[inline] suggests the function to be inlined, also across crates.
|
||
- #[inline(always)] this strongly suggests the function to be inlined,
|
||
but the compiler might still decide not to (in exceptional cases).
|
||
- #[inline(never)] strongly suggests the function should not be
|
||
inlined.
|
||
|
||
Note that, again, these are not attribute macros but built-in
|
||
attributes. However, we mention them here because a) other languages
|
||
use macros for inlining and b) attribute syntax is used, which makes
|
||
them look similar.
|
||
|
||
On Programming Syntax
|
||
|
||
As mentioned before, macros take an input and produce and output (Rust
|
||
code). In general, the grammar of a programming language defines how a
|
||
string (x = 5) is turned into a sequence of tokens
|
||
([Variable(x), Operator(=), Literal(5)]). These tokens are the building
|
||
blocks for the syntax of a language.
|
||
|
||
In compiler terms, we go from a raw string via the process of lexing to
|
||
a token stream. The following line of code
|
||
|
||
let value = 40 + 2;
|
||
//1 2 3 4 5 67 <- index
|
||
|
||
is transformed into this stream of tokens:
|
||
|
||
TokenStream [
|
||
1 Ident { sym: let },
|
||
2 Ident { sym: value },
|
||
3 Punct { char: '=', spacing: Alone },
|
||
4 Literal { lit: 40 },
|
||
5 Punct { char: '+', spacing: Alone },
|
||
6 Literal { lit: 2 },
|
||
7 Punct { char: ';', spacing: Alone }
|
||
]
|
||
|
||
A tokenstream can then be transformed into a Rust syntax fragment,
|
||
e.g. a statement in the case above or an expression (based on string
|
||
5 * 5).
|
||
|
||
Macros in Rust
|
||
|
||
Having talked about the pros and cons of macros, let’s see how to use
|
||
them in Rust. First, we must distinguish between two types:
|
||
|
||
- Declarative Macros:
|
||
- declared using the macro_rules!() macro
|
||
- leverage special mini-language to declare macros (match &
|
||
replace)
|
||
- limited in functionality
|
||
- Procedural Macros:
|
||
- declared in a dedicated proc-macro crate
|
||
- take raw TokenStreams as both input and output
|
||
- offer maximum functionality
|
||
|
||
Declarative Macros
|
||
|
||
The first, and easier, class of macros are the declarative ones. They
|
||
can be defined using the macro_rules!() macro in any crate. They act in
|
||
a copy and paste manner, i.e. they have transformation rules that are
|
||
simply applied. The input to a declarative macro is a syntax
|
||
fragment (e.g. a expression, identifier, …) which is used to generate
|
||
code according to a template. Finally, a macro must be defined before
|
||
the invocation, limiting the places where it can be introduced.
|
||
|
||
The general structure of a declarative macro is as follows:
|
||
|
||
macro_rules! macro_name {
|
||
(matcher1) => { transcriber1 }
|
||
// ...
|
||
(matcherN) => { transcriberN }
|
||
}
|
||
|
||
The macro_name can be picked mostly freely and will be used to invoke
|
||
the macro. Afterwards, there is a set of matcher-transcriber pairs,
|
||
which can be thought of as patterns in pattern matching.
|
||
|
||
Matchers try to match the given syntax fragment to its own regex. The
|
||
syntax fragments are also captured in metavariables, allowing access to
|
||
them. The following illustrates that
|
||
|
||
($var:ident, $val:expr)
|
||
|
||
would match
|
||
|
||
some_variable_name, 42 + 17 * 3
|
||
|
||
Matching sequences is also possible with $()<OP>. <OP> can be
|
||
|
||
- *: any number of repetitions
|
||
- +: any number, but at least one
|
||
- ?: optional fragment, zero or one occurence
|
||
|
||
An example would be this:
|
||
|
||
$($key:expr => $value:expr),+
|
||
|
||
would match
|
||
|
||
1 => 2 + 3, 4 => 5 * 6
|
||
|
||
These concepts are in play in the vec! macro (with invocations in
|
||
comments):
|
||
|
||
macro_rules! vec {
|
||
() => { ... }; // vec![]
|
||
($elem : expr ; $n : expr) => { ... }; // vec![1; 100]
|
||
($($x: expr),+ $(,)?) => { ... }; // vec![1,2,3] or vec![1, 2, 3]
|
||
}
|
||
|
||
Note that invocation of macros can be done with (), [], or {}. All of
|
||
them are equivalent. However, there are common conventions (e.g. [] for
|
||
collections, {} for larger blocks, and () for single-lines).
|
||
|
||
A transcriber then declares how the captured metavariables are
|
||
transformed into code. This can make use of metavariables as mentioned
|
||
before. Here is an example of a macro that creates a vector of numbers
|
||
in [min, max) (exclusive end):
|
||
|
||
macro_rules! ranged_vec {
|
||
($min:expr, $max:expr) => {
|
||
($min..$max).collect::<Vec<_>>()
|
||
};
|
||
}
|
||
|
||
Procedural Macros
|
||
|
||
This form of macro comes in three distinct types:
|
||
|
||
- Function-Like Macros: custom!(...)
|
||
- Derive Macros: #[derive(CustomDerive)]
|
||
- Attribute Macros: #[CustomAttribute]
|
||
|
||
In contrast to declarative macros, procedural macros must be defined in
|
||
a proc-macro library. They are also compiled differently and tend to
|
||
lead to an increase in compile-time compared to non-macro code. The
|
||
Cargo.toml must look like this:
|
||
|
||
[package]
|
||
name = "dsys-macros" # arbitrary name
|
||
version = "0.1.0"
|
||
edition = "2021"
|
||
|
||
[lib]
|
||
proc-macro = true
|
||
|
||
Each proc macro is then a function in this library:
|
||
|
||
#[proc_macro]
|
||
pub fn dsys(input: TokenStream) -> TokenStream {
|
||
let output = TokenStream::new();
|
||
// ... do the actual work ...
|
||
output
|
||
}
|
||
|
||
Function-like and attribute macros replace their input completely
|
||
(though parts of the input can be maintained within the transformation
|
||
function). Derive macros instead do not replace, but rather extend what
|
||
they are applied to, like this:
|
||
|
||
#[derive(CustomDerive)] // <--- this
|
||
struct CustomStruct {
|
||
// ...
|
||
}
|
||
|
||
// generates for example this:
|
||
impl CustomStruct {
|
||
// ...
|
||
}
|
||
|
||
In contrast to declarative macros, the input token streams are taken
|
||
as-is (no matching applied) and the output token stream must also be
|
||
composed manually (as opposed to the transcriber syntax). In practice,
|
||
developers use the syn crate for parsing inputs and the quote crate for
|
||
producing outputs. syn can parse arbitrary Rust code into an Abstract
|
||
Syntax Tree (AST). Afterwards, one would analyze the AST and produce
|
||
tokens accordingly. For this use-case, quote!(...) helps as Rust code
|
||
passed in as ... is a TokenStream and can be treated as data. There is
|
||
also parse_quote!(...) which returns a parsed syn element instead of a
|
||
TokenStream.
|
||
|
||
Macros in Action
|
||
|
||
Function-Like Macro
|
||
|
||
These macros are the most basic form as they can accept any input and
|
||
produce any output. The TokenStream resulting from the function is
|
||
injected inplace. This is often necessary for complex tasks, for
|
||
instance when computation over input must be done. Wherever possible,
|
||
declarative macros should be used instead of function-like procedural
|
||
macros (as they are simpler). Popular examples are for instance the
|
||
json! in serde.
|
||
|
||
Here, we build our own macro timeit! that takes an arbitrary expression,
|
||
measures how long it computes, and prints this to the console:
|
||
|
||
// lib.rs
|
||
use proc_macro::TokenStream;
|
||
use quote::quote;
|
||
|
||
#[proc_macro]
|
||
pub fn timeit(input: TokenStream) -> TokenStream {
|
||
let input_code = input.to_string();
|
||
let input: proc_macro2::TokenStream = input.into();
|
||
quote!({
|
||
let start = std::time::Instant::now();
|
||
let result = #input;
|
||
println!("`{}` took {:?}", #input_code, start.elapsed());
|
||
result
|
||
})
|
||
.into()
|
||
}
|
||
|
||
Later, we use it like this:
|
||
|
||
// main.rs
|
||
|
||
use macros::timeit;
|
||
|
||
fn main() {
|
||
let f = timeit!(5 * 5);
|
||
println!("Result: {}", f);
|
||
}
|
||
|
||
Derive Macro
|
||
|
||
These macros can only be applied to struct or enum declarations and
|
||
cannot stand freely. Furthermore, they cannot alter the input stream,
|
||
but rather add functionality to the input declaration. The most common
|
||
use case is the automated implementation of traits and associated
|
||
functionality (as seen in a previous section of this unit).
|
||
|
||
Assume that we want to build a Description trait that can be
|
||
automatically derived for types, including additional attributes. The
|
||
usage looks like this:
|
||
|
||
// main.rs
|
||
|
||
#[derive(Description)]
|
||
pub enum Mode {
|
||
#[description("System completely disabled.")]
|
||
Off = 0,
|
||
#[description("System in limited recovery mode.")]
|
||
Recovery = 5,
|
||
#[description("System fully operational.")]
|
||
On = 9,
|
||
}
|
||
|
||
fn main() {
|
||
println!(Mode::Recovery.description());
|
||
// Should print "[5] System in limited recovery mode."
|
||
}
|
||
|
||
The implementation of the macro looks like this (click the “show hidden
|
||
button”, as this book misinterprets the quote syntax):
|
||
|
||
// lib.rs
|
||
|
||
use quote::quote;
|
||
|
||
#[proc_macro_derive(Description, attributes(description))]
|
||
pub fn derive_description(input: proc_macro::TokenStream) -> proc_macro::TokenStream {
|
||
let input = parse_macro_input!(input as DeriveInput);
|
||
if let syn::Data::Enum(data) = input.data {
|
||
let arms: Vec<_> = data
|
||
.variants
|
||
.into_iter()
|
||
.map(enum_variant_to_match_arm)
|
||
.collect();
|
||
|
||
let ty = input.ident;
|
||
quote! {
|
||
impl Description for #ty {
|
||
fn description(&self) -> &str {
|
||
match self {
|
||
#(#arms),*
|
||
}
|
||
}
|
||
}
|
||
}
|
||
.into()
|
||
} else {
|
||
panic!("Description can only be derived on enums.");
|
||
}
|
||
}
|
||
|
||
fn enum_variant_to_match_arm(variant: syn::Variant) -> proc_macro2::TokenStream {
|
||
let attribute_ident: proc_macro2::Ident = quote::format_ident!("description");
|
||
|
||
let description = variant
|
||
.attrs
|
||
.iter()
|
||
.find(|attr| {
|
||
attr.path
|
||
.get_ident()
|
||
.map(|ident| ident == &attribute_ident)
|
||
.is_some()
|
||
})
|
||
.expect(
|
||
"When deriving Description, each variant must have one #[description(...)] attribute.",
|
||
);
|
||
|
||
let tokens = description.tokens.clone().into_iter().collect::<Vec<_>>();
|
||
if tokens.len() == 1 {
|
||
if let proc_macro2::TokenTree::Group(g) = &tokens[0] {
|
||
let description: syn::LitStr = syn::parse2(g.stream())
|
||
.expect("#[description(...)] argument should be a literal string.");
|
||
|
||
let discriminant = if let Some((_, discriminant)) = variant.discriminant {
|
||
discriminant.to_token_stream().to_string()
|
||
} else {
|
||
"?".to_string()
|
||
};
|
||
let result = format!("[{}] {}", discriminant, description.value());
|
||
let variant_ident = variant.ident;
|
||
quote! {
|
||
Self::#variant_ident => #result
|
||
}
|
||
} else {
|
||
panic!("#[description(...)] argument must be wrapped in ().")
|
||
}
|
||
} else {
|
||
panic!("#[description(...)] should have exactly one argument.");
|
||
}
|
||
}
|
||
|
||
The macro first checks if it is applied to an enum. If so, each enum
|
||
variant is transformed into a match arm to be later added to the
|
||
impl Description block that implements the trait. Within
|
||
enum_variant_to_match_arm, we validate that the variant has an attribute
|
||
and the attribute has the following form:
|
||
#[description("A literal string")]. Eventually, the variant identifier
|
||
and the literal string are used to compose the description text.
|
||
|
||
Attribute Macros
|
||
|
||
Finally, attribute macros also work on items (e.g. struct, enum, or
|
||
function) but replace instead of extend. This can be seen from their
|
||
signature in the following example:
|
||
|
||
#[proc_macro_attribute]
|
||
pub fn amend(attr: TokenStream, item: TokenStream) -> TokenStream {
|
||
println!("attr: \"{}\"", attr.to_string());
|
||
println!("item: \"{}\"", item.to_string());
|
||
item
|
||
}
|
||
|
||
The attribute as well as the item itself are passed to the
|
||
transformation function. Inside the attribute, we can use expressions of
|
||
arbitrary complexity. This can be seen here:
|
||
|
||
#[amend(baz => bar)]
|
||
fn foo() {}
|
||
// out: attr: "baz => bar"
|
||
// out: item: "fn foo() {}"
|
||
|
||
The use cases for this are various:
|
||
|
||
- Framework annotations, e.g. declare a function as a backend route in
|
||
rocket.rs.
|
||
- Transparent middleware, e.g. injecting tracing functionality.
|
||
- Type transformation, e.g. alter the input struct.
|
||
- Test generation, e.g. generate same test for different cases /
|
||
configurations.
|
||
|
||
A helpful crate in this case is darling, which let’s us declare a struct
|
||
into which the arguments of the attribute are parsed automatically. The
|
||
following is similar to the timeit function-like macros, but this time
|
||
as an attribute that can be added to functions (as opposed to
|
||
expressions for timeit):
|
||
|
||
use macros::timed;
|
||
|
||
#[timed(fmt = "{} elapsed")]
|
||
fn the_answer() -> usize {
|
||
42
|
||
}
|
||
|
||
fn main() {
|
||
let a = the_answer(); // will print "100ns elapsed" or similar
|
||
println!("The answer is {}", a);
|
||
}
|
||
|
||
The macro is implemeted as follows:
|
||
|
||
use darling::FromMeta;
|
||
|
||
#[derive(Debug, FromMeta)]
|
||
struct MacroArgs {
|
||
fmt: String,
|
||
}
|
||
|
||
#[proc_macro_attribute]
|
||
pub fn timed(args: TokenStream, input: TokenStream) -> TokenStream {
|
||
let attr_args = parse_macro_input!(args as syn::AttributeArgs);
|
||
let input = parse_macro_input!(input as syn::ItemFn);
|
||
|
||
let args = match MacroArgs::from_list(&attr_args) {
|
||
Ok(v) => v,
|
||
Err(e) => {
|
||
return TokenStream::from(e.write_errors());
|
||
}
|
||
};
|
||
|
||
let fmt = args.fmt.replace("{}", "{0:#?}");
|
||
|
||
let block = input.block;
|
||
let block = parse_quote! {
|
||
{
|
||
let start = std::time::Instant::now();
|
||
let result = #block;
|
||
println!(#fmt, start.elapsed());
|
||
result
|
||
}
|
||
};
|
||
|
||
syn::ItemFn { block, ..input }.to_token_stream().into()
|
||
}
|
||
|
||
First, a darling::FromMeta struct is defined, which is then parsed and
|
||
used to make the resulting code argument-dependent. In particular, the
|
||
format string of println! is based on the argument. In this use case,
|
||
you also see how we can use the struct copy operation
|
||
({ changed, ..original }) to modify syn structures. Concretely, we parse
|
||
an ItemFn, modify its block (by wrapping it), and return a tokenized
|
||
version again.
|
||
|
||
Hygiene
|
||
|
||
In the context of macros, you often read about hygiene (no worries, no
|
||
showers involved): Before we define hygiene, let’s have a look at an
|
||
unhygienic C example:
|
||
|
||
#include <stdio.h>
|
||
|
||
#define TIMES_TWO(X) X + X
|
||
|
||
int main() {
|
||
int x = TIMES_TWO(3) * 2;
|
||
printf("%d", x);
|
||
return 0;
|
||
}
|
||
|
||
Given the name of the macro, the developer probably intended this to be
|
||
self-contained, i.e. the input number is doubled. However, the example
|
||
use produces 9 instead of 12, as the macro is a 1:1 replacement and
|
||
operator precedence rules are applied afterwards. A common fix is to put
|
||
brackets around these kinds of macros to overcome this (round brackets
|
||
for values; curly brackets for scopes in some C variants).
|
||
|
||
Another example is a macro that uses identifiers:
|
||
|
||
#include <stdio.h>
|
||
#define MODIFY_X(VALUE) x = VALUE;
|
||
|
||
int main() {
|
||
int x = 5;
|
||
MODIFY_X(42)
|
||
printf("%d", x);
|
||
return 0;
|
||
}
|
||
|
||
Here, by accident or not, x is used in both the macro itself and the
|
||
destination scope. Again, you can see that the 1:1 replacements could
|
||
lead to unforseen and hard to debug effects on their environment.
|
||
|
||
In consequence, we call a macro hygienic, if it is neither affected by
|
||
its surroundings, nor it affects the surrounding. Without further
|
||
limitation, this sounds like macros can either (a) be hygienic and
|
||
useless (no effect) or (b) have an effect and be dirty. Infact, we have
|
||
to clarify surroundings more: Obviously, macros add functionality
|
||
(e.g. by introducing new items such as function, structures, statements,
|
||
etc.). This functionality sometimes includes items with identifiers
|
||
(functions, variables, structs, etc.). If a macro uses an identifier
|
||
that is already present in the scope in which it is executed, it is not
|
||
clear how disambiguities are resolved. Here, hygiene comes into play:
|
||
|
||
- For module-level items (e.g. structs, functions), the compiler
|
||
simply complains about the reused identifier (forcing the developer
|
||
to act).
|
||
- For function-level local variables, each macro invocation creates
|
||
its own scope/context.
|
||
- For expressions (as in the C example above), the macro returns an
|
||
expression that stands for itself and is not syntactically merged
|
||
with the destination code.
|
||
|
||
In the following we have two pieces of code:
|
||
|
||
macro_rules! keep_unchanged {
|
||
($x:expr) => {
|
||
value = $x;
|
||
}
|
||
}
|
||
|
||
let mut value = 1;
|
||
keep_unchanged!(2);
|
||
assert_eq!(value, 1);
|
||
|
||
The compiler complains that value is not found in the scope (showing
|
||
that the macro expansion has its own scope). In the second code example,
|
||
we pass an identifier of the environment to the macro, allowing the
|
||
macro to modify it:
|
||
|
||
macro_rules! modify {
|
||
($var:ident, $val:expr) => {
|
||
$var = $val;
|
||
};
|
||
}
|
||
|
||
let mut value = 0;
|
||
modify!(value, 42);
|
||
assert_eq!(value, 42);
|
||
|
||
Finally, what is about identifiers used in the macro such as Instant?
|
||
When refering to items (types, functions, …) the lookup happens as would
|
||
any other lookup at the call site of the macro. This means that if
|
||
Instant was not brought into scope by use, the compilation fails.
|
||
Furthermore, it can happen that others than the intended item get used
|
||
due to having the same name (shadowing). As a consequence, the
|
||
recommendation is to use fully qualified module paths to items in a
|
||
macro (e.g. std::time::Instant).
|
||
|
||
Reasons Against Macros
|
||
|
||
Now that we have covered use cases and implementations of macros, you
|
||
are probably excited to use them (all over the place). But before you do
|
||
so, let’s think a second about what drawbacks they have:
|
||
|
||
First of all, macros increase the complexity of your code, as
|
||
|
||
- procedural macros introduce an additional crate,
|
||
- declarative macros use the special language for matchers and
|
||
transcribers, and
|
||
- non-trivial macros should be written using syn and quote, which you
|
||
must learn first.
|
||
|
||
The usages of macros tend to look simple, but can be responsible for
|
||
quite some complex code. If used wisely, this is good (as on the
|
||
usage-side, complexity is reduced). If not, a macro-solution can be less
|
||
maintainable and reliable as writing the code manually.
|
||
|
||
Second, macros can be hard to maintain, as developers must understand
|
||
the transformation logic for all the use cases. While this would be the
|
||
same for any function, macros tend not to have such a clear and obvious
|
||
API.
|
||
|
||
This also causes bugs in macros to be harder to find and fix than normal
|
||
code (tough cargo-expand can help here).
|
||
|
||
Macros also make it easy to implement unidiomatic behaviour, i.e. you
|
||
can use them to write code that no longer feels like Rust — making it
|
||
potentially hard to understand for others.
|
||
|
||
Especially new programmers also tend to overuse macros, as it seems like
|
||
it is a powerful tool. Indeed, it is but should only be used with care
|
||
and where appropriate.
|
||
|
||
As macros are so powerful, there are also bad ways to use them. One
|
||
anecdote can be found on StackOverflow. Here, we see that the C/C++
|
||
macro system uses the so-called preprocessor, i.e. before compilation
|
||
the macros are one-to-one text replacements, agnostic of language
|
||
syntax. In fact, the macro is used to fix the broken syntax. In Rust,
|
||
this is not possible, as we always work on token trees or token streams
|
||
and not pure text. However, you get the idea that not every use of
|
||
macros is really sensible.
|
||
|
||
U10: Metaprogramming
|
||
|
||
We introduce generics and traits as a means to easily and correctly
|
||
reuse code (or write code that adapts to the use case). We make use of
|
||
them to build both run-time and compile-time state machines. Finally, we
|
||
also cover macros, another way to write code that writes code
|
||
(i.e. metaprogramming).
|
||
|
||
S10: Sample Solution
|
||
|
||
IntoIterator
|
||
|
||
struct ListIterator<T>(List<T>);
|
||
|
||
impl<T: Copy> Iterator for ListIterator<T> {
|
||
type Item = T;
|
||
|
||
fn next(&mut self) -> Option<Self::Item> {
|
||
if let Some(x) = self.0.head() {
|
||
self.0 = self.0.clone().tail();
|
||
Some(x)
|
||
} else {
|
||
None
|
||
}
|
||
}
|
||
}
|
||
|
||
impl<T: Copy> IntoIterator for List<T> {
|
||
type Item = T;
|
||
|
||
type IntoIter = ListIterator<T>;
|
||
|
||
fn into_iter(self) -> Self::IntoIter {
|
||
ListIterator(self)
|
||
}
|
||
}
|
||
|
||
#[test]
|
||
fn test_into_iter() {
|
||
let mut list = List::Empty;
|
||
list.add(5u8);
|
||
list.add(7u8);
|
||
for i in list {
|
||
println!("{}", i);
|
||
}
|
||
}
|
||
|
||
FromIterator
|
||
|
||
impl<T: Copy> FromIterator<T> for List<T> {
|
||
fn from_iter<A: IntoIterator<Item = T>>(iter: A) -> Self {
|
||
let mut list = List::Empty;
|
||
for i in iter {
|
||
list.add(i);
|
||
}
|
||
list
|
||
}
|
||
}
|
||
|
||
|
||
#[test]
|
||
fn test_from_iter() {
|
||
let numbers = std::iter::repeat(5).take(5);
|
||
let list = List::from_iter(numbers);
|
||
assert_eq!(list.length(), 5);
|
||
}
|
||
|
||
Run-Time State Machines
|
||
|
||
#[derive(Clone)]
|
||
pub struct DFA<S, I>
|
||
where
|
||
S: Clone + PartialEq,
|
||
{
|
||
start: S,
|
||
accept: Vec<S>,
|
||
transition: fn(S, I) -> S,
|
||
}
|
||
|
||
impl<S, I> DFA<S, I>
|
||
where
|
||
S: Clone + PartialEq,
|
||
{
|
||
pub fn new(start: S, accept: Vec<S>, transition: fn(S, I) -> S) -> Self {
|
||
Self {
|
||
start,
|
||
accept,
|
||
transition,
|
||
}
|
||
}
|
||
|
||
pub fn run(&self, mut input: Vec<I>) -> bool {
|
||
let mut state = self.start.clone();
|
||
input.reverse();
|
||
while let Some(symbol) = input.pop() {
|
||
state = (self.transition)(state, symbol);
|
||
}
|
||
self.accept.contains(&state)
|
||
}
|
||
}
|
||
|
||
#[derive(Clone, Copy, PartialEq)]
|
||
enum State {
|
||
Ready,
|
||
AwaitMoney { cents: u32 },
|
||
Error,
|
||
}
|
||
|
||
enum Input {
|
||
SelectBeverage,
|
||
Insert1Euro,
|
||
Insert50Cent,
|
||
Insert20Cent,
|
||
Insert10Cent,
|
||
}
|
||
|
||
fn transition(state: State, symbol: Input) -> State {
|
||
let new_state = match (state, symbol) {
|
||
(State::Ready, Input::SelectBeverage) => {
|
||
println!("You selected Ferriskola! An excellent choice :-)");
|
||
State::AwaitMoney { cents: 0 }
|
||
}
|
||
(state, Input::SelectBeverage) => {
|
||
println!("Cannot select a beverage in this state.");
|
||
state
|
||
}
|
||
(State::AwaitMoney { cents }, Input::Insert1Euro) => {
|
||
State::AwaitMoney { cents: cents + 100 }
|
||
}
|
||
(State::AwaitMoney { cents }, Input::Insert50Cent) => {
|
||
State::AwaitMoney { cents: cents + 50 }
|
||
}
|
||
(State::AwaitMoney { cents }, Input::Insert20Cent) => {
|
||
State::AwaitMoney { cents: cents + 20 }
|
||
}
|
||
(State::AwaitMoney { cents }, Input::Insert10Cent) => {
|
||
State::AwaitMoney { cents: cents + 10 }
|
||
}
|
||
(State::Ready, _) => {
|
||
println!("Pick a beverage first before putting in money.");
|
||
State::Ready
|
||
}
|
||
(State::Error, _) => {
|
||
println!("The system is in error state. Please ask the operators to fix it.");
|
||
State::Error
|
||
}
|
||
};
|
||
|
||
if let State::AwaitMoney { cents } = new_state {
|
||
if cents >= 280 {
|
||
println!("Enjoy your Ferriskola. Here are {}c back", cents - 280);
|
||
State::Ready
|
||
} else {
|
||
println!("{}c more to go", 280 - cents);
|
||
new_state
|
||
}
|
||
} else {
|
||
new_state
|
||
}
|
||
}
|
||
|
||
fn main() {
|
||
let dfa = DFA::new(State::Ready, vec![State::Ready], transition);
|
||
assert!(dfa.run(vec![
|
||
Input::SelectBeverage,
|
||
Input::Insert1Euro,
|
||
Input::Insert1Euro,
|
||
Input::Insert10Cent,
|
||
Input::Insert1Euro
|
||
]));
|
||
}
|
||
|
||
Compile-Time State Machine
|
||
|
||
use std::marker::PhantomData;
|
||
|
||
struct MiniPlumber;
|
||
struct NormalPlumber;
|
||
struct FirePlumber;
|
||
|
||
struct Plumber<S> {
|
||
data: PhantomData<S>,
|
||
}
|
||
|
||
struct Shroom;
|
||
struct FireFlower;
|
||
|
||
impl Plumber<MiniPlumber> {
|
||
fn hit(self) {
|
||
println!("Game Over");
|
||
panic!();
|
||
}
|
||
|
||
fn consume_shroom(self, _item: Shroom) -> Plumber<NormalPlumber> {
|
||
println!("Yippie!");
|
||
Plumber::<NormalPlumber> {
|
||
data: Default::default(),
|
||
}
|
||
}
|
||
|
||
fn consume_fireflower(self, _item: FireFlower) -> Plumber<FirePlumber> {
|
||
println!("Whapp whapp whapp!");
|
||
Plumber::<FirePlumber> {
|
||
data: Default::default(),
|
||
}
|
||
}
|
||
}
|
||
|
||
impl Plumber<NormalPlumber> {
|
||
fn new() -> Plumber<NormalPlumber> {
|
||
println!("Flitze-Go!");
|
||
Plumber::<NormalPlumber> {
|
||
data: Default::default(),
|
||
}
|
||
}
|
||
|
||
fn hit(self) -> Plumber<MiniPlumber> {
|
||
println!("Aua!");
|
||
Plumber::<MiniPlumber> {
|
||
data: Default::default(),
|
||
}
|
||
}
|
||
|
||
fn consume_fireflower(self, _item: FireFlower) -> Plumber<FirePlumber> {
|
||
println!("Whapp whapp whapp!");
|
||
Plumber::<FirePlumber> {
|
||
data: Default::default(),
|
||
}
|
||
}
|
||
}
|
||
|
||
impl Plumber<FirePlumber> {
|
||
fn hit(self) -> Plumber<NormalPlumber> {
|
||
println!("Aua!");
|
||
Plumber::<NormalPlumber> {
|
||
data: Default::default(),
|
||
}
|
||
}
|
||
}
|
||
|
||
fn main() {
|
||
let plumber = Plumber::new();
|
||
let plumber = plumber.hit();
|
||
let plumber = plumber.consume_fireflower(FireFlower);
|
||
let plumber = plumber.hit();
|
||
let plumber = plumber.consume_fireflower(FireFlower);
|
||
let plumber = plumber.hit();
|
||
let plumber = plumber.hit();
|
||
let plumber = plumber.consume_shroom(Shroom);
|
||
let plumber = plumber.hit();
|
||
let plumber = plumber.hit();
|
||
}
|
||
|
||
Macros
|
||
|
||
# Cargo.toml
|
||
|
||
[package]
|
||
name = "macros"
|
||
version = "0.1.0"
|
||
edition = "2021"
|
||
|
||
[lib]
|
||
proc-macro = true
|
||
|
||
[dependencies]
|
||
proc-macro2 = "1.0.32"
|
||
|
||
[dependencies.syn]
|
||
version = "1.0.102"
|
||
features = [
|
||
"full",
|
||
]
|
||
|
||
// lib.rs
|
||
use syn::parse_quote;
|
||
|
||
#[proc_macro_attribute]
|
||
pub fn repeat(_: TokenStream, input: TokenStream) -> TokenStream {
|
||
let input: syn::ItemFn = syn::parse2(input.into()).unwrap();
|
||
|
||
let ty = if let syn::ReturnType::Type(_, ty) = input.sig.output {
|
||
ty
|
||
} else {
|
||
parse_quote!(())
|
||
};
|
||
|
||
let output: syn::ReturnType = parse_quote! { -> impl Iterator<Item = #ty> };
|
||
|
||
let sig = syn::Signature {
|
||
output,
|
||
..input.sig
|
||
};
|
||
|
||
let block = input.block;
|
||
let block = parse_quote! {
|
||
{
|
||
let result = #block;
|
||
std::iter::repeat(result)
|
||
}
|
||
};
|
||
|
||
syn::ItemFn {
|
||
sig,
|
||
block,
|
||
..input
|
||
}
|
||
.to_token_stream()
|
||
.into()
|
||
}
|
||
|
||
State Machines
|
||
|
||
With our knowledge on generics, we start looking into state machines — a
|
||
common tool to both model and implement dependable systems.
|
||
|
||
First, we start with run-time state machines, allowing you to model or
|
||
execute them in your code. Later, we look at compile-time state
|
||
machines, allowing you to enforce that the code you write complies with
|
||
the state machine (e.g. a certain operation must be executed first,
|
||
before another is available).
|
||
|
||
Run-time
|
||
|
||
Here is a general definition for a deterministic finite automaton (DFA):
|
||
|
||
#[derive(Clone)]
|
||
pub struct DFA<S, I>
|
||
where
|
||
S: Clone + PartialEq,
|
||
{
|
||
start: S,
|
||
accept: Vec<S>,
|
||
transition: fn(S, I) -> S,
|
||
}
|
||
|
||
impl<S, I> DFA<S, I>
|
||
where
|
||
S: Clone + PartialEq,
|
||
{
|
||
pub fn new(start: S, accept: Vec<S>, transition: fn(S, I) -> S) -> Self {
|
||
Self {
|
||
start,
|
||
accept,
|
||
transition,
|
||
}
|
||
}
|
||
|
||
pub fn run(&self, mut input: Vec<I>) -> bool {
|
||
let mut state = self.start.clone();
|
||
input.reverse();
|
||
while let Some(symbol) = input.pop() {
|
||
state = (self.transition)(state, symbol);
|
||
}
|
||
self.accept.contains(&state)
|
||
}
|
||
}
|
||
|
||
Note that:
|
||
|
||
- The automaton is generic in S (the states) and I (the inputs).
|
||
- The definition enforces that our state type S is used for the single
|
||
start state, the accept states as well as an input and output of the
|
||
transition.
|
||
- The run method executes our DFA with an input vector, returning if
|
||
we end in an accept state when the input is consumed.
|
||
|
||
Here is the example usage for a DFA that checks if there is an even
|
||
count of zeros:
|
||
|
||
#[derive(Clone, Copy, PartialEq)]
|
||
enum State {
|
||
Even,
|
||
Odd,
|
||
Error,
|
||
}
|
||
|
||
fn main() {
|
||
let dfa = DFA::new(State::Even, vec![State::Even], |state, symbol| {
|
||
match (state, symbol) {
|
||
(State::Even, 0) => State::Odd,
|
||
(State::Odd, 0) => State::Even,
|
||
(state, 1) => state,
|
||
_ => State::Error,
|
||
}
|
||
});
|
||
assert!(dfa.run(vec![]));
|
||
assert!(!dfa.run(vec![0, 1]));
|
||
assert!(!dfa.run(vec![0, 1, 1]));
|
||
assert!(dfa.run(vec![0, 1, 1, 0]));
|
||
assert!(dfa.run(vec![0, 0]));
|
||
}
|
||
|
||
Note that: The input space is i32, even though we only allow 0 and 1.
|
||
Hence passing vec![-5] is valid code and leads to the DFA entering the
|
||
error state. A workaround would be to define a separate input enum with
|
||
two variants (Zero, One).
|
||
|
||
An issue with this is that invalid transitions are detected at runtime
|
||
only. Handling this means that we typically go to the error state.
|
||
However, such a transition could be due to an implementation bug,
|
||
i.e. the error state should never have been entered but rather this
|
||
transition should not be valid. With this, we come to the topic of
|
||
compile-time state machines.
|
||
|
||
Compile-time State Machines
|
||
|
||
For the coming section, we use the following state machine for a certain
|
||
device:
|
||
|
||
+----------+ +--------+ +--------+
|
||
| +----+> +----+> |
|
||
| Inactive | | Active | | Paused |
|
||
| <+----+ <+----+ |
|
||
+---+----^-+ +--------+ +----+-+-+
|
||
| | | |
|
||
| +-------------------------+ |
|
||
| |
|
||
+---+------+ |
|
||
| V | |
|
||
| Exit <+-------------------------+
|
||
| |
|
||
+----------+
|
||
|
||
We encode it as follows:
|
||
|
||
#[derive(Debug)]
|
||
struct StateMachine<S> {
|
||
shared_data_value: usize,
|
||
state: S,
|
||
}
|
||
|
||
We define the states as follows, including state-dependent data if there
|
||
is any:
|
||
|
||
#[derive(Debug)]
|
||
struct Inactive;
|
||
|
||
#[derive(Debug)]
|
||
struct Active {
|
||
value: usize,
|
||
}
|
||
|
||
#[derive(Debug)]
|
||
struct Paused {
|
||
frozen_value: usize,
|
||
}
|
||
|
||
#[derive(Debug)]
|
||
struct Exit;
|
||
|
||
We can define methods on state machines in any state S like this:
|
||
|
||
impl<S> StateMachine<S> {
|
||
fn state(&mut self) -> &mut S {
|
||
&mut self.state
|
||
}
|
||
}
|
||
|
||
We can also define methods only for machines in certain states. For
|
||
instance, only Inactive machines can be created with new and Active
|
||
state machines can have an increment() method:
|
||
|
||
impl StateMachine<Inactive> {
|
||
fn new(val: usize) -> Self {
|
||
Self {
|
||
shared_data_value: val,
|
||
state: Inactive,
|
||
}
|
||
}
|
||
}
|
||
|
||
impl StateMachine<Active> {
|
||
fn increment(&mut self) {
|
||
self.state.value += 1;
|
||
}
|
||
}
|
||
|
||
Further, we can define valid transitions and their logic using the From
|
||
traits:
|
||
|
||
impl From<StateMachine<Inactive>> for StateMachine<Active> {
|
||
fn from(val: StateMachine<Inactive>) -> StateMachine<Active> {
|
||
println!("Start");
|
||
StateMachine {
|
||
shared_data_value: val.shared_data_value,
|
||
state: Active { value: 0 },
|
||
}
|
||
}
|
||
}
|
||
|
||
impl From<StateMachine<Inactive>> for StateMachine<Exit> {
|
||
fn from(_: StateMachine<Inactive>) -> StateMachine<Exit> {
|
||
println!("Disable");
|
||
StateMachine {
|
||
shared_data_value: 0,
|
||
state: Exit,
|
||
}
|
||
}
|
||
}
|
||
|
||
impl From<StateMachine<Active>> for StateMachine<Paused> {
|
||
fn from(mut val: StateMachine<Active>) -> StateMachine<Paused> {
|
||
println!("Pause");
|
||
StateMachine {
|
||
shared_data_value: val.shared_data_value,
|
||
state: Paused {
|
||
frozen_value: val.state().value,
|
||
},
|
||
}
|
||
}
|
||
}
|
||
|
||
impl From<StateMachine<Active>> for StateMachine<Inactive> {
|
||
fn from(mut val: StateMachine<Active>) -> StateMachine<Inactive> {
|
||
println!("End with {}", val.state().value);
|
||
StateMachine {
|
||
shared_data_value: val.shared_data_value,
|
||
state: Inactive,
|
||
}
|
||
}
|
||
}
|
||
|
||
impl From<StateMachine<Paused>> for StateMachine<Active> {
|
||
fn from(mut val: StateMachine<Paused>) -> StateMachine<Active> {
|
||
println!("Resume");
|
||
StateMachine {
|
||
shared_data_value: val.shared_data_value,
|
||
state: Active {
|
||
value: val.state().frozen_value,
|
||
},
|
||
}
|
||
}
|
||
}
|
||
|
||
impl From<StateMachine<Paused>> for StateMachine<Inactive> {
|
||
fn from(mut val: StateMachine<Paused>) -> StateMachine<Inactive> {
|
||
println!("Stop with {}", val.state().frozen_value);
|
||
StateMachine {
|
||
shared_data_value: val.shared_data_value,
|
||
state: Inactive,
|
||
}
|
||
}
|
||
}
|
||
|
||
or using custom functions:
|
||
|
||
impl StateMachine<Paused> {
|
||
fn pause(mut self) -> StateMachine<Exit> {
|
||
println!("Exit with {}", self.state().frozen_value);
|
||
StateMachine {
|
||
shared_data_value: self.state().frozen_value,
|
||
state: Exit,
|
||
}
|
||
}
|
||
}
|
||
|
||
In the following code, you see this in action. Note the commented out
|
||
lines that cause a compile-time error if commented in:
|
||
|
||
fn main() {
|
||
let sm = StateMachine::new(5);
|
||
println!("{:?}", &sm);
|
||
// let sm: StateMachine<Active> = StateMachine::new(5); <-- does not work
|
||
let mut sm: StateMachine<Active> = sm.into();
|
||
println!("{:?}", &sm);
|
||
for _ in 0..5 {
|
||
sm.increment();
|
||
}
|
||
sm.shared_data_value = 7;
|
||
println!("Modified");
|
||
println!("{:?}", &sm);
|
||
let sm: StateMachine<Paused> = sm.into();
|
||
println!("{:?}", &sm);
|
||
// sm.increment(); <-- does not work
|
||
let mut sm: StateMachine<Active> = sm.into();
|
||
sm.increment();
|
||
println!("{:?}", &sm);
|
||
let sm: StateMachine<Paused> = sm.into();
|
||
println!("{:?}", &sm);
|
||
// let sm: StateMachine<Inactive> = sm.into(); <-- does not work
|
||
let sm: StateMachine<Exit> = sm.pause();
|
||
println!("{:?}", &sm);
|
||
}
|
||
|
||
This approach is also known as typestate pattern, about which you can
|
||
read more in the RustEmbedded Book.
|
||
|
||
Summary
|
||
|
||
What did you learn?
|
||
|
||
- How to write generic code in Rust and make use of traits.
|
||
- How to implement both run-time and compile-time state machines.
|
||
- How (and when) to use macros.
|
||
|
||
Where can you learn more?
|
||
|
||
- Generics:
|
||
- Rust Book: Ch. 10
|
||
- Programming Rust: Ch. 11, 13
|
||
- Rust for Rustaceans: Ch. 03
|
||
- cheats.rs: Generics & Constraints
|
||
- State Machines
|
||
- Typestate Programming in the Embedded Rust Book.
|
||
- Hoverbear’s State Machine Pattern
|
||
- Novatec GmbH’s Case for the Typestate Pattern
|
||
- Yoshua Wuyts on Future of Type States in State Machines III:
|
||
Type States
|
||
- Macros:
|
||
- Rust Book: Ch. 19.5
|
||
- Rust for Rustaceans: Ch. 07
|
||
- Rust Reference: Macros
|
||
- Rust by Example: Macros
|
||
- The Little Book of Rust Macros
|
||
- Fathomable Rust Macros
|
||
- David Tolnay’s Procedural Macros Workshop
|
||
- Nine Rules for Creating Procedural Macros
|
||
|
||
W10: Work Sheet
|
||
|
||
Generics & Traits
|
||
|
||
- Do the Rustlings exercises generics and traits.
|
||
|
||
- Revisit the List from U04. Add support for the FromIterator and
|
||
IntoIterator traits.
|
||
|
||
Run-Time State Machines
|
||
|
||
Develop a run-time state machine that implements a beverage dispenser.
|
||
Reuse the DFA definitions provided in the unit. The specification is as
|
||
follows:
|
||
|
||
- The automaton starts in the Ready state, waiting for an order.
|
||
- Upon the input SelectBeverage, it enters the AwaitMoney state.
|
||
- In this state, Insert1EUR, Insert50Cent, Insert20Cent, and
|
||
Insert10Cent inputs can happen.
|
||
- As soon as the price for the beverage (2,80EUR) has been reached,
|
||
the automaton
|
||
- prints to stdout: “Beverage dispensed”; optionally including
|
||
“Returning X.XX EUR” if too much money has been inserted.
|
||
- re-enters the Ready state.
|
||
|
||
Compile-Time State Machines
|
||
|
||
Develop a compile-time state machine for an Italian plumber:
|
||
|
||
fire flower
|
||
+-----------------------------------------------------+
|
||
| V
|
||
+--------------+ shroom +----------------+ fire flower +--------------+
|
||
| Mini Plumber | ----------->| Normal Plumber |-------------->| Fire Plumber |
|
||
+--------------+ +----------------+ +--------------+
|
||
^ hit | ^ ^ hit |
|
||
+-------------------+ | +--------------------------+
|
||
|
|
||
|
||
The following code snippets should be present in your solution (with ???
|
||
replaced appropriately) and you shall not use the From-style transitions
|
||
but custom ones:
|
||
|
||
struct Plumber<S> {
|
||
???
|
||
}
|
||
|
||
struct Shroom;
|
||
struct FireFlower;
|
||
|
||
fn hit(self) -> ???;
|
||
|
||
fn consume_shroom(self, item: Shroom) -> ???;
|
||
|
||
Macro Warm-Up
|
||
|
||
Work through the macrokata.
|
||
|
||
Custom Macro
|
||
|
||
Develop an attribute macro #[repeat], which you can apply on any
|
||
function:
|
||
|
||
#[repeat]
|
||
pub fn foo(bar: usize) -> usize {
|
||
// ...
|
||
}
|
||
|
||
The macro changes the return type to impl Iterator<Item = usize> and
|
||
wraps the return value in std::iter::repeat(). Apart from that, the
|
||
function must stay unchanged, i.e. visibility, parameters, etc. stay the
|
||
same.
|
||
|
||
The following program should compile afterwards (macros is your
|
||
proc-macro crate):
|
||
|
||
pub mod math {
|
||
use macros::repeat;
|
||
|
||
#[repeat]
|
||
pub fn the_answer() -> usize {
|
||
42
|
||
}
|
||
}
|
||
|
||
fn main() {
|
||
let answers = math::the_answer().take(5).collect::<Vec<_>>();
|
||
println!("5 Answers: {:#?}", answers);
|
||
}
|
||
|
||
and output
|
||
|
||
5 Answers: [
|
||
42,
|
||
42,
|
||
42,
|
||
42,
|
||
42,
|
||
]
|
||
|
||
Make use of quote and syn (the latter with feature full enabled).
|
||
|
||
Async Programming
|
||
|
||
Now that we have seen how to work in parallel on data and use mechanisms
|
||
to synchronize threads (locks or channels), we investigate another
|
||
approach to write concurrent programs: Cooperative Multitasking. While
|
||
multithreading and working on parallel data are to maximize the usage of
|
||
your computer’s resources (shortening computation time by increasing
|
||
throughput), cooperative multitasking is often about minimizing the
|
||
usage of resources (shortening computation time by cleverly using wait
|
||
times). Note that one is multi-threading and the other multi-tasking
|
||
(also see the terms introduced before). Frequently, you encounter the
|
||
terms compute-intensive (e.g. predict weather) and I/O-intensive tasks
|
||
(e.g. serve 10k chat users) in this context. If compute is your
|
||
bottleneck, multithreading is the first thing to try out; if I/O is it,
|
||
it is multitasking instead.
|
||
|
||
Assume you want to download all sections of this coursebook. Ignoring
|
||
that you can print them to a single page via the Print button, we assume
|
||
you do a HTTP request to each of the pages. Even if you don’t know much
|
||
about computer networking, you probably believe that for each of these
|
||
requests, we tell the operating system to: * Open a TCP socket. *
|
||
Trigger the TCP socket to connect to the hod.cs.uni-saarland.de
|
||
server. * Issue a HTTP request to GET /units/U11.md (and other pages). *
|
||
Read the response and return it to the caller.
|
||
|
||
This involves both system calls as well as packet transmits/receives,
|
||
which take non-negligible time. Though, your software cannot progress
|
||
with the request at hand while the system call is executed or a packet
|
||
is in flight. We could use our knowledge from before to use multiple
|
||
threads to multiplex this, where each thread blocks and waits for
|
||
completion. However, each thread comes with a non-negligible overhead in
|
||
terms of memory usage (e.g. pthread on Linux uses 512KB). If we have
|
||
lots of requests we want to multiplex, this can quickly add up.
|
||
|
||
A more lightweight solution is to use tasks (sometimes referred to as
|
||
green threads). A task can optionally have task-local storage, but
|
||
usually comes with only small amounts of memory usage. In contrast to a
|
||
thread, a task is an independent unit of work, which can be processed by
|
||
a single thread or be distributed over a pool of threads — allowing to
|
||
execute parts of a task by different threads in succession (but not at
|
||
the same time). Hence, this is a concurrency approach but not a
|
||
parallelism approach if we look at a single task (looking at multiple
|
||
tasks, we can indeed have parallelism if a pool of threads is used).
|
||
|
||
So far, the operating system has taken care of scheduling different
|
||
threads, including stopping a thread to give CPU resources to another
|
||
thread so both can make progress. With cooperative multitasking, each of
|
||
the tasks must cooperate, i.e. yield execution if it has nothing to do
|
||
or made enough progress that it can spare a pause.
|
||
|
||
TODO: Nice Diagram showing the differences
|
||
|
||
Rust’s Async Machinery
|
||
|
||
In Rust, we have the async keyword, the .await syntax as well as the
|
||
std::future::Future type to provide means for asynchronous programming.
|
||
In contrast to synchronous functions (which block the flow of execution
|
||
until completion), asynchronous functions can yield control flow and be
|
||
resumed later.
|
||
|
||
First, let’s have a look at std::future::Future:
|
||
|
||
trait Future {
|
||
type Output;
|
||
fn poll(self: Pin<&mut Self>, cx: &mut Context<'_>) -> Poll<Self::Output>;
|
||
}
|
||
|
||
enum Poll<T> {
|
||
Ready(T),
|
||
Pending,
|
||
}
|
||
|
||
We see that a Future has an output and can be polled for it. Upon poll,
|
||
it can either return Ready if the output is there or Pending if it needs
|
||
more time. This abstraction means that we have to regularly poll a
|
||
future to make progress. Also note that futures in Rust are lazy
|
||
(similar to iterators). If nobody polls them, they do not run.
|
||
|
||
You are already familiar with std::fs::read_to_string, which has the
|
||
following signature:
|
||
|
||
fn read_to_string<P: AsRef<Path>>(path: P) -> Result<String>
|
||
|
||
An asynchronous equivalent would look like this:
|
||
|
||
fn read_to_string<P: AsRef<Path>>(path: P) -> impl Future<Output = Result<String>>
|
||
|
||
This is a common pattern you find in async code: The function parameters
|
||
stay the same, but the return value is wrapped in an impl Future.
|
||
|
||
As this is used so frequently (and in many cases involves additional
|
||
lifetime considerations that we do not show here), we can use the async
|
||
keyword to conveniently turn a sync function into an async one, which
|
||
returns a Future:
|
||
|
||
async fn read_to_string<P: AsRef<Path>>(path: P) -> Result<String>
|
||
|
||
Finally, .await can be used to consume a future. Even though it looks
|
||
like accessing a field, it is special syntax that is translated by the
|
||
compiler into code that awaits the result and returns the final value.
|
||
|
||
Executors
|
||
|
||
Now that we have everything to create and chain futures together, we
|
||
need a way to actually execute them. While other programming languages
|
||
have a built-in global event loop into which tasks are put and where
|
||
they are executed, Rust leaves this to the application developer to
|
||
start an executor (or runtime) of their choice (also allowing to have
|
||
multiple executors at the same time).
|
||
|
||
The most prominent executors are:
|
||
|
||
- async-std - focus on compatibility with std
|
||
- tokio - focus on network applications
|
||
- embassy (EMBedded ASYnc) - focus on embedded applications
|
||
|
||
<img src="https://async.rs/images/logo.svg" width="30%" />
|
||
|
||
In the following, we use async-std, as it is easier to use than tokio
|
||
and closely mimics the std, by using the same types wherever possible.
|
||
We add it like this to our project:
|
||
|
||
[dependencies]
|
||
async-std = {version = "1.10.0", features = ["attributes", "unstable"] }
|
||
|
||
The easiest way to execute an async function is to use the block_on
|
||
primitive. We leverage async_std::fs::read_to_string(), which is similar
|
||
in functionality to the std equivalent, except that it is async.
|
||
|
||
Coming from the std equivalent, we try this:
|
||
|
||
fn main() {
|
||
let s = async_std::fs::read_to_string("ferris.txt").unwrap();
|
||
println!("{}", s);
|
||
}
|
||
|
||
Following the suggestion of the compiler, we add .await like this:
|
||
|
||
fn main() {
|
||
let s = async_std::fs::read_to_string("ferris.txt").await.unwrap();
|
||
println!("{}", s);
|
||
}
|
||
|
||
Again, the compiler complains, but this time about .await not being
|
||
allowed outside of async functions or blocks. So let us add a block:
|
||
|
||
fn main() {
|
||
let s = async {
|
||
async_std::fs::read_to_string("ferris.txt").await.unwrap()
|
||
};
|
||
println!("{}", s);
|
||
}
|
||
|
||
We would hope for s to be a String, but it is not yet. The async block
|
||
returns a Future<Output = String>. We could repeat our .await, but
|
||
obviously we start a cycle. Instead, we leverage the block_on primitive,
|
||
which blocks on the future and consumes it:
|
||
|
||
fn main() {
|
||
let s = async_std::task::block_on(async {
|
||
async_std::fs::read_to_string("ferris.txt").await.unwrap()
|
||
});
|
||
println!("{}", s);
|
||
}
|
||
|
||
Or simpler:
|
||
|
||
fn main() {
|
||
let s = async_std::task::block_on(
|
||
async_std::fs::read_to_string("ferris.txt")
|
||
}).unwrap();
|
||
println!("{}", s);
|
||
}
|
||
|
||
Now we are back to a synchronous mode of operation, but we gained
|
||
something in terms of program organization. Note that block_on, much
|
||
like any other blocking operation, should never be used in an async
|
||
function. block_on is an efficient primitive, as it goes to sleep
|
||
(instead of busy-waiting).
|
||
|
||
When we are dealing with larger async-only programs (i.e. with a single
|
||
runtime), we can simplify the above code to:
|
||
|
||
#[async_std::main]
|
||
async fn main() {
|
||
let s = async_std::fs::read_to_string("ferris.txt").await.unwrap();
|
||
println!("{}", s);
|
||
}
|
||
|
||
Essentially, the main() function becomes the function on which block_on
|
||
is applied, causing the program to run until completion (if ever).
|
||
|
||
With our program nicely organized like this, let’s try to actually
|
||
become concurrent and do multiple things (potentially) at the same time.
|
||
Therefore, we use the async_std::task::spawn_local method that adds a
|
||
future to the thread-local executor to be then polled eventually, when
|
||
the block_on is used. Before we start, we add the async-log crate, so
|
||
that we can see the interleaving of events on the command-line:
|
||
|
||
[dependencies]
|
||
async-log = "2.0.0"
|
||
log = "0.4.14"
|
||
femme = "1.2.0"
|
||
|
||
Now, we will use simple HTTP requests, which we execute via the
|
||
following function:
|
||
|
||
use async_std::io::prelude::*;
|
||
use async_std::net;
|
||
|
||
async fn request(host: &str, port: u16, path: &str) -> std::io::Result<String> {
|
||
let mut socket = net::TcpStream::connect((host, port)).await?;
|
||
|
||
let request = format!("GET {} HTTP/1.1\r\nHost: {}\r\n\r\n", path, host);
|
||
socket.write_all(request.as_bytes()).await?;
|
||
socket.shutdown(net::Shutdown::Write)?;
|
||
info!("{} Request to {} sent", host);
|
||
|
||
let mut response = String::new();
|
||
socket.read_to_string(&mut response).await?;
|
||
info!("Response from {} received", host);
|
||
|
||
Ok(response)
|
||
}
|
||
|
||
From the main function, we now do several HTTP requests concurrently and
|
||
we also setup logging:
|
||
|
||
fn setup_logger() {
|
||
let logger = femme::pretty::Logger::new();
|
||
|
||
async_log::Logger::wrap(logger, || 12)
|
||
.start(log::LevelFilter::Info)
|
||
.unwrap();
|
||
}
|
||
|
||
#[async_std::main]
|
||
async fn main() {
|
||
setup_logger();
|
||
|
||
let hosts = vec!["google.com", "depend.cs.uni-saarland.de", "rustacean.net"];
|
||
|
||
let mut handles = vec![];
|
||
for host in hosts {
|
||
handles.push(task::spawn_local(request(host, 80, "/")));
|
||
}
|
||
info!("All tasks spawned!");
|
||
|
||
let mut results = vec![];
|
||
for handle in handles {
|
||
results.push(handle.await);
|
||
}
|
||
dbg!(results);
|
||
}
|
||
|
||
The async-std executor also supports a thread pool, which means that we
|
||
can use several threads in parallel to poll futures and attempt to make
|
||
progress. With spawn_local, we added the task to the same thread we are
|
||
working on now. There is also spawn, which adds it to the global
|
||
executor, allowing other threads to access this. Normally, you will to
|
||
spawn and let the executor figure out for you, which thread should poll
|
||
the future. Note that this implies that data is shared between threads,
|
||
which we see by comparing the signatures of the two functions:
|
||
|
||
pub fn spawn_local<F, T>(future: F) -> JoinHandle<T> where
|
||
F: Future<Output = T> + 'static,
|
||
T: 'static {}
|
||
|
||
pub fn spawn<F, T>(future: F) -> JoinHandle<T> where
|
||
F: Future<Output = T> + Send + 'static,
|
||
T: Send + 'static {}
|
||
|
||
Notice that the Future we pass to spawn must be Send, allowing it to be
|
||
passed between threads.
|
||
|
||
Async Iterators
|
||
|
||
Note: the async_std::stream::Stream type is probably going to be
|
||
replaced by the AsyncIterator in std via RFC2996.
|
||
|
||
async-std provides the Stream trait, which is very similar to Iterator
|
||
but supports async:
|
||
|
||
trait Stream {
|
||
type Item;
|
||
|
||
fn poll_next(self: Pin<&mut Self>, cx: &mut Context<'_>) -> Poll<Option<Self::Item>>;
|
||
}
|
||
|
||
We do not go into the details of Pin, so consider Pin<&mut Self> to be
|
||
the same as &mut Self for now, knowing that this is a simplification.
|
||
|
||
Analogously to U04: Iterators, a Stream is produced, adapted, and
|
||
consumed. The major difference is that execution can “pause” at more
|
||
locations than for the sync equivalent, i.e. at .await points.
|
||
|
||
Here is how we can turn an iterator into a Stream (produce), map each
|
||
element to a request (adapt), and collect the results in a vector
|
||
(consume):
|
||
|
||
use async_std::stream::StreamExt;
|
||
|
||
let hosts = vec!["google.com", "depend.cs.uni-saarland.de", "rustacean.net"];
|
||
let hosts = async_std::stream::from_iter(hosts);
|
||
let mut requests = hosts.map(|host| request(host, 80, "/"));
|
||
let mut results = vec![];
|
||
while let Some(item) = requests.next().await {
|
||
results.push(item.await);
|
||
}
|
||
|
||
dbg!(results);
|
||
|
||
In P02, these Streams will come in handy.
|
||
|
||
Parting Words
|
||
|
||
Before we let you explore the world of async Rust on your own, you
|
||
should know that the state of the ecosystem is a bit challenging. This
|
||
situation is due to the “late” standardization of async and .await in
|
||
the std lib (Rust edition 2015 did not include this, only in November
|
||
2019 this got stabilized). At the time of stabilization, the ecosystem
|
||
had already evolved and in particularly split into multiple,
|
||
non-compatible solutions. Nowadays, you can use multiple executors (like
|
||
tokio and async-std) together, which was not the case before. The good
|
||
news for the future (and future) is that the Async Foundations Working
|
||
Group attempts to change the state of the ecosystem and develops a
|
||
shared vision of how async programming should work in Rust in the long
|
||
run.
|
||
|
||
Fearless Concurrency
|
||
|
||
Rust has, as mentioned before, several concepts in place that make
|
||
dealing with concurrent code indeed fearless. As you have seen in U03,
|
||
the Rust ownership model makes data races impossible — though, you can
|
||
still have race conditions as well as Heisenbugs. However, there are a
|
||
set of technical tools Rust and its ecosystem offer that allow you to
|
||
implement concurrent software in a dependable way. Therefore, we show
|
||
you how to approach this using two paradigms:
|
||
|
||
- Message Passing Concurrency using Channels
|
||
|
||
- Shared Memory Concurrency with Mutual Exclusion constructs and
|
||
Atomics
|
||
|
||
This section is intentionally kept brief and you should read the
|
||
excellent 16th chapter of the Rust book if you have any doubts or want
|
||
a more in-depth introduction to concurrency in Rust.
|
||
|
||
Message Passing
|
||
|
||
In our first concurrency approach, threads communicate with each other
|
||
via messages that are sent through channels:
|
||
|
||
+----------+ +----------+
|
||
| Thread 1 | | Thread 2 |
|
||
| | +---------+ | |
|
||
| S |--->| Channel |--->| R |
|
||
+----------+ +---------+ +----------+
|
||
|
||
A unidirectional channel:
|
||
|
||
- sits between two threads
|
||
- has a sending (S) and a receiving side (R)
|
||
- forwards messages of a certain type
|
||
- has a capacity of messages it can store (a message that was sent but
|
||
not yet received)
|
||
|
||
This is a so-called single-producer single-consumer (SPSC) channel that
|
||
links two threads. The Rust standard library contains std::sync::mpsc,
|
||
which is a multi-producer single-consumer channel. Instead of this, we
|
||
are going to show how the third-party structure crossbeam::channel can
|
||
be used, as it is in all aspects superior to the std variant (except for
|
||
the fact that you need one more crate). Note that the crossbeam channel
|
||
is a multi-producer, multi-consumer channel, but for our use case, this
|
||
does not matter.
|
||
|
||
Let’s create our example system, which has the following components:
|
||
|
||
- A logger thread that waits for worker threads to produce data to be
|
||
logged and logs “nothing happened” if there was no message for a
|
||
certain time.
|
||
- Two worker threads that take different amounts of time to produce
|
||
data.
|
||
|
||
Here is the code. The Cargo.toml:
|
||
|
||
[package]
|
||
name = "messagepassing"
|
||
version = "0.1.0"
|
||
edition = "2018"
|
||
|
||
[dependencies]
|
||
crossbeam-channel = "0.5.1"
|
||
rand = "0.8.4"
|
||
|
||
and main.rs:
|
||
|
||
use crossbeam_channel::{select, unbounded};
|
||
use rand::prelude::*;
|
||
use std::{
|
||
thread,
|
||
time::{Duration, Instant},
|
||
};
|
||
|
||
fn worker_thread(sender: crossbeam_channel::Sender<u32>) {
|
||
let mut rng = thread_rng();
|
||
loop {
|
||
let number = rng.gen_range(1..=8);
|
||
thread::sleep(Duration::from_secs(number.into()));
|
||
sender.send(number).unwrap();
|
||
}
|
||
}
|
||
|
||
fn main() {
|
||
let (s1, r1) = unbounded();
|
||
let (s2, r2) = unbounded();
|
||
|
||
thread::spawn(move || worker_thread(s1));
|
||
thread::spawn(move || worker_thread(s2));
|
||
|
||
let start = Instant::now();
|
||
println!("Sec - Message");
|
||
loop {
|
||
let msg = select! {
|
||
recv(r1) -> msg => format!("R1: {}", msg.unwrap()),
|
||
recv(r2) -> msg => format!("R2: {}", msg.unwrap()),
|
||
default(Duration::from_secs(3)) => format!("nothing happened"),
|
||
};
|
||
println!("{:03} - {}", start.elapsed().as_secs(), msg);
|
||
}
|
||
}
|
||
|
||
Here is the output of a sample run:
|
||
|
||
Sec - Message
|
||
003 - nothing happened
|
||
005 - R2: 5
|
||
005 - R1: 5
|
||
008 - R1: 3
|
||
011 - nothing happened
|
||
012 - R2: 7
|
||
013 - R1: 5
|
||
016 - nothing happened
|
||
016 - R1: 3
|
||
019 - nothing happened
|
||
020 - R2: 8
|
||
023 - nothing happened
|
||
024 - R1: 8
|
||
025 - R2: 5
|
||
027 - R1: 3
|
||
030 - R1: 3
|
||
031 - R1: 1
|
||
|
||
Let’s go through this piece by piece:
|
||
|
||
- at the beginning of main, we create two unbounded channels. The
|
||
function returns both a sending as well as a receiving end, which we
|
||
can pass around.
|
||
- when we spawn the treads, we move the sending ends into the thread.
|
||
- the worker_thread continuously produces numbers in the range from 1
|
||
to 8 (inclusive), sleeps for these many seconds and sends the number
|
||
to the channel afterwards.
|
||
- the rest of the main function deals with simultaneously receiving
|
||
from both channels and having a timeout of 3 seconds. Whenever at
|
||
least one of the arms can be taken (a message on r1 and/or r2 and/or
|
||
the timeout) the select! call non-deterministically takes one of the
|
||
available arms.
|
||
|
||
Hopefully, you can appreciate how clean this solution is. We do not need
|
||
to care about individual parts of memory, our data is safely shared
|
||
(sent!) between threads and can be easily accessed.
|
||
|
||
Shared Memory
|
||
|
||
Here is a similar solution for the program we developed using channels
|
||
before:
|
||
|
||
use rand::prelude::*;
|
||
use std::{
|
||
sync::{Arc, Condvar, Mutex},
|
||
thread,
|
||
time::{Duration, Instant},
|
||
};
|
||
|
||
fn worker_thread(reference: Arc<(Mutex<Vec<(usize, u32)>>, Condvar)>, index: usize) {
|
||
let mut rng = thread_rng();
|
||
loop {
|
||
let number = rng.gen_range(1..=8);
|
||
thread::sleep(Duration::from_secs(number.into()));
|
||
let mut buffer = reference.0.lock().unwrap();
|
||
buffer.push((index, number));
|
||
reference.1.notify_all();
|
||
}
|
||
}
|
||
|
||
fn main() {
|
||
let shared_buffer = Arc::new((Mutex::new(vec![]), Condvar::new()));
|
||
let sb1 = shared_buffer.clone();
|
||
let sb2 = shared_buffer.clone();
|
||
|
||
thread::spawn(move || worker_thread(sb1, 1));
|
||
thread::spawn(move || worker_thread(sb2, 2));
|
||
|
||
let start = Instant::now();
|
||
println!("Sec - Message");
|
||
let mut guard = shared_buffer.0.lock().unwrap();
|
||
loop {
|
||
let mut new_guard = shared_buffer
|
||
.1
|
||
.wait_timeout(guard, Duration::from_secs(3))
|
||
.unwrap();
|
||
let msg = if new_guard.0.len() > 0 {
|
||
let e = new_guard.0.pop().unwrap();
|
||
format!("R{}: {}", e.0, e.1)
|
||
} else {
|
||
format!("nothing happened")
|
||
};
|
||
guard = new_guard.0;
|
||
println!("{:03} - {}", start.elapsed().as_secs(), msg);
|
||
}
|
||
}
|
||
|
||
The following changes have been made:
|
||
|
||
- We use an Arc<Mutex<T>> instead of a channel. Arc stands for
|
||
atomic-reference counting, i.e. a thread-safe reference counting
|
||
type. The Arc allows us to move copies of it to the other threads
|
||
when we spawn them. Mutex stands for mutual exclusion and is used
|
||
to, at run-time, ensure only one thread can hold a lock at the same
|
||
time. Whoever holds the lock can access the inside using the guard
|
||
variable after .lock() returned.
|
||
- We introduce a Condvar (conditional variable) to be able to signal
|
||
between threads that data is available. worker_thread notifies all
|
||
other threads that wait_timeout on the condvar. This is the
|
||
replacement for the select call with timeout we had before.
|
||
- Note that after wait_timeout we hold a new guard that must be used
|
||
in the following iteration.
|
||
|
||
Communicating Processes
|
||
|
||
While we show you both approaches to concurrency, you should be aware
|
||
of:
|
||
|
||
Don’t communicate by sharing memory; share memory by communicating. -
|
||
Rob Pike
|
||
|
||
That means that generally, message passing should be prefered over
|
||
shared memory, as it leads to solutions that are easier to implement and
|
||
reason about. This StackOverflow answer explains very well why this is
|
||
the case.
|
||
|
||
Marker Traits
|
||
|
||
Finally, we briefly want to mention two important traits:
|
||
|
||
- If something implements Send, it can be safely moved from one thread
|
||
to another.
|
||
- If something implements Sync, it can be safely used by more than a
|
||
single thread.
|
||
|
||
Both of them are marker traits, i.e. they do not carry implementations
|
||
but instead signal to the Rust compiler how they can be used. This means
|
||
that you can also annotate your types with this. However, doing so
|
||
yourself is strongly discouraged. The reason is that Rust automatically
|
||
marks a structure, where all elements have either of these traits, with
|
||
this trait as well. If this is not the case, there is in many cases a
|
||
reason that an element is not Sync or Send and pretending that it is,
|
||
can cause serious problems. In consequence, you only use this marker
|
||
trait if you implement unsafe code (which we cover in U13), where you
|
||
are the only one who knows that the structure is Send and/or Sync.
|
||
|
||
Concurrent Introduction
|
||
|
||
While we are very used to the fact that, in the real world, things
|
||
happen at the same time (you read THIS word and a tree is planted
|
||
somewhere) or overlap in durations (you checking Facebook during a
|
||
lecture… though this never happens), talking about these concepts with
|
||
respect to computation is certainly not easy. Most likely, most of the
|
||
software you have written so far has been concerned with executing
|
||
things in a sequential manner (instruction after instruction). So let’s
|
||
enter the realm of concurrency:
|
||
|
||
In computer science, concurrency is the property of program,
|
||
algorithm, or problem decomposition into order-independent or
|
||
partially-ordered units. - Wikipedia
|
||
|
||
Now this sounds a bit complicated so let’s decompose it and play this
|
||
through with an example: Assume you are at the train station and you
|
||
just left your train to head for the connecting one. While you run (task
|
||
1), you check the boards for the platform your next train is leaving
|
||
(task 2). Let’s, for a second, ignore the fact that running without
|
||
knowing the destination might be worse than waiting and checking first.
|
||
These tasks are executed concurrently, but there is only one you so it
|
||
is not done in parallel (you might have to slow down or stand to check
|
||
the board). If you were on your phone with your partner and ask them to
|
||
check the platform, we would be in a concurrent and parallel scenario,
|
||
as now you can focus on running while someone else checks for the
|
||
destination simultaneously.
|
||
|
||
Mapping this to our definition, we see that the “algorithm to get to the
|
||
connecting train” is decomposed into two units: figuring out the
|
||
destination platform and running for it. This is partially ordered, as
|
||
you can only reliably run for a destination if you know where it is. If
|
||
more than one executing party is involved, units can, but must not be,
|
||
executed in parallel.
|
||
|
||
Benefits and Drawbacks of Concurrency
|
||
|
||
Now that we know that things don’t get easier with concurrency, the
|
||
question is why we do it in the first place. There are two major
|
||
reasons:
|
||
|
||
- First, if we have concurrency with parallelism, we have the chance
|
||
to increase our throughput (completed tasks per time) or decrease
|
||
latency (completion time of task). Assume you have a task that can
|
||
be parallelized, e.g., train attendants checking for tickets only in
|
||
their “section” of the train. The tickets checked per time increases
|
||
(throughput) and the average time between a traveller entering the
|
||
train and getting checked for their ticket decreases as well
|
||
(completion time). Note that this is not always the case: in
|
||
so-called pipelined systems, the throughput increases, even though
|
||
the end-to-end completion time (item arrives and is completely
|
||
processed) does not change.
|
||
- Second, if we are able to write programs with concurrency in mind,
|
||
we can deal better with systems that are either distributed
|
||
(messages between systems take non-negligible time and each system
|
||
can compute independently) or interfacing with the real-world (where
|
||
actions take time until a reaction follows).
|
||
|
||
Now you might be convinced that, despite concurrency being hard to talk
|
||
about, it is often a desirable concept. But there are also drawbacks of
|
||
concurrent computing systems:
|
||
|
||
- Concurrent code can exhibit race conditions, if the result of the
|
||
computation depends on the exact timing and/or the order of executed
|
||
code. A special form of this are data races when the result only
|
||
depends on the order (not the timing) in which concurrent threads
|
||
are executed. We discussed this already in U03.
|
||
- Another situation into which concurrent tasks can get is a deadlock.
|
||
When we try to synchronize tasks by using locks (which we cover in
|
||
the next section), i.e. when a system locks a resource before it
|
||
works on it, it can happen that two tasks wait on each other
|
||
indefinitely.
|
||
- Concurrent code often contains Heisenbugs (in contrast to
|
||
Bohrbugs)[12], i.e. undesired behaviour that is hard to be traced
|
||
down — where debugging is hard as the debugging process &
|
||
instrumentation itself tends to make the issue disappear as long as
|
||
it is attached.
|
||
|
||
In essence, when we strive for high performance using parallelism or
|
||
want to develop concurrent, distributed systems, we have to find ways to
|
||
compensate for the drawbacks — a topic that we cover in the next
|
||
sections.
|
||
|
||
Terms, Terms, Terms
|
||
|
||
Before we get started, we have to introduce a couple of system
|
||
programming terms, i.e. concepts coming from the operating systems
|
||
community. We follow this excellent glossary (unfortunately, English
|
||
terms are mentioned but only German explanations provided), if possible.
|
||
What we need in the following, which we translated and simplified:
|
||
|
||
Definition: A Process is a program in execution (program is the
|
||
description of what should be done).
|
||
|
||
Definition: A Thread is a strand of actions (e.g. call a function,
|
||
compute a value) with its own runtime context (i.e. state like
|
||
variables etc.).
|
||
|
||
Definition: A Task is something to be done in a process. This can be
|
||
implemented by calling a subroutine, or can have its own thread.
|
||
|
||
Definition: An Event is a set of circumstances that happen during a
|
||
process and are observed.
|
||
|
||
Definition: A Routine is a smaller program or part of a program with a
|
||
well-defined, commonly required functionality.
|
||
|
||
Definition: A Coroutine is executed together (lat. con) other
|
||
routines, all being on the same conceptual level (in contrast to
|
||
subroutine).
|
||
|
||
You do not have to learn them by heart, but make sure that you
|
||
understand the difference so that the following sections make more
|
||
sense.
|
||
|
||
U11: Dependable Concurrent Operation
|
||
|
||
We are getting close to the end of your junior program at DSys, which
|
||
means there is a final set of trainings given by coaches — for instance
|
||
Ferris Heisenberg, who is with us today. He is here to introduce
|
||
concurrency (including Heisenbugs), show why concurrency in Rust is
|
||
fearless, how to program asynchronous code, and demo how working with
|
||
parallel data is a breeze in Rust.
|
||
|
||
Parallel Data Processing with Rayon
|
||
|
||
So far, we have talked about parallelism in the form that two or more
|
||
threads work on the same data or do independent tasks.
|
||
|
||
A different form of concurrency is so-called data-parallelism, where you
|
||
exploit that data can be partitioned into equal units and worked upon
|
||
independently. A simple form, that even works in hardware, are
|
||
single-instruction multiple-data (SIMD) instructions certain CPUs or
|
||
GPUs provide. Instead of 8 multiplications of 8 values, you put them in
|
||
place and run an 8-value wide multiplication instruction.
|
||
|
||
At a higher level, we see this with iterators of items to which certain
|
||
modifications should be applied (remember adapters from U04). We
|
||
distinguish between mappers and reducers:
|
||
|
||
- A Mapper transforms each item into something else (fn(T) -> U). A
|
||
perfect example is the map function, but also the filter function
|
||
that “removes” elements.
|
||
- A Reducer transforms a sequence of items into something else
|
||
(fn(Iter<T>) -> U). A perfect example are fold methods and special
|
||
cases such as sum or product.
|
||
|
||
Further, we distinguish between non-blockers and blockers:
|
||
|
||
- Blockers can only produce their output when they have completely
|
||
consumed their input. An example is the fold method.
|
||
- Non-Blockers can produce outputs stepwise, without requiring the
|
||
whole input to be consumed. An example is the map method.
|
||
|
||
Depending on the chain of adapters we built up, and whether they
|
||
block/don’t block or map/reduce, we get potential ways for parallelising
|
||
things. From these definitions, it should be clear that:
|
||
|
||
- A step after a blocker cannot happen in parallel to the blocker. The
|
||
successor can only start as soon as the blocker is done.
|
||
- A step after a non-blocker can happen in parallel to the non-blocker
|
||
step, but on different items (i.e. we get a pipelined system).
|
||
- A mapper can be parallelised by applying the map function to
|
||
distinct parts of the iterator.
|
||
- A reducer can be parallelised, if the operation is associative and
|
||
commutative (e.g. sum). In this case, the input is put into batches
|
||
that are evaluated in parallel.
|
||
|
||
Benchmarking Tools
|
||
|
||
For the following use case, we leverage different benchmarking tools
|
||
that are also helpful in other situations.
|
||
|
||
hyperfine
|
||
|
||
hyperfine is a command-line benchmarking tool that can work with
|
||
anything, not just Rust binaries. You can use it like this:
|
||
|
||
hyperfine [OPTIONS] <command>
|
||
|
||
The following options are useful and often leveraged by performance
|
||
evaluations:
|
||
|
||
- --warmup <NUM> run command multiple times before benchmarking to
|
||
fill caches
|
||
- --prepare <CMD> run before the command to measure
|
||
|
||
btm
|
||
|
||
btm is a CLI task manager that shows how your CPUs are used:
|
||
|
||
[btm demo]
|
||
|
||
A Computation Example using Data Parallelism
|
||
|
||
The task is to compute the sum of prime numbers successors lower/equal
|
||
to n. While this is not particular useful, it allows us to use a filter,
|
||
a map and a fold/reduce adaptor. We leverage the following checking
|
||
function:
|
||
|
||
pub fn is_prime(n: &u32) -> bool {
|
||
let root = (*n as f64).sqrt().floor() as u32;
|
||
(2..=root).all(|i| *n % i != 0)
|
||
}
|
||
|
||
First, we write a sequential solution:
|
||
|
||
use paralleldata::is_prime;
|
||
|
||
fn main() {
|
||
let n: u32 = 300_000;
|
||
let res = (1..n)
|
||
.into_iter()
|
||
.filter(is_prime)
|
||
.map(|i: u32| i + 1)
|
||
.fold(0, |a, b| a + b);
|
||
println!("res: {}", res);
|
||
}
|
||
|
||
Afterwards, we add rayon for a parallel solution:
|
||
|
||
use paralleldata::is_prime;
|
||
use rayon::prelude::*;
|
||
|
||
fn main() {
|
||
let n: u32 = 300_000;
|
||
let res = (1..n)
|
||
.into_par_iter()
|
||
.filter(is_prime)
|
||
.map(|i: u32| i + 1)
|
||
.reduce(|| 0, |a, b| a + b);
|
||
println!("res: {}", res);
|
||
}
|
||
|
||
We implemented the following changes:
|
||
|
||
- use rayon::prelude::*; imports rayon and its traits that allow
|
||
turning regular iterators into parallel ones and provides the map,
|
||
fold, … adapters.
|
||
- .into_iter() became .into_par_iter() turning it into a parallel
|
||
iterator rayon provides.
|
||
- .fold(...) now takes a closure as the first parameter, as it is
|
||
executed multiple times (parallel fold groups values and produces a
|
||
fold for each group).
|
||
- .fold(...) alone was no longer sufficient, as it produces single
|
||
values for each group. Now we do a reduce instead, which produces a
|
||
single value.
|
||
|
||
Benchmarking Results
|
||
|
||
We run the benchmarking using the following commands:
|
||
|
||
cargo build --bin sequential --release
|
||
cargo build --bin parallel --release
|
||
|
||
hyperfine target/release/sequential target/release/parallel
|
||
|
||
Here are the results:
|
||
|
||
Benchmark 1: target/release/sequential
|
||
Time (mean ± σ): 287.9 ms ± 36.9 ms [User: 280.6 ms, System: 3.5 ms]
|
||
Range (min … max): 246.6 ms … 361.4 ms 11 runs
|
||
|
||
Benchmark 2: target/release/parallel
|
||
Time (mean ± σ): 112.7 ms ± 27.6 ms [User: 401.8 ms, System: 16.7 ms]
|
||
Range (min … max): 83.6 ms … 183.6 ms 20 runs
|
||
|
||
Summary
|
||
'target/release/parallel' ran
|
||
2.55 ± 0.71 times faster than 'target/release/sequential'
|
||
|
||
Watching them live in btm
|
||
|
||
[Btm Result]
|
||
|
||
The left part with the multiple spikes shows the sequential solution
|
||
running. The right part with the significant purple spike shows the
|
||
parallel solution. It becomes clear that by using all cores, the
|
||
parallel solution is done faster.
|
||
|
||
Why is Rayon useful?
|
||
|
||
- Rayon guarantees that there are no data races introduced.
|
||
- Rayon figures out ways to parallelize steps that can be
|
||
parallelized.
|
||
- Rayon internally uses a join primitive that only executes
|
||
concurrently when cores are idle (implementing potential parallelism
|
||
in contrast to guaranteed parallelism that might cause too much
|
||
overhead).
|
||
|
||
S11: Sample Solution
|
||
|
||
Applied Concurrency in Rust
|
||
|
||
- Rustlings: Discuss in class.
|
||
|
||
10-Incrementer
|
||
|
||
Shared Memory
|
||
|
||
use std::{
|
||
sync::{Arc, Mutex},
|
||
thread,
|
||
};
|
||
|
||
fn increment(location: Arc<Mutex<u32>>) {
|
||
for _ in 0..10 {
|
||
let mut l = location.lock().unwrap();
|
||
*l = *l + 1;
|
||
}
|
||
}
|
||
|
||
fn main() {
|
||
let counter = Arc::new(Mutex::new(42));
|
||
let t1 = thread::spawn({
|
||
let counter = counter.clone();
|
||
move || increment(counter)
|
||
});
|
||
let t2 = thread::spawn({
|
||
let counter = counter.clone();
|
||
move || increment(counter)
|
||
});
|
||
t1.join().unwrap();
|
||
t2.join().unwrap();
|
||
println!("{}", counter.lock().unwrap());
|
||
}
|
||
|
||
Message Passing
|
||
|
||
// Cargo.toml
|
||
// ...
|
||
[dependencies]
|
||
crossbeam-channel = "0.5.1"
|
||
|
||
use std::thread;
|
||
|
||
use crossbeam_channel::{Receiver, Sender};
|
||
|
||
fn increment(input: Receiver<u32>, output: Sender<u32>) {
|
||
for _ in 0..10 {
|
||
output.send(input.recv().unwrap() + 1).unwrap();
|
||
}
|
||
}
|
||
|
||
fn main() {
|
||
let mut counter = 42;
|
||
let (s, r) = crossbeam_channel::unbounded();
|
||
let (s2, r2) = crossbeam_channel::unbounded();
|
||
|
||
let t1 = thread::spawn({
|
||
let r = r.clone();
|
||
let s = s2.clone();
|
||
move || increment(r, s)
|
||
});
|
||
let t2 = thread::spawn(move || increment(r, s2));
|
||
|
||
s.send(counter).unwrap();
|
||
for result in r2 {
|
||
counter = result;
|
||
match s.send(result) {
|
||
Ok(_) => continue,
|
||
Err(_) => break,
|
||
}
|
||
}
|
||
|
||
t1.join().unwrap();
|
||
t2.join().unwrap();
|
||
println!("{}", counter);
|
||
}
|
||
|
||
Rayon in Action
|
||
|
||
- TODO
|
||
|
||
Async in Action
|
||
|
||
// ...
|
||
[dependencies]
|
||
async-std = { version = "1.10.0", features = ["attributes"] }
|
||
rayon = "1.5.1"
|
||
surf = "2.3.2"
|
||
url = "2.2.2"
|
||
|
||
// main.rs
|
||
fn create_url_vector() -> Result<Vec<url::Url>, url::ParseError> {
|
||
let urls = vec![
|
||
"https://rustacean.net/assets/rustacean-orig-noshadow.png",
|
||
"https://rustacean.net/assets/rustacean-orig-noshadow.svg",
|
||
"https://rustacean.net/assets/rustacean-flat-noshadow.png",
|
||
"https://rustacean.net/assets/rustacean-flat-noshadow.svg",
|
||
"https://rustacean.net/assets/cuddlyferris.png",
|
||
"https://rustacean.net/assets/cuddlyferris.svg",
|
||
"https://rustacean.net/assets/rustacean-flat-happy.png",
|
||
"https://rustacean.net/assets/rustacean-flat-happy.svg",
|
||
"https://rustacean.net/assets/rustacean-flat-gesture.png",
|
||
"https://rustacean.net/assets/rustacean-flat-gesture.svg",
|
||
"https://rustacean.net/assets/corro.svg",
|
||
"https://rustacean.net/more-crabby-things/droidstacean-flat-happy_green.png",
|
||
];
|
||
urls.into_iter().map(url::Url::parse).collect()
|
||
}
|
||
|
||
async fn download_file(url: &url::Url) -> Result<(), Box<dyn std::error::Error>> {
|
||
let mut res = surf::get(&url).await?;
|
||
let body = res.body_bytes().await?;
|
||
|
||
let segments = url.path_segments().expect("url has no path");
|
||
let mut path = std::env::current_dir()?;
|
||
path.push("target");
|
||
async_std::fs::write(path.join(segments.last().unwrap()), &body).await?;
|
||
Ok(())
|
||
}
|
||
|
||
#[async_std::main]
|
||
async fn main() -> Result<(), Box<dyn std::error::Error>> {
|
||
let urls = create_url_vector()?;
|
||
|
||
let tasks = urls
|
||
.into_iter()
|
||
.map(|url| async move {
|
||
if let Err(error) = download_file(&url).await {
|
||
eprintln!("Error downloading `{url}`: {error}!")
|
||
}
|
||
})
|
||
.map(async_std::task::spawn)
|
||
.collect::<Vec<_>>();
|
||
|
||
for task in tasks {
|
||
task.await
|
||
}
|
||
|
||
Ok(())
|
||
}
|
||
|
||
Summary
|
||
|
||
What did you learn?
|
||
|
||
- What the difference between concurrency and parallelism is, as well
|
||
as their benefits and drawbacks.
|
||
- How to do both message passing as well as shared memory concurrency
|
||
in Rust.
|
||
- How rayon allows you to easily exploit data parallelism when working
|
||
with iterators.
|
||
- How to program asynchronous code, enabling resource-efficient
|
||
software that deals well with many I/O tasks.
|
||
|
||
Where can you learn more?
|
||
|
||
- Concurrency
|
||
- Rust Book: Ch. 16 20
|
||
- Programming Rust: Ch. 19, 20
|
||
- Rust for Rustaceans: Ch. 09, 11
|
||
- The Embedded Rust Book: Concurrency
|
||
- OneSignal: Thread Safety
|
||
- 7 Concurrency Models in 7 Weeks
|
||
- Parallel Data / rayon
|
||
- Blog Post
|
||
- RustBelt Talk
|
||
- Async Programming
|
||
- Programming Rust: Ch. 20
|
||
- Rust in Action: Ch. 10
|
||
- Rust for Rustaceans: Ch. 08
|
||
- Async IO Fundamentals
|
||
- A Pratical Guide to Async in Rust
|
||
- async-rs Stop Token
|
||
- Async Read and Write Traits
|
||
|
||
W11: Work Sheet
|
||
|
||
Applied Concurrency in Rust
|
||
|
||
- Do the Rustlings exercises threads.
|
||
|
||
- Remember the 10-incrementer we have mentioned in U03. Your task is
|
||
now to take this code (which we show below) and turn it into a
|
||
concurrent solution (i.e. it keeps spawning two threads that do the
|
||
stepwise 10-increment) and produces the correct output (62). Do so
|
||
once using message passing and once using shared memory concurrency.
|
||
|
||
use std::thread;
|
||
|
||
fn increment(mut counter: Counter) {
|
||
for _ in 0..10 {
|
||
counter.count += 1;
|
||
}
|
||
}
|
||
|
||
#[derive(Debug)]
|
||
struct Counter {
|
||
count: u32,
|
||
}
|
||
|
||
fn main() {
|
||
let mut counter = Counter { count: 42 };
|
||
let t1 = thread::spawn(|| increment(counter));
|
||
let t2 = thread::spawn(|| increment(counter));
|
||
t1.join().unwrap();
|
||
t2.join().unwrap();
|
||
println!("{:#?}", counter);
|
||
}
|
||
|
||
Rayon in Action
|
||
|
||
In U04 you had to implement a word count program using iterators. For
|
||
this task, take this solution and turn it into a concurrency-enabled
|
||
solution using Rayon. Benchmark the sequential and the parallel solution
|
||
and compare the performance.
|
||
|
||
Async in Action
|
||
|
||
At the beginning of Async Programming, we described the “download all
|
||
sections of this book concurrently” use case for async. Your task is now
|
||
to do exactly that: given a vector of URIs, try to download all of them
|
||
in parallel and write them to a folder on disk. Do not attempt to
|
||
download the book sections, because they are behind HTTP basic auth,
|
||
which complicates things. In the sample solution, we download all the
|
||
~~crap~~ crabs images. Instead of doing HTTP requests by hand, leverage
|
||
the surf crate that works nicely with async-std. Benchmark your solution
|
||
while you develop it.
|
||
|
||
Hardware Dependability
|
||
|
||
System dependability can come in various forms:
|
||
|
||
- how often are reboots allowed?
|
||
- are crashes allowed and how often?
|
||
- what is the acceptable failure rate?
|
||
|
||
Every system can fail. So, you need to decide what your acceptable
|
||
failure rate is. - Better Embedded System Software
|
||
|
||
Dependability targets must exist so that systems can be designed for
|
||
this target.
|
||
|
||
The two most common issues for hardware of embedded systems are
|
||
reliability and availability. Notably, software fails in different ways
|
||
than hardware and the math we cover here cannot easily be transferred to
|
||
software components.
|
||
|
||
Typical faults that happen in hardware are that gates do not properly
|
||
compute their output or bits get corrupted in memory.
|
||
|
||
Reliability
|
||
|
||
For the remainder of this section, we define reliability as the
|
||
probability of a system to work continuously for X hours after having
|
||
been turned on. Naturally, longer uptimes induce larger probabilities of
|
||
failure. Purely mechanical components often have a high probability of
|
||
failure right after they have been produced; which is often called
|
||
burn-in phase.
|
||
|
||
Reliability is a measure of a system’s ability to work completely
|
||
failure-free for a certain length of time, or mission. - Better
|
||
Embedded System Software
|
||
|
||
Under some natural assumptions, including that of mutual independence of
|
||
failures, we can consider the probability of hardware failures as being
|
||
determined by the failure rate \(\) and time \(t\) in a negative
|
||
exponential probability distribution:
|
||
|
||
\[ R(t) = e^{-t}\]
|
||
|
||
The exponential function leads to the fact that reliability drops the
|
||
longer the mission becomes.
|
||
|
||
A typical measure is Mean Time To Failure (\(MTTF\)), the average length
|
||
of failure-free operation after initialization, which under the above
|
||
assumptions corresponds to the inverse of the failure rate: \(MTTF=1/\).
|
||
Higher MTTF values are indicators of higher reliability.
|
||
|
||
Note that if the MTTF is 1000 hours (\(= 0.001 \)), the system is not
|
||
guaranteed to work for 1000 hours — it could very well fail sooner or
|
||
later. Instead, the reliability of a 1000-hour mission is:
|
||
|
||
\[ R(1000) = e^{-0.001 } = e^{-1} = 0.3679 \]
|
||
|
||
Spelled out, this means that operating for 1000 hours with such a
|
||
component, you would expect in 63% of the cases that the component is
|
||
failed by that time. Here is a table of mission times given a certain
|
||
MTTF and target reliability:
|
||
|
||
------------------------------------------------------------------------------
|
||
MTTF (hrs) Mission time at 99% Mission time at 99.9% Mission time at 99.99%
|
||
reliability reliability reliability
|
||
------------ -------------------- --------------------- ----------------------
|
||
10 6 minutes 36 seconds 3.6 seconds
|
||
|
||
100 1 hour 6 minutes 36 seconds
|
||
|
||
1000 10 hours 1 hour 6 minutes
|
||
|
||
10,000 4.2 days 10 hours 1 hour
|
||
|
||
100,000 6 weeks 4.2 days 10 hours
|
||
|
||
1,000,000 60 weeks 6 weeks 4.2 days
|
||
|
||
10,000,000 11.5 years 60 weeks 6 weeks
|
||
------------------------------------------------------------------------------
|
||
|
||
Depending on how we connect components, the reliability can change:
|
||
|
||
- Serial connection: if one fails, the entire chain fails:
|
||
\[R(t)_{serial} = _{i}R(t)_i\]
|
||
- Parallel connection: if one fails, the other can take over:
|
||
\[R(t)_{parallel} = 1 - _{i}(1-R(t)_i)\]
|
||
|
||
We can deduce that parallel connection improves reliability, while
|
||
serial reduces it. Here is a table of the number of (redundant) parallel
|
||
components and the chances that they fail on 11-hour long missions:
|
||
|
||
------------------------------------------------------------------------
|
||
# Components R(11) at 50,000 MTTF Mean Number of Missions Before
|
||
Failure
|
||
------------ -------------------------- --------------------------------
|
||
1 \(0.99978\) \(4,546\)
|
||
|
||
2 \(0.999 999 952\) \(20,665,853\)
|
||
|
||
3 \(0.999 999 999 999 999 \(1.326 ^{17}\)
|
||
998\)
|
||
------------------------------------------------------------------------
|
||
|
||
Note that all this assumes that failures are independent, which is
|
||
something engineers have to put in a lot of effort to ensure.
|
||
|
||
Availability
|
||
|
||
For repairable systems/components, a different view on dependability is
|
||
to look at the (long-run) availability:
|
||
|
||
Availability is the fraction of time the system is operational.
|
||
|
||
That number depends on the MTTF as well as the mean time needed to
|
||
repair the system upon failure:
|
||
|
||
\[ = \]
|
||
|
||
Note that the availability is independent of the mission time.
|
||
|
||
Increasing availability is usually done via redundancy (a single failure
|
||
does not cause unavailability) or fast recovery (repair time gets
|
||
small). Approaches to do the latter can be standby-systems, fast resets,
|
||
or watchdog timers of periodic resets.
|
||
|
||
Markov Analysis
|
||
|
||
Markov models are stochastic processes describing system behaviour
|
||
stochastically using concepts similar to state machines. Central to
|
||
these systems is the Markov assumption: The probability of the next
|
||
state depends on the current state only (i.e. is independent of previous
|
||
states). Therefore, a Markov model is said to be memory-less as prior
|
||
state occupancies are not influencing the future behaviour.
|
||
|
||
There are different models: * Discrete-time Markov chains are the most
|
||
basic ones, where state changes are described by probabilistic
|
||
experiments over successor states. For instance, with states being
|
||
elements of \(\{head, tail\}^{+}\), the possible sequences of outcomes
|
||
of a tosses of a fair coin describe a discrete-time Markov chain.
|
||
|
||
- More relevant for our purposes here are continuous-time Markov
|
||
chains. They evolve in continuous time (the reals), not in discrete
|
||
time (the integers). Here, the memory-less property is also in the
|
||
time domain, which means it does not matter when the system has been
|
||
where (including how long the system has been in the current state),
|
||
only the current state determines the future behaviour. It can be
|
||
shown that state occupancy times in such models must be
|
||
exponentially distributed. The proof is beautiful and hence
|
||
recommended (alternative: join the next ``Quantitative Model
|
||
Checking’’ lecture).
|
||
|
||
Here is an example continuous time Markov chain, where rates (of
|
||
exponential distributions) label edges.
|
||
|
||
|
||
+-----------+ μ
|
||
| A: intact |<--------+
|
||
+-----------+ |
|
||
| |
|
||
λ | +-----+-----+
|
||
+-------->| B: failed |
|
||
+-----------+
|
||
|
||
Application in Dependability Analysis
|
||
|
||
Markov models are widely applied in dependability analysis. For
|
||
instance, we can do the following:
|
||
|
||
- First, we create a system model with empirically measured mean times
|
||
between failure and repair times.
|
||
- Based on this, we build a Markov model, where component states form
|
||
nodes, while failure and repair times are used on the edges between
|
||
states.
|
||
- From this model, a set of equations can be derived and used to
|
||
compute the percentage of time spent in a state, as well as visit
|
||
frequency and visit durations for states (as well as the precise
|
||
information in what state the system is at what time with what
|
||
probability).
|
||
- Finally, this information can be used to tell working system states
|
||
apart from failed system states.
|
||
- Using these two states, we can compute the availability of the
|
||
entire system and many other measures of interest, like mission
|
||
survivability.
|
||
|
||
Use Case: Remote-Controlled Robot
|
||
|
||
We reconsider our robotic application and now want to analyse it using
|
||
Markov chains: For this, we consider the system as the composition of
|
||
components that can be in working or failed state. We denote that
|
||
component X is working with X and !X if it is failed. With our 3
|
||
components, we have \(2^3\) distinct states:
|
||
|
||
+---------+
|
||
| (P,T,S) |
|
||
+---------+
|
||
| | |
|
||
+---------+ | +----------+
|
||
| | |
|
||
V V V
|
||
+-----------+ +-----------+ +-----------+
|
||
| (!P,T,S) | | (P,!T,S) | | (P,T,!S) |
|
||
+-----------+ +-----------+ +-----------+
|
||
| | | | | |
|
||
| +-----------+----+--------+---+ |
|
||
| | | | | |
|
||
| +------+ | +-----+ | |
|
||
V V V V V V
|
||
+-----------+ +-----------+ +-----------+
|
||
| (!P,!T,S) | | (P,!T,!S) | | (!P,T,!S) |
|
||
+-----------+ +-----------+ +-----------+
|
||
| | |
|
||
+---------+ | +-----------+
|
||
V V V
|
||
+------------+
|
||
| (!P,!T,!S) |
|
||
+------------+
|
||
|
||
In another step, we mark states that are failed states (where the fault
|
||
tree evaluates to true) as well as working states (marked with ==, where
|
||
the FT evaluates to false). We again consider the failure rates as
|
||
specified in the previous section. We also annotate the edges with the
|
||
rates:
|
||
|
||
+=========+
|
||
| (P,T,S) |
|
||
+=========+
|
||
λ_P | | | λ_S
|
||
+---------+ | +----------+
|
||
| | λ_T |
|
||
V V V
|
||
+-----------+ +===========+ +===========+
|
||
| (!P,T,S) | | (P,!T,S) | | (P,T,!S) |
|
||
+-----------+ +===========+ +===========+
|
||
| λ_T | λ_S | λ_P | λ_S | λ_T | λ_P
|
||
| +------+------+------+---+ |
|
||
| | +-+ | | |
|
||
| +------+ | +-----+ | |
|
||
V V V V V V
|
||
+-----------+ +-----------+ +-----------+
|
||
| (!P,!T,S) | | (P,!T,!S) | | (!P,T,!S) |
|
||
+-----------+ +-----------+ +-----------+
|
||
| λ_S | λ_P | λ_T
|
||
+---------+ | +-----------+
|
||
V V V
|
||
+------------+
|
||
| (!P,!T,!S) |
|
||
+------------+
|
||
|
||
Assume that we now want to compute the probability of the overall system
|
||
failure, i.e. the probability of being in any of the states not
|
||
surrounded with ==. To do this, we can simplify the chain (factually
|
||
being an exploitation of bisimulation on Markov chains) as follows by
|
||
collapsing failed states into one (numbers in state give the index used
|
||
for latter analysis):
|
||
|
||
+===========+
|
||
| (P,T,S) 0 |
|
||
+===========+
|
||
λ_P | | | λ_S
|
||
+---------------+ | +----------+
|
||
| | λ_T |
|
||
| V V
|
||
| +============+ +============+
|
||
| | (P,!T,S) 1 | | (P,T,!S) 2 |
|
||
| +============+ +============+
|
||
| | λ_P | λ_S | λ_T | λ_P
|
||
| | | | |
|
||
| | +-+ | |
|
||
| +------+ | +-----+ |
|
||
V V V V V
|
||
+-----------------------------------------+
|
||
| Failed 3 |
|
||
+-----------------------------------------+
|
||
|
||
Multiple outgoing edges can be combined as in the following diagram (as
|
||
a result of the fact that the minimum of exponential distributions is
|
||
exponentially distributed with the sum of the rates):
|
||
|
||
+===========+
|
||
| (P,T,S) 0 |
|
||
+===========+
|
||
λ_P | | | λ_S
|
||
+---------------+ | +----------+
|
||
| | λ_T |
|
||
| V V
|
||
| +============+ +============+
|
||
| | (P,!T,S) 1 | | (P,T,!S) 2 |
|
||
| +============+ +============+
|
||
| | λ_P + λ_S | λ_P + λ_T
|
||
| | |
|
||
| | |
|
||
| | |
|
||
V V V
|
||
+-----------------------------------------+
|
||
| Failed 3 |
|
||
+-----------------------------------------+
|
||
|
||
Final Analysis
|
||
|
||
If we apply the formulas from the previous section on the failure rates
|
||
we get \[\] The last line is obtained by evaluating \(Q_0(t)\) with time
|
||
\(t\) set to 8760 hours (1 year). The result obtained should agree with
|
||
the one of the direct analysis in the previous section, which was a lot
|
||
simpler, but needed to fix \(t\) prior to the analysis, and which is
|
||
generally unable to cover repairable systems.
|
||
|
||
U12: Dependability Theory
|
||
|
||
This time, it is Ferris McHardHat who is going to teach you about the
|
||
reliability of hardware, quantitative fault tree analysis, and Markov
|
||
analysis. As he is more of a theoretical person, he mostly wears the
|
||
hard hat for style and not for safety reasons — but that does not mean
|
||
you should pay less attention to him!
|
||
|
||
Quantitative Fault Tree Analysis
|
||
|
||
In U09 we already discussed how fault trees can be used to analyze a
|
||
system for events that can cause failures. While we looked at algorithms
|
||
to find a minimal set of these failures, we have not considered how each
|
||
event contributes to the overall reliability of the system. In this
|
||
section, we also introduce importance measures to:
|
||
|
||
- identify basic events that should be improved, maintained, or
|
||
controlled
|
||
- identify basic events that contribute significantly to the top-event
|
||
probability — which means that high-quality failure data should be
|
||
obtained.
|
||
|
||
In practice, the values computed by the importance measures differ by
|
||
orders of magnitude. Hence, it is often sufficient to look at the rough
|
||
estimates and you do not need precise results.
|
||
|
||
Use Case: Remote-Controlled Robot
|
||
|
||
Let’s consider a remote-controlled robotic system composed of the
|
||
following components with respective failure rates:
|
||
|
||
- A power supply \(P\), without which the system stops working. \(_P =
|
||
12.03 ^{-6} \)
|
||
- A communication module, without which the mission is no longer under
|
||
control i.e. failed, that is composed of two redundant links:
|
||
- A terrestrial link \(T\). \(_T = 25.47 ^{-6} \)
|
||
- A satellite link \(S\). \(_S = 40.72 ^{-6} \)
|
||
|
||
Our mission is designed to last for \(t = 8760 h\) (1 year) and we now
|
||
want to know how likely a mission failure is, given these values. If
|
||
this is insufficient, we would need to improve the reliability of
|
||
components.
|
||
|
||
Based on these details, we can come up with both (a) the structure of
|
||
the fault tree and (b) the failure probabilities of all events \(p_ =
|
||
1-e^{-t}\) (check the numbers, note that we rounded gracefully):
|
||
|
||
System
|
||
failed
|
||
0.154
|
||
|
|
||
+-----+
|
||
| >=1 |
|
||
+-----+
|
||
| |
|
||
+--+ |
|
||
0.06 | |
|
||
+-------+ |
|
||
| & | |
|
||
+-------+ |
|
||
| | |
|
||
O O O
|
||
0.2 0.3 0.1
|
||
^ ^ ^
|
||
Terrestrial | Power Supply
|
||
Link failed | failed
|
||
|
|
||
Satellite
|
||
Link failed
|
||
|
||
Top-Event Probabilities
|
||
|
||
First, let us introduce a bit of notation:
|
||
|
||
- Our system is composed of \(n\) components.
|
||
- The fault tree induces a structure function \((.)\), a negation-free
|
||
Boolean formula.
|
||
- Let \(x_i\) indicate that the \(i\)th component is in working state
|
||
(1), respectively failed state (0).\( = (x_1, x_2, …, x_n)\) is the
|
||
vector of states of all components.
|
||
- \(()\) represents the system state which is defined as \(() =
|
||
1~(0)\) if the system is working (failed). \((.)\) is the negation
|
||
of \((.)\) meaning that the the system fails once the top-level
|
||
event turns true.
|
||
- Let \(p_i\) specify the reliability of component \(i\), i.e. \(P(x_i
|
||
= 1) = p_i = 1 - P(x_i = 0)\).\(\) is the vector of reliabilities of
|
||
all components.
|
||
- \(R()\) the system reliability with component reliability vector
|
||
\(\).
|
||
|
||
We introduce the top-level failure probability \(Q_0\), that is computed
|
||
based on minimal cut sets. The minimal cut sets are: [Terrestrial Link
|
||
failed, Satellite Link failed] and [Power Supply failed].
|
||
|
||
If basic events are independent, \(Q_0\) gives the top-level failure
|
||
probability exactly. If they are not, \(Q_0\) is a conservative upper
|
||
bound, as failure combinations are counted multiple times.
|
||
|
||
First, let’s compute \(q_i\) of each cut set using the failure
|
||
probabilities from the diagram \((1 - p_j)\):
|
||
|
||
\[q_i = _{j MCS_i} (1 - p_j) \]
|
||
|
||
For our cut sets, this means:
|
||
|
||
\(q_1 = 0.2 = 0.06\) [Terrestrial Link failed, Satellite Link Failed]
|
||
|
||
\(q_2 = 0.1\) [Power Supply failed]
|
||
|
||
The top-level failure probability is then computed as
|
||
|
||
\[Q_0 = 1 - _{i MCSes} (1 - q_i)\]
|
||
|
||
For our case:
|
||
|
||
\[Q_0 = 1 - (1 - 0.06) (1 - 0.1) = 1 - 0.94 = 1 - 0.846 = 0.154\]
|
||
|
||
Birnbaum Importance
|
||
|
||
When it comes to importance measures, the Birnbaum Importance \(I_B\)
|
||
indicates how important a component (represented by the failure event)
|
||
is for correct functioning of the whole system. The value is computed
|
||
for an event \(i\) as follows:
|
||
|
||
\[I_B(i) = Q_0( p_i = 1) - Q_0( p_i = 0)\]
|
||
|
||
Hence, you re-compute the \(Q_0\) values but replace the event
|
||
probabilities with 1 and 0 respectively. Intuitively speaking, the
|
||
importance quantifies how much the top-level failure probability changes
|
||
if the component is either perfectly unreliable (1) or reliable (0).
|
||
|
||
Let’s compute this for the event Terrestrial Link failed:
|
||
|
||
\[\]
|
||
|
||
We get the following results for the other events:
|
||
|
||
\[\]
|
||
|
||
Improvement Potential Importance
|
||
|
||
Another importance measure is the Improvement Potential Importance,
|
||
which gives for each component how much the overall system reliability
|
||
would increase if the component were perfect. The value is computed for
|
||
an event \(i\) as follows:
|
||
|
||
\[I_{IP}(i) = R( p_i = 0) - R()\]
|
||
|
||
Let us reformulate this in terms of unreliability to use our \(Q_0\)
|
||
function:
|
||
|
||
\[\]
|
||
|
||
Let’s compute this for the event Terrestrial Link failed:
|
||
|
||
\[\]
|
||
|
||
We get the following results for the other events:
|
||
|
||
\[\]
|
||
|
||
S12: Sample Solution
|
||
|
||
HW Reliability
|
||
|
||
- \(R(t) = e^{-} = e^{-0.5} = 0.606\)
|
||
|
||
- \(t = -2160 h ^{-4} = 0.2160h = 12.96 min\)
|
||
|
||
- \( = = 72000h\)
|
||
|
||
- \(N = -t _S ln(1 - R(t))= (1-0.999)= 81\)
|
||
|
||
HW Availability
|
||
|
||
- \(5 years - 0.999 year = 0.005 year = 43.8 h\)
|
||
|
||
- \( = 0.999726\), i.e. three nines
|
||
|
||
- \(15min = 1485min = 24.75h\)
|
||
|
||
QTA
|
||
|
||
Q0: 0.0394 -> 1 - 0.0394 = 0.9606
|
||
|
||
Summary
|
||
|
||
What did you learn?
|
||
|
||
- How hardware dependability can be computed in terms of reliability
|
||
and availability.
|
||
- How to execute quantitative fault tree analysis to compute
|
||
system-level reliability as well as importance of basic events for
|
||
reliability.
|
||
- How to execute Markov analysis on systems to check for the amount of
|
||
time the system spends in the failed state.
|
||
|
||
Where can you learn more?
|
||
|
||
- Better Embedded System Software: Ch. 26 or Course Unit
|
||
- Embedded Software Development for Safety-Critical Systems: Ch. 11
|
||
Markov Chains
|
||
- Markov Chains in Detail
|
||
|
||
W12: Work Sheet
|
||
|
||
Hardware Reliability
|
||
|
||
- Given \(MTTF = 2 years = 17520 h\) and \(t = 1 year = 8760 h\), what
|
||
is \(R(t)\)?
|
||
|
||
- Given \(MTTF = 3 months = 3 24h = 2160 h\) and \(R(t) = 0.9999\)
|
||
goal, how long is \(t\)?
|
||
|
||
- Given \(R(t) = 0.999\) goal and \(t = 72h\), what is the \(MTTF\)?
|
||
|
||
- You have a component with failure rate \(= 0.001 \) and a target
|
||
mission time of \(t = 8760 h\). How many of the component do you
|
||
need in parallel to achieve at least a reliability of 0.9999 at
|
||
\(t\).
|
||
|
||
Hardware Availability
|
||
|
||
In dependability jargon, “X 9s” refers to a reliability of X-many 9s —
|
||
e.g. Two-9s = 0.99.
|
||
|
||
- Given availability target of three 9s and \(MTTF\) of 5 years what
|
||
is the repair time?
|
||
|
||
- Give \(MTTF\) of 10 years and repair time of 1 day, how many 9s of
|
||
availability do you get?
|
||
|
||
- Given availability target of two 9s and repair time of 15 minutes,
|
||
how large should the \(MTTF\) be?
|
||
|
||
Quantitative Fault Tree Analysis
|
||
|
||
Consider the fault tree from W09 with failure probabilities \(P_V =
|
||
0.01\) and \(P_{S1} = P_{S2} = P_{S3} = 0.1\):
|
||
|
||
System
|
||
failed
|
||
|
|
||
+-----+
|
||
| >=1 |
|
||
+-----+
|
||
| |
|
||
| +--+
|
||
| |
|
||
| +-------+
|
||
| | >=1 |
|
||
| +-------+
|
||
| | | +----------------+
|
||
| | +---------+ |
|
||
| | | |
|
||
| +-------+ +-------+ +-------+
|
||
| | & | | & | | & |
|
||
| +-------+ +-------+ +-------+
|
||
| | | | | | |
|
||
O O O O O O O
|
||
^ ^ ^ ^ ^ ^ ^
|
||
V | S2 | S3 | S3
|
||
failed | failed | failed | failed
|
||
S1 S1 S2
|
||
failed failed failed
|
||
|
||
Your task is now to:
|
||
|
||
- Compute the top-level failure reliability using \(Q_0\).
|
||
- Compute the Birnbaum Importance for all basic events.
|
||
- Compute the Improvement Potential Importance for all basic events.
|
||
|
||
Assembly
|
||
|
||
As you probably know already, CPUs do not execute Rust code, but rather
|
||
work with bits and bytes that encode the CPUs instruction. This code is
|
||
called assembly code. In contrast to high-level, safe Rust, assembly can
|
||
be broken in many ways. The goal of this section is that you value the
|
||
guarantees Rust gives you more and only go to lower layers if really
|
||
necessary — or if you want to play around with the compiler itself.
|
||
|
||
When we write Rust, we write in a high-level language and pass it to the
|
||
compiler. The translation happens through multiple layers, e.g. the
|
||
following ones for the LLVM compiler:
|
||
|
||
- Rust High-Level Code
|
||
- Rust’s Mid-Level Intermediate Representation (MIR)
|
||
- LLVM Intermediate Representation
|
||
- Target-specific Assembly Code
|
||
|
||
We can inspect the compilation process using cargo-asm, which brings
|
||
both the cargo asm as well as cargo llvm-ir subcommands to our system.
|
||
|
||
Let’s Assemble
|
||
|
||
We start with the following code that sums all the numbers in a range:
|
||
|
||
pub fn sum(range: std::ops::RangeInclusive<u8>) -> u8 {
|
||
let mut sum = 0;
|
||
for i in range {
|
||
sum += i;
|
||
}
|
||
sum
|
||
}
|
||
|
||
We do cargo llvm-ir asm::sum and see how the %sum variable is
|
||
initialized, updated, and how the loop is converted:
|
||
|
||
define i8 @asm::sum(i24 %0) unnamed_addr #0 personality i32 (i32, i32, i64, %"unwind::libunwind::_Unwind_Exception"*, %"unwind::libunwind::_Unwind_Context"
|
||
start:
|
||
%iter.sroa.0.0.extract.trunc = trunc i24 %0 to i8
|
||
%iter.sroa.5.0.extract.shift = lshr i24 %0, 8
|
||
%iter.sroa.5.0.extract.trunc = trunc i24 %iter.sroa.5.0.extract.shift to i8
|
||
%_2.not.i.i.i16 = icmp ugt i24 %0, 65535
|
||
%.not.i.i.i17 = icmp ugt i8 %iter.sroa.0.0.extract.trunc, %iter.sroa.5.0.extract.trunc
|
||
%.0.i.i.i18 = select i1 %_2.not.i.i.i16, i1 true, i1 %.not.i.i.i17
|
||
br i1 %.0.i.i.i18, label %bb6, label %bb3.i.i
|
||
|
||
bb3.i.i: ; preds = %start, %bb3.i.i
|
||
%sum.020 = phi i8 [ %3, %bb3.i.i ], [ 0, %start ]
|
||
%iter.sroa.0.019 = phi i8 [ %spec.select15, %bb3.i.i ], [ %iter.sroa.0.0.extract.trunc, %start ]
|
||
%1 = icmp ult i8 %iter.sroa.0.019, %iter.sroa.5.0.extract.trunc
|
||
%not. = xor i1 %1, true
|
||
%2 = zext i1 %1 to i8
|
||
%spec.select15 = add nuw i8 %iter.sroa.0.019, %2
|
||
%3 = add i8 %sum.020, %iter.sroa.0.019
|
||
%.not.i.i.i = icmp ugt i8 %spec.select15, %iter.sroa.5.0.extract.trunc
|
||
%.0.i.i.i = select i1 %not., i1 true, i1 %.not.i.i.i
|
||
br i1 %.0.i.i.i, label %bb6, label %bb3.i.i
|
||
|
||
bb6: ; preds = %bb3.i.i, %start
|
||
%sum.0.lcssa = phi i8 [ 0, %start ], [ %3, %bb3.i.i ]
|
||
ret i8 %sum.0.lcssa
|
||
}
|
||
|
||
We do cargo asm our_crate::sum --rust and see a similar structure (the
|
||
initialization and loop):
|
||
|
||
pub fn sum(range: std::ops::RangeInclusive<u8>) -> u8 {
|
||
mov ecx, edi
|
||
and ecx, 16777215
|
||
xor eax, eax
|
||
cmp ecx, 65535
|
||
ja .LBB0_5
|
||
mov ecx, edi
|
||
shr ecx, 8
|
||
cmp dil, cl
|
||
ja .LBB0_5
|
||
xor eax, eax
|
||
.LBB0_3:
|
||
mov edx, edi
|
||
cmp dil, cl
|
||
adc dil, 0
|
||
sum += i;
|
||
add al, dl
|
||
cmp dl, cl
|
||
jae .LBB0_5
|
||
cmp dil, cl
|
||
jbe .LBB0_3
|
||
.LBB0_5:
|
||
}
|
||
ret
|
||
|
||
Now let’s have a look at a slightly different piece of code. The change
|
||
is that we now use an exclusive Range:
|
||
|
||
pub fn sum(range: std::ops::Range<u8>) -> u8 {
|
||
let mut sum = 0;
|
||
for i in range {
|
||
sum += i;
|
||
}
|
||
sum
|
||
}
|
||
|
||
Now, our LLVM IR code looks quite different:
|
||
|
||
define i8 @asm::sum(i8 %range.0, i8 %range.1) unnamed_addr #0 personality i32 (i32, i32, i64, %"unwind::libunwind::_Unwind_Exception"*, %"unwind::libunwind::_Unwind_Context"
|
||
start:
|
||
%0 = icmp ult i8 %range.0, %range.1
|
||
br i1 %0, label %bb4.preheader, label %bb6
|
||
|
||
bb4.preheader: ; preds = %start
|
||
%1 = xor i8 %range.0, -1
|
||
%2 = add i8 %1, %range.1
|
||
%3 = add nuw i8 %range.0, 1
|
||
%4 = mul i8 %2, %3
|
||
%5 = zext i8 %2 to i9
|
||
%6 = add i8 %range.1, -2
|
||
%7 = sub i8 %6, %range.0
|
||
%8 = zext i8 %7 to i9
|
||
%9 = mul i9 %5, %8
|
||
%10 = lshr i9 %9, 1
|
||
%11 = trunc i9 %10 to i8
|
||
%12 = add i8 %4, %range.0
|
||
%13 = add i8 %12, %11
|
||
br label %bb6
|
||
|
||
bb6: ; preds = %bb4.preheader, %start
|
||
%sum.0.lcssa = phi i8 [ 0, %start ], [ %13, %bb4.preheader ]
|
||
ret i8 %sum.0.lcssa
|
||
}
|
||
|
||
What you notice is that there is no loop anymore. Instead, Rust was able
|
||
to detect that what we are doing is adding up the elements of a range
|
||
i..j (which is different from adding up arbitrary elements from a list).
|
||
In consequence, it converted this construct into an optimized version of
|
||
the well-known formulas for computing the triangular number:
|
||
|
||
\[T_n = \]
|
||
|
||
and the natural sum of a range: sum(i..j) = \({i}^{j-1} = T_j - T{i-1}
|
||
\)
|
||
|
||
In assembly, the result looks like this:
|
||
|
||
pub fn sum(range: std::ops::Range<u8>) -> u8 {
|
||
cmp dil, sil
|
||
jae .LBB0_1
|
||
mov ecx, edi
|
||
not cl
|
||
add cl, sil
|
||
lea edx, [rdi, +, 1]
|
||
mov eax, ecx
|
||
mul dl
|
||
movzx ecx, cl
|
||
sub sil, dil
|
||
add sil, -2
|
||
movzx edx, sil
|
||
imul edx, ecx
|
||
shr edx
|
||
add al, dil
|
||
add al, dl
|
||
}
|
||
ret
|
||
.LBB0_1:
|
||
xor eax, eax
|
||
ret
|
||
|
||
In summary, you should realize how efficient the Rust compiler is and
|
||
what is done for you already. This should also motivate you to abstain
|
||
from premature optimization, and only look at this things if you have
|
||
enough demand to further optimize the code you are generating.
|
||
|
||
Foreign Function Interface
|
||
|
||
Rust, by being a language without a runtime environment, is very well
|
||
suited to interface with other programming languages and ecosystems.
|
||
This is where foreign function interfaces (FFI) come into play.
|
||
Naturally, there are several use cases for this:
|
||
|
||
- You have existing code in a language that is well-tested and
|
||
established and you do not want to touch or rewrite it. Still, you
|
||
want to use the code in Rust and for this, you would wrap the
|
||
existing code in an FFI and safely call it from Rust.
|
||
- You have existing code in a language and you want to gradually
|
||
rewrite it in Rust. In this case, you can start by writing pieces of
|
||
code in Rust, expose them via FFI, and integrate them into the
|
||
existing system — replacing existing functionality.
|
||
- You have code written in a language that is not suited for high
|
||
performance (e.g., Python) and want to remove performance
|
||
bottlenecks by rewriting critical functions in Rust.
|
||
|
||
In the following, we take a closer look at the last use case.
|
||
|
||
An FFI Fibonacci
|
||
|
||
To show you FFI in action, we show how using Rust for a core function
|
||
and calling it from Python can improve performance. Again use the
|
||
recursive Fibonacci method as an example — hoping that it is clear to
|
||
you that this is a toy example and the obvious path to improve the
|
||
performance by using the closed form to compute the result.
|
||
|
||
Pure Python
|
||
|
||
First, to establish a baseline, we implement and benchmark the function
|
||
in pure Python:
|
||
|
||
#!/usr/bin/env python3
|
||
|
||
def fib(n):
|
||
if n == 0 or n == 1:
|
||
return 1
|
||
else:
|
||
return fib(n-1) + fib(n-2)
|
||
|
||
|
||
print(fib(34))
|
||
|
||
Running it with hyperfine yields the following results:
|
||
|
||
> hyperfine "python3 src/pure.py"
|
||
Benchmark #1: python3 src/pure.py
|
||
Time (mean ± σ): 1.099 s ± 0.046 s [User: 1.097 s, System: 0.002 s]
|
||
Range (min … max): 1.042 s … 1.192 s 10 runs
|
||
|
||
Rust-Powered Python
|
||
|
||
We know that Python, being an interpreted language, is not fast in doing
|
||
a) function calls (in our case the recursive call) as well as b) basic
|
||
mathematical operations (+ in our case). The reason behind this is that
|
||
due to the dynamic typing of Python, it cannot make the same assumptions
|
||
as other languages do. For our addition operation, Python is going to
|
||
first check what the two operands are, how + is implemented and then
|
||
apply it — despite the fact that only integers are used for which the
|
||
addition is a single machine instruction.
|
||
|
||
A typical approach for these performance-critical code parts of Python
|
||
programs is to call to code from other programming languages or use, for
|
||
instance, Cython. In our case, we rewrite the core logic in Rust and
|
||
make it accessible from Python.
|
||
|
||
The PyO3 Way
|
||
|
||
While the previous version works, for many use cases it is more suitable
|
||
to use this approach for FFI. Our Cargo.toml looks like this:
|
||
|
||
[package]
|
||
name = "fibonacci"
|
||
version = "0.1.0"
|
||
edition = "2018"
|
||
|
||
[lib]
|
||
name = "fibonacci"
|
||
crate-type = ["cdylib"]
|
||
|
||
[dependencies.pyo3]
|
||
version = "0.15.1"
|
||
features = ["extension-module"]
|
||
|
||
The upper part is as before and we further leverage the pyo3, a crate
|
||
for bridging the foreign-function interface between Rust and Python
|
||
(both ways). The lib.rs with the fib function looks very similar to the
|
||
Python version:
|
||
|
||
use pyo3::prelude::*;
|
||
|
||
#[pyfunction]
|
||
fn fib(n: u32) -> u32 {
|
||
if n == 0 || n == 1 {
|
||
1
|
||
} else {
|
||
fib(n - 1) + fib(n - 2)
|
||
}
|
||
}
|
||
|
||
#[pymodule]
|
||
fn fibonacci(_py: Python, m: &PyModule) -> PyResult<()> {
|
||
m.add_function(wrap_pyfunction!(fib, m)?)?;
|
||
Ok(())
|
||
}
|
||
|
||
What is different from normal Rust functions are the attributes.
|
||
#[pyfunction] automatically wraps a function to be Python compatible.
|
||
#[pymodule] creates a Python module that can be imported. Note that the
|
||
name of the module function fibonacci must be identical with the
|
||
lib.name in the Cargo.toml. Thanks to Rust being statically typed, this
|
||
code can be compiled down to highly efficient machine code.
|
||
|
||
On the Python side of things, we declare the following in ffi.py:
|
||
|
||
#!/usr/bin/env python3
|
||
|
||
import fibonacci
|
||
|
||
print(fibonacci.fib(34))
|
||
|
||
Before we can trigger pyo3-rs magic, we need to do the following to
|
||
setup a virtual environment and add maturin (a CLI tool to build and
|
||
publish PyO3 crates):
|
||
|
||
python -m venv .env # creates a virtual environment
|
||
source .env/bin/activate # activates it
|
||
pip install maturin
|
||
|
||
Now, we can build our Python extension module with
|
||
|
||
maturin develop --release
|
||
|
||
Finally, we compare the two approaches with hyperfine:
|
||
|
||
> hyperfine "python src/ffi.py" "python src/pure.py"
|
||
Benchmark 1: python src/pure.py
|
||
Time (mean ± σ): 1.197 s ± 0.218 s [User: 1.195 s, System: 0.002 s]
|
||
Range (min … max): 1.070 s … 1.801 s 10 runs
|
||
|
||
Benchmark 2: python src/ffi.py
|
||
Time (mean ± σ): 19.2 ms ± 0.8 ms [User: 17.7 ms, System: 1.4 ms]
|
||
Range (min … max): 17.9 ms … 21.2 ms 158 runs
|
||
|
||
Summary
|
||
'python src/ffi.py' ran
|
||
62.23 ± 11.66 times faster than 'python src/pure.py'
|
||
|
||
In summary, we reduce the computation time more than a factor of 60 with
|
||
our solution. For more complex functions, it is well possible to achieve
|
||
even higher gains.
|
||
|
||
U13: unsafe(ty) last
|
||
|
||
You almost made it! In this final training, Corro the Unsafe Rusturchin
|
||
is going to teach about unsafe Code, Debugging Tools for unsafe, Foreign
|
||
Function Interfaces, and a little bit of Assembly.
|
||
|
||
S13: Example solution
|
||
|
||
FFI with PyO3
|
||
|
||
lib.rs:
|
||
|
||
use num::complex::Complex;
|
||
use pyo3::prelude::*;
|
||
|
||
fn mandelbrot_at_point(cx: f64, cy: f64, max_iters: usize) -> usize {
|
||
let mut z = Complex { re: 0.0, im: 0.0 };
|
||
let c = Complex::new(cx, cy);
|
||
|
||
for i in 0..=max_iters {
|
||
if z.norm() > 2.0 {
|
||
return i;
|
||
}
|
||
z = z * z + c;
|
||
}
|
||
max_iters
|
||
}
|
||
|
||
#[pyfunction]
|
||
fn calculate_mandelbrot(
|
||
max_iters: usize,
|
||
x_min: f64,
|
||
x_max: f64,
|
||
y_min: f64,
|
||
y_max: f64,
|
||
width: usize,
|
||
height: usize,
|
||
) -> Vec<Vec<usize>> {
|
||
let mut rows: Vec<_> = Vec::with_capacity(width);
|
||
for img_y in 0..height {
|
||
let mut row: Vec<usize> = Vec::with_capacity(height);
|
||
for img_x in 0..width {
|
||
let x_percent = (img_x as f64 / width as f64);
|
||
let y_percent = (img_y as f64 / height as f64);
|
||
let cx = x_min + (x_max - x_min) * x_percent;
|
||
let cy = y_min + (y_max - y_min) * y_percent;
|
||
let escaped_at = mandelbrot_at_point(cx, cy, max_iters);
|
||
row.push(escaped_at);
|
||
}
|
||
rows.push(row);
|
||
}
|
||
rows
|
||
}
|
||
|
||
#[pymodule]
|
||
fn mandelbrot(_py: Python, m: &PyModule) -> PyResult<()> {
|
||
m.add_function(wrap_pyfunction!(calculate_mandelbrot, m)?)?;
|
||
Ok(())
|
||
}
|
||
|
||
ffi.py:
|
||
|
||
import io
|
||
from mandelbrot import calculate_mandelbrot
|
||
|
||
def render_mandelbrot(vals):
|
||
for row in vals:
|
||
line = io.StringIO()
|
||
for column in row:
|
||
if column in range(0,2):
|
||
line.write(' ')
|
||
elif column in range(3,5):
|
||
line.write('.')
|
||
elif column in range(6,10):
|
||
line.write('•')
|
||
elif column in range(11, 30):
|
||
line.write('*')
|
||
elif column in range(31, 100):
|
||
line.write('+')
|
||
elif column in range(101, 200):
|
||
line.write('x')
|
||
elif column in range(201, 400):
|
||
line.write('$')
|
||
elif column in range(401, 700):
|
||
line.write('#')
|
||
else:
|
||
line.write('%')
|
||
print(line.getvalue())
|
||
|
||
if __name__ == "__main__":
|
||
mandelbrot = calculate_mandelbrot(1000, -2.0, 1.0, -1.0, 1.0, 100, 24)
|
||
|
||
render_mandelbrot(mandelbrot)
|
||
|
||
Assembling numbers
|
||
|
||
Discuss in class.
|
||
|
||
Summary
|
||
|
||
What did you learn?
|
||
|
||
- What privileges and duties come whenever you start using unsafe in
|
||
your project.
|
||
- What rare practical use cases are there for you to employ unsafe.
|
||
Look first, if someone else already did the work for you.
|
||
- How Rust can interface with other languages using FFI, for instance,
|
||
to replace performance-critical functions with efficient Rust
|
||
implementations.
|
||
|
||
Where can you learn more?
|
||
|
||
- unsafe
|
||
- Rust Book: Ch. 19.1
|
||
- Programming Rust: Ch. 22
|
||
- Nomicon
|
||
- Unsafe Coding Guidelines
|
||
- Unsafe is not what you think it means
|
||
- Understand Unsafe Rust
|
||
- Rust for Rustaceans: Ch. 10
|
||
- Foreign Function Interfaces:
|
||
- Programming Rust: Ch. 23
|
||
- Rust for Rustaceans: Ch. 12
|
||
- Rust FFI Omnibus
|
||
- The Challenge of Using C in Safety-Critical Applications
|
||
|
||
Tools
|
||
|
||
Before we let you work with unsafe code, we want to show you two tools
|
||
that allow you to debug your software for undefined behaviour
|
||
|
||
cargo-careful
|
||
|
||
The first tool is cargo-careful, which lets you run Rust code with extra
|
||
care. You install it like this:
|
||
|
||
cargo install cargo-careful
|
||
|
||
and run it like this:
|
||
|
||
cargo +nightly careful run
|
||
|
||
run can be replaced with test to run your test suite instead of your
|
||
binary. careful changes the build process in that it builds your code
|
||
with a standard library in which all debug assertions are enabled.
|
||
Hence, your execution is really slow but way more assumptions are
|
||
checked while running. There are also some nightly-only flags that add
|
||
run-time checks against undefined behaviour.
|
||
|
||
The following shows some undefined behaviour we introduced in an unsafe
|
||
block:
|
||
|
||
fn main() {
|
||
let arr = [1, 2, 3, 4];
|
||
let slice = &arr[..2];
|
||
let value = unsafe { slice.get_unchecked(2) };
|
||
println!("The value is {}!", value);
|
||
}
|
||
|
||
If the do cargo run, value will become 3 but we violated memory rules by
|
||
indexing slice out of bounds. If we run it using cargo careful, the
|
||
get_unchecked precondition is evaluated and an index violation is
|
||
detected.
|
||
|
||
This is a simple (if not obvious) example, but you can imagine how
|
||
larger projects using various pieces of unsafe code could create less
|
||
obvious undefined behaviour.
|
||
|
||
miri
|
||
|
||
The second tool is miri, an interpreter for Rust’s mid-level
|
||
intermediate representation (MIR). This is not a course on compilers,
|
||
hence it should be enough for you to know that MIR is a simpler
|
||
representation of Rust code (i.e. syntax is desugared). With miri, you
|
||
can run binaries and tests of cargo projects to check for certain
|
||
classes of undefined behaviour, as we show below. If you are authoring
|
||
unsafe code, you should leverage miri to double-check if you do not
|
||
expose miri-detectable classes of undefined behaviour.
|
||
|
||
You can add miri like this:
|
||
|
||
rustup +nightly component add miri
|
||
|
||
and run it like this:
|
||
|
||
cargo +nightly miri run
|
||
|
||
The following examples have been kindly provided by Ralf Jung, the
|
||
author of miri and graduate of MPI-SWS. They represent cases where we do
|
||
not fulfill our duties mentioned before. You can try them out by copying
|
||
the code to a new crate and run the above command. Note that if you run
|
||
them with cargo run they might still produce some behaviour. However,
|
||
when multiple pieces of unsafe code work together, strange things can
|
||
happen.
|
||
|
||
Invalid Memory Access
|
||
|
||
Here, we attempt to dereference null:
|
||
|
||
#![allow(unused)]
|
||
|
||
fn main() {
|
||
unsafe {
|
||
let val = *(0 as *const u8);
|
||
}
|
||
}
|
||
|
||
Note that for this piece, cargo run already presents us with a warning
|
||
(as apparently some parts of Miri already work with cargo alone).
|
||
|
||
For the following, this is not the case.
|
||
|
||
fn main() {
|
||
unsafe {
|
||
let x = 0u8;
|
||
let ptr = &x as *const u8;
|
||
ptr.offset(1); // okay, one-past-the-end
|
||
ptr.wrapping_offset(2); // okay, wrapping_offset may go OOB
|
||
ptr.offset(2); // UB
|
||
}
|
||
}
|
||
|
||
Here, we create a pointer to a memory region (2 bytes into ptr) that
|
||
does not belong to what we allocated (1 byte for u8).
|
||
|
||
Type Invariants
|
||
|
||
We mentioned before that the memory region of a bool should contain
|
||
either the value 0 or 1. In the following, this is not the case:
|
||
|
||
fn main() {
|
||
unsafe {
|
||
let x: bool = std::mem::transmute(2u8);
|
||
println!("{}", x);
|
||
}
|
||
}
|
||
|
||
Similarly, enum memory should only ever contain values that are
|
||
associated with a valid enum variant. Here, we disobey this rule:
|
||
|
||
#[derive(Debug)]
|
||
enum Enum {
|
||
A,
|
||
B,
|
||
C,
|
||
}
|
||
fn main() {
|
||
unsafe {
|
||
let x: Enum = std::mem::transmute(3u8);
|
||
println!("{:?}", x);
|
||
}
|
||
}
|
||
|
||
unsafe
|
||
|
||
So far in this course, we have only used safe Rust code, which means
|
||
that the code we wrote (and successfully compiled) so far could not
|
||
contain certain forms of bugs. In particular, this is concerned with
|
||
so-called undefined behaviour.
|
||
|
||
Undefined behaviour describes the situation, where it is no longer
|
||
clear what you as the programmer intended and it is left free to the
|
||
compiler to pick a behaviour.
|
||
|
||
This is particularly bad, as arbitrarily bad things can happen. Let’s
|
||
build a crashing piece of code:
|
||
|
||
fn main() {
|
||
let mut a: usize = 0;
|
||
let ptr = &mut a as *mut usize;
|
||
unsafe {
|
||
*ptr.offset(-3) = 0x7ffff72f484c;
|
||
}
|
||
}
|
||
|
||
In the program, we take a raw pointer to the first stack variable a. In
|
||
the unsafe block, we do pointer arithmetic, leaving our original memory
|
||
area (a) and use the return address of main. In overwriting this value,
|
||
we make our program no longer well-behaved. So we have misused the
|
||
capabilities provided by unsafe. Fortunately, the operating system
|
||
provides memory separation, so we get a segmentation fault and only our
|
||
application crashes. On a embedded system (without OS), we could have
|
||
easily caused more trouble.
|
||
|
||
In summary, with the use of the unsafe keyword, we are entering the
|
||
realm of unsafe Rust where two things happen:
|
||
|
||
- First, you get more power as you can now write code that does not
|
||
need to conform with the compiler’s rules. You can think of unsafe
|
||
as a way to swear to the compiler “you don’t need to check this, I
|
||
know what I am doing”.
|
||
- Second, you get more responsibility, as it is now your fault if the
|
||
resulting code contains issues.
|
||
|
||
Metaphorically speaking, safe Rust is like a prison in where you’re not
|
||
allowed to bring shovels. Even more, it is a language — so it erases the
|
||
concept of shovels & digging from the inhabitants. In consequence, they
|
||
cannot even think about the concept of a shovel. With unsafe, thinking
|
||
about this concept is allowed again — including all the, potentially
|
||
devastating, consequences.
|
||
|
||
Before we get started, let’s clarify the use cases of unsafe a bit more.
|
||
It is important that, after you read and understood this section, you
|
||
don’t feel like you should now spread unsafe blocks all over your code
|
||
because it makes things easier. If you are writing high-level,
|
||
application-layer programs, it is extremely unlikely that you need to
|
||
use unsafe — it is even discouraged. If you want to enforce this policy
|
||
in your crate, use the #![forbid(unsafe_code)] attribute in your
|
||
top-level module, so that unsafe code cannot sneak in easily (assuming
|
||
you have contributors that might not be aware of unsafe consequences).
|
||
So when is unsafe really needed?
|
||
|
||
- If you write low-level software that deals with IO, registers, or
|
||
other hardware directly. Please note that in many cases, someone
|
||
already wrote that low-level code for you and provided a library
|
||
with safe abstractions.
|
||
- If you want to write efficient data structures whose structure and
|
||
algorithms do not comply with the Rust ownership rules. Again, it is
|
||
highly likely that someone already wrote a crate for that.
|
||
|
||
With this in mind, let’s remove the safety net and get unsafe.
|
||
|
||
Unsafe Privileges and Duties
|
||
|
||
Privileges
|
||
|
||
When you mark a block of code or a function as unsafe, you get access to
|
||
the following operations:
|
||
|
||
- dereference pointers
|
||
- call other unsafe functions
|
||
- call functions from other languages (via foreign function interface)
|
||
- mutably access global variables (with 'static lifetime)
|
||
|
||
Note that while Rust no longer avoids these potentially harmful
|
||
operations, the compiler still checks for (a) types, (b) lifetimes, and
|
||
(c) bounds on data structures.
|
||
|
||
Duties
|
||
|
||
Now, with unsafe in place, it is your duty to uphold the Rust rules for
|
||
well-behaved programs (source: Programming Rust):
|
||
|
||
- The program must not read uninitialized memory.
|
||
- The program must not create invalid primitive values:
|
||
- References, boxes, or fn pointers that are null
|
||
- bool values that are not either 0 or 1
|
||
- enum values with invalid discriminant values
|
||
- char values that are not valid, non-surrogate Unicode code
|
||
points
|
||
- str values that are not well-formed UTF-8
|
||
- Fat pointers with invalid vtables/slice lengths
|
||
- Any value of type !
|
||
- The rules for references must be followed:
|
||
- No reference may outlive its referent
|
||
- Shared access is only read-only access
|
||
- Mutable access is exclusive access
|
||
- The program must not dereference null, incorrectly aligned or
|
||
dangling pointers
|
||
- The program must not use a pointer to access memory outside the
|
||
allocation with which the pointer is associated
|
||
- The program must be free of data races
|
||
- The program must not unwind across a call made from another
|
||
language, via the foreign function interface
|
||
- The program must comply with the contracts of standard library
|
||
functions
|
||
|
||
Rust assumes that any unsafe code never violates any of these rules. If
|
||
this is the case, Rust can guarantee that the composition of several
|
||
safe Rust components is also safe.
|
||
|
||
It is important to note that checking for the above rules does not only
|
||
require you to look at the unsafe block but also its surroundings. Bugs
|
||
before the unsafe block can break contracts, which only turns into
|
||
undefined behaviour inside the block. Also, it is possible that the
|
||
consequences of contract breaking only happen after the unsafe block.
|
||
|
||
In essence, to be a good Rustacean, you should
|
||
|
||
- only use unsafe where needed, in blocks of code that are as small as
|
||
possible. As they must undergo review, this helps both yourself as
|
||
well as your reviewers.
|
||
- explicitly state contracts, by adding a # Safety section to each
|
||
unsafe function you write.
|
||
- uphold all contracts mentioned above.
|
||
|
||
Using cargo-geiger
|
||
|
||
If you care about the usage of unsafe in your project and its
|
||
dependencies, you can use cargo-geiger to check all of them. It returns
|
||
the following results:
|
||
|
||
- 🔒 = No unsafe usage found, declares #![forbid(unsafe_code)]
|
||
- ❓ = No unsafe usage found, missing #![forbid(unsafe_code)]
|
||
- ☢️ = unsafe usage found
|
||
|
||
Ideally, most of your dependencies have the lock symbol. Note that it
|
||
does not mean you should eliminate all unsafe code as much as possible.
|
||
Instead, the idea is to minimize the usage of unnecessary unsafe code as
|
||
much as possible. So in case you have the choice between two,
|
||
functionally equivalent libraries, pick the safer one.
|
||
|
||
Unsafe in Action
|
||
|
||
Efficient ASCIIString
|
||
|
||
This example shows how you can write efficient code, when you are well
|
||
aware that certain contracts are upheld, while the compiler is not aware
|
||
of this.
|
||
|
||
mod ascii {
|
||
#[derive(Debug, Eq, PartialEq)]
|
||
pub struct Ascii(
|
||
Vec<u8>
|
||
);
|
||
|
||
impl Ascii {
|
||
pub fn from_bytes(bytes: Vec<u8>) -> Result<Ascii, NotAsciiError> {
|
||
if bytes.iter().any(|&byte| !byte.is_ascii()) {
|
||
return Err(NotAsciiError(bytes));
|
||
}
|
||
Ok(Ascii(bytes))
|
||
}
|
||
}
|
||
|
||
#[derive(Debug, Eq, PartialEq)]
|
||
pub struct NotAsciiError(pub Vec<u8>);
|
||
|
||
impl From<Ascii> for String {
|
||
fn from(ascii: Ascii) -> String {
|
||
unsafe { String::from_utf8_unchecked(ascii.0) }
|
||
}
|
||
}
|
||
}
|
||
|
||
The type Ascii operates as follows: When the type is created based on a
|
||
vector of bytes, they are all checked if they are valid ASCII
|
||
characters. In this case, the vector is moved to be the inner type of
|
||
Ascii. As the from_bytes function is the only one to create Ascii
|
||
instances, the contract is upheld that the vector only contains valid
|
||
ASCII bytes. Now when we want to convert Ascii into a String this helps.
|
||
Internally, a String is a vector of bytes that have been checked if they
|
||
are valid UTF8 characters. As any ASCII character is a valid UTF8
|
||
character, we can in principle reuse the Ascii vector for the String. We
|
||
can do so by using the unsafe function from_utf8_unchecked, whose safety
|
||
contract is that the inputted bytes are all valid. We checked so before,
|
||
making the transformation a simple move of the vector from the Ascii to
|
||
the String type. If we had used the safe from_utf8(), this would have
|
||
been less efficient as it checks if these are all valid.
|
||
|
||
Here is the safe Ascii type in use:
|
||
|
||
use ascii:Ascii;
|
||
|
||
let bytes: Vec<u8> = b"ASCII string example".to_vec();
|
||
|
||
let ascii : Ascii = Ascii::from_bytes(bytes) // no allocation or copy, only scan
|
||
.unwrap();
|
||
|
||
let string = String::from(ascii); // Zero-cost: no allocation, copy, or scan
|
||
|
||
assert_eq!(string, "ASCII string example");
|
||
|
||
W13: Work Sheet
|
||
|
||
FFI with PyO3
|
||
|
||
Start from the following Python code which prints the Mandelbrot set and
|
||
rewrite the core performance-critical functions in Rust:
|
||
|
||
import io
|
||
|
||
def calculate_mandelbrot(max_iters, x_min, x_max, y_min, y_max, width, height):
|
||
rows = []
|
||
for img_y in range(height):
|
||
row = []
|
||
for img_x in range(width):
|
||
x_percent = img_x / width
|
||
y_percent = img_y / height
|
||
cx = x_min + (x_max - x_min) * x_percent
|
||
cy = y_min + (y_max - y_min) * y_percent
|
||
escaped_at = mandelbrot_at_point(cx, cy, max_iters)
|
||
row.append(escaped_at)
|
||
rows.append(row)
|
||
|
||
return rows
|
||
|
||
|
||
def mandelbrot_at_point(cx, cy, max_iters):
|
||
z = complex(0.0, 0.0)
|
||
c = complex(cx, cy)
|
||
|
||
for i in range(max_iters+1):
|
||
if abs(z) > 2.0:
|
||
return i
|
||
z = (z * z) + c
|
||
return max_iters
|
||
|
||
def render_mandelbrot(vals):
|
||
for row in vals:
|
||
line = io.StringIO()
|
||
for column in row:
|
||
if column in range(0,2):
|
||
line.write(' ')
|
||
elif column in range(3,5):
|
||
line.write('.')
|
||
elif column in range(6,10):
|
||
line.write('•')
|
||
elif column in range(11, 30):
|
||
line.write('*')
|
||
elif column in range(31, 100):
|
||
line.write('+')
|
||
elif column in range(101, 200):
|
||
line.write('x')
|
||
elif column in range(201, 400):
|
||
line.write('$')
|
||
elif column in range(401, 700):
|
||
line.write('#')
|
||
else:
|
||
line.write('%')
|
||
print(line.getvalue())
|
||
|
||
if __name__ == "__main__":
|
||
mandelbrot = calculate_mandelbrot(1000, -2.0, 1.0, -1.0, 1.0, 100, 24)
|
||
|
||
render_mandelbrot(mandelbrot)
|
||
|
||
Save this as pure.py. Now start with a ffi.py (a copy of pure.py) and a
|
||
lib.rs for using pyo3 to bridge the two. In a first step, move
|
||
mandelbrot_at_point from Python to Rust. Afterwards, also move
|
||
calculate_mandelbrotset to Rust. You are allowed to use
|
||
num::complex::Complex (from the third-party num crate).
|
||
|
||
Finally, run hyperfine "python src/ffi.py" "python src/pure.py" to see
|
||
how the performance improves.
|
||
|
||
Assembling numbers
|
||
|
||
Consider the following function that sums a slice of numbers (in
|
||
contrast to working on ranges as in the earlier section):
|
||
|
||
pub fn sum_numbers(numbers: &[u8]) -> u8 {
|
||
let mut sum = 0;
|
||
for num in numbers {
|
||
sum += num;
|
||
}
|
||
sum
|
||
}
|
||
|
||
Your task is now to:
|
||
|
||
- Have a close look at the LLVM-IR and assembly and annotate which
|
||
parts of the code implement which higher level function.
|
||
- Rewrite the function by using an Iterator and an appropriate
|
||
consumer function. What happens to the IR and assembly?
|
||
|
||
U14: Energy-Aware Systems
|
||
|
||
[Timo Hönig]
|
||
|
||
Timo Hönig, © RUB, Marquard
|
||
|
||
Finally, DSys invited Timo Hönig (RUB) as the last coach to give a
|
||
lecture on the design and implementation of energy-aware computing
|
||
systems. From the perspective of the practical design of operating
|
||
systems and system software, the lecture will discuss methods and
|
||
approaches to improve non-functional system properties such as
|
||
performance and dependability - in particular under the consideration of
|
||
the systems’ energy demand.
|
||
|
||
[1] Niels Bohr, winner of a Nobel Prize in physics, contributed the Bohr
|
||
model the atom, which is a rather stable and tangible model. Werner
|
||
Heisenberg, another Nobel Prize in physics winner, described the
|
||
“uncertainty principle”, where things change or disappear if you try to
|
||
measure them.
|
||
|
||
[2] Niels Bohr, winner of a Nobel Prize in physics, contributed the Bohr
|
||
model the atom, which is a rather stable and tangible model. Werner
|
||
Heisenberg, another Nobel Prize in physics winner, described the
|
||
“uncertainty principle”, where things change or disappear if you try to
|
||
measure them.
|
||
|
||
[3] The meaning of life.
|
||
|
||
[4] Niels Bohr, winner of a Nobel Prize in physics, contributed the Bohr
|
||
model the atom, which is a rather stable and tangible model. Werner
|
||
Heisenberg, another Nobel Prize in physics winner, described the
|
||
“uncertainty principle”, where things change or disappear if you try to
|
||
measure them.
|
||
|
||
[5] Niels Bohr, winner of a Nobel Prize in physics, contributed the Bohr
|
||
model the atom, which is a rather stable and tangible model. Werner
|
||
Heisenberg, another Nobel Prize in physics winner, described the
|
||
“uncertainty principle”, where things change or disappear if you try to
|
||
measure them.
|
||
|
||
[6] The meaning of life.
|
||
|
||
[7] Tony Hoare invented it and calls it his billion dollar mistake.
|
||
|
||
[8] Niels Bohr, winner of a Nobel Prize in physics, contributed the Bohr
|
||
model the atom, which is a rather stable and tangible model. Werner
|
||
Heisenberg, another Nobel Prize in physics winner, described the
|
||
“uncertainty principle”, where things change or disappear if you try to
|
||
measure them.
|
||
|
||
[9] Niels Bohr, winner of a Nobel Prize in physics, contributed the Bohr
|
||
model the atom, which is a rather stable and tangible model. Werner
|
||
Heisenberg, another Nobel Prize in physics winner, described the
|
||
“uncertainty principle”, where things change or disappear if you try to
|
||
measure them.
|
||
|
||
[10] Niels Bohr, winner of a Nobel Prize in physics, contributed the
|
||
Bohr model the atom, which is a rather stable and tangible model. Werner
|
||
Heisenberg, another Nobel Prize in physics winner, described the
|
||
“uncertainty principle”, where things change or disappear if you try to
|
||
measure them.
|
||
|
||
[11] Niels Bohr, winner of a Nobel Prize in physics, contributed the
|
||
Bohr model the atom, which is a rather stable and tangible model. Werner
|
||
Heisenberg, another Nobel Prize in physics winner, described the
|
||
“uncertainty principle”, where things change or disappear if you try to
|
||
measure them.
|
||
|
||
[12] Niels Bohr, winner of a Nobel Prize in physics, contributed the
|
||
Bohr model the atom, which is a rather stable and tangible model. Werner
|
||
Heisenberg, another Nobel Prize in physics winner, described the
|
||
“uncertainty principle”, where things change or disappear if you try to
|
||
measure them.
|