Rust: Haskell but more imperative?

Cătălin December 17, 2023 [RustHaskell] #Functional

Intro

What's the first thing that comes to mind when you hear "Rust"? I bet it's systems programming and languages such as C++. Would you consider me crazy if I said that Rust is just as much influenced by functional programming and languages like Haskell?

Have a look at this expression that calculates the factorial of a number in Rust:

fact :: Integer -> Integer
fact n = product [1..n]

Just look at how concise... oh wait, that's actually Haskell. Here's how it's done in Rust:

fn fact(n: u32) -> 32 {
    (1..=n).product()
}

Quite similar, right?

With this sample out of the way, the rest of the post will cover how Rust and Haskell use similar philosophies for handling complexity. The approaches might differ a bit, but the underlying values are the same. We're also going to cover features that are present in the both languages.

Check out the video

If you don't feel like reading:

Principles

Purity

Haskell uses the idea of purity to ensure correctness. A pure function only takes an input and returns an output, without any observable side effects such as creating state or performing IO operations.

This allows for better abstraction and composability as it's easier to understand what the function is doing - it will always do the same thing, no matter where you use it. With this, we can split up our program into separate pieces (or combine them into bigger components) without worrying about the underlying implementation of the component. We can just treat every function at any level like a black box. In imperative code, where we create state and use it in the control flow of our functions, we can't reliably do this, since components that share state will be implicitly and almost invisibly linked together.

Impure code

Of course, we can't just chill-out in pure-land, as we need to do stuff that affects the real world, so we need code that handles side-effects - and for this, Haskell uses monads. A monad is a computational construct which allows us to apply context-aware series of transformations to the data inside the context (in other words, the context ensures that the transformation will not violate it's own integrity rules).

Now, monads do not automatically allow side effects, it's just that you can create ones that handle side effects for the programmer.

As an example we have the IO monad which can be used to chain together series of IO operations, for which the Haskell runtime handles side effects. Values produced by an IO action cannot be used outside of an IO context. Because of this, the main function is an overarching IO action made up of smaller actions - modeled as monad transformations. We call bind on the monads using pure functions in order to achieve these trasnformations, and then have the side effects be executed automatically. We cannot acess the IO values produced by the actions inside the monads directly, but instead, we inject behavior into the IO monad, and the monad will then apply our function to the value inside it's context, while keeping the result wrapped in its type.

With this we have a sort of encapsulation of impurity. We know that all the code that can cause side-effects is in our monadic operations. Ideally we would keep our program's main small, having it be the only place we care about state and effects, and keep the rest of the functions we use to process data pure. This way we can combine and abstract components confidently. Of course, for more complex programs we will need to use multiple monads which will be running in the context of an overarching principal monad.

Rust's safe-unsafe separation

Rust uses a similar concept with the unsafe keyword. In this case the encapsulation of unsafe code happens towards the bottom level of the code hierarchy, unlike Haskell where the top level logic of the program has to be in a monadic context. Instead of having a main encapsulated instance at the top, we have multiple blocks closer to the low-level logic.

In order to handle systems programming tasks we have to perform operations which the compiler cannot guarantee safety for (such as direct memory manipulation). The places where we do this will be wrapped in unsafe blocks and accessed through a safe abstraction. In this way, we encapsulate the places where memory issues can occur. This will usually be done at a very low level.

Keep in mind - unsafe doesn't mean the code is unsafe, but just that the compiler cannot guarantee safety via static analysis. So, it's up to the author of the code to check and ensure safety.

Ownership and mutability semantics

Now, this still doesn't solve that problem of mutable state (Which Haskell handles with pure functions). For this, we have mutability semantics together with the ownership system. You can mutate state in Rust , but this has to be done under strict rules. You can only mutate a value when nobody is reading it, let alone trying to write to it.

You also can't change mutability for an entity unless ownership changes.

This concept of moving values from one owner to another gives fined grained control over mutation, ensuring that values don't change when we don't expect them to, but without being as rigid as pure functions. This also comes with a performance advantage. Pure functions mean we have to copy objects in order to return modified versions of them, while using the move concept in Rust allows us to modify the objects in place, without needing a copy, which can be quite expensive for certain types. Of course, you can already do this in other languages, but with Rust, you can assign the object to a mutable handler when you need to modify it, and move it to an immutable identifier once you're finished with the changes - so you can maintain a greater level of control.

So, Haskell makes you handle state and side effects using monadic operations in a smaller encapsulated part of the program, while Rust uses ownership and mutability semantics to control state transformations throughout all the codebase.

Expression based

Another defining feature of the two languages that is easy to overlook is that they're designed around expressions. An expression is a piece of code that returns something. Statements, on the other hand are parts of code that just do something, but don't return anything.

Most common languages are generally built around statements, and few instructions act as expressions. This makes the code less concise and the parts harder to chain together.

As an example, in Rust, if else blocks are expressions - so you can assign the result of a block to a variable, or even pass it directly to another expression

let x = 4;
let size = if x > 3 {"greater"} else {"smaller"};

Matches are also expressions:

let bool_val = true;

let bool_str: String = match bool_val {
	true => "true".to_owned(),
	false => "false".to_owned()
};

Even loops are expressions - Here, we return the number we break at, after adding the value 32 to it.

let mut num = 0;

    let result = loop {
        num += 1;

        if num == 10 {
            break num + 32;
        }
    };

Most instructions in Rust are expressions, except for those instructions that declare variables and assign values.

In Haskell, with all operations being function calls - virtually everything returns a value that can be passed to another function, so it's entirely based around expressions.

Types

Typeclasses

Another area in which the languages are similar is their type systems. Both use complex type systems to encode business logic and then enforce it at compile time. Here are the similarities between the two:

Typeclasses in Haskell are the equivalent of traits in Rust. With this we can use values in a polymorphic way by defining behavior that the data needs to support.

The Show typeclass allows us to encode types as Strings in order to display them.

class  Show a  where
    showsPrec        :: Int -> a -> ShowS
    show       :: a -> String 
    showList         :: [a] -> ShowS

In Rust this can be achieved by using the Display trait.

pub trait Display {
    // Required method
    fn fmt(&self, f: &mut Formatter<'_>) -> Result<(), Error>;
}

The typeclass / trait system has a powerful integration with the standard library in both languages. For example, the Read typeclass and FromStr trait allow us to construct data values from strings. These are kind of mirror traits to Show and Display which we covered earlier.

Here, we're reading the number 42 from a String using the Read typeclass:

read :: (Read a=> String -> a

//......

read "42" :: Int

Here we're using the FromStr trait for the same purpose:

pub trait FromStr: Sized {
    type Err;

    // Required method
    fn from_str(s: &str) -> Result<Self, Self::Err>;
}

//......

let x = i32::from_str("42").unwrap();

Both languages can derive traits or typeclasses:

data Point = Point Float Float deriving (Show)

show Point 4.2 4.2
#[derive(Debug)]
struct Point(f32, f32);

println!("{:?}", Point (4.2, 4.2));

Both languages support default implementations in traits / typeclasses, though Haskell can be more flexible in this area. Here's an example with the Eq typeclass, where the user can either implement the == or /= member function in order to have implementations for both:

class Eq a where  
    (==) :: a -> a -> Bool  
    (/=) :: a -> a -> Bool  
    x == y = not (x /= y)  
    x /= y = not (x == y)

Because we declare what functions need to do instead of how to do it, we can define a function in reference to another, so the user can implement either of these two and the other will be automatically available.

Both languages support generic parameters, and both allow defining type constraints on both data and interface implementations. Here we're using a generic type T with PartiarOrd as a constraint:

fn find_largest<T: std::cmp::PartialOrd>(list: &[T]) -> Option<&T> {
    if list.is_empty() {
        return None;
    }
    let mut largest = &list[0];
    for item in list {
        if item > largest {
            largest = item;
        }
    }
    Some(largest)
}

In the Haskell example, we use a generic type a that has to implement the typeclass Ord:

findLargest :: (Ord a) => [a] -> Maybe a
findLargest [] = Nothing
findLargest (x:xs) = Just $ foldl (\acc y -> if y > acc then y else acc) x xs

The associated type feature is also available in both languages. In the following example we have a trait named Contains, with associated types A and B:

struct Container(i32, i32);

trait Contains {
    type A;
    type B;

    fn contains(&self, _: &Self::A, _: &Self::B) -> bool;
    fn first(&self) -> i32;
    fn last(&self) -> i32;
}

impl Contains for Container {
    type A = i32;
    type B = i32;

    fn contains(&self, number_1: &i32, number_2: &i32) -> bool {
        (&self.0 == number_1) && (&self.1 == number_2)
    }
}

This is the same thing implemented in Haskell:

{-# LANGUAGE TypeFamilies #-}

class Contains a where
  type A a
  type B a
  contains :: a -> A a -> B a -> Bool

data Container = MkContainer Int Int

instance Contains Container where
  type A Container = Int
  type B Container = Int
  contains (MkContainer x y) a b = x == a && y == b

Algebraic data types

Both languages have algebraic data types (what we call enums in Rust). In Haskell all data types can behave like enums.

In this example we have a command that can be used to perform one of 3 actions:

enum Command {
	Move {x: f32, y: f32}
	Log(String)
	Quit
}

And here's the Haskell equivalent:

data Commmand = Move {x :: Float, y :: Float} | Log String | Quit

Don't confuse the Haskell enum with the Rust enum. An enum in Haskell is typeclass which represents types for which I can get a successor and a predecessor:

data Day = Monday | Tuesday | Wednesday | Thursday | Friday | Saturday | Sunday   
            deriving (ShowEnum)

succ Tuesday
pred Tuesday

In this example I can get the next or previous day of the week using the successor and predecessor functions.

Monads

While Rust doesn't have the ability to define the generic concept of a Monad, like Haskell does, it does have some specific instances of monads implemented in the standard library, while also allowing users to create similar constructs.

Either is a monad that models something that can have one of two types, and this is often used to represent an operation that can fail with an error - similar to the Result enum in Rust.

Here, the Left variant represents an error message:

safeDiv :: Float -> Float -> Either String Float
safeDiv x 0 = Left "Divison by zero"
safeDiv x y = Right (x / y)

In Rust, the second type in the result represents the error , which is modeled using an Error type, but you can use String messages here as well:

fn safe_div(num: f32, div: f32) -> Result<f32, String> {
    if div == 0_f32 {
        Err("Divide by zero".to_string())
    }
    else {
	    Ok(num / div)
	}
}

The Haskell Maybe represents a value that is possibly there or Nothing - same as the Option enum in Rust. Here we have the same thing as in the previous examples where the function tries to do a safe divide, just that we replace the errors with the Nothing or None variants:

safeDiv :: Float -> Float -> Maybe Float
safeDiv x 0 = Nothing
safeDiv x y = Just (x / y)
fn safe_div(num: f32, div: f32) -> Option<f32> {
    if div == 0_f32 {
        None
    }
    else {
	    Some(num / div)
	}
}

Type aliases / synonyms

We can use the type keyword to create a shorthand alias (in Rust) or synonym (in Haskell) for a certain type. In this example, we're assuming a default error type and aliasing to a result type that omits the error:

type ParseResult<T> = Result<T, ParseIntError>;
type ParseResult a = Either a ParseError

Pattern matching

Both languages have the concept of pattern matching, which can be used to easily extract values by matching against the structure of some data. Here's an example straight from the Rust book where we use pattern matching to extract inner data from the variants of the message enum. We can easily get the x and y coordinates from Move, the String from the Write tuple, and the RGB components from the ChangeColor variant.

enum Message {
    Quit,
    Move { x: i32, y: i32 },
    Write(String),
    ChangeColor(i32, i32, i32),
}

fn main() {
    let msg = Message::ChangeColor(0, 160, 255);

    match msg {
        Message::Quit => {
            println!("The Quit variant has no data to destructure.");
        }
        Message::Move { x, y } => {
            println!("Move in the x direction {} and in the y direction {}", x, y);
        }
        Message::Write(text) => {
            println!("Text message: {text}");
        }
        Message::ChangeColor(r, g, b) => {
            println!("Change the color to red {}, green {}, and blue {}", r, g, b)
        }
    }
}

And here's the same thing in Haskell:

data Message = Quit
             | Move { x :: Int, y :: Int }
             | Write String
             | ChangeColor Int Int Int

main :: IO ()
main = do
    let msg = ChangeColor 0 160 255

    case msg of
        Quit -> putStrLn "The Quit variant has no data to destructure."
        Move { x = xVal, y = yVal } ->
            putStrLn $ "Move in the x direction " ++ show xVal ++ " and in the y direction " ++ show yVal
        Write text ->
            putStrLn $ "Text message: " ++ text
        ChangeColor r g b ->
            putStrLn $ "Change the color to red " ++ show r ++ ", green " ++ show g ++ ", and blue " ++ show b

This is using the special do notion available with the IO monad. The same kind of matching can be used in normal syntax with just pure functions:

hasNumericData :: Message -> Bool
hasNumericData msg = case msg of
    Quit -> False
    Move { x = _, y = _ } -> True
    Write _ -> False
    ChangeColor _ _ _ -> True

This is a functions that returns true if the variant contains numeric values, or false otherwise. The underscores indicate that we're ignoring the matched values - we just care about the specific variant here. We could replace the underscores with proper names if we want to actually use the values in our logic.

These are just a few examples. While Rust can use pattern matching for match blocks, if and while let's, destructuring, tuples and slices - Haskell has much more powerful pattern matching. Due to Haskell's declarative way of defining functions, you can use pattern matching anywhere you could bind a variable. The examples would be too much to cover, but here's an example of a lambda expression that switches the values in an tuple between each other:

swap = \(x,y) -> (y,x)

Due to this feature, no intermediary values are necessary for the switch.

Other similarities

Both languages provide compile-time metaprogramming with macros in Rust and the TemplateHaskell extension. So, you can write code that at compile time generates other code.

A potent example of this is the ability to check SQL against the database schema at compile time. The sqlx crate in Rust and the postgres-typed package in Haskell offer this feature.

There are more similarities such as the ability to apply functional operations on lists or iterators along with functions and closures being first-class citizens, ideas such as lambdas - but these are quite common in many languages so they're not worth discussing here.

Hope you enjoyed reading! 'Till next time!

Back to top