170
Rust-Written Zlib-rs Is Not Only Safer But Now Outperforming Zlib C Implementations
(www.phoronix.com)
Welcome to the Rust community! This is a place to discuss about the Rust programming language.
Credits
There has been a zlib in Ada for many years, doing its job quietly. Speed comparable to the C version, probably not beating it, but not trailing by much in any case. Rust is safer than C but less safe than Ada from what I can tell.
Rust (edited for clarity) looks to me to be about halfway between C and C++ in complexity, with a bunch of footguns removed, and using implicit move semantics ("borrowing") more than C++ does, and the notorious borrow checker is simply the compiler making sure you don't make RAII mistakes because of that.
It's always seemed to me that Phoronix is too often about turning mailing list drama into clickbait. I've mostly disliked it, because of that.
I don't know much about Ada, but to my knowledge, its safety is difficult to compare with Rust.
Ada has a type system that can express lots of details (like that an hour is in the range from 0 to 23), but then Rust prevents you from doing dumb things across threads (which might be part of the reason why a faster implementation was so quick to be implemented) and Rust has a more functional style, which also tends to avoid various bugs.
And then, yeah, kind of similar thing for C/C++.
If we just score the complexity and then compare numbers, I can see how you might arrive at the halfway mark. (Not knowing terribly much about C++ either, having so many legacy concepts feels incredibly daunting, so I'd put Rust rather at a third of the complexity points, but either is fair.)
But yeah, on the other hand, I would not say that Rust is as if C and C++ had a baby with some footguns removed.
C is a hardcore procedural language (aside from the ternary operator). I have to assume that C++ introduced some functional concepts at some point in its history, but Rust is much more oriented in that direction as a whole.
I also believe your description of the borrow checker simply preventing RAII mistakes is a bit too simple. As I already mentioned, Rust also prevents you from doing dumb things across threads.
It does so by the borrow checker checking that you only have one mutable reference at a time ("mutable reference" meaning the holder can modify the value behind the pointer). It also prevents having non-mutable references while a mutable reference is being passed around. If you actually need mutable access accross threads, it forces you to use a mutex or similar.
And yeah, the borrow checker being such an integral aspect, I'd also argue that it has other effects. In particular, it really pushes you to make your program tree-shaped, so where data is initialized at the top of a (sub-)tree and then the data is only temporarily passed down into functions as references.
IMHO that's generally a good idea for making programs understable, but it's a wild departure from the object-oriented world, for example, where everyone and everything just holds references to each other. (You can also do such references in Rust, if you need it, via
Rc
andArc
, but it's not the first tool you reach for.)Rust can do that too with const generics, no?
I don't know much about Ada though, but I hear it rocks. Rust has a lot stronger community though, and that carries a lot of weight.
Hmm, I haven't really played around with const generics much, but I guess, you could maybe implement a custom
InRange
type, so you could use it like this:That
::new()
function would contain the assertions and could beconst
, but I don't know, if that actually makes it execute at compile-time, when called in normal runtime code. Might be worth trying to implement it, just to see how it behaves.Was that an open question or did you have a solution in mind? 😅
What would definitely work in Rust, though, is to implement a macro which checks the constraint and generates a
compile_error!()
when it's wrong. Typically, you'd use a (function-like) proc_macro for this, but in this case, you could even have amacro_rules!
macro with 24 success cases and then a catch-all error case.Well, and of course, it may also be fine (or even necessary) to check such numbers at runtime. For that, just a wrapper type with a
::new()
function would work.More open. I saw it land in stable some time back and haven't gotten around to playing with it. I honestly haven't done it much, because usually enums are plentywhen there are a finite set of options.
And yeah, I was thinking of runtime checks with const bounds, like this:
I'm not sure how magic Ada gets with things, so maybe it's a lot nicer there, but I honestly can't see how it could really improve on handling runtime checks.
Well, I don't know much about Ada, but it's typically lauded for all its compile-time checks. Obviously, you can't compile-time check something when it's loaded at runtime from e.g. a configuration file, but yeah, I'm guessing that's probably where it shines, that it uses compile-time checks when possible.
I would really like the mythical higher kinded types (which I think covers what Ada does here), but unfortunately we don't have that yet.
I had thought the C and Ada zlibs were single threaded. I do some big compression tasks sometimes but haven't felt the need for a multi-threaded zlib since I just use parallel processes to compress lots of files.
For an example of Ada safety, integer arithmetic is overflow checked by default. The program raises Constraint_Error on overflow. Rust is checked in debug builds and wraps around (modular arithmetic instead of standard arithmetic) in release builds. Ada also has DBC with static checking using SPARK, and Ada has a much more serious package and module system (that area is under development for Rust though). As another example, Ada has a very rigorous specification (the ARM, or Ada Reference Manual) while Rust is something of amoving target. That again helps verify Ada programs with formal methods.
Rust doesn't currently have exceptions, so you have to check error codes pervasively through your program, and that sounds easy to mess up. I don't know whether Rust's designers think of this as a shortcoming (fixable later) or a feature.
I do get the impression that Rust lets you write some things easily that are difficult or impractical in Ada. I don't know how well Ada handles shared memory concurrency. It has language support for multitasking with tasks communicating through mailboxes, more or less.
I'll defer to you about the description of the borrow checker. But, I doubt it's idiomatic to use standard functional programming techniques in Rust, e.g. shared immutable tree structures for lookups. That usually relies on garbage collection. As you say, Rc and Arc are there in Rust, but as we saw with decades of GIL anguish from Python, it's imho preferable to do GC for real if that is what you want.
Disclaimer: I haven't actually coded anything in Rust so far. I finally got around to reading a book about it recently and I mostly liked what I saw, and it seemed to me to be mostly familiar and reasonably comfortable. I had somehow expect the type system to be much more complicated.
Well, the others already responded to some of your points, I'll try to answer the rest.
Well, I don't know what the Ada system is like, but I will say that Rust has one of the nicest module systems, in my opinion. "Serious" isn't necessarily the adjective I would choose for it, but it works well despite being fairly simple and what I love in particular, is that you can start a codebase small and grow it larger and larger without breakage of module paths.
You do need to build a midsized codebase to really experience that, but basically you can go from a file to a folder to a folder with lots of subfolders without ever changing the imports, even when you move the actual type definition to be further down the tree.
As the others already said, it's a feature. It comes from the functional world (putting data flow and control flow on the same path) and yeah, I find if you want to do solid error handling, it's really good at forcing you to do it.
If you don't want to do solid error handling (e.g. because you're just writing a script or the startup logic of an application), you can get behavior very similar to exceptions by using anyhow for error handling.
Well, the borrow checker also kind of obsoletes relying on immutability for correctness. If you actually want to share that tree between threads, you do need a mutex then, but within the same thread just the ownership and mutability rules prevent you from updating the tree while others might be reading it. Effectively, if the compiler allows you to update the tree, it is safe to do so.
This is IMHO not talked about nearly enough, but Rust effectively makes mutability a viable strategy again, particularly because it also forces you to make mutability explicitly visible at all times. It is somewhat antithetical to functional programming to mutate a variable, because it is a side-effect. But if this side-effect cannot bite you, it's not actually a problem.
In particular, always cloning values is only not a problem, if you're really doing puristic FP. As soon as you store state and you duplicate this state to update it, you might have two different states in your application.
Re: errors Rust Result is an algebraic data type, so an enum with two variants (one is Ok and the other is Err). This means that you cannot use the result without checking it, making impossible to mess up error handling. Well, you can always panic by calling unwrap(), but then you don't have a program to worry anymore ;)
You also have to check all the time for flags set by signals. Example: your aerodynamic simulation hits a numeric overflow and raises SIGFPE. How do you handle it?
Rust supports wrapping, saturating, and checked operations, which allows you to precisely define the behavior you want from your math operations, and avoiding ever hitting an (unchecked) overflow.
I saw something where you can wrap a function around an operation to say how to handle overflow, but that seems like a mistake. Modular (wrapping), saturating (sometimes useful), and checked (standard arithmetic within the machine bounds) are all good, but they should be conveyed in the datatype. Particularly, the default integer datatypes (i32, i64) should be checked. Unchecked arithmetic (including wrapping around when the application is written as if the ints were unbounded) is simply unsafe, like unchecked array subscripts.
It's ok if there is an optimization pragma to enable this for performance when necessary. Ada does it the right way, and implementations I know of have such a pragma available for when you want it. Also, while this is a matter of tooling rather than language, Ada currently has better facilities (SPARK) for statically verifying that integer arithmetic in a program doesn't overflow.
I'm not trying to bash Rust or get into a Rust vs Ada war, but am noting the differences that I see.
Wrapping and Saturating are available as data types in std. Checked can't be a (useful) data type as-is because it by definition changes the type of the return value of operations (
Option<T>
instead ofT
). But you can trivially add a noisy/signalling wrapper yourself if you wish to (basically doing checked ops and unwrapping all results). An example of something offering a noisy interface is a crate named noisy_float.Checked arithmetic failing should raise an exception like it does in Ada. What happens if you use an out of range array subscript a[n]? Does that always return an option type? Really, these types of errors are rare enough that it's unfeasible to program defensively around the possibility all the time. But they are frequent enough (especially with malicious input) that we've had 50 years of buffer overruns in C, leading to the invention of Rust among other things.
Wrapping and saturating are for special purposes: wrapping for when you're explicitly dealing with computer words (as in bit operations or cryptography) and saturating in some media applications and the like. It's amusing that C in a certain sense is more correct than Rust or Java in this way. Signed arithmetic overflow in C is UB, so the compiler is at least permitted to always check the arithmetic and signal on overflow (use -ftrapv for this). C doesn't have a way to check unsigned overflow. Things were muddled in the 1970s when C was designed ;).
I think it would be an improvement to Rust to fix its arithmetic to work like Ada's.
It never returns an option type. This
Index
interface happens to be actually noisy as implemented for some std types. Although you can implement it however you like for your own data types (including ones just wrapping the std ones). And we have checked access (example) and unchecked access (example) as methods.It's actually astonishing the lengths you're taking to NOT learn anything, to the point of just imagining things about Rust that are supposedly done wrong compared to Ada.
I think that you would be surprised by the amount you would learn if you spent five minutes actually trying to answer your own questions, instead of treating them as proof that you just made a relevant point merely by asking them.
I am struggling to understand why you are getting downvoted.
I don't downvote people, but since you asked.
Who asked?
There is no one C version. The version being referred to is the original zlib, which happens to be the worst implementation of four possible zlib back-ends available in the
flate2
crate. Besides the originalzlib
andzlib-rs
, there is zlib-ng andcloudflare_zlib
, both of which are also (still) implemented in C.So being comparable to the original zlib is hardly something to shout about. In fact, individual hobbyists have been beating that implementation just for fun for many years.
That's a lot of inaccurate waffling that could have been entirely written by an LLM, except it's probably too wrong for it to have been done so.
I appreciate the reply