In languages with static and convenient type systems, I try to instead encode units as types. With clever C++ templating, you can even get implicit conversions (e.g. second -> hour) and compound types (e.g. meter and second types also generate m/s, m/s^2 and so on).
Programming
Welcome to the main community in programming.dev! Feel free to post anything relating to programming here!
Cross posting is strongly encouraged in the instance. If you feel your post or another person's post makes sense in another community cross post into it.
Hope you enjoy the instance!
Rules
Rules
- Follow the programming.dev instance rules
- Keep content related to programming in some way
- If you're posting long videos try to add in some form of tldr for those who don't want to watch videos
Wormhole
Follow the wormhole through a path of communities [email protected]
IIRC F# even has built-in support for units.
A good example is Go's time package. You'd normally express durations like 5 * time.Second
and the result is a time.Duration
. Under the hood, it's just an int64 nanoseconds, but you'd never use it as a plain nanoseconds. You'd instead use it like d.Seconds()
to get whichever unit you desire.
I prefer to encode quantities as types (and store value with most precision inside) and provide functions that return it in desired unit as int/long/whatever.
E.g. Duration type that stores nanoseconds and has to_seconds(), to_milliseconds() etc. It just feels more natural to me. Why should some function care which units are used? It just needs "duration" and will convert to desired unit internally (also it won't be part of its api which is good because it's unnecessary restriction).
Of course some C++ devs will disdain this approach because it's inefficient to pass highest precision value around when its not needed but for my use cases it doesn't matter.
Also, you should always use standard types when available. E.g. C++ has std::chrono, in Java world there are java.time types and kotlin.Duration.
I like how in Nim you can create a type and then overload default operators to support custom types (operators are just functions with 2 arguments in Nim), in example: you can do Hours + Seconds or kilometers * miles, etc. It feels very organic and not like a "hack".
Those are just types. You shouldn't write types in the names. It's called Hungarian Notation, but it's just redundant. If you need to check the type of a variable, hover over it and your IDE should tell you that temperatureThreshold
is type DegreesCelsius
. No need to add extra cruft. There's also a question of how specific everything needs to be.
It's also especially problematic if you later refactor things. If you change units, then you have to rename every variable.
Plus, variables shouldn't really be tied to a specific unit. If you need to display in Fahrenheit, you ideally just pass temperatureThreshold
and it converts types as needed. A Temperature
type that that has degreesF()
and degreesC()
functions is even cleaner. Units should just be private to the type's struct.
I absolutely agree. But:
- sometimes you need to modify existing code and you can't add the types necessary without a giant refactoring
- you can't express units with types in:
- JSON/YAML object keys
- XML tag or attribute names
- environment variable names
- CLI switch names
- database column names
- HTTP query parameters
- programming languages without a strong type system
Obviously as a Hungarian I have a soft spot for Hungarian notation :) But in these cases I think it's warranted.
Not sure what languages you commonly work with, but in good modern languages you can simply declare "feet" as an alias of integer (or double?), and no refactoring would be required.
And any good toolchain to parse / generate JSON/etc can absolutely get the types right.
There are plenty of times where the type is just something generic like an integer and making a wrapper type is not worth the effort and this is a useful approach.
fileSizeLimitGB
Surely its fileSizeLimitGiB
/s
That seems akin to commenting. The problem with this approach is that text is not code. It's very easy to forget to change text. In that case it becomes the worst of both worlds, you have a variable name that actually misleads you.
Much safer than this is to encode this kind of information into the code itself in such a way the program won't compile of the types are incorrect.
I understand what you mean, and I even agree with it, but just to be a little pedantic, variable names are code, or at least they are more code than comments or docs.
But yes, encoding units into the type system is a much better solution. It doesn't work however for config options, environment variables or CLI switches.
It does work - you have a function somewhere that converts your environment variable to the correct type, possibly with a default value, throwing an exception if it's an invalid value (maybe cache size has a minimum of 100MB or the software will be unusable), and has extensive unit tests for the function.
It's so annoying when you have to figure out what unit a variable is describing :(
That constant frustration is what made me write this post :)
Looks like Hungarian Notation
Related: Making Wrong Code Look Wrong
TL;DR: there is good and bad Hungarian notation. Encoding types (like string or int) in variable names is bad. Encoding information that cannot be expressed in the type system is good. (Though with the development of type systems, more and more of those concepts can be moved into the types, keeping variable names clean.)
But as a Hungarian, I'm obviously a little biased :)
The better fix is to try to use types that represent those units or data types (e.g. duration instead of ms). Makes it harder to accidentally use the wrong units and documents the code / intent better.
But what if the FileSize can be „1G“, „1024M“, 518K“, etc.?
Documentation itself is much more important and modern IDEs and Editors will show you what to type in :)
In that case I would call the variable fileSizeWithUnit
and also document what the possible units are. I wouldn't say that documentation is categorically more important than good naming. Both are different aspects of good software development.
Then use bytes
Typing, yay.
But command-line flags are written as strings. And that's one place this problem crops up: in the stringly-typed interface between your program and its operators, courtesy of argv
and your favorite flags library.
Consider:
crapserver --cache-size=1024
Did I just give my crapserver a cache of 1024 B, or kiB, or MiB? Or maybe I told it to reserve space for 1024 cache entries? (How big is a cache entry?)
It's pretty easy to just call the flag --cache-size-bytes
or --cache-max-entries
so the operator can easily see what's going on.