this post was submitted on 11 Dec 2024
13 points (93.3% liked)
Advent Of Code
920 readers
1 users here now
An unofficial home for the advent of code community on programming.dev!
Advent of Code is an annual Advent calendar of small programming puzzles for a variety of skill sets and skill levels that can be solved in any programming language you like.
AoC 2024
Solution Threads
M | T | W | T | F | S | S |
---|---|---|---|---|---|---|
1 | ||||||
2 | 3 | 4 | 5 | 6 | 7 | 8 |
9 | 10 | 11 | 12 | 13 | 14 | 15 |
16 | 17 | 18 | 19 | 20 | 21 | 22 |
23 | 24 | 25 |
Rules/Guidelines
- Follow the programming.dev instance rules
- Keep all content related to advent of code in some way
- If what youre posting relates to a day, put in brackets the year and then day number in front of the post title (e.g. [2024 Day 10])
- When an event is running, keep solutions in the solution megathread to avoid the community getting spammed with posts
Relevant Communities
Relevant Links
Credits
Icon base by Lorc under CC BY 3.0 with modifications to add a gradient
console.log('Hello World')
founded 1 year ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
Learning to profile code gives you a chance to learn what is inefficient code! I definitely like to spend sometime looking at it but at the end of the day. I really need to learn more languages. for now, I am sticking with trusty python. image bellow is in microseconds.
screenshots
if the python process was kept alive, then we only are saving 25 milliseconds from ~250 to ~235! However, in real world testing, it seems that the profiler is not really a proper enough test! likely because the profiler is adding some overhead to each line of code.
notice here, if I add this line of code:
Notice I load from file a transform_cache from a previous run. However because of the "if stone in transform_cache" check, the loaded transform_cache is for some reason slower than allowing it to be filled up again. however, loading it and clearing it, is faster because the cpu/ram/OS is probably doing their own caching, too. if we remove the "if stone in transform_cache" check and keep the transform_cache fully loaded, then it is faster by ~1 millisecond, down to 29 milliseconds! these are the niche problems with caching things and squeezing all the performance out of code.
Yeah, I have been using these slow challenges to improve my profiling ability. It is a bit of a dark art though, especially with compiled languages.
My slowest part seems to be the hashmap, but there isnt much I can do about that I think.
Also, if I do a release run I can get 10ms, but that feels like cheating :D
Hey that is what release mode is for! optimizing by unrolling and other tricks is needed, but in your case, I think I remember someone mention their rust release mode is closer to 2 ms.
I did try something like PyPy3 but it seems to be slower by ~3 milliseconds! So I don't know where I could improve my code any further without unrolling the range(75) loop. though would only make it closer to ~29 ms on avg.
Edit: apparently they updated their code to be closer to 250 micro seconds. that is blazing fast [ link ]
Using release to beat python code is just a very hollow victory :D
It does somewhat depend on how we measure as well, you are benching the algorithm itself, and I'm doing the entire execution time
you are right, I dont think loading the file from disk should be part of it because OS process priority/queue, disk and cache and more headaches on figuring out what is slow. If you want to compare the entire execution including python startup overhead and reading from file and anything extra. it is closer 50 to 60 ms on linux and 80-90 ms on windows. (both hosts, not virtual machines)
My reasoning is that loading the input will eventually either pull from the website or disk. that is not part of the challenge. you could simply just hard code it.
So maybe you should look into adding code to your debug mode or both modes for measuring solving it instead of the entire process loading.
however, someone claims their rust code can do 250 microseconds, so I doubt you have much excuse aside from having "inefficient" code.(you are still fast, just not at the limit of your Language's performance) only measuring my python algorithm, it is only able to finish in 32000 microseconds.
https://github.com/maneatingape/advent-of-code-rust/blob/main/src/year2024/day11.rs
however, now that I am looking at their
main.rs
file, they do calculate time for completion after process startup and only the algorithm.Yeah, disk loading definitely shouldn't count if I was timing properly, I'm just lazy and dont want to do proper timing. :D
Most of my slowdown is in the hashmap, looks like that answer deconstructs the hashmap and builds it from a fastmap and a vec. Not sure I want to go down that road, at this stage.
Thanks again for your code and help :)