this post was submitted on 17 Jun 2023

174 points (95.8% liked)

You Should Know

34296 readers

418 users here now

YSK - for all the things that can make your life easier!

The rules for posting and commenting, besides the rules defined here for lemmy.world, are as follows:

Rules (interactive)

Rule 1- All posts must begin with YSK.

All posts must begin with YSK. If you're a Mastodon user, then include YSK after @youshouldknow. This is a community to share tips and tricks that will help you improve your life.

Rule 2- Your post body text must include the reason "Why" YSK:

**In your post's text body, you must include the reason "Why" YSK: It’s helpful for readability, and informs readers about the importance of the content. **

Rule 3- Do not seek mental, medical and professional help here.

Do not seek mental, medical and professional help here. Breaking this rule will not get you or your post removed, but it will put you at risk, and possibly in danger.

Rule 4- No self promotion or upvote-farming of any kind.

That's it.

Rule 5- No baiting or sealioning or promoting an agenda.

Posts and comments which, instead of being of an innocuous nature, are specifically intended (based on reports and in the opinion of our crack moderation team) to bait users into ideological wars on charged political topics will be removed and the authors warned - or banned - depending on severity.

Rule 6- Regarding non-YSK posts.

Provided it is about the community itself, you may post non-YSK posts using the [META] tag on your post title.

Rule 7- You can't harass or disturb other members.

If you harass or discriminate against any individual member, you will be removed.

If you are a member, sympathizer or a resemblant of a movement that is known to largely hate, mock, discriminate against, and/or want to take lives of a group of people and you were provably vocal about your hate, then you will be banned on sight.

For further explanation, clarification and feedback about this rule, you may follow this link.

Rule 8- All comments should try to stay relevant to their parent content.

Rule 9- Reposts from other platforms are not allowed.

Let everyone have their own content.

Rule 10- The majority of bots aren't allowed to participate here.

Unless included in our Whitelist for Bots, your bot will not be allowed to participate in this community. To have your bot whitelisted, please contact the moderators for a short review.

Rule 11- Posts must actually be true: Disiniformation, trolling, and being misleading will not be tolerated. Repeated or egregious attempts will earn you a ban. This also applies to filing reports: If you continually file false reports YOU WILL BE BANNED! We can see who reports what, and shenanigans will not be tolerated.

Partnered Communities:

You can view our partnered communities list by following this link. To partner with our community and be included, you are free to message the moderators or comment on a pinned post.

Community Moderation

For inquiry on becoming a moderator of this community, you may comment on the pinned post of the time, or simply shoot a message to the current moderators.

Credits

Our icon(masterpiece) was made by @clen15!

founded 2 years ago

MODERATORS

174

YSK about compression (lemmy.world)

submitted 2 years ago by flint5436 to c/youshouldknow

25 comments fedilink hide all child comments

YSK long noise videos cant effectively be compressed and as such take quite a lot of storage space and bandwidth. So if you want to keep the hosting costs of social media platforms low, for example when going public, you definetly wouldnt want disenfranchised users to upload them to your platform. This is the kind of noise you would want to avoid: https://stackoverflow.com/questions/15792105/simulating-tv-noise#15795112

you are viewing a single comment's thread
view the rest of the comments

[–] [email protected] 52 points 2 years ago* (last edited 2 years ago) (3 children)

YSK long noise videos cant effectively be compressed

From the standpoint of loading down Reddit today, yes. But, if we want to talk computer science theory and what one compression algorithms one could build, that's not really true.

There are two classes of compression -- lossless compression and lossy compression.

Lossless compression retains an exact copy of the original data. Compress and then decompress and you get back the original.

So, okay. How can you compress data? I mean, if I have a byte of data, eight 1s or 0s, how can I use less than eight 1s or 0s to store those? For lossless compression, the answer is that you have to have some knowledge of what information it is that you're storing. If you know that information of a given length N with certain characteristics comes up more-frequently than others, then you can assign a shorter pattern M to represent that pattern and then use the old pattern of length N to represent something else less-common. Lossless compression is just the art of reordering representations of data to more-closely fit the frequency with which they arise: shorter for things that are relatively-more-common.

If you're wrong about that order, then lossless compression can make the representation of your data larger.

Now, technically the noise in there is actually probably very predictable, because it's likely based off a pseudorandom number generator (PRNG). That isn't really "random" -- it's just making numbers that look random from a single number that's hard to predict. If the PRNG isn't having more entropy injected into it over time, then all of the noise generated during the session comes down to that one small number. If you were clever and could figure out the seed -- a small number, often something like 64 bits, and often seeded off something like the Unix time at the time that the random numbers started being generated, which makes it even more predictable than that -- or at least the internal state of the PRNG, maybe 256 bits -- you could basically store the content of the whole video in just a few bytes. However, it's not always easy to determine that original state -- in the case of cryptographically secure PRNGs, it's specifically intended to be impractical.

However, we generally treat pseudorandom noise as if it were actually truly random, rather than just pseudorandom, which means that it's totally unpredictable, and if that is the case, then you cannot losslessly compress it and make it smaller, not over a sufficient quantity of noise, because you can know nothing about the frequency with which a given pattern arises.

So, depending upon the source of noise used, we might be able to do lossless compression of noise, if pseudorandom noise was used (probably) and if we can figure out what that number is that was used to generate that noise.

Okay, enough about lossless compression. Can we do lossy compression of noise?

And there the answer is...yeah, probably yes.

The way lossy compression works is that we have to know something about what information is actually "important" when we get around to actually using it. That lets us throw out some of the less-important information. What we get back, unlike with lossless compression, is not true to the original, but it's a lot closer than if we just threw out information without regard for what's important and what's not. Lossy compression is often used to compress audio and video.

For a lot of things, there's a lot of not-very-important information.

Let's say that we're looking at a video of noise. Your brain doesn't care about every exact pixel there. It's looking for shapes that remain across multiple frames, move together, so it can pick out objects and the like. Your brain just basically sees the noise as one big field of stuff of an approximate color changing at a given rate. None of the specifics of that noise matter. Basically, regardless of what seed was used to generate that noise, pretty much all noise with the given properties (black and white, 1 pixel size, changes every frame, N fps) looks pretty much identical. So a good form of lossy compression in a video codec would be to detect anything that looks like noise -- and then just replace it with generated noise using a fixed seed. All noise looks pretty much identical to a human. So you'd get pretty much identical-looking output. As it happens, existing video codecs don't have a noise detector, but they could.

So video of noise is lossily-compressible.

Now, I will grant that this is unrelated to putting load on Reddit, but, hey, might as well start filling the Fediverse with useful information now.

[–] [email protected] 15 points 2 years ago

This is a good post. I don't really have much more to add than that but I want to boost you with the interaction. Very well written and informational, and as far as I can tell, accurate.

[–] A_Toasty_Strudel 5 points 2 years ago (1 children)

I started reading and felt my brain start to numb a little. Is there an ELI5?

[–] SameOldJorts 1 points 2 years ago (1 children)

So say you have a picture, and it’s made up of pixels, and you want to send that picture to someone but in order to do so, you have to make it smaller. You could send the most important bits and allow reconstruction on the receiver’s end, or you could some how make it smaller without changing the information. So if your picture is four blue pixels, followed by 3 red, and 2 yellow you could send the entire string like that, versus blue, blue, blue, blue, red, red, red, yellow, yellow. This would be lossless and are generally GIF, PNG, etc. JPEG is lossy compression, and it would be like telling your friend receiving the picture “I have a picture of a bird, here’s part of a beak, one wing, a tail, and one foot.” Your friend, being smart, can reconstruct the data that wasn’t sent (other wing and foot, body) because they have a good idea how the rest of the bird should look based on the parts they see. Lossy is better for smaller compression, but lossless is important if all the information needs to reach the receiver. Hope that helps.

[–] A_Toasty_Strudel 1 points 2 years ago

This was actually super informative! Thanks, my dude!

[–] flint5436 4 points 2 years ago

A fellow cs mayor I see, no I apreciate the thoroughness. You are right I was trying to put it in laymans terms and might have been a little to cursory.

I gotta disagree on the pseudo randomness tho. At least in linux /dev/random generates its entropy pool by using device drivers. So there is no simple algorithm behind it where you can copy a seed. So you would have to to copy the system state and all external events happening (eg. the ethernet network traffic) to generate the same output.