this post was submitted on 20 Jul 2024

40 points (90.0% liked)

Ask Lemmy

27036 readers

1400 users here now

A Fediverse community for open-ended, thought provoking questions

Please don't post about US Politics. If you need to do this, try [email protected]

Rules: (interactive)

1) Be nice and; have fun

Doxxing, trolling, sealioning, racism, and toxicity are not welcomed in AskLemmy. Remember what your mother said: if you can't say something nice, don't say anything at all. In addition, the site-wide Lemmy.world terms of service also apply here. Please familiarize yourself with them

2) All posts must end with a '?'

This is sort of like Jeopardy. Please phrase all post titles in the form of a proper question ending with ?

3) No spam

Please do not flood the community with nonsense. Actual suspected spammers will be banned on site. No astroturfing.

4) NSFW is okay, within reason

Just remember to tag posts with either a content warning or a [NSFW] tag. Overtly sexual posts are not allowed, please direct them to either [email protected] or [email protected]. NSFW comments should be restricted to posts tagged [NSFW].

5) This is not a support community.

It is not a place for 'how do I?', type questions. If you have any questions regarding the site itself or would like to report a community, please direct them to Lemmy.world Support or email [email protected]. For other questions check our partnered communities list, or use the search function.

Reminder: The terms of service apply here too.

Partnered Communities:

Logo design credit goes to: tubbadu

founded 1 year ago

MODERATORS

How can everything be captured in simple data like people's exact voice noise? (self.asklemmy)

submitted 4 months ago* (last edited 4 months ago) by cheese_greater to c/asklemmy

21 comments fedilink hide all child comments

If a recording of someones very rare voice is representable by mp4 or whatever, could monkeys typing out code randomly exactly reproduce their exact timbre+tone+overall sound?

I don't get how we can get rocks to think + exactly transcribe reality in the ways they do!

Edit: I don't get how audio can be fossilized/reified into plaintext

you are viewing a single comment's thread
view the rest of the comments

[–] [email protected] 20 points 4 months ago (4 children)

Yes, monkeys could type out the zeros and ones. In fact we (not the monkeys) kind of did. There is a library of babel for audio named the sound library of babel which contains every 15 seconds audio recording you can imagine. Every single one. Almost all of them are white noise, but still there are recordings of every human saying any words in 15 seconds.

[–] Nibodhika 3 points 4 months ago (2 children)

I call bullshit on that. Every second there are 44100 samples of 8 bit, so every second of sound is 44100 bytes, or 44kB. Even 1 second of audio is impossible to generate all possibilities.

To put this in perspective, there's something called Universally Unique Identifier (UUID for short), one of them is 128 bits, or 16 bytes. Let's imagine these were 1 bit long, on the second attempt at generating an id you would have a 50% chance of generating a repeated one, which means that by the third one you generate the chances that you have already generated a repeated id are 50%; If we extend this to 1 byte (i.e. 256 possibilities) the second time you have 1/256 chance of generating a repeated one, the second time 1/255, so on, and so forth. So from the third one on your chances of having already generated a duplicated id are 1/256 + 1/255 + 1/254 + ... This means that by the 103th id you generate you have a 50% chance to have already generated a repeated one; why did I do those examples? Because a UUID has 16 bytes, this means that if you generated a billion UUID per second, it would take you 100 years to have a 50% chance of having generated a repeated one, and by that time you would need 43 ZB of storage (that's not a typo, it's Zettabytes as in 1024 EB (which is also not a typo, that's Exabytes which is 1024 PB (which is also not a typo, that's Petabytes which is 1024 TB, or Terabytes which is the first measure people are likely to be familiar with))).

Let me again try to put this in perspective, if Google, Amazon, Microsoft and Facebook emptied all of their storage just for this, they wouldhave around 2 Exabytes, so you would need a company 4300x larger than that conclomerate to have enough space to store the amount of unique ids that would be generated from a 16 byte random data (until you have a 50% chance of generating a repeated one).

Another way of thinking about this is that to store all of the possible combinations of 1 bit you need 2 bits of space, for 2 bits is 4, for 3 bits is 8, it goes on exponentially, so that for n bits is 2^n. For the UUID that is 3.4E38, or 3.5E13 YB (again, not a typo, that's 1024 Zettabytes), i.e 35000000000000 YB (I could go up a few more orders of magnitude, but I think I made my point). And this is for 128 bits, every bit doubles that amount.

So again, I call bullshit that they have all possible sounds for even 1 second which is almost 3x that amount.

[–] maengooen 7 points 4 months ago (1 children)

I appreciate the interest in doing all the math, and I am also not specifically familiar with audio or the audio library, but I believe you could use a similar argument against the OG library of babel, and I happen to know(confidently believe?) that they don't actually have a stored copy of every individual text file "in the library", rather each page is algorithmically generated and they have proven that the algorithm will generate every possible text.

I'd wager it's the same thing here, they have just written the code to generate a random audio file from a unique input, and proven that for all possible audio files (within some defined constraints, like exactly 15 seconds long), there exists an input to the algorithm which will produce said audio file.

Determining whether or not an algorithm with infrastructure backing it counts as a library is an exercise left to the reader, I suppose.

[–] Nibodhika 0 points 4 months ago

The claim was it "contains every 15 seconds audio recording you can imagine. Every single one.". Which is bullshit, that's like saying this program contains every single literally work:

import sys

print(sys.argv[1])

It's just adding a layer of encoding on top so it feels less bullshity, something like:

def decode(number: int):
  out = ""
  while number:
    number, letter_index = divmod(number, len(string.printable))
    out += string.printable[letter_index]
  return out

That also does not contain every possible (ASCII) book, it can decode any number into a text, and some numbers happen to contain texts that are readable.

load more comments (1 replies)