this post was submitted on 16 Dec 2024
66 points (95.8% liked)

Ask Lemmy

27210 readers
2100 users here now

A Fediverse community for open-ended, thought provoking questions


Rules: (interactive)


1) Be nice and; have funDoxxing, trolling, sealioning, racism, and toxicity are not welcomed in AskLemmy. Remember what your mother said: if you can't say something nice, don't say anything at all. In addition, the site-wide Lemmy.world terms of service also apply here. Please familiarize yourself with them


2) All posts must end with a '?'This is sort of like Jeopardy. Please phrase all post titles in the form of a proper question ending with ?


3) No spamPlease do not flood the community with nonsense. Actual suspected spammers will be banned on site. No astroturfing.


4) NSFW is okay, within reasonJust remember to tag posts with either a content warning or a [NSFW] tag. Overtly sexual posts are not allowed, please direct them to either [email protected] or [email protected]. NSFW comments should be restricted to posts tagged [NSFW].


5) This is not a support community.
It is not a place for 'how do I?', type questions. If you have any questions regarding the site itself or would like to report a community, please direct them to Lemmy.world Support or email [email protected]. For other questions check our partnered communities list, or use the search function.


6) No US Politics.
Please don't post about current US Politics. If you need to do this, try [email protected] or [email protected]


Reminder: The terms of service apply here too.

Partnered Communities:

Tech Support

No Stupid Questions

You Should Know

Reddit

Jokes

Ask Ouija


Logo design credit goes to: tubbadu


founded 2 years ago
MODERATORS
 

Thanks Hank Green.

you are viewing a single comment's thread
view the rest of the comments
[–] nycki 13 points 1 day ago (2 children)

Almost all web traffic now uses the utf-8 encoding, a clever hack which works because ascii is a seven-bit code but web traffic uses 8-bit bytes.

  • If the first bit is 0, treat the byte as ascii.
  • if the first bit is 1, treat the byte as part of a multi-byte unicode character.

multi-byte characters in utf-8 can officially be up to four bytes long, with 11 of those 32 bits used for tracking the size of the multi-byte block. That leaves 2^21 code points available, about two million in total, easily enough for every alphabet you could need to write on a website, and all without breaking ascii.

[–] [email protected] 4 points 20 hours ago (1 children)

Oh, I wondered about why there weren't more characters in the ASCII code set.

[–] nycki 2 points 18 hours ago

yep! the ascii standard was originally invented for teletypewriters, and includes four 'blocks' of 32 codes each, for 128 in total, so it only uses seven bits per code.

the first block, hex 00 - 1F, contains control codes for the typewriter. stuff like "newline", "backspace", and "ring bell" all go in here.

The second block has the digits are in order, from hex 30 = '0' all the way to hex 39 = '9',

The uppercase alphabet starts at hex 41 = 'A', and exactly one block later, the lowercase alphabet starts at hex 61 = 'a'. This means their binary codes are 100 0001 and 110 0001, differering only in a single bit! So you can easily convert between upper and lowercase ascii by flipping that bit.

The remaining space in the last three blocks is filled with various punctuation marks. I'm not sure if these are in any particular order.

The final ascii code, 7F, is reserved for "delete", because its binary representation is 111 1111, perfect for "deleting" data on a punch card by punching over it.

[–] [email protected] 2 points 1 day ago