Python

6674 readers

22 users here now

Welcome to the Python community on the programming.dev Lemmy instance!

📅 Events

Past

November 2023

PyCon Ireland 2023, 11-12th
PyData Tel Aviv 2023 14th

October 2023

PyConES Canarias 2023, 6-8th
DjangoCon US 2023, 16-20th (!django 💬)

July 2023

PyDelhi Meetup, 2nd
PyCon Israel, 4-5th
DFW Pythoneers, 6th
Django Girls Abraka, 6-7th
SciPy 2023 10-16th, Austin
IndyPy, 11th
Leipzig Python User Group, 11th
Austin Python, 12th
EuroPython 2023, 17-23rd
Austin Python: Evening of Coding, 18th
PyHEP.dev 2023 - "Python in HEP" Developer's Workshop, 25th

August 2023

PyLadies Dublin, 15th
EuroSciPy 2023, 14-18th

September 2023

PyData Amsterdam, 14-16th
PyCon UK, 22nd - 25th

🐍 Python project:

💓 Python Community:

#python IRC for general questions
#python-dev IRC for CPython developers
PySlackers Slack channel
Python Discord server
Python Weekly newsletters
Mailing lists
Forum

✨ Python Ecosystem:

🌌 Fediverse

Communities

#python on Mastodon
c/django on programming.dev
c/pythorhead on lemmy.dbzer0.com

Projects

Pythörhead: a Python library for interacting with Lemmy
Plemmy: a Python package for accessing the Lemmy API
pylemmy pylemmy enables simple access to Lemmy's API with Python
mastodon.py, a Python wrapper for the Mastodon API

Feeds

founded 2 years ago

MODERATORS

[email protected]

Why and How Does Python Use Bloom Filters in String Processing? (codeconfessions.substack.com)

submitted 1 year ago by [email protected] to c/[email protected]

4 comments fedilink hide all child comments

you are viewing a single comment's thread
view the rest of the comments

[–] [email protected] 3 points 1 year ago (1 children)

The article says that CPython represents strings as UTF-8 encoded, which is not correct. The details about how it works are correct, just that's not UTF-8.

That's just a minor point though, nice article.

[–] abhi9u 3 points 1 year ago (1 children)

Hi @qwop, I am the author. Thank you for reading and the kind words. I would like to understand the error I made better so that I don't repeat in future, and if I can fix it. Could you please clarify?

[–] [email protected] 3 points 1 year ago (1 children)

UTF-8 is an encoding for unicode, that means it's a way of representing a unicode string as actual bytes on a computer.

It is variable length and works by using the first bits of each byte to indicate how many bytes are are needed to represent the current character.

Python also uses an encoding, as you describe in the article, but it's different to UTF-8. Unlike unicode, all characters in Python's representation of the unicode string use the same number of bytes, which is the maximum that any individual unicode character in the string needs.

I'd probably mess up a more detailed explanation of UTF-8 or Python's representation, so I'll let you look into how they work in more detail if you're interested.

[–] abhi9u 1 points 1 year ago

Thank you! That's helpful. I spent quite some time trying to understand the difference between UTF-8 and Python's representation and arrived at the same understanding as you wrote. However, most of the external documents simply say that strings in Python are UTF-8 which made me conclude that perhaps I am missing something and it might be safer to write it as utf-8.

I will look more in the code as you suggested.