Python

6593 readers

27 users here now

Welcome to the Python community on the programming.dev Lemmy instance!

📅 Events

Past

November 2023

PyCon Ireland 2023, 11-12th
PyData Tel Aviv 2023 14th

October 2023

PyConES Canarias 2023, 6-8th
DjangoCon US 2023, 16-20th (!django 💬)

July 2023

PyDelhi Meetup, 2nd
PyCon Israel, 4-5th
DFW Pythoneers, 6th
Django Girls Abraka, 6-7th
SciPy 2023 10-16th, Austin
IndyPy, 11th
Leipzig Python User Group, 11th
Austin Python, 12th
EuroPython 2023, 17-23rd
Austin Python: Evening of Coding, 18th
PyHEP.dev 2023 - "Python in HEP" Developer's Workshop, 25th

August 2023

PyLadies Dublin, 15th
EuroSciPy 2023, 14-18th

September 2023

PyData Amsterdam, 14-16th
PyCon UK, 22nd - 25th

🐍 Python project:

💓 Python Community:

#python IRC for general questions
#python-dev IRC for CPython developers
PySlackers Slack channel
Python Discord server
Python Weekly newsletters
Mailing lists
Forum

✨ Python Ecosystem:

🌌 Fediverse

Communities

#python on Mastodon
c/django on programming.dev
c/pythorhead on lemmy.dbzer0.com

Projects

Pythörhead: a Python library for interacting with Lemmy
Plemmy: a Python package for accessing the Lemmy API
pylemmy pylemmy enables simple access to Lemmy's API with Python
mastodon.py, a Python wrapper for the Mastodon API

Feeds

founded 2 years ago

MODERATORS

[email protected]

how to optimize this kind of process (lemmy.eco.br)

submitted 1 year ago by [email protected] to c/[email protected]

5 comments fedilink hide all child comments

Hi, When im working with some big dataframes and I need to create some columns based on functions. So i have some code like this

Def function(row): function

And then I run the function on the df as

df['new column'] = df.apply(function, axis=1)

But I do this with 10 or more columns/functions at time. I don't think this is efficient because each time a column is created it had to parce the entire data frame. There's a way to create all the columns at the same time while parsing the rows only once?

Thanks for any help.

you are viewing a single comment's thread
view the rest of the comments

[–] [email protected] 5 points 1 year ago* (last edited 1 year ago) (1 children)

Whatever you do, usually as long as the data frame fits in memory it should be pretty fast. Depending on functions you're using applymap on splices of columns might be faster but code readability will suffer.

How big is your dataset? If it's huge or your need are complex you'll get way more performance by switching from Pandas to Polars dataframes rather than trying to optimize Pandas operations.

[–] [email protected] 2 points 1 year ago

6M rows (it grows by 35K rows at month aprox), 6 columns, after the function it's go to 17 columns and then finally to 9 where I starts to processes. It currently took 8min the pd.read_cvs() and 20min the creation of the columns. I would like to reduce that 20 min process.