As far as I know, yeah, R always work with copy on modification. Some libraries as you mention (data.table
) can have object/classes to avoid this, but I'm not aware of any of them working with arrays (more than 2D). Maybe parquet
or arrow
have something like this??
R Programming
Please use this as a forum to discuss R, and learn more about it. If you have any questions about how to do specific things in R, this is the place to ask.
Getting Started
You can download R here.
You can download RStudio here. RStudio IDE, which is supported by Posit PBC, is a powerful and well-developed IDE for R. Other development environment options include Emacs addon Emacs Speak Statistics and VSCode.
Other Communities
Other communities that may be of interest across the fediverse:
- https://lemmy.ml/c/rstats
- https://lemmy.ml/c/dataisbeautiful
- https://lemmy.world/c/dataisbeautiful
- https://code4lib.net/c/datascience
- https://discuss.tchncs.de/c/data_engineering
Please send @a_statistician a message to recommend additional communities to add to this list.
Learning resources:
- R for Data Science - a good introductory book for learning R. Start here if you're overwhelmed.
- Big Book of R - collection of more than 500 online books/tutorials covering various aspects of R. Some links are to paid books with previews, but most links are to free online textbooks.
Thank you for the suggestion! Worth looking at parquet
and arrow
indeed.
+1 for parquet
and arrow
. If you're pushing memory better to just treat it as a completely out of memory problem. If you can split the data into multiple parquet files with hive style or directory partitioning it will be more efficient. You don't want parquet files too small though (I've heard people saying 1 GB each file is ideal, colleagues at work like 512 MB per file - but that's on an AWS setup).
Bonus is once you've learned the packages it'll be the same for all out of memory big datasets.