When you create that instance, do you immediately need to download and store all the data that has ever been posted to all federated Lemmy instances?
Run my own instance. @[email protected] is right but there are more details. Federation is not a "sync." When your instance needs to fetch from another instance it will, but it does not get history. You can get a specific comment or post from any time however.
Or perhaps you only need to download and store everything that is posted to the federated Lemmy instances from that point forward?
This is not by default either. Only communities that your users subscribe to will be updated by their "origin" instances.
Or better yet, do you only store what the users on that instance do (i.e. their posts, and posts to the communities hosted on that instance)?
This does happen, but it also stores what your users do on remote instances as well as "copies" of what they interact with. Images (currently the only media hosted by lemmy servers) are linked to thier "origin" as well. So you are storing text of posts and comments.