No. That is what data brokers and big AI companies are pushing for but currently it's considered fair use.
Anything public facing can be used for ml and it's been like that for quite a while. It might change based on all the ongoing lawsuit but I doubt it will, it would be economic suicide and China doesn't care if it's "theft".
It's better for us, the consumer in any case, since having to pay for data would kill the open source scene and give openai and the other 3 companies a defecto monopoly.
Anybody can use the data as long as it's public facing. It's not because websites like reddit and getty stomp their feet and want us to pay that we have to.
https://en.m.wikipedia.org/wiki/List_of_datasets_for_machine-learning_research
Reddit is already in every llm model. Until the courts say otherwise clearly (and I highly doubt they will), it's fair game as it should be.