Where are you getting the data from, and do you maintain access to the originals after ingestion?
Is the database used for anything other than Elasticsearch?
If you do not have access to it after ingestion, you should keep a perfect copy of the data because, as you noted, you lose information otherwise. This can be especially important to address bugs in normalization logic, or requirement changes. For example, if your normalization logic replaces "-" with "_", and at some point in the future you need to distinguish between "this-phrase" and "this_phrase", if you've lost the original data you've also lost the ability to fix your normalized data and indexes.
Similarly, while the existing normalization logic might be better for Elasticsearch, you may not be using Elasticsearch forever, and you don't know the requirements of the next system.
That all said, I'm also skeptical that there is any real Elasticsearch benefit to modifying your data as described, in particular converting to lowercase. You might want to ask your data engineer to tell you explicitly what the purported benefits are. If they tell you it's for performance, ask for metrics, and weigh performance gains/costs against the usability gains/costs. If they can't give you metrics, ask for the documentation supporting their claims. If they can't give you metrics or docs, find a new data engineer.