this post was submitted on 15 Jun 2023
3 points (100.0% liked)

Learn Machine Learning

524 readers
1 users here now

Welcome! This is a place for people to learn more about machine learning techniques, discuss applications and ask questions.

Example questions:

Please do:

Please don't:

Other communities in this area:

Similar subreddits: r/MLquestions, r/askmachinelearning, r/learnmachinelearning

founded 1 year ago
MODERATORS
 

Not OP. This question is being reposted to preserve technical content removed from elsewhere. Feel free to add your own answers/discussion.

Original question:

I have a dataset that contains vectors of shape 1xN where N is the number of features. For each value, there is a float between -4 and 5. For my project I need to make an autoencoder, however, activation functions like ReLU or tanh will either only allow positive values through the layers or within -1 and 1. My concern is that upon decoding from the latent space the data will not be represented in the same way, I will either get vectors with positive values only or constrained negative values while I want it to be close to the original.

Should I apply some kind of transformation like adding a positive constant value, exp() or raise data to power 2, train VAE, and then if I want original representation I just log() or log2() the output? Or am I missing some configuration with activation functions that can give me an output similar to the original input?

you are viewing a single comment's thread
view the rest of the comments
[–] [email protected] 2 points 1 year ago

Original answer:

Scaling the dataset before passing it to the autoencoder is usually how I do it, you don't need to rescale after if you are only using the encoder portion (for example for dimensionality reduction). If you don't do it linearly (aka (x-min(x))/(max(x)-min(x) ) and use exp or log to do it then be mindful that it would likely have an impact with respect to loss/optimization behaviour.

Make sure to take the max and min values from the training data then apply it to the testing (in the case of values out of bounds, set them to the boundary value but this shouldn't have a big impact if your training dataset is large enough with enough variance).