this post was submitted on 01 Jul 2023
3 points (80.0% liked)
Machine Learning | Artificial Intelligence
963 readers
5 users here now
Welcome to Machine Learning – a versatile digital hub where Artificial Intelligence enthusiasts unite. From news flashes and coding tutorials to ML-themed humor, our community covers the gamut of machine learning topics. Regardless of whether you're an AI expert, a budding programmer, or simply curious about the field, this is your space to share, learn, and connect over all things machine learning. Let's weave algorithms and spark innovation together.
founded 2 years ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
The ‘swish’ activation function is f(x) = x.sigmoid(B.x).
B is typically set to 1, but it doesn’t have to be. You can use it as a parameter for the model to learn if you want. I’ve played with it and not really seen any significant benefit though; I’ve found that allowing the learning rate and/or batch size to vary are more impactful than a learned activation function. Also you can end up with vanishing or exploding gradients if you don’t constrain B; and even then B might saturate depending on what happens during training.
The choice of activation function itself is more impactful than allowing it to be dynamic/learned.
Happy learning!