Machine Learning | Artificial Intelligence

963 readers

5 users here now

Welcome to Machine Learning – a versatile digital hub where Artificial Intelligence enthusiasts unite. From news flashes and coding tutorials to ML-themed humor, our community covers the gamut of machine learning topics. Regardless of whether you're an AI expert, a budding programmer, or simply curious about the field, this is your space to share, learn, and connect over all things machine learning. Let's weave algorithms and spark innovation together.

founded 2 years ago

MODERATORS

Hopps

[R] Unraveling the Mysteries: Why is AdamW Often Superior to Adam+L2 in Practice? (self.machinelearning)

submitted 1 year ago by LoveOxygenProducers to c/machinelearning

3 comments fedilink hide all child comments

Hello, ML enthusiasts! 🚀🤖 We analyzed rotational equilibria in our latest work, ROTATIONAL EQUILIBRIUM: HOW WEIGHT DECAY BALANCES LEARNING ACROSS NEURAL NETWORKS

💡 Our Findings: Balanced average rotational updates (effective learning rate) across all network components may play a key role in the effectiveness of AdamW.

🔗 ROTATIONAL EQUILIBRIUM: HOW WEIGHT DECAY BALANCES LEARNING ACROSS NEURAL NETWORKS

Looking forward to hearing your thoughts! Let’s discuss more about this fascinating topic together!

you are viewing a single comment's thread
view the rest of the comments

[–] A_A 1 points 1 year ago* (last edited 1 year ago) (2 children)

Please explain like I'm a 5 years old.

Maybe I understand the following :
(my apologies if this is grossly simplified and doesn't help)

1- Better neural network need to contain more (stacked) layers.
2- input layer at one end of the stack is exposed to messy informations from the real world.
3- at the other end the output layer provide results from the network.
4- the first step for making this work is the training of the network during which training, learning is done.
5- instabilities and stagnation in some layers often occur when learning does not occur in an optimal way. This problem increases exponentially with the number of layers.
6- here learning is done all at once to all the layers. Something called rotation which I don't understand, is used to stabilize and optimize the learning.

I feel this is very different from human learning where it happens in stages : we first learn words, then try to assemble them to form simple sentences, then evolve to make sense of more complex notions and so on. I wish this approach could apply also in artificial intelligence development.

[–] [email protected] 2 points 1 year ago

The human brain isn't a blank slate when it comes into existence. There are already structures that are designed to do certain things. These structures come "pre trained" and a lot of the learning humans do is more akin to the fine tuning that we do for foundation models.

[–] A_A 1 points 1 year ago