I am reading 'attention is not all you need' https://arxiv.org/abs/2103.03404
I had read this paper in the past but felt the need to refresh my memory and look at self attention with mildly critical lens. Afaik, this paper talks about attention networks without surrounding structures like MLP, skip connections etc and its behaviour.