Attention Is All You Need

I thought I’d dive back into history and read the original paper that started it all. It’s somewhat technical about encode/decoder layouts and matrix multiplications. None of the components are super exciting for somebody who’s been looking at neural networks for the past decade.

What’s exciting is that such a simplification generates results that are that much better and how they came up with it. Unfortunately, they don’t write how they found this out.

The paper itself is a bit too abstract so I’m going to look for some of those YouTube videos that explain what is actually going on here and why it’s such a big deal. I’ll update this later.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.