ROKO
[coursera] Sequence Models: Week 4 본문
728x90
Transformer

- Attention + CMM
- Self-attention: calculate attention parallel
- Multi-head attention: rich representations
Self-attention

Multi-head attention

Transformer architecture

728x90
'Artificial Intelligence > Deep Learning' 카테고리의 다른 글
[CS234] Reinforcement Learning: Lecture 2 (0) | 2024.07.11 |
---|---|
[CS234] Reinforcement Learning: Lecture 1 (0) | 2024.07.10 |
[coursera] Sequence Models: Week 3 (0) | 2024.07.09 |
[coursera] Sequence Models: Week 2 (0) | 2024.07.09 |