ROKO
[coursera] Sequence Models: Week 4 본문
728x90
Transformer
- Attention + CMM
- Self-attention: calculate attention parallel
- Multi-head attention: rich representations
Self-attention
Multi-head attention
Transformer architecture
728x90
'Artificial Intelligence > Deep Learning' 카테고리의 다른 글
[CS234] Reinforcement Learning: Lecture 2 (0) | 2024.07.11 |
---|---|
[CS234] Reinforcement Learning: Lecture 1 (0) | 2024.07.10 |
[coursera] Sequence Models: Week 3 (0) | 2024.07.09 |
[coursera] Sequence Models: Week 2 (0) | 2024.07.09 |
Comments