本文最后更新于：8 个月前

\(\mathbf{B}\text{idirectional}\ \mathbf{E}\text{ncoder}\ \mathbf{R}\text{epresentations}\text{from}\ \mathbf{T}\text{ransformers}\)

Key-points

\((1)\) 双向预训练；\((2)\) 统一模型解决任务；\((3)\) 无监督训练方式。

\[ \begin{cases} \mathbf{MLM} \text{: masked language model}\\ \mathbf{NSP} \text{: next sentence prediction} \end{cases} \]

在 \(11\) 个 NLP 任务上表现好。

该实验主要通过对比实验说明不同关键点融合的好处，本质思想是奥卡姆剃刀原理。

NLP > 论文

#NLP

BERT, Pre-training of Deep Bidirectional Transformers for Language Understanding

https://lr-tsinghua11.github.io/2023/01/24/NLP/BERT_Pre-training_of_Deep_Bidirectional_Transformers_for_Language_Understanding/

作者

Learning_rate

发布于

2023年1月24日

许可协议