Skip to content

Commit

Permalink
transformer mha chinese translation
Browse files Browse the repository at this point in the history
  • Loading branch information
vpj committed Jun 27, 2024
1 parent d3f0bd3 commit f6e913e
Show file tree
Hide file tree
Showing 13 changed files with 245 additions and 248 deletions.
2 changes: 1 addition & 1 deletion docs/sitemap.xml
Original file line number Diff line number Diff line change
Expand Up @@ -1450,7 +1450,7 @@

<url>
<loc>https://nn.labml.ai/rl/ppo/gae.html</loc>
<lastmod>2023-10-24T16:30:00+00:00</lastmod>
<lastmod>2024-06-24T16:30:00+00:00</lastmod>
<priority>1.00</priority>
</url>

Expand Down
4 changes: 2 additions & 2 deletions docs/zh/index.html
Original file line number Diff line number Diff line change
Expand Up @@ -72,7 +72,7 @@
<h1><a href="index.html">labml.ai 带注释的 PyTorch 版论文实现</a></h1>
<p>这是一个用 PyTorch 实现各种神经网络和相关算法的集合。每个算法的<a href="https://github.com/labmlai/annotated_deep_learning_paper_implementations">代码实现</a>都有详细的解释说明,且在<a href="index.html">网站</a>上与代码逐行对应。我们相信,这些内容将帮助您更好地理解这些算法。</p>
<p><img alt="Screenshot" src="dqn-light.png"></p>
<p>我们正在积极维护这个仓库并添加新的代码实现<a href="https://twitter.com/labmlai"><img alt="Twitter" src="https://img.shields.io/twitter/follow/labmlai?style=social"></a>以获取更新。</p>
<p>我们正在积极维护这个仓库并添加新的代码实现<a href="https://twitter.com/labmlai"><img alt="Twitter" src="https://img.shields.io/twitter/follow/labmlai?style=social"></a>以获取更新。</p>
<h2>翻译</h2>
<h3><strong><a href="https://nn.labml.ai">英语(原版)</a></strong></h3>
</a><h3><strong><a href="https://nn.labml.ai/zh/">中文(翻译)</strong></h3>
Expand Down Expand Up @@ -102,7 +102,7 @@ <h4>✨ <a href="transformers/index.html">Transformers</a></h4>
<li><a href="transformers/primer_ez/index.html">Primer</a></li>
<li><a href="transformers/hour_glass/index.html">沙漏网络</a></li></ul>
<h4><a href="neox/index.html">Eleuther GPT-neox</a></h4>
<li><a href="neox/samples/generate.html">在一块 48GB GPU 上生成</a></li> <ul>
<ul><li><a href="neox/samples/generate.html">在一块 48GB GPU 上生成</a></li>
<li><a href="neox/samples/finetune.html">在两块 48GB GPU 上微调</a></li>
<li><a href="neox/utils/llm_int8.html">llm.int8 ()</a></li></ul>
<h4><a href="diffusion/index.html">扩散模型</a></h4>
Expand Down
2 changes: 1 addition & 1 deletion docs/zh/sitemap.xml
Original file line number Diff line number Diff line change
Expand Up @@ -1450,7 +1450,7 @@

<url>
<loc>https://nn.labml.ai/rl/ppo/gae.html</loc>
<lastmod>2023-10-24T16:30:00+00:00</lastmod>
<lastmod>2024-06-24T16:30:00+00:00</lastmod>
<priority>1.00</priority>
</url>

Expand Down
84 changes: 42 additions & 42 deletions docs/zh/transformers/configs.html

Large diffs are not rendered by default.

57 changes: 28 additions & 29 deletions docs/zh/transformers/feed_forward.html

Large diffs are not rendered by default.

94 changes: 47 additions & 47 deletions docs/zh/transformers/index.html
Original file line number Diff line number Diff line change
Expand Up @@ -3,24 +3,24 @@
<head>
<meta http-equiv="content-type" content="text/html;charset=utf-8"/>
<meta name="viewport" content="width=device-width, initial-scale=1.0"/>
<meta name="description" content="这是变压器和相关技术的 PyTorch 实现/教程的集合"/>
<meta name="description" content="这是一个包含 Transformers 及相关技术的 PyTorch 实现和教程的合集"/>

<meta name="twitter:card" content="summary"/>
<meta name="twitter:image:src" content="https://avatars1.githubusercontent.com/u/64068543?s=400&amp;v=4"/>
<meta name="twitter:title" content="变压器"/>
<meta name="twitter:description" content="这是变压器和相关技术的 PyTorch 实现/教程的集合"/>
<meta name="twitter:title" content="Transformers"/>
<meta name="twitter:description" content="这是一个包含 Transformers 及相关技术的 PyTorch 实现和教程的合集"/>
<meta name="twitter:site" content="@labmlai"/>
<meta name="twitter:creator" content="@labmlai"/>

<meta property="og:url" content="https://nn.labml.ai/transformers/index.html"/>
<meta property="og:title" content="变压器"/>
<meta property="og:title" content="Transformers"/>
<meta property="og:image" content="https://avatars1.githubusercontent.com/u/64068543?s=400&amp;v=4"/>
<meta property="og:site_name" content="变压器"/>
<meta property="og:site_name" content="Transformers"/>
<meta property="og:type" content="object"/>
<meta property="og:title" content="变压器"/>
<meta property="og:description" content="这是变压器和相关技术的 PyTorch 实现/教程的集合"/>
<meta property="og:title" content="Transformers"/>
<meta property="og:description" content="这是一个包含 Transformers 及相关技术的 PyTorch 实现和教程的合集"/>

<title>变压器</title>
<title>Transformers</title>
<link rel="shortcut icon" href="/icon.png"/>
<link rel="stylesheet" href="../pylit.css?v=1">
<link rel="canonical" href="https://nn.labml.ai/transformers/index.html"/>
Expand Down Expand Up @@ -70,50 +70,50 @@
<div class='section-link'>
<a href='#section-0'>#</a>
</div>
<h1>变压器</h1>
</a><p>本模块包含 <a href="https://pytorch.org/">PyTorch 实现和论文 Attention Is <a href="https://arxiv.org/abs/1706.03762">All You Need</a> 中对原创变压器的解释,以及它的衍生品和增强功能</p>
<ul><li><a href="mha.html">多头关注</a></li>
<li><a href="models.html">变压器编码器和解码器型号</a></li>
<h1>Transformers</h1>
</a><p>本节内容包含对论文<a href="https://arxiv.org/abs/1706.03762">Attention is All You Need 》</a>中原始 Transformer 的解释与<a href="https://pytorch.org/">PyTorch</a> 实现,以及对其衍生和增强版本的解释与实现</p>
<ul><li><a href="mha.html">多头注意力</a></li>
<li><a href="models.html">Transformer 编码器和解码器模型</a></li>
<li><a href="feed_forward.html">位置前馈网络 (FFN)</a></li>
<li><a href="positional_encoding.html">固定位置编码</a></li></ul>
<h2><a href="xl/index.html">变压器 XL</a></h2>
<p>这使用<a href="xl/relative_mha.html">相对的多头注意力</a>实现了变形金刚 XL 模型</p>
<h2><a href="rope/index.html">旋转位置嵌入</a></h2>
<p>这实现了旋转位置嵌入 (roPE)</p>
<h2><a href="alibi/index.html">注意线性偏差</a></h2>
<p>这实现了线性偏差注意力(AliBI)</p>
<h2><a href="retro/index.html">复古</a></h2>
<p>这实现了检索增强型转换器(RETRO</p>
<h2><a href="compressive/index.html">压缩变压器</a></h2>
<p>这是一种压缩变压器的实现,它通过压缩最古老的存储<a href="xl/index.html">器来延长注意力跨度,从而在Transformer XL</a> 上扩展</p>
<h2><a href="xl/index.html">Transformer XL</a></h2>
<p>这是使用<a href="xl/relative_mha.html">相对多头注意力</a>的 Transformer XL 模型的实现。</p>
<h2><a href="rope/index.html">旋转式位置编码</a></h2>
<p>这是旋转式位置编码( ROPE )的实现。</p>
<h2><a href="alibi/index.html">线性偏差注意力</a></h2>
<p>这是线性偏差注意力( ALIBI )的实现</p>
<h2><a href="retro/index.html">RETRO</a></h2>
<p>这是对检索增强 Transformer ( RETRO )的实现</p>
<h2><a href="compressive/index.html">压缩 Transformer</a></h2>
<p>这是一个压缩transformer的实现,它在<a href="xl/index.html">Transformer XL</a> 的基础上,通过压缩最早期的记忆来延长注意力跨度</p>
<h2><a href="gpt/index.html">GPT 架构</a></h2>
<p>这是 GPT-2 体系结构的实现</p>
<p>这是 GPT-2 结构的实现</p>
<h2><a href="glu_variants/simple.html">GLU 变体</a></h2>
<p>这是论文 <a href="https://arxiv.org/abs/2002.05202">GLU 变体改进变压器的</a>实现</p>
<h2><a href="knn/index.html">knn-lm</a></h2>
<p>这是论文<a href="https://arxiv.org/abs/1911.00172">通过记忆推广:最近邻语言模型</a>的实现。</p>
<h2><a href="feedback/index.html">反馈变压器</a></h2>
<p>这是一篇论文《使用<a href="https://arxiv.org/abs/2002.09402">反馈存储器访问顺序变压器中的更高层次表示》的</a>实现</p>
<h2><a href="switch/index.html">开关变压器</a></h2>
<p>这是论文<a href="https://arxiv.org/abs/2101.03961">开关变压器:以简单高效的稀疏度缩放到万亿参数模型</a>》的微型实现。我们的实现只有几百万个参数,不对并行分布式训练进行建模。它进行单个 GPU 训练,但我们实现了白皮书中描述的切换概念</p>
<h2><a href="fast_weights/index.html">快速重量变压器</a></h2>
<p>这是 <a href="https://arxiv.org/abs/2102.11174">PyTorch 中线性变压器是秘密的快速重量存储系统论文的</a>实现</p>
<h2><a href="fnet/index.html">FNet:将令牌与傅里叶变换混合</a></h2>
<p>这是论文<a href="https://arxiv.org/abs/2105.03824">FNet:将令牌与傅里叶变换混合</a>的实现。</p>
<h2><a href="aft/index.html">免注意变压器</a></h2>
<p>这是论文<a href="https://arxiv.org/abs/2105.14103">无注意力变压器》的</a>实现</p>
<h2><a href="mlm/index.html">屏蔽语言模型</a></h2>
<p>这是在论文《B <a href="https://arxiv.org/abs/1810.04805">ERT:用于语言理解的深度双向变换器的预训练》中用于预训练的蒙面语言模型的</a>实现。</p>
<h2><a href="mlp_mixer/index.html">MLP 混音器:面向视觉的全 MLP 架构</a></h2>
<p>这是论文 <a href="https://arxiv.org/abs/2105.01601">MLP-Mixer:视觉的全 MLP 架构的</a>实现</p>
<h2><a href="gmlp/index.html">注意 MLP (gMLP)</a></h2>
<p>这是 “<a href="https://arxiv.org/abs/2105.08050">注意 MLP” 一文的</a>实现</p>
<h2><a href="vit/index.html">视觉变压器 (ViT)</a></h2>
<p>这是论文<a href="https://arxiv.org/abs/2010.11929">图像值得 16x16 Words:大规模图像识别的变形金刚》的</a>实现</p>
<p>这是论文 <a href="https://arxiv.org/abs/2002.05202">GLU Variants Improve Transformer 》</a>的实现</p>
<h2><a href="knn/index.html">kNN-LM</a></h2>
<p>这是论文<a href="https://arxiv.org/abs/1911.00172">《 Generalization through Memorization: Nearest Neighbor Language Models 》</a>的实现。</p>
<h2><a href="feedback/index.html">自反馈 Transformer</a></h2>
<p>这是论文<a href="https://arxiv.org/abs/2002.09402">《 Accessing Higher-level Representations in Sequential Transformers with Feedback Memory 》</a>的实现</p>
<h2><a href="switch/index.html">Switch Transformer</a></h2>
<p>这是论文<a href="https://arxiv.org/abs/2101.03961">《 Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity 》</a>的一个简化实现。我们的实现仅包含几百万个参数,并且只在单 GPU 上进行训练,不涉及并行分布式训练,但我们仍然实现了论文中描述的 Switch 概念</p>
<h2><a href="fast_weights/index.html">快速权重 Transformer</a></h2>
<p>这是论文 <a href="https://arxiv.org/abs/2102.11174">《 Linear Transformers Are Secretly Fast Weight Memory Systems in PyTorch </a>的实现</p>
<h2><a href="fnet/index.html">Fnet:使用傅里叶变换混合 token </a></h2>
<p>这是论文<a href="https://arxiv.org/abs/2105.03824">FNet: Mixing Tokens with Fourier Transforms 》</a>的实现。</p>
<h2><a href="aft/index.html">无注意力 Transformer</a></h2>
<p>这是论文<a href="https://arxiv.org/abs/2105.14103">《 An Attention Free Transformer 》</a>的实现</p>
<h2><a href="mlm/index.html">掩码语言模型</a></h2>
<p>这是论文<a href="https://arxiv.org/abs/1810.04805">《 BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding 》</a>中用于预训练的掩码语言模型的实现</p>
<h2><a href="mlp_mixer/index.html">MLP-Mixer:一种用于视觉的全 MLP 架构</a></h2>
<p>这是论文 <a href="https://arxiv.org/abs/2105.01601">MLP-Mixer: An all-MLP Architecture for Vision 》</a>的实现</p>
<h2><a href="gmlp/index.html">门控多层感知器 (gMLP)</a></h2>
<p>这是论文<a href="https://arxiv.org/abs/2105.08050">《 Pay Attention to MLPs 》</a>的实现</p>
<h2><a href="vit/index.html">视觉 Transformer (ViT)</a></h2>
<p>这是论文<a href="https://arxiv.org/abs/2010.11929">《 An Image Is Worth 16x16 Words: Transformers For Image Recognition At Scale 》</a>的实现</p>
<h2><a href="primer_ez/index.html">Primer</a></h2>
<p>这是论文《入<a href="https://arxiv.org/abs/2109.08668">门:为语言建模寻找高效的变换器》的</a>实现</p>
<h2><a href="hour_glass/index.html">沙漏</a></h2>
<p>这是论文<a href="https://arxiv.org/abs/2110.13711">分层变换器是更有效的语言模型</a>的实现</p>
<p>这是论文<a href="https://arxiv.org/abs/2109.08668">《 Primer: Searching for Efficient Transformers for Language Modeling 》</a>的实现</p>
<h2><a href="hour_glass/index.html">沙漏网络</a></h2>
<p>这是论文<a href="https://arxiv.org/abs/2110.13711">《 Hierarchical Transformers Are More Efficient Language Models 》</a>的实现</p>

</div>
<div class='code'>
Expand Down
10 changes: 5 additions & 5 deletions docs/zh/transformers/label_smoothing_loss.html
Original file line number Diff line number Diff line change
Expand Up @@ -3,12 +3,12 @@
<head>
<meta http-equiv="content-type" content="text/html;charset=utf-8"/>
<meta name="viewport" content="width=device-width, initial-scale=1.0"/>
<meta name="description" content="这是标签平滑损失的实现,可以用作交叉熵损失的替代方案,以提高准确性"/>
<meta name="description" content="这是标签平滑损失的实现,可作为交叉熵损失的替代品以提高准确性"/>

<meta name="twitter:card" content="summary"/>
<meta name="twitter:image:src" content="https://avatars1.githubusercontent.com/u/64068543?s=400&amp;v=4"/>
<meta name="twitter:title" content="标签平滑损失"/>
<meta name="twitter:description" content="这是标签平滑损失的实现,可以用作交叉熵损失的替代方案,以提高准确性"/>
<meta name="twitter:description" content="这是标签平滑损失的实现,可作为交叉熵损失的替代品以提高准确性"/>
<meta name="twitter:site" content="@labmlai"/>
<meta name="twitter:creator" content="@labmlai"/>

Expand All @@ -18,7 +18,7 @@
<meta property="og:site_name" content="标签平滑损失"/>
<meta property="og:type" content="object"/>
<meta property="og:title" content="标签平滑损失"/>
<meta property="og:description" content="这是标签平滑损失的实现,可以用作交叉熵损失的替代方案,以提高准确性"/>
<meta property="og:description" content="这是标签平滑损失的实现,可作为交叉熵损失的替代品以提高准确性"/>

<title>标签平滑损失</title>
<link rel="shortcut icon" href="/icon.png"/>
Expand Down Expand Up @@ -154,7 +154,7 @@ <h1>标签平滑损失</h1>
<div class='section-link'>
<a href='#section-5'>#</a>
</div>
<p>显示系统预期的目标分布</p>
<p>展示系统期望的目标分布</p>

</div>
<div class='code'>
Expand Down Expand Up @@ -183,7 +183,7 @@ <h1>标签平滑损失</h1>
<div class='section-link'>
<a href='#section-7'>#</a>
</div>
<p>打印(预测)</p>
<p>输出(预测)</p>

</div>
<div class='code'>
Expand Down
Loading

0 comments on commit f6e913e

Please sign in to comment.