"<p>No need to train the mlp bias because we are adding it with attention output </p>\n":"<p>\u4e0d\u9700\u8981\u8bad\u7ec3 mlp \u504f\u5dee\uff0c\u56e0\u4e3a\u6211\u4eec\u6dfb\u52a0\u4e86\u6ce8\u610f\u529b\u8f93\u51fa</p>\n",
4
"<p>Set <span translate=no>_^_0_^_</span> to <span translate=no>_^_1_^_</span> for the entire layer. </p>\n":"<p><span translate=no>_^_0_^_</span>\u5c06<span translate=no>_^_1_^_</span>\u6574\u4e2a\u56fe\u5c42\u8bbe\u7f6e\u4e3a\u3002</p>\n",