"<p>No need to train the mlp bias because we are adding it with attention output </p>\n":"<p>\u6ce8\u610f\u51fa\u529b\u3067\u52a0\u7b97\u3059\u308b\u306e\u3067\u3001mlp\u30d0\u30a4\u30a2\u30b9\u3092\u30c8\u30ec\u30fc\u30cb\u30f3\u30b0\u3059\u308b\u5fc5\u8981\u306f\u3042\u308a\u307e\u305b\u3093\u3002</p>\n",
4
"<p>Set <span translate=no>_^_0_^_</span> to <span translate=no>_^_1_^_</span> for the entire layer. </p>\n":"<p><span translate=no>_^_0_^_</span><span translate=no>_^_1_^_</span>\u30ec\u30a4\u30e4\u30fc\u5168\u4f53\u3067\u306b\u8a2d\u5b9a\u3057\u307e\u3059\u3002</p>\n",