Book a Demo!
CoCalc Logo Icon
StoreFeaturesDocsShareSupportNewsAboutPoliciesSign UpSign In
labmlai
GitHub Repository: labmlai/annotated_deep_learning_paper_implementations
Path: blob/master/translate_cache/distillation/readme.zh.json
4922 views
1
{
2
"<h1><a href=\"https://nn.labml.ai/distillation/index.html\">Distilling the Knowledge in a Neural Network</a></h1>\n<p>This is a <a href=\"https://pytorch.org\">PyTorch</a> implementation/tutorial of the paper <a href=\"https://arxiv.org/abs/1503.02531\">Distilling the Knowledge in a Neural Network</a>.</p>\n<p>It&#x27;s a way of training a small network using the knowledge in a trained larger network; i.e. distilling the knowledge from the large network.</p>\n<p>A large model with regularization or an ensemble of models (using dropout) generalizes better than a small model when trained directly on the data and labels. However, a small model can be trained to generalize better with help of a large model. Smaller models are better in production: faster, less compute, less memory.</p>\n<p>The output probabilities of a trained model give more information than the labels because it assigns non-zero probabilities to incorrect classes as well. These probabilities tell us that a sample has a chance of belonging to certain classes. For instance, when classifying digits, when given an image of digit <em>7</em>, a generalized model will give a high probability to 7 and a small but non-zero probability to 2, while assigning almost zero probability to other digits. Distillation uses this information to train a small model better. </p>\n": "<h1><a href=\"https://nn.labml.ai/distillation/index.html\">\u5728\u795e\u7ecf\u7f51\u7edc\u4e2d\u63d0\u70bc\u77e5\u8bc6</a></h1>\n<p>\u8fd9\u662f\u8bba\u6587\u300a<a href=\"https://arxiv.org/abs/1503.02531\">\u5728\u795e\u7ecf\u7f51\u7edc\u4e2d\u63d0\u70bc\u77e5\u8bc6\u300b\u7684 PyT</a> <a href=\"https://pytorch.org\">orch</a> \u5b9e\u73b0/\u6559\u7a0b\u3002</p>\n<p>\u8fd9\u662f\u4e00\u79cd\u4f7f\u7528\u7ecf\u8fc7\u8bad\u7ec3\u7684\u5927\u578b\u7f51\u7edc\u4e2d\u7684\u77e5\u8bc6\u6765\u8bad\u7ec3\u5c0f\u578b\u7f51\u7edc\u7684\u65b9\u6cd5\uff1b\u5373\u4ece\u5927\u578b\u7f51\u7edc\u4e2d\u63d0\u70bc\u77e5\u8bc6\u3002</p>\n<p>\u76f4\u63a5\u5728\u6570\u636e\u548c\u6807\u7b7e\u4e0a\u8bad\u7ec3\u65f6\uff0c\u5177\u6709\u6b63\u5219\u5316\u6216\u6a21\u578b\u96c6\u5408\uff08\u4f7f\u7528 dropout\uff09\u7684\u5927\u578b\u6a21\u578b\u6bd4\u5c0f\u578b\u6a21\u578b\u7684\u6982\u5316\u6548\u679c\u66f4\u597d\u3002\u4f46\u662f\uff0c\u5728\u5927\u578b\u6a21\u578b\u7684\u5e2e\u52a9\u4e0b\uff0c\u53ef\u4ee5\u8bad\u7ec3\u5c0f\u6a21\u578b\u4ee5\u66f4\u597d\u5730\u8fdb\u884c\u6982\u62ec\u3002\u8f83\u5c0f\u7684\u6a21\u578b\u5728\u751f\u4ea7\u4e2d\u66f4\u597d\uff1a\u901f\u5ea6\u66f4\u5feb\u3001\u8ba1\u7b97\u66f4\u5c11\u3001\u5185\u5b58\u66f4\u5c11\u3002</p>\n<p>\u7ecf\u8fc7\u8bad\u7ec3\u7684\u6a21\u578b\u7684\u8f93\u51fa\u6982\u7387\u6bd4\u6807\u7b7e\u63d0\u4f9b\u7684\u4fe1\u606f\u66f4\u591a\uff0c\u56e0\u4e3a\u5b83\u4e5f\u4f1a\u4e3a\u9519\u8bef\u7684\u7c7b\u5206\u914d\u975e\u96f6\u6982\u7387\u3002\u8fd9\u4e9b\u6982\u7387\u544a\u8bc9\u6211\u4eec\uff0c\u6837\u672c\u6709\u53ef\u80fd\u5c5e\u4e8e\u67d0\u4e9b\u7c7b\u522b\u3002\u4f8b\u5982\uff0c\u5728\u5bf9\u6570\u5b57\u8fdb\u884c\u5206\u7c7b\u65f6\uff0c\u5f53\u7ed9\u5b9a\u6570\u5b57 <em>7</em> \u7684\u56fe\u50cf\u65f6\uff0c\u5e7f\u4e49\u6a21\u578b\u4f1a\u7ed9\u51fa7\u7684\u9ad8\u6982\u7387\uff0c\u7ed92\u7684\u6982\u7387\u5f88\u5c0f\u4f46\u4e0d\u662f\u96f6\uff0c\u800c\u7ed9\u5176\u4ed6\u6570\u5b57\u5206\u914d\u51e0\u4e4e\u4e3a\u96f6\u7684\u6982\u7387\u3002\u84b8\u998f\u5229\u7528\u8fd9\u4e9b\u4fe1\u606f\u6765\u66f4\u597d\u5730\u8bad\u7ec3\u5c0f\u578b\u6a21\u578b\u3002</p>\n",
3
"Distilling the Knowledge in a Neural Network": "\u5728\u795e\u7ecf\u7f51\u7edc\u4e2d\u63d0\u70bc\u77e5\u8bc6"
4
}
5