Path: blob/master/projects/kaggle_quora_insincere/README.md
2617 views
Kaggle Quora Insincere Questions
Refer to the Kaggle competition description to get up to speed with the goal of this text classification competition. This competition is a kernel competition, meaning we can only use the training/test data and pre-trained embedding provided by the competition organizers.
Documentation
pytorch_quora_insincere.ipynb[nbviewer][html] Four main personal learnings from this competition are:Pre-trained embeddings helps with model performance if we leverage them correctly. Specifically, if we plan on using pre-trained embeddings, we should aim to get our vocabulary as close as possible to our pre-trained embedding.
Bucketing greatly improves the runtime. Meaning, instead of padding the entire input text data to a fixed length, we only pad the length for each batch.
How to use attention layer in the context of text classification.
Leverage pytorch framework for text classification.
Results
Private Leaderboard Score, F1 score:
this repo: 0.69
leaderboard score 850th place: 0.69
leaderboard score 1st place: 0.71
Note that the model here is not extensively tuned and no ensembling/blending/stacking was used.