Book a Demo!
CoCalc Logo Icon
StoreFeaturesDocsShareSupportNewsAboutPoliciesSign UpSign In
ethen8181
GitHub Repository: ethen8181/machine-learning
Path: blob/master/projects/kaggle_quora_insincere/README.md
2617 views

Kaggle Quora Insincere Questions

Refer to the Kaggle competition description to get up to speed with the goal of this text classification competition. This competition is a kernel competition, meaning we can only use the training/test data and pre-trained embedding provided by the competition organizers.

Documentation

  • pytorch_quora_insincere.ipynb [nbviewer][html] Four main personal learnings from this competition are:

    • Pre-trained embeddings helps with model performance if we leverage them correctly. Specifically, if we plan on using pre-trained embeddings, we should aim to get our vocabulary as close as possible to our pre-trained embedding.

    • Bucketing greatly improves the runtime. Meaning, instead of padding the entire input text data to a fixed length, we only pad the length for each batch.

    • How to use attention layer in the context of text classification.

    • Leverage pytorch framework for text classification.

Results

Private Leaderboard Score, F1 score:

  • this repo: 0.69

  • leaderboard score 850th place: 0.69

  • leaderboard score 1st place: 0.71

Note that the model here is not extensively tuned and no ensembling/blending/stacking was used.