Book a Demo!
CoCalc Logo Icon
StoreFeaturesDocsShareSupportNewsAboutPoliciesSign UpSign In
amanchadha
GitHub Repository: amanchadha/coursera-natural-language-processing-specialization
Path: blob/master/4 - Natural Language Processing with Attention Models/Week 1/data/opus/medical/0.1.0/dataset_info.json
65 views
1
{
2
"citation": "@inproceedings{Tiedemann2012ParallelData,\n author = {Tiedemann, J},\n title = {Parallel Data, Tools and Interfaces in OPUS},\n booktitle = {LREC}\n year = {2012}}",
3
"description": "OPUS is a collection of translated texts from the web.\n\nCreate your own config to choose which data / language pair to load.\n\n```\nconfig = tfds.translate.opus.OpusConfig(\n version=tfds.core.Version('0.1.0'),\n language_pair=(\"de\", \"en\"),\n subsets=[\"GNOME\", \"EMEA\"]\n)\nbuilder = tfds.builder(\"opus\", config=config)\n```\n\nmedical documents",
4
"downloadSize": "35952852",
5
"location": {
6
"urls": [
7
"http://opus.nlpl.eu/"
8
]
9
},
10
"name": "opus",
11
"splits": [
12
{
13
"name": "train",
14
"numBytes": "198021004",
15
"shardLengths": [
16
"554376",
17
"554376"
18
]
19
}
20
],
21
"supervisedKeys": {
22
"input": "de",
23
"output": "en"
24
},
25
"version": "0.1.0"
26
}
27