Real-time collaboration for Jupyter Notebooks, Linux Terminals, LaTeX, VS Code, R IDE, and more,
all in one place. Commercial Alternative to JupyterHub.
Real-time collaboration for Jupyter Notebooks, Linux Terminals, LaTeX, VS Code, R IDE, and more,
all in one place. Commercial Alternative to JupyterHub.
Path: blob/main/course/th/chapter6/section8.ipynb
Views: 2549
Kernel: Unknown Kernel
การสร้าง tokenizer ทีละขั้นตอน
Install the Transformers, Datasets, and Evaluate libraries to run this notebook.
In [ ]:
In [ ]:
In [ ]:
In [ ]:
In [ ]:
In [ ]:
In [ ]:
hello how are u?
In [ ]:
In [ ]:
In [ ]:
[('Let', (0, 3)), ("'", (3, 4)), ('s', (4, 5)), ('test', (6, 10)), ('my', (11, 13)), ('pre', (14, 17)),
('-', (17, 18)), ('tokenizer', (18, 27)), ('.', (27, 28))]
In [ ]:
[("Let's", (0, 5)), ('test', (6, 10)), ('my', (11, 13)), ('pre-tokenizer.', (14, 28))]
In [ ]:
[('Let', (0, 3)), ("'", (3, 4)), ('s', (4, 5)), ('test', (6, 10)), ('my', (11, 13)), ('pre', (14, 17)),
('-', (17, 18)), ('tokenizer', (18, 27)), ('.', (27, 28))]
In [ ]:
In [ ]:
In [ ]:
In [ ]:
['let', "'", 's', 'test', 'this', 'tok', '##eni', '##zer', '.']
In [ ]:
(2, 3)
In [ ]:
In [ ]:
['[CLS]', 'let', "'", 's', 'test', 'this', 'tok', '##eni', '##zer', '.', '[SEP]']
In [ ]:
['[CLS]', 'let', "'", 's', 'test', 'this', 'tok', '##eni', '##zer', '...', '[SEP]', 'on', 'a', 'pair', 'of', 'sentences', '.', '[SEP]']
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1]
In [ ]:
In [ ]:
"let's test this tokenizer... on a pair of sentences."
In [ ]:
In [ ]:
In [ ]:
In [ ]:
In [ ]:
In [ ]:
In [ ]:
[('Let', (0, 3)), ("'s", (3, 5)), ('Ġtest', (5, 10)), ('Ġpre', (10, 14)), ('-', (14, 15)),
('tokenization', (15, 27)), ('!', (27, 28))]
In [ ]:
In [ ]:
In [ ]:
['L', 'et', "'", 's', 'Ġtest', 'Ġthis', 'Ġto', 'ken', 'izer', '.']
In [ ]:
In [ ]:
' test'
In [ ]:
In [ ]:
"Let's test this tokenizer."
In [ ]:
In [ ]:
In [ ]:
In [ ]:
In [ ]:
In [ ]:
[("▁Let's", (0, 5)), ('▁test', (5, 10)), ('▁the', (10, 14)), ('▁pre-tokenizer!', (14, 29))]
In [ ]:
In [ ]:
In [ ]:
['▁Let', "'", 's', '▁test', '▁this', '▁to', 'ken', 'izer', '.']
In [ ]:
0 1
In [ ]:
In [ ]:
['▁Let', "'", 's', '▁test', '▁this', '▁to', 'ken', 'izer', '.', '.', '.', '<sep>', '▁', 'on', '▁', 'a', '▁pair',
'▁of', '▁sentence', 's', '!', '<sep>', '<cls>']
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2]
In [ ]:
In [ ]:
In [ ]: