Book a Demo!
CoCalc Logo Icon
StoreFeaturesDocsShareSupportNewsAboutPoliciesSign UpSign In
codebasics
GitHub Repository: codebasics/deep-learning-keras-tf-tutorial
Path: blob/master/43_text_classification_rnn/rnn_text_classification.ipynb
1141 views
Kernel: Python 3
import numpy as np import tensorflow_datasets as tfds import tensorflow as tf # import os # os.environ["TF_FORCE_GPU_ALLOW_GROWTH"]="true" tfds.disable_progress_bar()
tf.__version__
'2.5.0'
devices = tf.config.experimental.list_physical_devices('GPU') devices
[PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]
physical_devices = tf.config.list_physical_devices('GPU') try: tf.config.experimental.set_memory_growth(devices[0], True) print("Success") except: print("Exception occured") pass
Success

Read more about this dataset here: https://ai.stanford.edu/~amaas/data/sentiment/ As per this article: This is a dataset for binary sentiment classification containing substantially more data than previous benchmark datasets. We provide a set of 25,000 highly polar movie reviews for training, and 25,000 for testing. There is additional unlabeled data for use as well.

dataset, info = tfds.load('imdb_reviews', data_dir='.', with_info=True, as_supervised=True)
info
tfds.core.DatasetInfo( name='imdb_reviews', version=1.0.0, description='Large Movie Review Dataset. This is a dataset for binary sentiment classification containing substantially more data than previous benchmark datasets. We provide a set of 25,000 highly polar movie reviews for training, and 25,000 for testing. There is additional unlabeled data for use as well.', homepage='http://ai.stanford.edu/~amaas/data/sentiment/', features=FeaturesDict({ 'label': ClassLabel(shape=(), dtype=tf.int64, num_classes=2), 'text': Text(shape=(), dtype=tf.string), }), total_num_examples=100000, splits={ 'test': 25000, 'train': 25000, 'unsupervised': 50000, }, supervised_keys=('text', 'label'), citation="""@InProceedings{maas-EtAl:2011:ACL-HLT2011, author = {Maas, Andrew L. and Daly, Raymond E. and Pham, Peter T. and Huang, Dan and Ng, Andrew Y. and Potts, Christopher}, title = {Learning Word Vectors for Sentiment Analysis}, booktitle = {Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies}, month = {June}, year = {2011}, address = {Portland, Oregon, USA}, publisher = {Association for Computational Linguistics}, pages = {142--150}, url = {http://www.aclweb.org/anthology/P11-1015} }""", redistribution_info=, )
dataset
{'test': <PrefetchDataset shapes: ((), ()), types: (tf.string, tf.int64)>, 'train': <PrefetchDataset shapes: ((), ()), types: (tf.string, tf.int64)>, 'unsupervised': <PrefetchDataset shapes: ((), ()), types: (tf.string, tf.int64)>}
train_dataset, test_dataset = dataset['train'], dataset['test']
type(train_dataset)
tensorflow.python.data.ops.dataset_ops.PrefetchDataset
len(train_dataset)
25000
len(test_dataset)
25000
for sample in train_dataset: print(sample[0].numpy()) print(sample[1].numpy()) break
b"This was an absolutely terrible movie. Don't be lured in by Christopher Walken or Michael Ironside. Both are great actors, but this must simply be their worst role in history. Even their great acting could not redeem this movie's ridiculous storyline. This movie is an early nineties US propaganda piece. The most pathetic scenes were those when the Columbian rebels were making their cases for revolutions. Maria Conchita Alonso appeared phony, and her pseudo-love affair with Walken was nothing but a pathetic emotional plug in a movie that was devoid of any real meaning. I am disappointed that there are movies like this, ruining actor's like Christopher Walken's good name. I could barely sit through it." 0
BUFFER_SIZE = 10000 BATCH_SIZE = 64
train_dataset = train_dataset.shuffle(BUFFER_SIZE).batch(BATCH_SIZE).prefetch(tf.data.AUTOTUNE) test_dataset = test_dataset.batch(BATCH_SIZE).prefetch(tf.data.AUTOTUNE)
for example, label in train_dataset.take(1): print('texts: ', example.numpy()[:3]) print() print('labels: ', label.numpy()[:3])
texts: [b"This movie was exactly what I expected, not great, but also not that bad either. In my opinion PG13 movies aren't scary enough so that's why I already knew I was going to be bored throughout the entire film. Sure there were scary things going on in the hotel room, but nothing we all haven't already seen. I guess I didn't like it because I thought there were too many twists and turns happening; it got old and repetitive. I also didn't understand if all the things Cusack was experiencing in the room was real or not. There is no explanation for any of the events that occurred. The movie just drags on and when it finally does come to an end you want it to keep going because you are still waiting around for someone to tell you what the whole movie was about. What I did like was the special effects. Other than that there wasn't much enjoyment from it. Maybe its just me but I thought this was below average." b"Sad story of a downed B-17 pilot. Brady is shot down over occupied territory. The local ranchers extended him kindness and protection at the cost of their own lives. I had never heard of this movie and it snagged me for two hours. After the film is over, I'm glad I took the time. It's an entire story told to explain the look on Brady's face at the start of the film." b"There is a scene near the beginning after a shootout where horses are running. If something red catches your eye it is because a white van is parked behind a bush by the trail. I thought I had seen bad but this is it. A white van in a western. Did they not catch this? Oh well, and I paid top dollar at the rental. It will make you want to grab your buddies and have them all put in 10 grand and make a better movie. The talking was so so slow, the acting was mostly OK but couldn't be taken seriously due to the poor nature of the filming. There is a door at the sheriffs that looks like a door today with the particular trimming. I say watch this movie, and move Cabin boy into #2 on the worst of all time."] labels: [0 1 0]
e = tf.keras.layers.experimental.preprocessing.TextVectorization() e.adapt([ "I love samosas and jalebi", "I love biking and yoga", "I love tensorflow" ])
e.get_vocabulary()
['', '[UNK]', 'love', 'i', 'and', 'yoga', 'tensorflow', 'samosas', 'jalebi', 'biking']
e(["I love pizza"]).numpy()
array([[3, 2, 1]], dtype=int64)
VOCAB_SIZE = 1000 encoder = tf.keras.layers.experimental.preprocessing.TextVectorization( max_tokens=VOCAB_SIZE) encoder.adapt(train_dataset.map(lambda text, label: text))
vocab = np.array(encoder.get_vocabulary()) vocab[:25]
array(['', '[UNK]', 'the', 'and', 'a', 'of', 'to', 'is', 'in', 'it', 'i', 'this', 'that', 'br', 'was', 'as', 'for', 'with', 'movie', 'but', 'film', 'on', 'not', 'you', 'are'], dtype='<U14')
example[:2]
<tf.Tensor: shape=(2,), dtype=string, numpy= array([b"This movie was exactly what I expected, not great, but also not that bad either. In my opinion PG13 movies aren't scary enough so that's why I already knew I was going to be bored throughout the entire film. Sure there were scary things going on in the hotel room, but nothing we all haven't already seen. I guess I didn't like it because I thought there were too many twists and turns happening; it got old and repetitive. I also didn't understand if all the things Cusack was experiencing in the room was real or not. There is no explanation for any of the events that occurred. The movie just drags on and when it finally does come to an end you want it to keep going because you are still waiting around for someone to tell you what the whole movie was about. What I did like was the special effects. Other than that there wasn't much enjoyment from it. Maybe its just me but I thought this was below average.", b"Sad story of a downed B-17 pilot. Brady is shot down over occupied territory. The local ranchers extended him kindness and protection at the cost of their own lives. I had never heard of this movie and it snagged me for two hours. After the film is over, I'm glad I took the time. It's an entire story told to explain the look on Brady's face at the start of the film."], dtype=object)>
encoded_example = encoder(example)[:3].numpy() encoded_example
array([[ 11, 18, 14, ..., 0, 0, 0], [614, 64, 5, ..., 0, 0, 0], [ 48, 7, 4, ..., 0, 0, 0]], dtype=int64)
for n in range(3): print("Original: ", example[n].numpy()) print("Round-trip: ", " ".join(vocab[encoded_example[n]])) print()
Original: b"This movie was exactly what I expected, not great, but also not that bad either. In my opinion PG13 movies aren't scary enough so that's why I already knew I was going to be bored throughout the entire film. Sure there were scary things going on in the hotel room, but nothing we all haven't already seen. I guess I didn't like it because I thought there were too many twists and turns happening; it got old and repetitive. I also didn't understand if all the things Cusack was experiencing in the room was real or not. There is no explanation for any of the events that occurred. The movie just drags on and when it finally does come to an end you want it to keep going because you are still waiting around for someone to tell you what the whole movie was about. What I did like was the special effects. Other than that there wasn't much enjoyment from it. Maybe its just me but I thought this was below average." Round-trip: this movie was exactly what i expected not great but also not that bad either in my opinion [UNK] movies arent scary enough so thats why i already knew i was going to be [UNK] throughout the entire film sure there were scary things going on in the [UNK] room but nothing we all havent already seen i guess i didnt like it because i thought there were too many [UNK] and turns [UNK] it got old and [UNK] i also didnt understand if all the things [UNK] was [UNK] in the room was real or not there is no [UNK] for any of the events that [UNK] the movie just [UNK] on and when it finally does come to an end you want it to keep going because you are still [UNK] around for someone to tell you what the whole movie was about what i did like was the special effects other than that there wasnt much [UNK] from it maybe its just me but i thought this was [UNK] average Original: b"Sad story of a downed B-17 pilot. Brady is shot down over occupied territory. The local ranchers extended him kindness and protection at the cost of their own lives. I had never heard of this movie and it snagged me for two hours. After the film is over, I'm glad I took the time. It's an entire story told to explain the look on Brady's face at the start of the film." Round-trip: sad story of a [UNK] [UNK] [UNK] [UNK] is shot down over [UNK] [UNK] the local [UNK] [UNK] him [UNK] and [UNK] at the [UNK] of their own lives i had never heard of this movie and it [UNK] me for two hours after the film is over im [UNK] i took the time its an entire story told to [UNK] the look on [UNK] face at the start of the film Original: b"There is a scene near the beginning after a shootout where horses are running. If something red catches your eye it is because a white van is parked behind a bush by the trail. I thought I had seen bad but this is it. A white van in a western. Did they not catch this? Oh well, and I paid top dollar at the rental. It will make you want to grab your buddies and have them all put in 10 grand and make a better movie. The talking was so so slow, the acting was mostly OK but couldn't be taken seriously due to the poor nature of the filming. There is a door at the sheriffs that looks like a door today with the particular trimming. I say watch this movie, and move Cabin boy into #2 on the worst of all time." Round-trip: there is a scene near the beginning after a [UNK] where [UNK] are running if something red [UNK] your eye it is because a white [UNK] is [UNK] behind a [UNK] by the [UNK] i thought i had seen bad but this is it a white [UNK] in a [UNK] did they not [UNK] this oh well and i [UNK] top [UNK] at the [UNK] it will make you want to [UNK] your [UNK] and have them all put in 10 [UNK] and make a better movie the talking was so so slow the acting was mostly ok but couldnt be taken seriously due to the poor nature of the [UNK] there is a [UNK] at the [UNK] that looks like a [UNK] today with the particular [UNK] i say watch this movie and move [UNK] boy into 2 on the worst of all time
model = tf.keras.Sequential([ encoder, tf.keras.layers.Embedding( input_dim=len(encoder.get_vocabulary()), output_dim=64, # Use masking to handle the variable sequence lengths mask_zero=True), tf.keras.layers.Bidirectional(tf.keras.layers.LSTM(64)), tf.keras.layers.Dense(64, activation='relu'), tf.keras.layers.Dense(1) ])
sample_text = ('The movie was cool. The animation and the graphics ' 'were out of this world. I would recommend this movie.') sample_text = ('awesome movie, I loved it so much') predictions = model.predict(np.array([sample_text])) print(predictions[0])
[-0.00647295]
model.compile(loss=tf.keras.losses.BinaryCrossentropy(from_logits=True), optimizer=tf.keras.optimizers.Adam(1e-4), metrics=['accuracy'])
model.fit(train_dataset, epochs=10, validation_data=test_dataset, validation_steps=30)
Epoch 1/10 32/391 [=>............................] - ETA: 21s - loss: 0.6929 - accuracy: 0.4927
--------------------------------------------------------------------------- CancelledError Traceback (most recent call last) <ipython-input-26-d9321db1417e> in <module> ----> 1 model.fit(train_dataset, epochs=10, 2 validation_data=test_dataset, 3 validation_steps=30) ~\AppData\Roaming\Python\Python38\site-packages\tensorflow\python\keras\engine\training.py in fit(self, x, y, batch_size, epochs, verbose, callbacks, validation_split, validation_data, shuffle, class_weight, sample_weight, initial_epoch, steps_per_epoch, validation_steps, validation_batch_size, validation_freq, max_queue_size, workers, use_multiprocessing) 1181 _r=1): 1182 callbacks.on_train_batch_begin(step) -> 1183 tmp_logs = self.train_function(iterator) 1184 if data_handler.should_sync: 1185 context.async_wait() ~\AppData\Roaming\Python\Python38\site-packages\tensorflow\python\eager\def_function.py in __call__(self, *args, **kwds) 887 888 with OptionalXlaContext(self._jit_compile): --> 889 result = self._call(*args, **kwds) 890 891 new_tracing_count = self.experimental_get_tracing_count() ~\AppData\Roaming\Python\Python38\site-packages\tensorflow\python\eager\def_function.py in _call(self, *args, **kwds) 915 # In this case we have created variables on the first call, so we run the 916 # defunned version which is guaranteed to never create variables. --> 917 return self._stateless_fn(*args, **kwds) # pylint: disable=not-callable 918 elif self._stateful_fn is not None: 919 # Release the lock early so that multiple threads can perform the call ~\AppData\Roaming\Python\Python38\site-packages\tensorflow\python\eager\function.py in __call__(self, *args, **kwargs) 3021 (graph_function, 3022 filtered_flat_args) = self._maybe_define_function(args, kwargs) -> 3023 return graph_function._call_flat( 3024 filtered_flat_args, captured_inputs=graph_function.captured_inputs) # pylint: disable=protected-access 3025 ~\AppData\Roaming\Python\Python38\site-packages\tensorflow\python\eager\function.py in _call_flat(self, args, captured_inputs, cancellation_manager) 1958 and executing_eagerly): 1959 # No tape is watching; skip to running the function. -> 1960 return self._build_call_outputs(self._inference_function.call( 1961 ctx, args, cancellation_manager=cancellation_manager)) 1962 forward_backward = self._select_forward_and_backward_functions( ~\AppData\Roaming\Python\Python38\site-packages\tensorflow\python\eager\function.py in call(self, ctx, args, cancellation_manager) 589 with _InterpolateFunctionError(self): 590 if cancellation_manager is None: --> 591 outputs = execute.execute( 592 str(self.signature.name), 593 num_outputs=self._num_outputs, ~\AppData\Roaming\Python\Python38\site-packages\tensorflow\python\eager\execute.py in quick_execute(op_name, num_outputs, inputs, attrs, ctx, name) 57 try: 58 ctx.ensure_initialized() ---> 59 tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name, 60 inputs, attrs, num_outputs) 61 except core._NotOkStatusException as e: CancelledError: [_Derived_]RecvAsync is cancelled. [[{{node Adam/Adam/update/AssignSubVariableOp/_57}}]] [[gradient_tape/sequential/embedding/embedding_lookup/Reshape/_54]] [Op:__inference_train_function_22707] Function call stack: train_function
import sys print(sys.version)
3.8.5 (tags/v3.8.5:580fbb0, Jul 20 2020, 15:57:54) [MSC v.1924 64 bit (AMD64)]