Book a Demo!
CoCalc Logo Icon
StoreFeaturesDocsShareSupportNewsAboutPoliciesSign UpSign In
codebasics
GitHub Repository: codebasics/deep-learning-keras-tf-tutorial
Path: blob/master/46_BERT_intro/bert_intro.ipynb
1141 views
Kernel: Python 3

Here is the page that has list of all available bert models on tensorflow hub that one can download and make use of.

https://tfhub.dev/google/collections/bert/1

Here is the information on basic uncased BERT model,

https://tfhub.dev/tensorflow/bert_en_uncased_L-12_H-768_A-12/4

It uses L=12 hidden layers (i.e., Transformer blocks), a hidden size of H=768, and A=12 attention heads. This model has been pre-trained for English on the Wikipedia and BooksCorpus.

import tensorflow_hub as hub import tensorflow_text as text
preprocess_url = "https://tfhub.dev/tensorflow/bert_en_uncased_preprocess/3" encoder_url = "https://tfhub.dev/tensorflow/bert_en_uncased_L-12_H-768_A-12/4"
bert_preprocess_model = hub.KerasLayer(preprocess_url)
text_test = ['nice movie indeed','I love python programming'] text_preprocessed = bert_preprocess_model(text_test) text_preprocessed.keys()
dict_keys(['input_mask', 'input_type_ids', 'input_word_ids'])
text_preprocessed['input_mask']
<tf.Tensor: shape=(2, 128), dtype=int32, numpy= array([[1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]])>
text_preprocessed['input_type_ids']
<tf.Tensor: shape=(2, 128), dtype=int32, numpy= array([[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]])>
text_preprocessed['input_word_ids']
<tf.Tensor: shape=(2, 128), dtype=int32, numpy= array([[ 101, 3835, 3185, 5262, 102, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [ 101, 1045, 2293, 18750, 4730, 102, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]])>

101 --> CLS token

102 --> SEP token

BERT uses CLS as a special token at the begining of each setence whereas SEP as a special token to separate two sentences or end sinle sentece

bert_model = hub.KerasLayer(encoder_url)
bert_results = bert_model(text_preprocessed)
bert_results.keys()
dict_keys(['default', 'encoder_outputs', 'pooled_output', 'sequence_output'])
bert_results['sequence_output']
<tf.Tensor: shape=(2, 128, 768), dtype=float32, numpy= array([[[ 0.07292037, 0.08567802, 0.14476836, ..., -0.0967709 , 0.08722147, 0.07711098], [ 0.17839421, -0.19006097, 0.5034945 , ..., -0.0586979 , 0.32717168, -0.1557853 ], [ 0.187015 , -0.4338879 , -0.4887509 , ..., -0.15502754, 0.00145202, -0.24470882], ..., [ 0.1208308 , 0.12884231, 0.464535 , ..., 0.07375526, 0.17441988, 0.16522081], [ 0.07967911, -0.01190642, 0.50225425, ..., 0.13777754, 0.21002221, 0.0062458 ], [-0.07212636, -0.28303444, 0.5903337 , ..., 0.47551885, 0.16668531, -0.08920398]], [[-0.0790057 , 0.36335146, -0.21101563, ..., -0.1718371 , 0.1629976 , 0.6724266 ], [ 0.27883556, 0.43716326, -0.35764703, ..., -0.04463666, 0.38315117, 0.588798 ], [ 1.2037678 , 1.0727024 , 0.4840874 , ..., 0.24921025, 0.40730858, 0.40481782], ..., [ 0.08630062, 0.19353887, 0.47540048, ..., 0.1888018 , -0.06474137, 0.3131858 ], [ 0.15887049, 0.28572696, 0.373408 , ..., 0.09309126, -0.04969583, 0.3876115 ], [-0.08079858, -0.09572807, 0.26809806, ..., 0.13979636, -0.06315896, 0.2728837 ]]], dtype=float32)>
bert_results['pooled_output']
<tf.Tensor: shape=(2, 768), dtype=float32, numpy= array([[-0.791774 , -0.21411924, 0.49769488, ..., 0.2446518 , -0.47334483, 0.81758696], [-0.9171231 , -0.4793518 , -0.78656995, ..., -0.6175177 , -0.7102687 , 0.92184293]], dtype=float32)>
len(bert_results['encoder_outputs'])
12

Since we are using BERT base model it has 12 encoder layers, that's why above the length of encoder_output is 12. Read more about purpose of each element here: https://tfhub.dev/tensorflow/bert_en_uncased_L-12_H-768_A-12/4

You can see below that last element of encoder_outputs is basically a sequence_output

bert_results['encoder_outputs'][-1] == bert_results['sequence_output']
<tf.Tensor: shape=(2, 128, 768), dtype=bool, numpy= array([[[ True, True, True, ..., True, True, True], [ True, True, True, ..., True, True, True], [ True, True, True, ..., True, True, True], ..., [ True, True, True, ..., True, True, True], [ True, True, True, ..., True, True, True], [ True, True, True, ..., True, True, True]], [[ True, True, True, ..., True, True, True], [ True, True, True, ..., True, True, True], [ True, True, True, ..., True, True, True], ..., [ True, True, True, ..., True, True, True], [ True, True, True, ..., True, True, True], [ True, True, True, ..., True, True, True]]])>
bert_results['encoder_outputs']
[<tf.Tensor: shape=(2, 128, 768), dtype=float32, numpy= array([[[ 0.12901425, 0.00644749, -0.03614946, ..., 0.04999629, 0.06149199, -0.02657548], [ 1.1753383 , 1.2140785 , 1.1569982 , ..., 0.11634405, -0.35855353, -0.40490192], [ 0.03859054, 0.53869987, -0.21089786, ..., 0.21858189, 0.72601694, -1.1158607 ], ..., [-0.07587019, -0.25421914, 0.70755106, ..., 0.50541997, -0.18878666, 0.15028352], [-0.16066605, -0.2808969 , 0.57597065, ..., 0.5275858 , -0.11141359, 0.0288756 ], [-0.04428164, -0.2027959 , 0.5909355 , ..., 0.8133832 , -0.39075807, -0.02601732]], [[ 0.1890357 , 0.02752566, -0.06513734, ..., -0.00620209, 0.15053885, 0.03165438], [ 0.59161496, 0.7589142 , -0.07240665, ..., 0.6190399 , 0.8292888 , 0.16161951], [ 1.4460827 , 0.44602665, 0.40990275, ..., 0.48255897, 0.6269115 , 0.13463429], ..., [ 0.15147886, -0.21573871, 0.70329094, ..., -0.12537216, -0.13787249, 0.2772206 ], [ 0.05143798, -0.24052703, 0.53569126, ..., -0.07915019, -0.03307928, 0.1738091 ], [ 0.20934705, -0.1564527 , 0.60395455, ..., 0.3290355 , -0.35827166, 0.0810039 ]]], dtype=float32)>, <tf.Tensor: shape=(2, 128, 768), dtype=float32, numpy= array([[[ 0.01418131, -0.22088248, -0.15028146, ..., 0.11415625, 0.12618116, 0.04843395], [ 1.2033907 , 1.3469863 , 1.7064534 , ..., 0.30610552, -0.5074256 , -0.5514745 ], [ 0.42169097, 0.81102455, -0.25631624, ..., -0.07722519, 0.893724 , -1.447206 ], ..., [-0.19047494, -0.23860876, 0.8141204 , ..., 0.97493625, -0.3477423 , -0.0873356 ], [-0.27150998, -0.31985015, 0.7659389 , ..., 0.9676174 , -0.295119 , -0.15731782], [-0.2130275 , -0.1922971 , 0.7338777 , ..., 1.1040441 , -0.45102894, -0.20683062]], [[ 0.08973329, -0.18419679, -0.1664508 , ..., 0.02761324, 0.11187711, 0.08041722], [ 0.58311343, 0.5957031 , 0.3601955 , ..., 0.41270113, 0.26809165, 0.28400558], [ 2.1166675 , 0.517694 , 0.86377466, ..., 0.71787316, 0.3240508 , 0.09740011], ..., [ 0.24372883, -0.05775562, 0.68428797, ..., 0.4348356 , -0.57660955, -0.11131063], [ 0.16803873, -0.03091818, 0.5863844 , ..., 0.49625814, -0.5056798 , -0.20782214], [ 0.24831744, 0.00315493, 0.51592547, ..., 0.80502397, -0.6989963 , -0.2418645 ]]], dtype=float32)>, <tf.Tensor: shape=(2, 128, 768), dtype=float32, numpy= array([[[ 0.02275399, -0.27980262, 0.02345765, ..., 0.2786705 , 0.11714804, 0.18175101], [ 1.2574846 , 0.87250787, 1.6266022 , ..., 0.4521088 , -0.80902594, -0.5448983 ], [ 0.75219846, 0.6357363 , -0.20566168, ..., -0.3238182 , 0.7574955 , -1.458792 ], ..., [-0.1510742 , -0.21129102, 0.9689462 , ..., 1.1261966 , -0.03214083, -0.22340278], [-0.2812558 , -0.31140104, 0.8432894 , ..., 1.1342676 , -0.08336546, -0.2516126 ], [-0.24449037, -0.215379 , 0.9480984 , ..., 1.241942 , -0.19873466, -0.33752578]], [[ 0.10617051, -0.27990732, -0.01731809, ..., 0.20060413, 0.08148402, 0.21859062], [ 0.6892589 , 0.31591517, 0.55586636, ..., 0.6903949 , -0.07141593, 0.41407183], [ 2.5758884 , 0.625209 , 1.2503716 , ..., 0.4395773 , -0.18525599, -0.05004852], ..., [ 0.2046436 , -0.01562005, 0.8343147 , ..., 0.8014956 , -0.12853897, -0.35842058], [-0.02875658, 0.05097827, 0.6815922 , ..., 0.9003147 , -0.12434906, -0.4225638 ], [ 0.13327706, 0.02257299, 0.7722171 , ..., 1.0211185 , -0.30834687, -0.45623207]]], dtype=float32)>, <tf.Tensor: shape=(2, 128, 768), dtype=float32, numpy= array([[[ 0.11484912, -0.64441085, -0.14245078, ..., 0.25474218, 0.00378427, 0.6110514 ], [ 1.3035429 , 0.7706043 , 1.3185003 , ..., 0.3582247 , -0.6412159 , -0.32795385], [ 1.189674 , 0.6279441 , -0.67501664, ..., -0.2887061 , 0.4779193 , -1.2805998 ], ..., [-0.22614792, -0.6276296 , 1.0227487 , ..., 0.82923126, -0.4031471 , 0.04389255], [-0.39901423, -0.7561907 , 0.7489382 , ..., 0.7492265 , -0.4507768 , -0.00367952], [-0.37728718, -0.783104 , 0.90572494, ..., 0.9736228 , -0.4828573 , -0.07389825]], [[ 0.15151377, -0.7075228 , -0.27520463, ..., 0.44345918, -0.20920427, 0.49860123], [ 0.88647956, -0.24838187, 0.73537445, ..., 0.74179393, -0.13177216, 0.1015849 ], [ 2.5865812 , 0.6188286 , 0.5279882 , ..., 0.8487309 , -0.59150153, 0.02349886], ..., [-0.05526871, -0.4339466 , 1.1783332 , ..., 0.91791636, -0.45718166, -0.25014636], [-0.29328704, -0.23091224, 0.9938007 , ..., 1.0353788 , -0.42435738, -0.34947628], [-0.15899107, -0.50098157, 0.9814213 , ..., 1.1373078 , -0.61888385, -0.4445779 ]]], dtype=float32)>, <tf.Tensor: shape=(2, 128, 768), dtype=float32, numpy= array([[[-0.2217483 , -0.4222534 , -0.04924012, ..., -0.26284197, 0.06099971, 0.6379489 ], [ 1.1627907 , 0.6770532 , 1.3072394 , ..., 0.22794655, -0.5432949 , -0.27383307], [ 1.5308232 , 0.7222546 , -0.43608722, ..., 0.2743583 , 0.24980283, -0.9875375 ], ..., [-0.17669769, -0.3126063 , 1.0875314 , ..., 0.59242195, 0.01275288, -0.28171927], [-0.34473717, -0.42430112, 0.879802 , ..., 0.54826146, -0.08151911, -0.34001422], [-0.48139775, -0.31324372, 1.1702341 , ..., 0.87309855, -0.07970213, -0.45325497]], [[-0.2160815 , -0.8986422 , -0.44991562, ..., -0.08286194, -0.17226878, 0.6619982 ], [ 0.66614556, -0.54916966, 0.46460512, ..., 0.17649382, 0.22822574, 0.34962195], [ 2.062962 , 0.6991985 , 0.35604444, ..., 0.5295338 , -0.34809256, 0.00831382], ..., [-0.11594024, -0.17149264, 0.8994401 , ..., 0.62990123, -0.33990195, -0.20049246], [-0.13007557, -0.02739088, 0.7079975 , ..., 0.8059463 , -0.30724132, -0.1995634 ], [-0.26314873, -0.26137084, 0.63385683, ..., 0.8150184 , -0.4540037 , -0.35120666]]], dtype=float32)>, <tf.Tensor: shape=(2, 128, 768), dtype=float32, numpy= array([[[-0.03392694, -0.39431512, 0.04223759, ..., -0.0790556 , 0.01993914, 0.7692066 ], [ 1.4294361 , -0.08699028, 1.5299492 , ..., 0.22512148, -1.0060076 , -0.23702255], [ 1.5031741 , 0.6725791 , -0.5259025 , ..., 0.10906923, 0.2704243 , -1.2567499 ], ..., [-0.30033568, -0.02261979, 1.3187404 , ..., 0.69288784, -0.09798449, -0.17419714], [-0.43754095, -0.21149294, 1.109457 , ..., 0.50066054, -0.17692389, -0.18529117], [-0.6592039 , -0.19607994, 1.3134065 , ..., 0.7320082 , -0.19967368, -0.32469186]], [[-0.27334678, -0.95268494, -0.7869662 , ..., -0.12205298, -0.08783774, 0.75811625], [ 0.46757 , -0.16237524, -0.0115591 , ..., 0.18781273, 0.62164736, 0.03009928], [ 1.7947402 , 0.9051077 , 0.08580893, ..., 0.7709361 , -0.6682784 , -0.0675548 ], ..., [-0.18337469, -0.15806566, 1.1826948 , ..., 0.83047485, -0.3940643 , -0.23047718], [-0.2227019 , 0.00975984, 0.9340841 , ..., 1.128668 , -0.34884343, -0.19491431], [-0.30386138, -0.29683134, 0.8247166 , ..., 1.0090816 , -0.46407676, -0.31465578]]], dtype=float32)>, <tf.Tensor: shape=(2, 128, 768), dtype=float32, numpy= array([[[-0.23803425, -0.66554177, 0.2991721 , ..., 0.03415202, 0.2936285 , 0.9653632 ], [ 1.5040764 , 0.00695128, 1.4314141 , ..., 0.24628803, -0.55219656, -0.3221549 ], [ 1.362903 , 0.47741804, -0.6152993 , ..., 0.08586977, 0.43922484, -1.5594147 ], ..., [-0.16863999, 0.00209653, 1.3201766 , ..., 0.94280434, -0.04737326, 0.21490407], [-0.26666343, -0.24881297, 1.2213267 , ..., 0.635281 , -0.04442438, 0.04466298], [-0.5336603 , -0.35796118, 1.344787 , ..., 0.60384506, -0.08427176, -0.157121 ]], [[-0.05967471, -0.84482205, -0.8939023 , ..., -0.10863847, 0.40932408, 0.794086 ], [ 0.54405093, -0.31962943, -0.45491558, ..., 0.48068655, 0.77862567, 0.22774288], [ 1.4479873 , 1.1230614 , 0.09567896, ..., 1.3554295 , 0.06143576, -0.00836849], ..., [ 0.2565228 , -0.14106475, 1.1854348 , ..., 0.9568366 , -0.2160393 , -0.06946394], [ 0.25819156, 0.07949193, 0.9851643 , ..., 1.0518225 , -0.07890907, -0.128415 ], [-0.05268869, -0.3343134 , 0.9637343 , ..., 0.8960704 , -0.10082424, -0.2812959 ]]], dtype=float32)>, <tf.Tensor: shape=(2, 128, 768), dtype=float32, numpy= array([[[-7.0826747e-02, -2.5689974e-01, -3.5962485e-02, ..., -3.5881209e-01, 1.3524806e-01, 1.0136114e+00], [ 9.7471899e-01, -1.9712862e-01, 1.4318084e+00, ..., -1.4052060e-01, -5.0432123e-02, -2.7142027e-02], [ 1.1273400e+00, 2.7122790e-01, -2.8622937e-01, ..., 9.3385458e-02, 2.9908311e-01, -1.3764435e+00], ..., [-9.5786773e-02, 2.9180759e-01, 1.5879148e+00, ..., 8.5004282e-01, -5.1284507e-02, 2.0340082e-01], [-2.2268736e-01, -8.3895952e-02, 1.7256017e+00, ..., 6.1222303e-01, 1.0976561e-01, 7.0047222e-02], [-5.7829899e-01, -3.5915700e-01, 1.6035969e+00, ..., 3.4189978e-01, 5.2881844e-02, -1.5400952e-01]], [[ 6.1017923e-02, -3.5161677e-01, -8.4891480e-01, ..., -4.8554888e-01, 4.2703065e-01, 6.3828975e-01], [ 6.6376495e-01, 1.8229987e-04, -7.9078859e-01, ..., 3.0686966e-01, 7.5843722e-01, 7.3240143e-01], [ 1.4494702e+00, 1.1265037e+00, 1.7709173e-01, ..., 2.3471612e-01, 3.4929511e-01, 4.3709803e-01], ..., [ 3.2339555e-01, 1.7575914e-01, 8.7672395e-01, ..., 9.8904103e-01, -2.7050799e-01, -3.0809429e-01], [ 4.0404287e-01, 5.1364994e-01, 7.2748107e-01, ..., 9.4942153e-01, -1.0214908e-02, -3.4644267e-01], [-9.1299685e-03, -8.7015972e-02, 7.3031861e-01, ..., 8.5439759e-01, -1.3661277e-01, -4.7493538e-01]]], dtype=float32)>, <tf.Tensor: shape=(2, 128, 768), dtype=float32, numpy= array([[[-0.09403469, -0.05987653, -0.01171679, ..., -0.22167481, -0.07650117, 0.57658136], [ 0.7748063 , 0.1518831 , 1.0920016 , ..., -0.19562437, 0.15173234, -0.01071449], [ 0.8519131 , 0.2584272 , -0.7703552 , ..., 0.03356653, 0.34010494, -1.3918676 ], ..., [-0.19780311, 0.52819705, 0.9023532 , ..., 0.27919745, -0.28764004, 0.75109327], [-0.35067317, 0.09291972, 1.3383352 , ..., 0.2261973 , -0.03588998, 0.27343997], [-0.9044491 , -0.20991278, 1.1977361 , ..., 0.35118693, -0.21743454, 0.01482266]], [[-0.07011057, -0.02392533, -0.6192433 , ..., -0.14541987, 0.34049344, 0.41224727], [ 0.5926157 , 0.19015956, -0.3739956 , ..., 0.37239593, 0.3915293 , 0.42580423], [ 1.1394775 , 0.84398603, 0.38893998, ..., 0.20898293, 0.25176358, 0.2640002 ], ..., [ 0.43306 , 0.42012244, 1.0965106 , ..., 1.1872368 , -0.16672091, 0.00856215], [ 0.57542306, 0.8660947 , 1.1184691 , ..., 1.0475059 , 0.04264343, -0.01540311], [-0.0548792 , -0.07306853, 0.901222 , ..., 0.8004535 , -0.16820708, -0.3550737 ]]], dtype=float32)>, <tf.Tensor: shape=(2, 128, 768), dtype=float32, numpy= array([[[-0.22671375, 0.03110819, 0.3218609 , ..., -0.2809014 , -0.75050825, 0.4455798 ], [ 0.77125525, 0.11217973, 0.6603322 , ..., -0.09223831, 0.3482124 , -0.3690516 ], [ 0.5122408 , -0.19928965, -0.8477503 , ..., -0.2881211 , 0.22383787, -1.2124597 ], ..., [ 0.12500736, 0.66633946, 0.9852208 , ..., -0.08709359, -0.5805453 , 0.6127232 ], [ 0.01400474, 0.10471254, 1.2684819 , ..., -0.01833086, -0.35933188, 0.15463208], [-0.8093821 , -0.3606813 , 1.350128 , ..., 0.5893608 , -0.5093217 , -0.09766993]], [[-0.21534939, -0.23962246, -0.30932406, ..., -0.25964907, 0.09695174, 0.41826627], [ 0.50700617, -0.01544808, -0.10519649, ..., 0.38675267, 0.22041193, -0.10293254], [ 1.0366058 , 0.9171315 , 0.26164216, ..., 0.33182117, 0.4559701 , 0.1269788 ], ..., [ 0.16190636, 0.76270264, 1.273768 , ..., 0.57260054, -0.05929537, 0.26224107], [ 0.46124512, 1.1673878 , 0.9594775 , ..., 0.46033052, 0.13521405, 0.4513607 ], [-0.24183989, 0.17352833, 0.8415647 , ..., 0.24766903, -0.14810342, 0.06557457]]], dtype=float32)>, <tf.Tensor: shape=(2, 128, 768), dtype=float32, numpy= array([[[-2.19522044e-02, 2.13035598e-01, 3.11359406e-01, ..., -2.39179075e-01, -2.78708607e-01, 2.04084888e-01], [ 8.30984175e-01, 1.90781176e-01, 7.11346030e-01, ..., -3.00244600e-01, 3.11525255e-01, -2.59340733e-01], [ 3.73089671e-01, -4.22126234e-01, -6.66908681e-01, ..., -4.52976614e-01, 3.20452750e-01, -2.99885243e-01], ..., [ 3.31665605e-01, 7.18357861e-01, 8.99610221e-01, ..., -3.49144906e-01, -2.99868882e-01, 5.23887873e-01], [ 2.38533929e-01, 2.04844743e-01, 1.13064253e+00, ..., -1.51129961e-01, -1.37810916e-01, 7.69993141e-02], [-3.84118825e-01, -3.81587178e-01, 1.33972895e+00, ..., 5.77555180e-01, -1.55398563e-01, -2.92945951e-01]], [[ 1.43630832e-01, 1.69801652e-01, 4.51198295e-02, ..., -6.21673279e-02, -1.57534312e-02, 2.87869871e-01], [ 5.84760249e-01, 2.69933075e-01, -2.85206795e-01, ..., 3.38930964e-01, 1.17773235e-01, 3.69864032e-02], [ 1.25257993e+00, 1.25564480e+00, 3.87543976e-01, ..., 1.72757044e-01, 4.96662259e-01, 6.13781393e-01], ..., [ 1.95486635e-01, 4.40820843e-01, 1.03893721e+00, ..., 1.45270735e-01, -2.77439207e-01, 1.90322071e-01], [ 4.48164970e-01, 7.64477909e-01, 6.98005378e-01, ..., -8.35609622e-04, -7.96449929e-02, 4.93359417e-01], [-2.33980551e-01, -2.03335062e-01, 3.48247796e-01, ..., -6.43417686e-02, -3.20876807e-01, 2.23351195e-02]]], dtype=float32)>, <tf.Tensor: shape=(2, 128, 768), dtype=float32, numpy= array([[[ 0.07292037, 0.08567802, 0.14476836, ..., -0.0967709 , 0.08722147, 0.07711098], [ 0.17839421, -0.19006097, 0.5034945 , ..., -0.0586979 , 0.32717168, -0.1557853 ], [ 0.187015 , -0.4338879 , -0.4887509 , ..., -0.15502754, 0.00145202, -0.24470882], ..., [ 0.1208308 , 0.12884231, 0.464535 , ..., 0.07375526, 0.17441988, 0.16522081], [ 0.07967911, -0.01190642, 0.50225425, ..., 0.13777754, 0.21002221, 0.0062458 ], [-0.07212636, -0.28303444, 0.5903337 , ..., 0.47551885, 0.16668531, -0.08920398]], [[-0.0790057 , 0.36335146, -0.21101563, ..., -0.1718371 , 0.1629976 , 0.6724266 ], [ 0.27883556, 0.43716326, -0.35764703, ..., -0.04463666, 0.38315117, 0.588798 ], [ 1.2037678 , 1.0727024 , 0.4840874 , ..., 0.24921025, 0.40730858, 0.40481782], ..., [ 0.08630062, 0.19353887, 0.47540048, ..., 0.1888018 , -0.06474137, 0.3131858 ], [ 0.15887049, 0.28572696, 0.373408 , ..., 0.09309126, -0.04969583, 0.3876115 ], [-0.08079858, -0.09572807, 0.26809806, ..., 0.13979636, -0.06315896, 0.2728837 ]]], dtype=float32)>]