



!! pip install –upgrade tensorflow_hub import tensorflow_hub ashub import numpy as np BERT load model load ## load bert from tensorhub module_url = “https://tfhub.dev/tensorflow/bert_en_uncased_L-24_H-1024_A-16/1” bert_layer = hub. KerasLayer (module_url, trainable = False)

trainingable = You don’t want to retrain the Bert layer, so you accidentally freeze the pre-trained Bert layer.

BERT model Versionbert_en_uncased_L-24_H-1024_A-16 model

Hidden layer (transformer block) with L = 24, hidden layer A = 16 attention head with H = 1024.

This model is trained on Wikipedia and the Books Corpus dataset. en_uncased indicates that the model is pre-trained for English and is not case sensitive.

Loading tokenizer

Training requires parsing text datasets into BERT-supported input formats. To do this, first tokenize the dataset and then convert it to a feature (encode it to some number)

Dividing a sentence into individual words is called tokenization.

Import the tokenizer file

!! wget –quiet https://raw.githubusercontent.com/tensorflow/models/master/official/nlp/bert/tokenization.py import tokenization

Tokenizer settings

vocab_file = bert_layer.resolved_object.vocab_file.asset_path.numpy () do_lower_case = bert_layer.resolved_object.do_lower_case.numpy () tokenizer = tokenization.FullTokenizer (vocab_file, do_lower_case) vocab_file This is a vocabulary file for mapping datasets Do_lower_case to lower the token

FullTokenizerclass takes vocal_file as an input parameter.

Calling the tokenizer:

tokenizer.tokenize (‘Where are you going?’)

Understanding Input Data Format Sources: Open Source

BERT inputs a combination of three different data formats

Embedding tokens

Token embedding holds the information in the dataset.This is the number assigned to each unique word token

[CLS] The token is attached to the beginning of every statement that marks the beginning

[SEP] The token is attached to the end of every sentence that marks the end of the sentence.

Embedding position

Used to indicate the location of the token in the text.

This helps BERT capture the order or order of the information given in the statement.

Segment embedding

The model needs to know whether a particular token belongs to statement 1 or statement 2.

At BERT. This is done by generating a fixed token called segment embedding.

So far, we’ve talked about BERT, its input format, and how to load a BERT model.

Data set loading

Use the disaster tweet dataset. Download the dataset link.

This dataset contains training and test files.

train = pd.read_csv (“../ input / nlp-with-disaster-tweets-cleaning-data / train_data_cleaning.csv”, usecols =[‘text’,’target’]) Test = pd.read_csv (“../ input / nlp-with-disaster-tweets-cleaning-data / test_data_cleaning.csv”, usecols = [‘text’]).

A miserable tweet if the target is 1, otherwise a normal tweet

Preprocess the dataset to BERT format

As BERT knows to enter, the training data is a combination of 3/2 embeds. Therefore, this step prepares the dataset in BERT input format.

Required libraries:

from tensorflow.keras.layers import Dense, Input from tensorflow.keras.optimizers import Adam from tensorflow.keras.models The import Model function bert_encoder gets text data and tokenizer and creates token_embeddings, positional_embeddings and segment_embedding. Bert supports maximum lengths up to 512 defbert_encoder (texts, tokenizer, max_len = 512): # Here we need 3 data entries for bert training and fine tuning all_tokens = []

all_masks = []

all_segments = []

For text in text: text = tokenizer.tokenize (text) text_sequence = text[:max_len-2] # Here we are trimming two words if they are greater than 512 input_sequences =. [“[CLS]”]+ text_sequence + [“[SEP]”]pad_len = max_len –len (input_sequences) tokens = tokenizer.convert_tokens_to_ids (input_sequences) tokens + = [0] * pad_len = pad_masks [1] * len (input_sequences) + [0] * pad_len segment_ids = [0] * max_len all_tokens.append (tokens) all_masks.append (pad_masks) all_segments.append (segment_ids) return np.array (all_tokens), np.array (all_masks), np.array (all_segments) bert_encodertakes as tokenizer and text data input Returns three different lists: mask / position embedding, segment embedding, and token embedding. convert_tokens_to_ids maps a unique token to a vocal file and assigns a unique ID to the unique token. max_length = 512, maximum statement length in the dataset

Note: Token embedding and position embedding are required to pass BERT training

Call encoding function:

Most tweets are no longer than 150 words, so train_input = bert_encoder (train.text.values, tokenizer, max_len = 160) max_len = 160. train_input contains a list of three arrays (all_tokens, all_masks, all_segments). Building a model using the BERT layer

You need to use BERT’s pre-trained model to design your model according to your use case by adding some CNN layers that provide end predictions.

def build_model (bert_layer, max_len = 512, num_class): input_word_ids = Input (shape = (max_len,), dtype = tf.int32, name = “input_word_ids”) input_mask = Input (shape = (max_len,), dtype = tf. int32, name = “input_mask”) segment_ids = Input (shape = (max_len,), dtype = tf.int32, name = “segment_ids”) _, sequence_output = bert_layer ([input_word_ids, input_mask, segment_ids]) Clf_output = sequence_output[:, 0, :]

out = Dense (num_class, activation =’sigmoid’) (clf_output) model = Model (inputs =[input_word_ids, input_mask, segment_ids], Outputs = out) model.compile (Adam (lr = 2e-6), loss = “binary_crossentropy”, metrics =[‘accuracy’]) Return model

The function build_model takes the Bert layer, max_len, and num_class as inputs and returns the final model.

The default max_len = 512. num_class = 1 The last dense layer with one output predicts the potential for a tweet to be disastrous. The BERT layer takes a 3/2 embedded array for training[[input_words_tokens][input_maks][segement_ids]Therefore, we need to create three input layers that are the same size as max_len. Binary classification binary_cross_entropysequence_output[:, 0, :] Hidden state in the middle.

model_final will be the final model used for training.

model_final = build_model (bert_layer, max_len = 160, num_class = 1) model_final.summary () source. Kaggle.com Training Steps

So far, we have built a model and data embedding to be passed to training.

It’s time to start training.

train_history = final_model.fit (train_input, train_labels, validation_split = 0.2, epochs = 3, batch_size = 16) final_model.save (‘model.h5’) validation split = 0.2, 20% of the training data is used as validation data It means that. train_label is the target array Source: Kaggle.com

splendid! !!

After running 3 epochs, the verification accuracy was 82%.

Testing and verification

For testing and forecasting, the test data must be in the same format as the training data.

When you call the bert_encoder function on the test data, it is converted into three embeds and passed to the model.predict method.

test_input = bert_encoder (test.text.values, tokenizer, max_len = 160) test_pred = final_model.predict (test_input) prediction = np.where (test_pred> .5, 1,0)

A prediction is an array that contains the probability that a tweet will be disastrous. If the probability is greater than 0.5, classify it as miserable and label it as 1.

test[‘prediction’] = Prediction result:

Filter tweets according to your expectations.

test[test.prediction == 1]

Source: Kaggle

Perfect! !!

All tweets that are predicted to be miserable are read as miserable.

Improved results

Pre-trained models give great results on some epochs. However, you can further improve the results by making fine adjustments.

Use callbacks and dynamic learning rates for efficient training. Use a deeper BERT architecture. bert_large has more layers and can learn a relatively large amount of information.Use stacked BERT layers Add multiple CNN layers on top of a BERT layer Source: SpringerLink Conclusion

BERT is an advanced and highly powerful linguistic expression model that can be implemented in many tasks such as question answering, text classification, and text summarization.

In this article, I learned how to implement BERT in text classification and confirmed that it works.

Implementing BERT using a transformer package is much easier. In the next article, I’ll show you how to quickly implement an NLP model using a transformer package.

Download the source code using the link.

