캐글 메달리스트가 알려주는 캐글 노하우: 7.4.6 학습 with TPU

6. transformers-BERT 모델 로드 후 TPU strategy 적용 및 학습, Weight 저장

transformers 라이브러리에서 BERT 모델 구조를 선언하고 from_pretrained() 함수를 통해 사전 학습 가중치(Weight)를 다운로드합니다. 이때 TPU를 사용해야 하기 때문에 앞서 TPU를 구성할 때 만들었던 strategy를 with strategy.score() 형태로 적용한 채로 모델 레이어를 구성해야 합니다.

from tensorflow.python.keras.layers import Dense, Input, Dropout
from tensorflow.python.keras.optimizers import Adam
from tensorflow.python.keras.models import Model
from tqdm.auto import tqdm

def build_model(transformer_layer, loss="binary_crossentropy", max_len=220):
    input_ids = Input(shape=(max_len,), dtype=tf.int32, name="input_ids")
    attention_mask = Input(shape=(max_len,), dtype=tf.int32, name="attention_mask")

    sequence_output = transformer_layer([input_ids, attention_mask])
    hidden_state = sequence_output["last_hidden_state"]
    cls_token = hidden_state[:, 0, :]  # cls_token은 첫 번째

    x = Dropout(0.35)(cls_token)
    out = Dense(1, activation="sigmoid")(x)

신간 소식 구독하기

뉴스레터에 가입하시고 이메일로 신간 소식을 받아 보세요.