딥러닝 텐서플로 교과서: 9.2.2 토큰화

slide 1 of 18, currently active
slide 2 of 18
slide 3 of 18
slide 4 of 18
slide 5 of 18
slide 6 of 18
slide 7 of 18
slide 8 of 18
slide 9 of 18
slide 10 of 18
slide 11 of 18
slide 12 of 18
slide 13 of 18
slide 14 of 18
slide 15 of 18
slide 16 of 18
slide 17 of 18
slide 18 of 18

다음은 아포스트로피가 포함된 문장에서 단어 토큰화를 실행한 결과입니다.

['it', ''', 's', 'nothing', 'that', 'you', 'don', ''', 't', 'already', 'know', 'except', 'most', 'people', 'aren', ''', 't', 'aware', 'of', 'how', 'their', 'inner', 'world', 'works', '.']

마지막으로 NLTK가 아닌 케라스를 이용하여 주어진 문장을 구분해 보겠습니다. 케라스에서는 text_to_word_sequence를 이용합니다.

코드 9-18 케라스를 이용한 단어 토큰화

from tensorflow.keras.preprocessing.text import text_to_word_sequence
sentence = "it's nothing that you don't already know except most people aren't aware of how their inner world works."
words = text_to_word_sequence(sentence)
print(words)

다음은 케라스를 이용한 단어 토큰화를 실행한 결과입니다.

['it's', 'nothing', 'that', 'you', 'don't', 'already', 'know', 'except', 'most', 'people', 'aren't', 'aware', 'of', 'how', 'their', 'inner', 'world', 'works', '.']

신간 소식 구독하기

뉴스레터에 가입하시고 이메일로 신간 소식을 받아 보세요.