모두의 한국어 텍스트 분석 with 파이썬: LESSON 08 단어 벡터화하기

train_feature_tfidf를 출력해 보면 <9131x22377 sparse matrix of type '<class 'numpy.float64'>'with 45928 stored elements in Compressed Sparse Row format>라고 나올 것이다(이전 장에서 결과를 출력해 봤기 때문에 여기서는 생략한다).

tfidf_vect 뒤에 get_feature_names_out() 메서드를 붙여서 vocab을 만들고 vocab[:10]을 출력해 보면 생성된 단어 사전(array)을 참조할 수 있다. 10개까지만 살펴보자.

vocab = tfidf_vect.get_feature_names_out()
print(len(vocab))
vocab[:10]

실행 결과

22377
array(['aa로', 'abs', 'acl', 'afc', 'afc 챔스리그', 'afc 챔피언십', 'afc 회장', 'ag', 'ag 우승', 'ai'], dtype=object)

신간 소식 구독하기

뉴스레터에 가입하시고 이메일로 신간 소식을 받아 보세요.