๐Ÿ“ƒ GloVe ๋…ผ๋ฌธ ์ •๋ฆฌํ•ด๋ณด๊ธฐ

์ด ๋…ผ๋ฌธ์€ cs224n ๊ฐ•์˜ 2๊ฐ•์—์„œ suggested readings๋กœ ์ถ”์ฒœ๋œ ๋…ผ๋ฌธ์ด๋‹ค. ์Šคํƒ ํฌ๋“œ์—์„œ ์ž‘์„ฑํ•œ ๋…ผ๋ฌธ์ด๊ณ , ์˜๋ฏธ๊ถŒ์—์„œ ๋ฒ ์ด์Šค๋กœ ํ™œ์šฉ์ด ๋งŽ์ด ๋œ๋‹ค๊ณ  ํ•ด์„œ ๋”ฐ๋กœ ์ •๋ฆฌ๋ฅผ ํ•ด๋ณด๊ธฐ๋กœ ํ—€๋‹ค! ์‚ฌ์‹ค ์ด๋ฒˆ ํฌ์ŠคํŠธ๋Š” ๋‚ด๊ฐ€ ๋‹ค์‹œ ๋ณด๊ธฐ ์œ„ํ•ด ๊ฐ„๋‹จํ•˜๊ฒŒ ์ •๋ฆฌํ•˜๋Š” ๊ฒƒ์ด๋ผ ์„ค๋ช…์ด ๋ถ€์กฑํ•˜๋‹ค.

Introduction

word vector๋ฅผ ํ•™์Šตํ•˜๋Š” ๋ฐฉ๋ฒ•์—๋Š” ๋‘๊ฐ€์ง€ ๋ฐฉ๋ฒ•์ด ์žˆ๋‹ค. ํ•˜๋‚˜๋Š” global matrix factorization methods์ด๊ณ , ๋‹ค๋ฅธ ํ•˜๋‚˜๋Š” local context window methods์ด๋‹ค. ์ „์ž์—๋Š” LSA ๊ฐ™์€ ๊ฒƒ์ด ํ•ด๋‹น๋˜๊ณ , ํ›„์ž์—๋Š” skipgram๊ฐ™์€ ๊ฒƒ์ด ํ•ด๋‹น๋œ๋‹ค. LSA์™€ ๊ฐ™์€ ๋ชจ๋ธ๋“ค์€ ํ†ต๊ณ„ํ•™์  ์ •๋ณด๋“ค์„ ํšจ์œจ์ ์œผ๋กœ ๊ทน๋Œ€ํ™”์‹œํ‚ค๋Š” ๋Œ€์‹  analogy task์—๋Š” ํ˜•ํŽธ์—†๋‹ค. skipgram๊ณผ ๊ฐ™์€ ๋ชจ๋ธ๋“ค์€ ๊ทธ์— ๋น„ํ•ด analogy์—๋Š” ํŠนํ™”๋˜์–ด ์žˆ์œผ๋‚˜, ํ†ต๊ณ„์ ์ธ ์ •๋ณด๋“ค์„ ์ œ๋Œ€๋กœ ์ˆ˜์ง‘ํ•˜๊ธฐ๊ฐ€ ํž˜๋“ค๋‹ค.

๊ทธ๋ž˜์„œ ์ด ๋…ผ๋ฌธ์—์„œ weighted least squares model์„ ์†Œ๊ฐœํ•˜๋Š”๋ฐ global word-word co-occurance counts๋ฅผ ํ›ˆ๋ จํ•œ ๋ชจ๋ธ์ด๊ณ , ํ†ต๊ฒŒ์ ์ธ ์ •๋ณด๋„ ์ž˜ ํ™œ์šฉํ•˜๊ฒŒ ๋งŒ๋“ค์—ˆ๋‹ค๊ณ  ํ•œ๋‹ค. 1

The GloVe Model

๊ฐ€ ๊ธฐ๋ณธ ๋‹จ์œ„์ธ word-word co-occurance count๋ฅผ ๋ผ๊ณ  ํ•œ๋‹ค. ๊ทธ๋ฆฌ๊ณ  ๋ผ ํ•œ๋‹ค. context word ์—์„œ word ๊ฐ€ ๋‚˜ํƒ€๋‚  ํ™•๋ฅ ์€

์ด๋‹ค. ์—ฌ๊ธฐ์„œ ๊ฐ๊ฐ์˜ ๋‹จ์–ด๊ฐ€ ๋‚˜ํƒ€๋‚  ํ™•๋ฅ ์˜ ๋น„์œจ์„ GloVe์—์„œ ํ™œ์šฉํ•˜๊ฒŒ ๋˜๋Š”๋ฐ ์„ธ ๋‹จ์–ด , , ์— ๋Œ€ํ•ด ์•„๋ž˜์ฒ˜๋Ÿผ ์ ์„ ์ˆ˜ ์žˆ๋‹ค.

๋Š” context word vector์ธ๋ฐ, word2vec์ด , ๋ฒกํ„ฐ๋ฅผ ๋‚˜๋ˆ„์–ด์“ฐ๋Š” ๊ฒƒ๊ณผ ๋น„์Šทํ•˜๊ฒŒ ์ƒ๊ฐํ•˜๋ฉด ๋ ๋“ฏ ์‹ถ๋‹ค. ์ž ์œ„์˜ ์‹์—์„œ ๊ฐ€ ๋‘ ๋‹จ์–ด์˜ ์ฐจ์ด์— ์˜์กด์ ์ด๋‹ˆ ์ด๋ ‡๊ฒŒ ๋ฐ”๊พธ๊ณ ,

๋˜ ์ธ์ž๋Š” ๋ฒกํ„ฐ์ธ๋ฐ ๋ฐ˜ํ™˜ํ•˜๋Š” ๊ฒƒ์€ ์Šค์นผ๋ผ๊ฐ’์ด๋‹ˆ ์ด๋ ‡๊ฒŒ ๋ฐ”๊พธ์ž

๊ทผ๋ฐ ์—ฌ๊ธฐ์„œ ์•Œ์•„์•ผ ํ•  ์ ์ด, word-word co-occurance matrix ๋ž‘, word , context word ๋ž‘ ๊ตฌ๋ถ„์ด ๋ชจํ˜ธํ•˜๋‹ค. ๊ทธ๋ž˜์„œ ๋ฅผ symmetricํ•˜๊ฒŒ, ๋Š” ์™€ ๋ฐ”๊ฟ”์“ธ ์ˆ˜ ์žˆ๊ฒŒ ํ•ด์•ผํ•œ๋‹ค. ๊ทธ๋Ÿฌ๊ธฐ ์œ„ํ•ด์„œ ๊ฐ€ group ์™€ ์— ๋Œ€ํ•ด homomorphismํ•จ์„ ํ•„์š”๋กœ ํ•œ๋‹ค. 2 ๊ทธ๋Ÿฌํ•œ homomorphism์„ ๋ณด์žฅ๋ฐ›์œผ๋ฉด ์ด๋ ‡๊ฒŒ ์ˆ˜์ •์ด ๊ฐ€๋Šฅํ•˜๋‹ค.

๋Š” ๋งˆ์น˜ exp์™€ ๋น„์Šทํ•˜๊ฒŒ ํ’€์–ด์ง€๋ฏ€๋กœ, ์ด๋Ÿฐ์‹์œผ๋กœ ์œ ๋„ํ•ด๋ณด์ž. (๋งจ ์œ„์˜ ์‹ ์ฐธ๊ณ )

๊ทผ๋ฐ ์ด๊ฒŒ ํ•ญ์ด $์— ๋…๋ฆฝ์ ์ด๋ผ ์ด๋Ÿฐ์‹์œผ๋กœ bias๋กœ ์ •๋ฆฌ๊ฐ€๋Šฅํ•˜๋‹ค.

๊ทผ๋ฐ ์—ฌ๊ธฐ์„œ ๋˜ ๋ฌธ์ œ์ ์ด ๊ฐ€ 0์ด ๋‚˜์˜ฌ์ˆ˜ ์žˆ๋‹ค๋Š” ์ ..์ธ๋ฐ, ์ด๊ฑธ ์™€ ๊ฐ™์€ ํ˜•์‹์œผ๋กœ ์˜ sparsity๋ฅผ ์œ ์ง€ํ•˜๋ฉด์„œ divergence๋ฅผ ํ”ผํ•œ๋‹ค๊ณ  ํ•œ๋‹ค. ์ž ์—ฌํŠผ ์—ฌ๊ธฐ์„œ cost function์„ ๋ฝ‘์•„๋‚ด๋Š”๋ฐ, ๊ทธ ์‹์ด ์•„๋ž˜์™€ ๊ฐ™๋‹ค.

์ด ์‹์„ ๋ณด๋ฉด์„œ ์–ด๋Š์ •๋„ ๋– ์˜ค๋ฅธ ์•„์ด๋””์–ด๋Š” โ€œ๊ฐ€ ์‹ค์ œ co-occurance์ด๊ณ , ๋Š” word vector๋กœ๋ถ€ํ„ฐ ๋ฝ‘์•„๋‚ด๋Š” ์˜ˆ์ƒ co-occurance๋‹ˆ๊นŒ ๊ทธ ์‚ฌ์ด์˜ ์ฐจ์ด๋ฅผ ์ œ๊ณฑํ•œ๋‹ค์Œ ๊ฐ๊ฐ ๊ฐ€์ค‘์น˜๋ฅผ ์ฃผ์–ด ํ•ฉํ•˜๋ฉด cost function์ธ๊ฐ€?โ€ ์ •๋„์ด๋‹ค. ๋ฌผ๋ก  ํ˜ผ์ž ์ƒ๊ฐํ•œ๊ฑฐ๋ผ ์ •ํ™•ํ•œ์ง€๋Š”.. ๋ชจ๋ฅด๊ฒ ๋‹ค.

Relationship to Other Models

์„ word ๊ฐ€ context of word ์— ๋‚˜ํƒ€๋‚  ํ™•๋ฅ ์ด๋ผ๊ณ  ํ•  ๋•Œ, ๋‹ค์Œ ์‹๊ณผ ๊ฐ™์•„์ง„๋‹ค.

softmax์ธ๋ฐ, ๊ทธ๋ฅผ ์ด์šฉํ•œ objective function์€ ๋‹ค์Œ๊ณผ ๊ฐ™๋‹ค.

์ด๋ฅผ co-occurance matrix ๋ฅผ ๋ฏธ๋ฆฌ ๊ณ„์‚ฐํ•ด์„œ ์ด๋ ‡๊ฒŒ ๋ณ€ํ˜•ํ•˜๋ฉด ํ›จ์”ฌ ๋นจ๋ผ์ง„๋‹ค. (์˜ 75% ~ 90%๊ฐ€ 0์ด๋‹ˆ..)

์—ฌ๊ธฐ์„œ ์•ž์˜ ์‹๋“ค์„ ์ด์šฉํ•ด ์ด๋ ‡๊ฒŒ ๋ณ€ํ˜•์ด ๊ฐ€๋Šฅํ•˜๋‹ค.

์—ฌ๊ธฐ์„œ ๋Š” Cross entropy์ด๋‹ค. ๊ทผ๋ฐ, cross entropy๋Š” distance๋ฅผ ์ธก์ •ํ•˜๋Š” ๋ฐฉ๋ฒ•์ค‘ ํ•˜๋‚˜์ธ๋ฐ, ์–ด๋–ค ๋•Œ์— weight๋ฅผ ๋„ˆ๋ฌด ๋งŽ์ด ์ค€๋‹ค๊ณ  ํ•œ๋‹ค. ๊ทธ๋ž˜์„œ ์ด๋ ‡๊ฒŒ ๋‹ค๋ฅธ ๊ฑฐ๋ฆฌ๋ฅผ ์“ฐ๋„๋ก ๋ฐ”๊ฟ”์ฃผ์ž.

์—ฌ๊ธฐ์„œ , ์ฒ˜๋Ÿผ unnormalize๋œ ๋ถ„ํฌ๊ฐ€ ๋œ๋‹ค. ๊ทผ๋ฐ ์ด๊ฒƒ๋„ ๋ฌธ์ œ๊ฐ€ ์žˆ๋‹ค. ๋Š” ๋ณดํ†ต ๋„ˆ๋ฌด ํฐ ๊ฐ’์„ ์ทจํ•˜๊ฒŒ ๋˜๋ฏ€๋กœ, squared error๋ฅผ minimizeํ•ด์ฃผ์ž.

์ž ๊ทผ๋ฐ ์ด๊ฑด ๊ฒฐ๊ตญ GloVe์˜ cost function๊ณผ ๊ฐ™์€ ํ˜•ํƒœ๊ฐ€ ๋˜์—ˆ๋‹ค.


๊ทธ ๋’ค๋Š” ๋„ˆ๋ฌด ์–ด๋ ค์›Œ๋ณด์—ฌ์„œ ์•„์ง ๋ชป๋ดค๋‹ค ใ… ใ… 

  1. http://nlp.stanford.edu/projects/glove/ ์—ฌ๊ธฐ์— ์†Œ์Šค์ฝ”๋“œ๊ฐ€ ์˜ฌ๋ผ๊ฐ€ ์žˆ๋‹ค.ย 

  2. https://en.wikipedia.org/wiki/Group_homomorphism ์—ฌ๊ธฐ์— ๊ฐ„๋žตํ•˜๊ฒŒ ์ž˜ ์„ค๋ช…๋˜์–ด ์žˆ๋‹ค.ย 

April 8, 2019 ์— ์ž‘์„ฑ
Tags: machine learning nlp paper