๐Ÿ“• CS224n Lecture 6 Language Models and RNNs

CS224n ์—ฌ์„ฏ๋ฒˆ์งธ ๊ฐ•์˜๋ฅผ ๋“ฃ๊ณ  ์ •๋ฆฌํ•œ ํฌ์ŠคํŠธ!

Language Modeling

Language Modeling์ด๋ž€ ์ดํ›„์— ์–ด๋–ค ๋‹จ์–ด๊ฐ€ ๋‚˜์˜ฌ์ง€ ์˜ˆ์ธกํ•˜๋Š” ํƒœ์Šคํฌ์ด๋‹ค. ์กฐ๊ธˆ ๋” ์ •ํ™•ํ•˜๊ฒŒ ๋งํ•˜์ž๋ฉด, ์˜ ๋‹จ์–ด๊ฐ€ ์ฃผ์–ด์ง€๋ฉด, ๋‹ค์Œ ๋‹จ์–ด ์˜ ํ™•๋ฅ  ๋ถ„ํฌ๋ฅผ ์˜ˆ์ธกํ•˜๋Š” ํƒœ์Šคํฌ์ด๋‹ค.

n-gram language model

n-gram์ด๋ž€? a chunk of n consecutive words

ngram language model์ด๋ž€? collect statistics about how frequent different ngrams are, and use these to predict next word.

์ด๊ฒŒ ๋ฌด์Šจ ๋ง์ด๋ƒ๋ฉด, ์•„๋ž˜๊ฐ™์€ ์‹์œผ๋กœ ์ฒ˜๋ฆฌํ•œ๋‹ค๋Š” ๋ง์ด๋‹ค.

๊ทผ๋ฐ ์—ฌ๊ธฐ์„œ ๋ฌธ์ œ์ ์ด ๋ช‡๊ฐ€์ง€ ์žˆ๋‹ค.

  • N๊ฐœ์˜ ๋‹จ์–ด ๋ฐ–์— ์žˆ๋Š” ๋‹จ์–ด๋“ค์„ ๊ณ ๋ คํ•˜์ง€ ๋ชปํ•œ๋‹ค.
  • sparsity problem
    • ๊ฐฏ์ˆ˜๊ฐ€ 0๊ฐœ๋ฉด..?
    • denominator๋„ 0์ด๋ฉด? -> N์„ 1 ์ค„์—ฌ์„œ ๋‹ค์‹œ ์ ์šฉํ•œ๋‹ค.
    • ๋‚˜ํƒ€๋‚˜๊ธด ๋‚˜ํƒ€๋‚˜์ง€๋งŒ, ๋„ˆ๋ฌด ์กฐ๊ธˆ ๋‚˜ํƒ€๋‚˜์„œ, ์ ์ ˆํ•˜๋‹ค๊ณ  ํŒ๋‹จ์ด ๋ถˆ๊ฐ€๋Šฅํ• ๋•Œ
  • storage problem
    • corpus ์•ˆ์˜ ๋ชจ๋“  ๊ฐฏ์ˆ˜๋ฅผ ๋ณด์กดํ•ด์•ผํ•œ๋‹ค.

๊ทธ๋ž˜๋„ ์ด ๋ชจ๋ธ์„ ๊ธฐ๋ฐ˜์œผ๋กœ text๋ฅผ ๋งŒ๋“ค์–ด๋ณด๋ฉด ์ƒ๊ฐ๋ณด๋‹ค grammaticalํ•˜๋‹ค. ๊ทผ๋ฐ, incoherentํ•˜๋‹ค.

Neural Network Language Model

fixed window neural network๋ฅผ ์‚ฌ์šฉํ•ด์•ผํ•˜๋‚˜?? -> ์˜ˆ์ธกํ•  ๋‹จ์–ด์˜ N๊ฐœ์˜ ๋‹จ์–ด๋ฅผ ๋“ค๊ณ ์™€์„œ ์ž„๋ฒ ๋”ฉ ํ•œ ํ›„ ๋ชจ๋ธ์— ๋„ฃ์–ด์„œ ๋‹ค์Œ ๋‹จ์–ด๋ฅผ ์˜ˆ์ธกํ•œ๋‹ค??

  • sparsity problem์ด ์—†๋‹ค.
  • ๋ชจ๋“  ๊ฐฏ์ˆ˜๋ฅผ ๋ณด์กดํ•  ํ•„์š”๊ฐ€ ์—†๋‹ค.
  • fixed window๊ฐ€ ์ž‘๋‹ค๋ฉด?
    • large window๋ฅผ ์“ด๋‹ค๋ฉด ์–ด๋–ค๊ฐ€? -> weight matrix๊ฐ€ ๋„ˆ๋ฌด ์ปค์ง„๋‹ค.
    • ๊ทธ๋ž˜์„œ ์ž‘๊ฒŒ ์œ ์ง€ํ•œ๋‹ค๋ฉด? -> ์˜๋ฏธ์žˆ๋Š” context๋ฅผ ์žƒ๊ฒŒ ๋œ๋‹ค.
  • symmetryํ•˜์ง€ ์•Š๋‹ค.
    • ๊ฐ™์€ ๋‹จ์–ด๊ฐ€ ๋‹ค๋ฅธ ์œ„์น˜์— ๋‚˜ํƒ€๋‚œ๋‹ค๋ฉด, ๋‹ค๋ฅด๊ฒŒ ์ฒ˜๋ฆฌ๋œ๋‹ค.

RNN

์ด ํ•œ์žฅ์œผ๋กœ ์„ค๋ช…์ด ๋๋‚˜๋Š” ๋“ฏํ•˜๋‹ค

core idea๊ฐ€ ์ค‘์š”ํ•˜๋‹ค!!

๊ทธ๋Ÿผ RNN์„ ์‚ฌ์šฉํ–ˆ์„ ๋•Œ ์ค‘์š”ํ•œ ์ ๋“ค์€?

  • ์–ด๋–ค ๊ธธ์ด์˜ ํ…์ŠคํŠธ์ด๋˜ ๊ณ„์‚ฐ ๊ฐ€๋Šฅํ•˜๋‹ค
  • ๊ทธ ์ด์ „์˜ ์ •๋ณด๋“ค์„ ํ™œ์šฉํ•  ์ˆ˜ ์žˆ๋‹ค.
  • ๋ชจ๋ธ ์‚ฌ์ด์ฆˆ๊ฐ€ ๊ณ ์ •๋˜์–ด ์žˆ๋‹ค.
  • symmetryํ•˜๊ฒŒ ์ฒ˜๋ฆฌ ๊ฐ€๋Šฅํ•˜๋‹ค.

๊ทผ๋ฐ,

  • ๋Š๋ฆฌ๋‹ค.
  • ๊ทธ ์ด์ „์˜ ์ •๋ณด๋ฅผ ํ™œ์šฉํ•˜๊ธฐ๋Š” ์‚ฌ์‹ค์ƒ ํž˜๋“ค๋‹ค.

Training RNN

ํฐ corpus์•ˆ์—์„œ ๋ฅผ ๊ณ„์† ์—ฐ์‚ฐํ•ด์„œ ํ›ˆ๋ จํ•œ๋‹ค. cross entropy๋ฅผ ์‚ฌ์šฉํ•œ๋‹ค๊ณ  ํ•œ๋‹ค. ๊ทผ๋ฐ ์ด๊ฒŒ ๋„ˆ๋ฌด ์—ฐ์‚ฐ๋Ÿ‰์ด ๋งŽ์•„์„œ, SGD์ฒ˜๋Ÿผ ๋ฏธ๋‹ˆ ๋ฐฐ์น˜๊ฐ™์€ ๊ฐœ๋…์„ ์ฐจ์šฉํ•˜๋Š” ๊ฒƒ ๊ฐ™๋‹ค.

์–ด์ฐŒ๋˜์—ˆ๋“  RNN์„ ํ†ตํ•ด ๋งŒ๋“ค์–ด๋‚ธ ํ…์ŠคํŠธ๋Š” ์ƒ๊ฐ๋ณด๋‹ค ์ž˜ ๋™์ž‘ํ•˜์ง€๋งŒ, ๊ธฐ์–ตํ•˜๋Š” ๋ถ€๋ถ„๊ณผ ๊ด€๋ จํ•ด์„œ๋Š” ์ข€ ๋ชจ์ž๋ผ๋‹ค. ์ž์„ธํ•œ ๊ฒƒ์€ medium ๊ธ€์„ ์ฐธ๊ณ ํ•ด๋ณด์ž.

Evaluating

perplexity๋ฅผ ๊ธฐ์ค€์œผ๋กœ ํ‰๊ฐ€ํ•œ๋‹ค. ๊ฐ’์€ ๋‚ฎ์€ ๊ฒƒ์ด ์ข‹๋‹ค.

April 21, 2019 ์— ์ž‘์„ฑ
Tags: cs224n machine learning nlp