๐Ÿ“• CS224n Lecture 9 Practical Tips for Final Projects

์ด๋ฒˆ ๊ฐ•์˜๋Š” ์œ ํŠœ๋ธŒ๋กœ ์—…๋กœ๋“œ๋˜๋Š” ๊ฐ•์˜๋Š” ์•„๋‹ˆ๊ณ , ์Šฌ๋ผ์ด๋“œ์™€ ๋…ธํŠธ๋งŒ ์˜ฌ๋ผ์™”๋‹ค. ํ•˜์ง€๋งŒ ํ•œ๋ฒˆ ์ถฉ๋ถ„ํžˆ ํ›‘์–ด๋ณผ๋งŒํ•œ ๊ฒƒ ๊ฐ™์•„์„œ ์ •๋ฆฌ๋ฅผ ํ•ด๋ณธ๋‹ค.

final projects

์Šคํ„ฐ๋”” ํ•˜๋Š” ์‚ฌ๋žŒ๋“ค๋ผ๋ฆฌ final project๋„ ๊ธฐ๋ณธ์œผ๋กœ ์ฃผ์–ด์ง€๋Š” ํ”„๋กœ์ ํŠธ๋ฅผ ๊ตฌํ˜„ํ•ด๋ณด๊ธฐ๋กœ ํ–ˆ๊ธฐ ๋•Œ๋ฌธ์— ์ด๋ฒˆ ๊ฐ•์˜ ์Šฌ๋ผ์ด๋“œ๋ฅผ ์‚ดํŽด๋ณด๊ธฐ๋กœ ํ–ˆ๋‹ค. SQuAD question answering์ด ๊ธฐ๋ณธ final project์ด๋‹ค. 1 ~ 3๋ช…์˜ ์ธ์›์œผ๋กœ ํ•œ๋‹ค๊ณ  ํ•œ๋‹ค. ์–ธ์–ด๋‚˜ ํ”„๋ ˆ์ž„์›Œํฌ์˜ ์ œํ•œ์€ ์—†๋‹ค๊ณ . ๊ทธ๋ฆฌ๊ณ  ๊ธฐ๋ณธ์ ์œผ๋กœ ์ฃผ์–ด์ง€๋Š” starter code๊ฐ€ pytorch๋ผ๊ณ  ํ•œ๋‹ค.

research topic

์ผ๋‹จ final project์˜ ์ฃผ์ œ๋Š” ๊ธฐ๋ณธ์ ์œผ๋กœ SQuAD์ด์ง€๋งŒ, project type (topic)๋„ ์ •ํ•ด์•ผ ํ•œ๋‹ค๊ณ  ํ•œ๋‹ค.

  1. ๋ชจ๋ธ์˜ application์„ ์ฐพ์•„๋ณด๊ณ  ์–ด๋–ป๊ฒŒ ํšจ์œจ์ ์œผ๋กœ ์ ์šฉํ•  ์ง€ ์ฐพ๊ฑฐ๋‚˜
  2. complex neural architecture๋ฅผ ๊ตฌํ˜„ํ•ด๋ณด๊ณ  ํŠน์ • ๋ฐ์ดํ„ฐ์— ๋Œ€ํ•œ performance๋ฅผ ์ธก์ •ํ•ด๋ณด๊ฑฐ๋‚˜
  3. new, variant NN model์„ ๊ตฌํ˜„ํ•ด์„œ ์‹คํ—˜์ ์ธ ๋ฐ์ดํ„ฐ๋กœ ํ–ฅ์ƒ์„ ๋ณด์—ฌ์ฃผ๊ฑฐ๋‚˜
  4. ๋ชจ๋ธ์˜ ๋™์ž‘๋ฒ•์„ ๋ถ„์„ํ•˜๊ฑฐ๋‚˜
  5. rare theoretical project, ๊ทธ๋ƒฅ ๊ฐœ์ฉŒ๋Š” ๊ฑธ ๊ฐ€์ ธ์˜ค๊ฑฐ๋‚˜..?

๋Œ€์ถฉ ์œ„์˜ ๋‹ค์„ฏ๊ฐ€์ง€๋ฅผ ๋ณด์—ฌ์ฃผ์—ˆ๋‹ค.

finding data

๋ฐ์ดํ„ฐ๋Š” ์•Œ์•„์„œ ๋งŒ๋“ค์–ด์„œ, ํšŒ์‚ฌ์—์„œ ์“ฐ๋Š”๊ฑธ ์ƒ˜ํ”Œ๋งŒ ๋“ค๊ณ ์™€์„œ ์“ธ ์ˆ˜๋„ ์žˆ์ง€๋งŒ, ์ด๋ฏธ ์ž˜ ์„ ๋ณ„๋œ ๋ฐ์ดํ„ฐ์…‹์„ ์“ฐ๋Š” ๊ฒƒ๋„ ์ข‹๋‹ค. ์•„๋ž˜์˜ ์‚ฌ์ดํŠธ๋“ค์„ ์ฐธ๊ณ ํ•ด๋ณด์ž.

review of gated neural sequence models

์ด ๋ถ€๋ถ„์€ ์—ญ์‹œ ๊ฐ•์˜๊ฐ€ ์—†์œผ๋‹ˆ ์ดํ•ดํ•˜๊ธฐ ํž˜๋“ค์–ด์„œ ๋ˆˆ์— ๋ณด์ธ ๊ฒƒ๋งŒ ์ •๋ฆฌํ•˜์ž๋ฉด

  • ์ง๊ด€์ ์œผ๋กœ RNN์—์„œ ๋ฌด์Šจ ์ผ์ด ์ผ์–ด๋‚˜๋Š”์ง€ ์ดํ•ดํ•˜๊ณ  ์‚ฌ์šฉํ•˜์ž (์ด๊ฑฐ ์ž˜ ๋ชจ๋ฅด๊ฒ ๋‹ค)
  • vanishing gradient ๋ฌธ์ œ๋ฅผ ์กฐ์‹ฌํ•˜์ž
  • naive transition function (tanh๊ฐ™์€)์„ ์“ฐ๋Š” ๊ฒƒ์ด ๋ฌธ์ œ๊ฐ€ ๋ ๊นŒ..? -> ๋‚ด ์ƒ๊ฐ์—๋Š” ๋  ๊ฒƒ ๊ฐ™์ง€๋งŒ ์•„์ง ํ™•์‹คํ•˜์ง€ ์•Š๋‹ค.
  • backprop์€ ์ž˜ ์•ˆ๋ ์ˆ˜๋„ ์žˆ๋‹ค.
    • shortcut connection์„ ๋งŒ๋“ค ์ˆ˜ ์žˆ๋‹ค.
    • adaptive shortcut connection๋„ ๋งŒ๋“ค ์ˆ˜ ์žˆ๋‹ค. -> ์„ ํƒ์ ์œผ๋กœ unnecessary connection๋“ค์€ ์—†์•ค๋‹ค.

a couple of MT topics

์ž ๊ทธ๋ž˜์„œ ์ข€ ๋ฌธ์ œ๋กœ ์—ฌ๊ฒจ์ง€๋Š” topic๋“ค์€?

  • softmax computation์ด ๋„ˆ๋ฌด cost๊ฐ€ ํฌ๋‹ค.
  • word generation problem -> ์ด๊ฑฐ ์Šฌ๋ผ์ด๋“œ ์ดํ•ด๋ฅผ ๋ชปํ•˜๊ณ˜๋‹ค ใ… ใ… ใ… 

๊ทธ๋ž˜์„œ Hierarchical softmax ์‚ฌ์šฉ๊ฐ€๋Šฅํ•˜๊ณ  noise contrastive estimation๋„ ์‚ฌ์šฉ์ด ๊ฐ€๋Šฅํ•˜๋‹ค. attention๋„ ์‚ฌ์šฉ์ด ๊ฐ€๋Šฅํ•˜๊ณ  ์ผ๋‹จ ํ‚ค์›Œ๋“œ๋งŒ ์ ์—ˆ๋‹ค.

๊ทธ๋Ÿผ Evaluation์€ ์–ด๋–ป๊ฒŒ ํ•  ์ˆ˜ ์žˆ์„๊นŒ?

automatic metric์„ ์‚ฌ์šฉํ•  ์ˆ˜๋„ ์žˆ๋Š”๋ฐ, BLEU๋‚˜, TER, METEOR๊ฐ™์€ ๊ฒƒ๋“ค์„ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ๋‹ค. ์‹ค์ œ๋กœ ์‚ฌ์šฉํ•  ๋•Œ ์—ฌ๊ธฐ ๋‹ค์‹œ ๋ด์•ผ๊ฒ ๋‹ค.

doing your research

ํ•œ๋ฒˆ ์—์‹œ๋ฅผ ๋“ค์–ด๋ณด์ž๋ฉด,

  1. summarization์ด๋ผ๋Š” task๋ฅผ ์ •ํ•˜์ž
  2. dataset์„ ์ •ํ•˜์ž
    1. search for academic datasets
      1. newsroom summarization dataset์„ ์‚ฌ์šฉํ•œ๋‹ค!
    2. define your own data (harder one)
      1. ํŠธ์œ—, ๋ธ”๋กœ๊ทธ ํฌ์ŠคํŠธ, ๋‰ด์Šค๋“ค๋„ ๋ฐ์ดํ„ฐ์…‹์ด ๋  ์ˆ˜ ์žˆ๋‹ค.
  3. dataset hygiene
    1. ์‹œ์ž‘ํ•  ๋•Œ ๋ฐ”๋กœ ํ…Œ์ŠคํŠธ์…‹๊ฐ™์€ ๊ฒƒ์€ ๋ถ„๋ฆฌํ•ด๋‘์ž
  4. metric๋„ ์ •ํ•˜์ž!
    1. summarization์€ rouge๊ฐ™์€ ๊ฒƒ๋„ ์“ธ ์ˆ˜ ์žˆ๋‹ค.
  5. baseline์„ ์ •ํ•˜์ž
    1. ๋„ˆ๋ฌด ์ž˜๋‚˜์˜ค๋ฉด ๋ฌธ์ œ๊ฐ€ ๋„ˆ๋ฌด ์‰ฌ์› ๋˜ ๊ฑฐ๋‹ค. ๋‹ค์‹œํ•˜์ž
  6. Always be close to your data!
    1. visualize the dataset
    2. collect summary datset
    3. look at erros
    4. analyze how different hyperparameters affect performance
  7. ๋‹ค๋ฅธ ์‹œ๋„๋„ ๋งŽ์ด ํ•ด๋ณด์ž

์ด๋Ÿฐ ๋‚ด์šฉ๋„ ๊ฐ™์ด ์žˆ๋‹ค.

  • overfit๋„ ์กฐ์‹ฌํ•ด๋ผ
  • validation์ด๋ž‘ test set์„ ๋”ฐ๋กœ ๋‘๊ณ  ์ž˜ ์‚ดํŽด๋ณด์•„๋ผ
  • training/tuning/dev/test set๊ฐ™์€ ๊ฒƒ์„ ์ž˜ ๊ตฌ๋ถ„ํ•ด๋ผ

RNN์„ ํ•™์Šตํ•  ๋• ์•„๋ž˜์™€ ๊ฐ™์€ ๋‚ด์šฉ๋„ ์‚ดํŽด๋ณด์ž

  1. LSTM์ด๋‚˜ GRU๋ฅผ ์จ๋ณด์ž
  2. orthogonalํ•˜๊ฒŒ recurrent matrices๋ฅผ ์ดˆ๊ธฐํ™”ํ•˜์ž
  3. ๋‹ค๋ฅธ matrices๋“ค์„ sensible scale๋กœ ๋งŒ๋“ค์ž
  4. forget gate bias๋ฅผ 1๋กœ ๋‘์ž (default to remembering์ด๋‹ค)
  5. adaptive learning rate๋ฅผ ์‚ฌ์šฉํ•˜์ž
  6. clip the norm of the gradient. (1 ~ 5๊ฐ€ ์ ๋‹นํ•œ threshold์ด๋‹ค)
  7. dropout์„ verticallyํ•˜๊ฒŒ ์ ์šฉ์‹œํ‚ค๊ฑฐ๋‚˜ Bayesian Dropout์„ ์‚ฌ์šฉํ•˜์ž
  8. ํ•™์Šต์€ ์ข€ ๊ธฐ๋‹ค๋ฆฌ์ž
May 26, 2019 ์— ์ž‘์„ฑ
Tags: cs224n machine learning nlp