๐Ÿ“• CS224n Lecture 3 Neural Network

CS224n ์„ธ๋ฒˆ์งธ ๊ฐ•์˜๋ฅผ ๋“ฃ๊ณ  ์ •๋ฆฌํ•œ ํฌ์ŠคํŠธ!!

Introduction

์•ž์œผ๋กœ ์ง„ํ–‰ํ•  ๊ฐ•์˜:

  • 2์ฃผ์ฐจ: neural network (3, 4๊ฐ•)
  • 3์ฃผ์ฐจ: nlp (ex> dependency parsing) (5, 6๊ฐ•)

HW2(gradient derivation of word2vec, implement word2vec with numpy)๋„ ์žˆ๋‹ค!

Classification Review

classification์— ๋Œ€ํ•œ ๋ฆฌ๋ทฐ. ์ผ๋‹จ, ์ด ์žˆ๋‹ค๊ณ  ๊ฐ€์ •. (training set consisting of samples) ๋Š” input, ๋Š” label์ด๋‹ค.

์—ฌ๊ธฐ์„œ ์ „ํ†ต์ ์ธ ML, ํ†ต๊ณ„ํ•™์˜ ์ ‘๊ทผ๋ฒ•์€ softmax, logistic regression์„ ํ†ตํ•ด decision boundary๋ฅผ ์ •ํ•˜๋Š” ๋ฌธ์ œ๋กœ ๋ณธ๋‹ค. ๊ทธ๋ž˜์„œ ์•„๋ž˜์™€ ๊ฐ™์€ x์— ๋Œ€ํ•œ ์‹์ด ๋งŒ๋“ค์–ด์ง„๋‹ค.

์ด๊ฑฐ๋ฅผ ๋กœ ๋ณด๊ณ  ํ‘œ๊ธฐ๋„ ๊ฐ€๋Šฅํ•˜๋‹ค. ๊ทธ๋ž˜์„œ softmax ์‹์„ ๋ผ ํ‘œ๊ธฐํ•˜๊ธฐ๋„ ํ•œ๋‹ค.

Cross Entropy Loss

์˜ prob์„ maximizeํ•œ๋‹ค. (์•„๋ž˜ ์‹, negative log prob์„ minimizeํ•œ๋‹ค)

cross entropy error๋Š” ๊ฐ€ ์‹ค์ œ ํ™•๋ฅ  ๋ถ„ํฌ์ด๊ณ , ๊ฐ€ ๋ชจ๋ธ์—์„œ ๊ณ„์‚ฐํ•œ ํ™•๋ฅ  ๋ถ„ํฌ์ผ ๋•Œ, ์•„๋ž˜ ์‹๊ณผ ๊ฐ™๋‹ค.

๊ฐ€ one-hot vector์ด๋ฉด (์‹ค์ œ๋กœ ์˜ณ์€ label์€ ๋ณดํ†ต ํ•˜๋‚˜๋ฅผ ์„ ์ •ํ•ด๋†“์œผ๋‹ˆ?), q ํ•˜๋‚˜๋งŒ์„ ๊ณ„์‚ฐํ•œ๋‹ค. ๊ทธ๋ฆฌ๊ณ  ์•„๋ž˜๋Š” cross entropy๋ฅผ ์ „์ฒด ๋ฐ์ดํ„ฐ์…‹์— ๋Œ€ํ•ด ๊ณ„์‚ฐํ•œ ๊ฒฐ๊ณผ์ด๋‹ค.

Neural Net Classifier

softmax๋Š” decision boundary๋งŒ ์ œ๊ณตํ•˜๋Š”๋ฐ, softmax๋งŒ ์‚ฌ์šฉํ•˜๊ธฐ์—๋Š” ํšจ๊ณผ์ ์ด์ง€ ์•Š๋‹ค. ๊ทธ๋ž˜์„œ neural net์„ ๊ฐ™์ด ์“ด๋‹ค. NLP์—์„œ์˜ classification์€ word vector๋ฅผ ํ•™์Šตํ•˜๋ฉด์„œ classification์— ํ•„์š”ํ•œ weight๊นŒ์ง€ ํ•™์Šตํ•œ๋‹ค. (๋ณดํ†ต์€ weight๋งŒ ํ•™์Šต)

์ค‘๊ฐ„์—๋Š” ์•„๋Š” ๋‚ด์šฉ์ด๋ผ ๊ฑด๋„ˆ๋œ€. (์ผ๋ฐ˜์ ์ธ neural net ์„ค๋ช…)

non linearity๋Š” ์›Œ๋‚™ ๋‹ค๋“ค ๊ฐ•์กฐํ•˜๋Š” ๋‚ด์šฉ. ๊ทธ ์ด์œ ? ๊ฒฐ๊ตญ ์‹์„ ๋‹ค ์ „๊ฐœํ•˜๋ฉด ํ•˜๋‚˜์˜ ์ธต์„ ์Œ“์€ ๊ฒƒ์ด ๋˜๋ฏ€๋กœ, non linearity๋ฅผ ๋งŒ๋“ค์–ด์ฃผ์–ด์•ผ ํ•œ๋‹ค.

NER (Named Entity Recognition)

NER์€ ํ…์ŠคํŠธ์—์„œ ํŠน์ •ํ•œ ๋‹จ์–ด๋“ค์„ ์ฐพ๊ณ  ๋ถ„๋ฅ˜ํ•˜๋Š” ์ž‘์—…์ด๋‹ค. ๊ทธ๋ž˜์„œ ํฌ๊ฒŒ ๋‘ ๋‹จ๊ณ„๋กœ ๋‚˜๋ˆŒ ์ˆ˜ ์žˆ๋Š”๋ฐ, ๋‹จ์–ด๋ฅผ ์ฐพ๋Š” ๊ฒƒ์ด 1, ๊ทธ๋ฅผ ๋ถ„๋ฅ˜ํ•˜๋Š” ๊ฒƒ์ด 2์ด๋‹ค. ๊ทผ๋ฐ NER์„ ์ˆ˜ํ–‰ํ•˜๋‹ค๋ณด๋ฉด ๋ฌธ์ œ์ ์ด ์žˆ๋‹ค. ์˜ˆ๋ฅผ ๋“ค์–ด future school์ด๋ผ๋Š” ๋‹จ์–ด๊ฐ€ ์žˆ์„ ๋–„, ํ•™๊ต์˜ ์ด๋ฆ„์ด Future School์ธ์ง€, ์•„๋‹ˆ๋ฉด ์ •๋ง ๋ฏธ๋ž˜์˜ ํ•™๊ต์ธ์ง€๋ฅผ ๋‚˜ํƒ€๋‚ด๋Š”์ง€ ๋ฌธ๋งฅ์„ ๋ชจ๋ฅด๋ฉด ์•Œ ์ˆ˜ ์—†๊ธฐ ๋•Œ๋ฌธ์— ๋„ˆ๋ฌด ๋ชจํ˜ธํ•˜๋‹ค๋Š” ๋ฌธ์ œ์ ์ด ์žˆ๋‹ค. ์ฆ‰, context์— ์˜์กด์ ์ด๋‹ค.

NER

Binary Word Window Classification

context์—์„œ ๋ชจํ˜ธํ•จ์ด ์ƒ๊ธฐ๋‹ˆ, context window์™€ ํ•จ๊ป˜ ๋‹จ์–ด๋ฅผ ๋ถ„๋ฅ˜ํ•˜์ž๋Š” ๊ฒƒ์ด ๋ฉ”์ธ์ด ๋˜๋Š” ์•„์ด๋””์–ด์ด๋‹ค.

word classification

์ด์— ๋Œ€ํ•œ ์ž์„ธํ•œ ๋‚ด์šฉ์€ Collobert & Weston (2008, 2011)๋ฅผ ์ฐพ์•„๋ณด์ž.

April 9, 2019 ์— ์ž‘์„ฑ
Tags: cs224n machine learning nlp