๐Ÿ“• CS224n Lecture 18 Constituency Parsing and Tree Recursive Neural Networks

18๊ฐ•์ด๊ณ  ๊ฐ•์˜ ์ „์ฒด๊ฐ€ ๋‹ค ๋๋‚˜๊ธฐ๊นŒ์ง€ ์ด ๊ฐ•์˜๋ฅผ ์ œ์™ธํ•˜๊ณ  2๊ฐ•์ •๋„๋งŒ ๋‚จ์•˜๋‹ค.

Last Minute Project Tips

์ด๊ฑด ํŒŒ์ด๋„ ํ”„๋กœ์ ํŠธ ํŒ์ธ๋ฐ, ๋Š๋ฆฌ๊ณ  ์•„๋ฌด๊ฒƒ๋„ ๋™์ž‘์•ˆํ•˜๋ฉด ์ฒ˜์Œ์œผ๋กœ ๊ทธ๋ƒฅ ๋ง˜ํŽธํžˆ ๋Œ์•„๊ฐ€์„œ ๋‹ค์‹œ ํ•ด๋ณด๋Š” ๊ฒƒ์ด ์ข‹๋‹ค๊ณ . ํŒจ๋‹‰์ธ ์ƒํ™ฉ์ผํ…Œ๋‹ˆ๊นŒ. ๋””๋ฒ„๊น…์„ ์œ„ํ•ด ๋งค์šฐ ์ž‘์€ ๋„คํŠธ์›Œํฌ๋ž‘ ๋ฐ์ดํ„ฐ๋ฅผ ๋„ฃ์–ด๋ณด๊ธฐ๋„ ํ•˜๊ณ , ์ž˜ ๋™์ž‘ํ•˜๋ฉด ๋ชจ๋ธ ์‚ฌ์ด์ฆˆ๋ฅผ ํ‚ค์›Œ๋ณด๊ธฐ๋„ ํ•˜๋ผ๊ณ  ํ•œ๋‹ค.

Motivation: Compositionality and Recursion

  • semantic ์ •๋ณด๋ฅผ ์šฐ๋ฆฌ๊ฐ€ ์ž˜ ๊ฐ€์ ธ๊ฐˆ ์ˆ˜ ์žˆ์„๊นŒ?
  • word embedding๋งŒ์œผ๋กœ ์ถฉ๋ถ„ํ• ๊นŒ?
  • ๋งŒ์•ฝ ์ž˜ ๊ฐ€์ ธ๊ฐˆ ์ˆ˜ ์žˆ๋‹ค๊ณ  ํ•ด๋„, ์ •๋ง ํฐ phrase์— ๋Œ€ํ•ด์„œ๋Š”?

์œ„์˜ ๋ฌธ์ œ ๋•Œ๋ฌธ์— compositionality๋ฅผ ์ƒ๊ฐํ•˜๊ฒŒ ๋˜์—ˆ๊ณ , ๋‹จ์–ด, ๊ตฌ๋“ค์˜ semantic composition์„ ๊ตฌ์„ฑํ•˜์ž๋Š” ์•„์ด๋””์–ด๊ฐ€ ๋‚˜์™”๋‹ค.

์•ฝ๊ฐ„ ์š”๋Ÿฐ composition??

๊ทธ๋Ÿผ tree ํ˜•ํƒœ๋กœ ๊ตฌ์„ฑํ•˜๊ธฐ ์œ„ํ•ด์„œ recursiveํ•œ์ง€ ๋ฌผ์–ด๋ณธ๋‹ค๋ฉด ๋…ผ์Ÿ์ด ์žˆ์„ ์ˆ˜ ์žˆ๊ฒ ์ง€๋งŒ, language ์ž์ฒด๋ฅผ ํ‘œํ˜„ํ•˜๋Š”๋ฐ ์—„์ฒญ ์ž์—ฐ์Šค๋Ÿฝ๋‹ค๊ณ .

์•”ํŠผ ์ž์—ฐ์Šค๋Ÿฌ์›€

Structure prediction with simple Tree RNN: Parsing

vector space์— word vector๋ฅผ ๋ฟŒ๋ ค๋†“๋Š”๋ฐ, ๊ทธ๋Ÿผ ๊ตฌ๋“ค์€ ์–ด๋–ป๊ฒŒ ํ•ด์•ผํ•˜๋‚˜?? -> ๊ทธ๋ƒฅ ๋ฐ”๋กœ ๊ฐ™์€ vector space์— ๋„ฃ์–ด๋ฒ„๋ฆฌ์ž. ๊ทธ๋Ÿผ ์–ด๋–ป๊ฒŒ phrase์˜ embedding์„ ๊ฒฐ์ •ํ• ๊นŒ??

  1. ๋‹จ์–ด์˜ ์˜๋ฏธ๋“ค๊ณผ
  2. ๊ทธ๋“ค์„ combineํ•˜๋Š” ๊ทœ์น™์„ ๋งŒ๋“ค์–ด์„œ!
์ด๋ ‡๊ฒŒ ์ด๋ ‡๊ฒŒ ์ž˜ ๋ฝ€๊นŒ๋ฝ€๊นŒ

์œ„์˜ ๊ฒฝ์šฐ๋Š” recursiveํ•œ ๊ฒฝ์šฐ์ด๊ณ , ์ผ๋ฐ˜์ ์ธ rnn์„ ํ†ตํ•ด์„œ๋„ ๊ฒฐ๊ณผ๋Š” ๋‹ค๋ฅด๊ฒ ์ง€๋งŒ ๊ตฌํ•ด๋ณผ ์ˆ˜ ์žˆ๋‹ค. ํ•˜์ง€๋งŒ recursiveํ•œ ๊ตฌ์กฐ๋Š” tree structure๋ฅผ ๊ตฌ์„ฑํ•ด์ฃผ์–ด์•ผ ํ•œ๋‹ค๋Š” ์–ด๋ ค์›€์ด ๋”ฐ๋ฅด๊ณ , rnn์„ ํ†ตํ•œ ๊ตฌ์กฐ๋Š” phrase๋ฅผ prefix context์—†์ด ์žก์•„๋‚ด์ง€๋„ ๋ชปํ•˜๊ณ , ๋งˆ์ง€๋ง‰ ๊ฒฐ๊ณผ vector๊ฐ€ ๋งˆ์ง€๋ง‰ ๋‹จ์–ด์— ์ขŒ์šฐ๋œ๋‹ค๋Š” ๋‹จ์ ์ด ์žˆ๋‹ค.

๊ทธ๋Ÿผ NN์— ์–ด๋–ป๊ฒŒ structure prediction์„ ํ•  ์ˆ˜ ์žˆ์„๊นŒ? ์ด๋ ‡๊ฒŒ children์„ ๋„ฃ๊ณ  semantic representation์„ ๋ฝ‘์•„๋‚ด๊ณ , ์–ผ๋งˆ๋‚˜ plausibleํ•œ์ง€(์–ผ๋งˆ๋‚˜ Network๊ฐ€ ์ด ๋…ธ๋“œ๋ฅผ ํ™•์‹ ํ•˜๋Š”์ง€ ์ •๋„..?)๋„ ๋ฝ‘์•„๋‚ธ๋‹ค. ๊ทธ๋Ÿฐ NN์„ ๋ชจ๋“  children ๋Œ€์ƒ์œผ๋กœ ์ญ‰ ๋Œ๋ฆฐ๋‹ค์Œ์— ๊ทธ ์œ„์˜ ์ธต์—์„œ๋„ ๋˜ ๋Œ๋ฆฌ๊ณ , ๋˜ ๋Œ๋ฆฌ๊ณ  ํ•œ๋‹ค๊ณ  ํ•˜๋Š”๋ฐ, ๋‚ด๊ฐ€ ๋‹ค์‹œ ์ด ๊ธ€์„ ๋ด๋„ ์ดํ•ด ๋ชปํ•  ๋“ฏ ํ•˜๋‹ˆ ์•„๋ž˜ ์‚ฌ์ง„์„ ๋ณด์ž.

์ด๋ ‡๊ฒŒ ์ด๋ ‡๊ฒŒ ์ž˜
์ด๊ฑธ ์ž˜ children์„ ๋„ฃ์–ด์„œ
์ž˜ ์š”๋ ‡๊ฒŒ ์š”๋ ‡๊ฒŒ

Backpropagation through Structure

general backprop๊ณผ ํฌ๊ฒŒ ๋‹ค๋ฅผ ๊ฒƒ์€ ์—†๋‹ค. ์›์น™์ ์œผ๋กœ๋Š” ๊ฐ™๋‹ค. Goller & Kuฬˆchler (1996)์— ์˜ํ•ด ์†Œ๊ฐœ๋˜์—ˆ๋‹ค.

forward prop์ผ ๋•Œ, children \(c_1\), \(c_2\)๊ฐ€ ์žˆ๋‹ค๊ณ  ํ•˜์ž. ๊ฑฐ๊ธฐ์„œ parent \(p\)๋ฅผ ๋ฝ‘์•„๋‚ด๊ธฐ ์œ„ํ•ด \(p = \tanh (W \pmatrix {c_1 \\ c_2} + b)\)์™€ ๊ฐ™์€ ์—ฐ์‚ฐ์„ ํ•œ๋‹ค. ๊ทธ๋Ÿผ backward prop์€ ๋ฐ˜๋Œ€๋กœ ๊ฐ๊ฐ์˜ ๋…ธ๋“œ๋ณ„๋กœ ์—ฐ์‚ฐ์„ ํ•ด์ค€๋‹ค. ์ด๋Ÿฐ Simple TreeRNN์— ๋Œ€ํ•ด์„œ ์•„๋ž˜์™€ ๊ฐ™์€ ์ ์„ ์ƒ๊ฐํ•ด๋ณผ ์ˆ˜ ์žˆ๋‹ค.

  • ์ƒ๊ฐ๋ณด๋‹ค ๊ฒฐ๊ณผ๋Š” ๋‚˜์˜์ง€ ์•Š๊ณ 
  • ์ข€ phrase๋ฅผ ์žก์•„๋‚ผ ์ˆ˜๋Š” ์žˆ๋Š”๋ฐ, more higher order composition, more complexํ•œ long sentence๋ฅผ ํŒŒ์‹ฑํ•˜๋Š”๋ฐ ์–ด๋ ค์›€์ด ์žˆ๋‹ค.
  • input word๋“ค ์‚ฌ์ด์˜ interaction์€ ํ•˜๋‚˜๋„ ์—†๋‹ค.
  • composition function์ด ๋‹ค ๋˜‘๊ฐ™๋‹ค

More complex TreeRNN units

๋‘๋ฒˆ์งธ๋กœ Syntactically-United RNN์ด ์žˆ๋Š”๋ฐ, Socher, Bauer, Manning, Ng 2013์„ ์‚ดํŽด๋ณด์ž.

๊ธฐ๋ณธ์ ์ธ syntactic structure๋ฅผ ์œ„ํ•ด์„œ๋Š” symbolic Context-Free Grammar (CFG) backbone์ด ์ข‹์•˜๋‹ค๊ณ  ํ•œ๋‹ค. ๊ทธ๋ฆฌ๊ณ  discrete syntactic category๋ฅผ ์‚ฌ์šฉํ–ˆ๋‹ค๊ณ  ํ•˜๊ณ , ๋‹ค๋ฅธ syntactic environment์—์„œ๋Š” ๋‹ค๋ฅธ composition matrix๋ฅผ ์‚ฌ์šฉํ•˜๊ฒŒ ํ•ด์ฃผ๋Š” ๊ฒƒ์ด ํ›จ์”ฌ ์ข‹์€ ๊ฒฐ๊ณผ๋ฅผ ๋‚ด์—ˆ๋‹ค๊ณ  ํ•œ๋‹ค. ๊ทธ๋ฆฌ๊ณ  ๋” ์ข‹์€ semantic์˜ ๊ฒฐ๊ณผ๋ฅผ ๋ณผ ์ˆ˜ ์žˆ์—ˆ๋‹ค๊ณ .

์•„์ง ์œ„์˜ ๋…ผ๋ฌธ์„ ์ฝ์–ด๋ณด์ง€ ์•Š์•„์„œ ์ž์„ธํ•œ ๋‚ด์šฉ์„ ๋ชจ๋ฅด์ง€๋งŒ, ๊ฐ•์˜์˜ ๋‚ด์šฉ๋Œ€๋กœ ์จ๋ณด์ž๋ฉด, Compositional Vector Grammar๋ฅผ ์ด์šฉํ•˜๋Š” ๊ฒƒ์˜ ๋‹จ์ ์ด ์ผ๋‹จ ๋ฌธ์ œ์ ์ด ์†๋„๊ฐ€ ๋Š๋ฆฌ๋‹ค๊ณ  ํ•œ๋‹ค. Beam Search ๊ณผ์ •์—์„œ matrix-vector product๋ฅผ ํ•˜๋‹ˆ ์—ฐ์‚ฐ๋Ÿ‰์ด ๋งŽ์•„์งˆ ์ˆ˜ ๋ฐ–์— ์—†๋‹ค๊ณ . ๊ทธ๋ž˜์„œ subset of tree๋ฅผ ๋จผ์ € ๊ณ„์‚ฐํ•ด๋ณด๊ณ  very unlikely candidates๋ฅผ ๋ฏธ๋ฆฌ ์—†์• ๋Š” ๋ฐฉ๋ฒ•(PCFG)์„ ์‚ฌ์šฉํ–ˆ๋‹ค๊ณ  ํ•œ๋‹ค. ๊ทธ๋ž˜์„œ CVG = PCFG + TreeRNN

๋‚˜์ค‘์„ ์œ„ํ•ด ์ €์žฅ! (CVG = Compositional Vector Grammer์ธ๋“ฏ..?)

์„ธ๋ฒˆ์งธ๋กœ โ€œSemantic Compositionality through Recursive Matrix-Vector Spacesโ€๋ผ๋Š” ์ œ๋ชฉ์˜ ๋…ผ๋ฌธ์„ ํ†ตํ•ด ์†Œ๊ฐœ๋œ ๋ชจ๋ธ๋„ ์†Œ๊ฐœํ•œ๋‹ค. ์›๋ž˜๋Š” weight matrix \(W\)๋ฅผ ๊ต‰์žฅํžˆ ์ ๊ทน์ ์œผ๋กœ ์ด์šฉํ–ˆ๋Š”๋ฐ, word๋Š” ๋Œ€๋ถ€๋ถ„ operator์ฒ˜๋Ÿผ ํ–‰๋™ํ•˜๋ฏ€๋กœ, (very good์˜ very ์ฒ˜๋Ÿผ) composition function์œผ๋กœ ์ฒ˜๋ฆฌํ•˜์ž๋Š” ๊ฒƒ์ด๋‹ค.

Compositionality Through Recursive Matrix-Vector Recursive Neural Networks (matrix meaning)

matrix meaning๊ณผ vector meaning์œผ๋กœ ๋‚˜๋ˆ„์–ด ์ดํ•ดํ•˜์ž๋Š” ๊ฒƒ์ธ๋ฐ, \(A\)๊ฐ€ matrix meaning, \(a\)๊ฐ€ vector meaning์ด๋‹ค. ์ด๊ฑธ ๊ฒฐํ•ฉ๋  ๋‹จ์–ด๋ž‘ ์—ฐ์‚ฐ์„ ํ•œ๋ฒˆ ํ•œ ๋‹ค์Œ์— composition function์œผ๋กœ ๋„˜๊ธด๋‹ค. ์™ผ์ชฝ์€ vector meaning of phrase๋ฅผ ๊ณ„์‚ฐํ•˜๋Š” ๊ฒƒ์ด๊ณ  ์˜ค๋ฅธ์ชฝ์€ matrix meaning of phrase๋ฅผ ๊ณ„์‚ฐํ•˜๋Š” ๊ฒƒ์ด๋‹ค.

MV RNN์˜ semantic relationship ๋ถ„๋ฅ˜ ๊ฒฐ๊ณผ

์ „ํ†ต์ ์ธ ๋ฐฉ๋ฒ•์— ๋น„ํ•ด ์—„์ฒญ๋‚œ ํ–ฅ์ƒ์„ ์ด๋ฃจ์—ˆ๊ณ , ์ผ๋ฐ˜์ ์ธ RNN์— ๋น„ํ•ด์„œ๋„ ์ข‹์€ ์„ฑ๋Šฅ์„ ๋ณด์ธ๋‹ค. feature๋ฅผ ๋” ๋„ฃ๊ฒŒ ๋œ๋‹ค๋ฉด ๋” ์ข‹์€ ์„ฑ๋Šฅ์„ ๊ธฐ๋Œ€ํ•  ์ˆ˜ ์žˆ์„ ๊ฒƒ์ด๋ผ๊ณ  ๋งํ–ˆ๋‹ค.

ํ•˜์ง€๋งŒ ์—ฌ๊ธฐ์„œ ๋ฉˆ์ถ”์ง€ ์•Š๊ณ  ๋ฌธ์ œ์ ์„ ์งš์–ด๋ณด์ž๋ฉด,

  1. matrix์™€ vector๋ฅผ ๊ฐ™์ด ์ด์šฉํ•˜๊ฒŒ ๋˜๋ฉด์„œ ์—„์ฒญ๋‚œ ์ˆ˜์˜ ํŒŒ๋ผ๋ฏธํ„ฐ๊ฐ€ ํ•„์š”ํ–ˆ๊ณ 
  2. ๊ฝค๋‚˜ ํฐ phrase์— ๋Œ€ํ•ด์„œ matrix meaining์„ ์žก์•„๋‚ผ ์ข‹์€ ๋ฐฉ๋ฒ•์ด ์—†์—ˆ๋‹ค.

๊ทธ๋ž˜์„œ ๋„ค๋ฒˆ์งธ๋กœ ๊ทธ ๋‹ค์Œํ•ด์— โ€œRecursive Deep Models for Semantic Compositionality Over a Sentiment Treebankโ€๋ผ๋Š” ๋…ผ๋ฌธ์œผ๋กœ MV RNN๋ณด๋‹ค parameter์ˆ˜๋ฅผ ์ค„์ธ ๋ชจ๋ธ์„ ๊ณต๊ฐœํ–ˆ๋‹ค๊ณ  ํ•œ๋‹ค. Recursion Neural Tensor Network๋ผ RNTN์ด๋ผ๊ณ  ๋ถ€๋ฅด๋Š” ๊ฒƒ ๊ฐ™๋‹ค.

๊ทธ ์ „์— sentiment detection์— ๋Œ€ํ•ด ํ•œ๋ฒˆ ์งš๊ณ  ๋„˜์–ด๊ฐ€์ž๋ฉด, ์ผ๋ฐ˜์ ์œผ๋กœ sentiment analysis๋Š” ๋‹จ์–ด๋“ค์„ ์ž˜ ๋ฝ‘์•„๋‚ด๋Š” ๊ฒƒ์œผ๋กœ ์ž˜ ๋™์ž‘ํ•œ๋‹ค๊ณ  ํ•œ๋‹ค. ๊ธด document์— ๋Œ€ํ•ด์„œ๋„ detection accuracy๋Š” 90% ์ˆ˜์ค€์œผ๋กœ ๋†’๋‹ค. ํ•˜์ง€๋งŒ, the movie should have been funnier and more entertaining๊ฐ™์€ ๋ฌธ์žฅ์€ negative meaning์ด์ง€๋งŒ, funnier, entertaining๊ณผ ๊ฐ™์€ ๋‹จ์–ด๋“ค๋งŒ ๋ณธ๋‹ค๋ฉด positiveํ•˜๊ฒŒ ๋ถ„๋ฅ˜ํ•  ๊ฐ€๋Šฅ์„ฑ์ด ์žˆ๋‹ค.

์œ„์™€ ๊ฐ™์€ ๋ฌธ์ œ๋Š” MV-RNN์—์„œ๋„ ์ถฉ๋ถ„ํžˆ ์ž˜ ๋™์ž‘ํ–ˆ๊ณ , treebank์ฒ˜๋Ÿผ ์ข‹์€ ๋ฐ์ดํ„ฐ์…‹์„ ๋„ฃ์–ด์ค„ ๊ฒฝ์šฐ ๋” ๋†’์€ ์„ฑ๋Šฅ์„ ๋ณด์˜€์ง€๋งŒ, hard negation๊ณผ ๊ฐ™์€ ๋ฌธ์žฅ๋“ค์€ ์—ฌ์ „ํžˆ ๋ถ€์ •ํ™•ํ–ˆ๊ณ , ๋” ์ข‹์€ ๋ชจ๋ธ์ด ํ•„์š”ํ–ˆ๋‹ค.

attention์„ ๋ฝ‘์„ ๋•Œ 1 x n, n x n, n x 1์˜ vector ๋‘๊ฐœ (์ฃผ๋กœ ๋‹จ์–ด๋“ค์˜ vector)์™€ matrix ํ•˜๋‚˜๋ฅผ ์ค‘๊ฐ„์— ๋ผ์›Œ์„œ ์“ฐ๋Š”๋ฐ, ์ด๊ฑธ ๋‹ค๋ฅด๊ฒŒ ์ƒ๊ฐํ•ด์„œ matrix๋ง๊ณ  3-dim tensor๋ฅผ ๋ผ์›Œ์„œ ์“ฐ์ž๋Š” ๊ฒƒ์ด๋‹ค. ๊ทธ๋ž˜์„œ ๋งŽ์€ ์ •๋ณด๋ฅผ ์ž˜ interactํ•ด์„œ ์žก์•„๋‚ด๋„๋ก ํ•˜์ž๋Š” ๊ฒƒ์ด๋‹ค.

์ด๋ ‡๊ฒŒ?? ๊ทผ๋ฐ ์ œ๋Œ€๋กœ ์ดํ•ด ๋ชป ํ–ˆ๋‹ค.

sentence labels ๋ฐ์ดํ„ฐ์…‹์—์„œ๋Š” ์ž˜ ์„ฑ๋Šฅ์ด ์•ˆ๋‚˜์™”์ง€๋งŒ, treebank์—์„œ ๋‚˜๋จธ์ง€๋“ค๋ณด๋‹ค ์ข‹์€ ์„ฑ๋Šฅ์œผ๋กœ ์ž˜ ๋‚˜์™”๊ณ , negation์ด ํŠนํžˆ ์ž˜ ๋˜์—ˆ๋‹ค.

Other uses of tree-recursive neural nets

๋‹ค๋ฅธ ๊ฒƒ์— ๋Œ€ํ•œ ์†Œ๊ฐœ๋กœ QCD-Aware Recursive Neural Networks for Jet Physics์— ๋Œ€ํ•œ ์†Œ๊ฐœ๋„ ์žˆ๋Š”๋ฐ, ์ด๊ฑด ๋ฌผ๋ฆฌ์ชฝ ๋ถ„์•ผ๋กœ ์ ์šฉํ•œ ๊ฒƒ์ด๋ผ ๊ฐ„๋‹จํ•˜๊ฒŒ ์Šคํ‚ต

๊ทธ ๋‹ค์Œ ๊ต‰์žฅํžˆ ์žฌ๋ฐŒ์–ด๋ณด์ด๋Š” ๊ฒƒ์€ Tree-to-tree Neural Networks for Program Translation์œผ๋กœ CoffeeScript๋ฅผ JavaScript๋กœ ๋ณ€ํ™˜ํ•˜๋Š” ๋ชจ๋ธ์„ ์ž‘์„ฑํ•œ ๊ฒƒ์ด๋‹ค.

July 14, 2019
Tags: cs224n