๐Ÿ“ƒ DialoGPT: Large-Scale Generative Pre-training for Conversational Response Generation ๋ฆฌ๋ทฐ

GPT๋ฅผ ๋Œ€ํ™”์ฒด์— ๋งž๋„๋ก ํ•™์Šต์‹œํ‚จ ๋ชจ๋ธ์ด๋‹ค. ๋งˆ์ดํฌ๋กœ์†Œํ”„ํŠธ์—์„œ ๋‚˜์˜จ ๋…ผ๋ฌธ์ด๊ณ , arxiv๋งํฌ๋Š” https://arxiv.org/abs/1911.00536์ด๋‹ค. ์ฝ”๋“œ๋Š” GitHub microsoft/DialoGPT์—์„œ ๋ณผ ์ˆ˜ ์žˆ๋‹ค.

1 Introduction

  • Like GPT-2, DIALOGPT is formulated as an autoregressive (AR) language model, and uses multi-layer transformer as model architecture.

  • Unlike GPT-2, however, DIALOGPT is trained on large-scale dialogue pairs/sessions extracted from Reddit discussion chains.

  • ๋…ผ๋ฌธ ์ €์ž๋“ค์€ DialoGPT๊ฐ€ ๋Œ€ํ™” ํ๋ฆ„์—์„œ์˜ Source, Target์˜ joint distribution์„ ํ•™์Šตํ•  ๊ฒƒ์„ ๊ธฐ๋Œ€ํ–ˆ๋‹ค๊ณ  ํ•œ๋‹ค.
  • DSTC-7์œผ๋กœ ํ‰๊ฐ€ํ–ˆ๊ณ , 6,000๊ฐœ์˜ reddit ํฌ์ŠคํŒ…์—์„œ ํ…Œ์ŠคํŠธ ๋ฐ์ดํ„ฐ์…‹์„ ๋ฝ‘์•˜๋‹ค๊ณ  ํ•œ๋‹ค

2 Dataset

  • Reddit ํฌ์ŠคํŒ…์—์„œ root node -> leaf node๋กœ ๊ฐ€๋Š” path๋ฅผ ์ถ”์ถœํ•ด์„œ instance๋กœ ์‚ฌ์šฉํ•จ
  • ๋‹ค๋งŒ ์•„๋ž˜ ์กฐ๊ฑด์€ ์ œ์™ธ
    1. URL์ด source, target์— ์žˆ์„ ๋•Œ
    2. target์ด 3๊ฐœ ์ด์ƒ์˜ ๋‹จ์–ด ๋ฐ˜๋ณต์ด ์กด์žฌํ•  ๋•Œ
    3. ์ž์ฃผ ๋“ฑ์žฅํ•˜๋Š” top 50 ์˜๋‹จ์–ด(a, the, of)๊ฐ€ ํ•˜๋‚˜๋„ ํฌํ•จ๋˜์–ด ์žˆ์ง€ ์•Š์„ ๋•Œ -> ์™ธ๊ตญ์–ด๋กœ ์ƒ๊ฐํ•จ
    4. special marker๊ฐ€ ์กด์žฌํ•  ๋•Œ [, ]
    5. source, target sequence๊ฐ€ 200๋‹จ์–ด๋ฅผ ๋„˜์„ ๋•Œ
    6. ๊ณต๊ฒฉ์ ์ธ ๋‹จ์–ด๋ฅผ ํฌํ•จํ•  ๋–„
    7. ๋งŽ์ด ๋‹จ์กฐ๋กœ์šด ๋ฌธ์žฅ

3 Method

3.1 Model Architecture

  • GPT-2, 12~24L๋กœ ์„ธํŒ…ํ•จ
  • BPE ์‚ฌ์šฉ
  • SOURCE ๋ฌธ์žฅ์„ ๋‹ค ์ด์–ด๋ถ™์ธ๋‹ค์Œ Target ๋ฌธ์žฅ์„ Generatingํ•˜๋„๋ก ์ž‘์„ฑํ•จ

3.2 Mutual Information Maximization

  • Open domain text generation ๋ชจ๋ธ์€ bland, uninformativeํ•œ ์ƒ˜ํ”Œ์„ ๋งŽ์ด ์ƒ์„ฑํ•จ
  • ๊ทธ๋ž˜์„œ MMI scoring function์„ ์‚ฌ์šฉํ•จ
  • top-K ์ƒ˜ํ”Œ๋ง ํ›„ Rerank
  • ๊ทผ๋ฐ RL ๋ฐฉ์‹์„ ํ™œ์šฉํ•ด์„œ Policy Gradient๋ฅผ ์‚ฌ์šฉํ•ด optimize์‹œ์ผœ๋ณด๋‹ˆ local optima์— ๋„ˆ๋ฌด ์ž˜ ๋น ์ง„๋‹ค
    • ์•„๋งˆ transformer์˜ representation power ๋•Œ๋ฌธ์ธ ๊ฒƒ์œผ๋กœ ์ถ”์ธก
    • future work๋กœ ๋‚จ๊ฒจ๋‘”๋‹ค ํ•จ

4 Result

  • 117M, 345M, 762M ๋ชจ๋ธ๋กœ ํ…Œ์ŠคํŠธํ•จ ์„ธ๋ถ€ ์‚ฌํ•ญ์€ Radford et al. (2018)๊ณผ ๊ฐ™์Œ
  • Azure Cognitive Service์™€ ๋น„๊ตํ•จ
  • Beam Search ์‚ฌ์šฉํ•˜๋ฉด ์„ฑ๋Šฅ์ด ๊ฝค ์˜ฌ๋ผ๊ฐ
    • ๊ทผ๋ฐ grounding information์ด ์—†๋Š”๋ฐ ์–ด๋–ป๊ฒŒ ์ž˜๋˜๋ƒ? -> ์•„๋งˆ pretraining ๋™์•ˆ ์–ป์–ด๋‚ด๋Š” information์ด ๋งŽ์•„์„œ grounding document์—†์–ด๋„ ๊ดœ์ฐฎ์€ ๋“ฏ ํ•˜๋‹ค.

๊ทธ ์™ธ์—” ๊ฐ„๋‹จํ•˜๊ฒŒ ์ฝ์–ด๋ณด๋ฉด ์ข‹์„ ๋“ฏ

์ด๊ฑฐ ์ƒ˜ํ”Œ์€ ๋˜๊ฒŒ ์‹ ๊ธฐํ•˜๋‹ค

์•„๋ž˜ ๊ฒฐ๊ณผ๋Š” ์ง„์งœ ๋†€๋ž๋‹ค. Human Response์— ๋ฒ„๊ธˆ๊ฐ€๋Š” ํ€„๋ฆฌํ‹ฐ๋ฅผ ์ƒ์„ฑํ•ด๋‚ธ๋‹ค. ๋‹ค๋งŒ ์•„์‰ฌ์šด ์ ์€ 345M, 762M ๋ณด์—ฌ์ค„ ๊ฑฐ๋ฉด 117M๋„ ์–ด๋Š์ •๋„์ธ์ง€๋Š” ๋ณด์—ฌ์ฃผ์—ˆ์œผ๋ฉด ์–ด๋• ์„๊นŒ์ด๋‹ค. Table 2, 3์—์„œ ๊ทธ๋ ‡๊ฒŒ ๋‚˜์™€์„œ ์•„๋ž˜์ฒ˜๋Ÿผ ๋น„๊ตํ•œ๊ฑด๊ฐ€??

6 Limitations and risks

์–ด์ฉ” ์ˆ˜ ์—†๋Š” ์ ์ด๊ธด ํ•˜์ง€๋งŒ Generation์ด ์–ด๋Š์ •๋„๋Š” ์œ„ํ—˜ํ•˜๊ธด ํ•˜๋‹ˆ๊นŒ..

Despite our efforts to minimize the amount of overtly offensive data prior to training, DI- ALOGPT retains the potential to generate output that may trigger offense. Output may reflect gen- der and other historical biases implicit in the data. Responses generated using this model may exhibit a propensity to express agreement with proposi- tions that are unethical, biased or offensive (or the reverse, disagreeing with otherwise ethical state- ments).

May 2, 2020 ์— ์ž‘์„ฑ
Tags: paper