Analyzing Multi-Head Self-Attention: Specialized Heads Do the Heavy Lifting, the Rest Can Be Pruned 리뷰

May 18, 2020

이 논문 역시 MHA를 Pruning 하는 논문이다. English-Russian WMT dataset에서 48 encoder heads중 38개를 pruning해도 0.15 BLEU drop만 있었다고 한다. 코드는 GitHub - lena-voita/the-story-of-heads에 공개되어 있고, Arxiv 링크는

Tags: paper

🤗 The Future of Natural Language Processing - Model Size and Computational Efficiency

May 11, 2020

HuggingFace에서 올린 슬라이드/영상인 The Future of Natural Language Processing이 최근 NLP 전반에 대한 오버뷰를 잘 제공하고 있는데, 이 세션에서 나오는 내용들 중 Model Size, Computational Efficiency와 관련된 부분에 대해서 간단한...

Tags: nlp

DialoGPT: Large-Scale Generative Pre-training for Conversational Response Generation 리뷰

May 2, 2020

GPT를 대화체에 맞도록 학습시킨 모델이다. 마이크로소프트에서 나온 논문이고, arxiv링크는 https://arxiv.org/abs/1911.00536이다. 코드는 GitHub microsoft/DialoGPT에서 볼 수 있다.

Tags: paper

ZeRO: Memory Optimization Towards Training A Trillion Parameter Models 리뷰

May 1, 2020

매우 큰 모델의 학습 프레임워크로 MegaTron을 뛰어넘는 성능을 보여줘 화제였던 논문이다. arvix 링크는 https://arxiv.org/abs/1910.02054이고, pytorch용 구현은 GitHub - microsoft/DeepSpeed에서 볼 수 있다.

Tags: paper

TinyBERT: Distilling BERT For Natual Language Understanding 리뷰

May 1, 2020

TinyBERT는 Under Review 상태인 논문이고, 화웨이 Noah’s Ark Lab에서 나온 논문이다. 코드는 GitHub huawei-noah/Pretrained-Language-Model/TinyBERT에 있다. arxiv 링크는 https://arxiv.org/abs/1909.10351이다.

Tags: paper

Layer Normalization 리뷰

May 1, 2020

Layer Normalization은 BERT에 쓰이는 것 때문에 찾아보게 된 논문이다. arxiv 링크는 https://arxiv.org/abs/1607.06450이다. training시간을 줄이는 것이 큰 기여인데, 이름에서 알 수 있듯이 neuron의 activity를 normalize하는 것이다. Batch Normalization도 비슷한 역할을...

Tags: paper

Efficient 8-Bit Quantization of Transformer Neural Machine Language Translation Model 리뷰

April 27, 2020

TensorFlow 상에서 FP32를 INT8로 quantization을 해보는 논문이다. 1.5배의 성능 향상을 얻으면서 0.5 BLEU score accuracy만 떨어졌다고 한다. 또한 intel cpu에 최적화를 진행했다. arxiv 링크는 https://arxiv.org/abs/1906.00532이고, intel에서 나온 논문이다.

Tags: paper

Patient Knowledge Distillation for BERT Model Compression 리뷰

April 16, 2020

EMNLP 2019에 Accept된 마이크로소프트에서 나온 PKD (Patient Knowledge Distillation) 방식의 Model Compression 논문이다. arxiv 링크는 https://arxiv.org/abs/1908.09355이고 코드는 GitHub - intersun/PKD-for-BERT-Model-Compression에 있다.

Tags: paper

Improving Multi-Task Deep Neural Networks via Knowledge Distillation for Natural Language Understanding 리뷰

April 16, 2020

이 논문이 나오기 얼마 전에 마이크로 소프트에서 나온 MT-DNN (Liu et al., 2019)에 Knowledge Distillation을 적용한 논문이다. arvix링크는 https://arxiv.org/abs/1904.09482이고 코드는 GitHub - namisan/mt-dnn에서 확인 가능하다. 특이하게 다른...

Tags: paper

Distilling the Knowledge in a Neural Network 리뷰

April 16, 2020

구글에서 Geoffrey Hinton, Oriol Vinyals, Jeff Dean이 작성한 Distillation 개념을 제안한 논문이다. arvix 링크는 https://arxiv.org/abs/1503.02531이고, NIPS 2014 워크샵에 나온 논문이다.

Tags: paper

Q8BERT: Quantized 8Bit BERT 리뷰

April 14, 2020

intel에서 나온 NeurIPS 2019에 발표된 Q8BERT 논문이다. arxiv 링크는 https://arxiv.org/pdf/1910.06188.pdf이다. BERT를 fine tuning phase때 quantization aware training을 적용하여 4배 압축하고, intel CPU의 8bit 연산을 사용해 연산을 가속했다.

Tags: paper

FastBERT: a Self-distilling BERT with Adaptive Inference Time 리뷰

April 14, 2020

이 논문 역시 BERT가 너무 서빙하기 큰 모델이라서 fine tuning 시에 self distillation을 적용해본 것이다. 2019 Tencent Rhino-Bird Elite Training Program에서 펀딩받아서 작성한 것이다. arxiv 링크는 https://arxiv.org/abs/2004.02178이다.

Tags: paper

DynaBERT: Dynamic BERT with Adaptive Width and Depth 리뷰

April 13, 2020

이 논문에서는 BERT, RoBERTa가 매우 좋은 성능을 보이지만, memory, computing power가 너무 많이 필요하므로 그를 압축해보는 방법을 제안한다. 아직 WIP인 논문이고, https://arxiv.org/abs/2004.04037가 링크이다. 화웨이에서 나온 논문이다.

Tags: paper

PEP(Python Enhancement Proposal)란 무엇일까

March 27, 2020

PEP와 숫자로 이루어진 수많은 python proposal이 존재하지만, 그 많은 proposal들은 어떤 기준으로 읽어야 하고, 판단을 해야 할까? 어떤 proposal을 읽어야 하고 어떤 proposal을 읽지 않아도 될까? 이런 질문에 대한 답을...

Tags: python

딥러닝 모델 서비스 A-Z 1편 - 연산 최적화 및 모델 경량화

March 11, 2020

핑퐁팀 블로그의 딥러닝 모델 서비스 A-Z 1편 - 연산 최적화 및 모델 경량화 - 딥러닝 모델 서비스 A-Z 1편에 올라간 글입니다. 작성에 일부 참여한 글이기 때문에 저장용으로...

Tags: pytorch scatterlab tensorflow