딥러닝 모델 서비스 A-Z 2편 - Knowledge Distillation

November 18, 2020

핑퐁팀 블로그의 딥러닝 모델 서비스 A-Z 2편 - Knowledge Distillation - 난 선생이고 넌 학생이야에 올라간 글입니다. 제가 작성한 글이기 때문에 이 블로그에 아카이브합니다.

Tags: pytorch scatterlab tensorflow

PyTorch Developer Day 2020

November 17, 2020

얼마전 PyTorch Team에서 처음으로 개최하는 PyTorch Developer Day에 초대한다는 메일이 와서 확인해보았다.

Tags: conference pytorch

최근 읽은 논문/아티클 정리 (Triplet Loss for Knowledge Distillation, Weight Distillation, Understanding and Improving Knowledge Distillation)

October 21, 2020

논문 세편 (Triplet Loss for Knowledge Distillation, Weight Distillation, Understanding and Improving Knowledge Distillation)을 읽고 정리했다.

Tags: paper

최근 읽은 논문/아티클 정리 (PRADO, Quasi-RNN, Advancing NLP with Efficient Projection-Based Model Architectures, Small and Practical BERT Models for Sequence Labeling)

October 13, 2020

PRADO, Quasi-RNN, Advancing NLP with Efficient Projection-Based Model Architectures, Small and Practical BERT Models for Sequence Labeling 논문 정리입니다.

Tags: paper

최근 읽은 논문 정리 (AIN, Repulsive Attention, Improving Transformer Optimization Through Better Initialization, ReZero)

September 25, 2020

AIN, Repulsive Attention, Improving Transformer Optimization Through Better Initialization, ReZero 논문 정리입니다.

Tags: paper

한국어 띄어쓰기 모델 작성하기

September 23, 2020

최근 lovit/namuwikitext를 보면서 데이터를 크게 만지지 않고 할 수 있는 것이 무엇이 있을까 싶었다. 고민해보다 kakao/khaiii의 위키 페이지 중 띄어쓰기 오류에 강건한 모델을 위한 실험을 보면서 띄어쓰기...

Tags: nlp

MIXOUT: Effective Regularization to Finetune Large-scale Pretrained Language Model

September 8, 2020

This post is a note for the paper “MIXOUT: Effective Regularization to Finetune Large-scale Pretrained Language Model” (Lee et al., 2019).

Tags: paper

"Clean Architecture: A Craftsman's Guide to Software Structure and Design" 1 - Introduction and Programming Paradigms

August 24, 2020

This post is a note for Part I (Introduction), and Part II (Programming Paradigms) in Clean Architecture: A Craftsman’s Guide to Software Structure and Design by Robert C. Matrin.

...

Tags: note

Compound File Binary File Format 오버뷰..?

August 22, 2020

사이드 프로젝트로 HWP 파일 포맷 파싱 라이브러리를 짜면 어떨까 싶어서 포맷을 살펴보니, CompoundFileBinary File Format을 따르는 것 같다. 그리고 이를 줄여서 MS에서는 CFB라 부르는 것 같고, 해당 포맷이 [MS-CFB]라는...

Tags: note

On Layer Normalization in the Transformer Architecture 리뷰

July 16, 2020

ICML 2020에 나온 논문이고, ZeRO 학습 튜토리얼에 쓰인 PreLN 구조를 소개한 논문이다. Microsoft Research에서 나온 논문인듯…? 그래서 ZeRO 튜토리얼에서 쓴 것 같다. BERT 학습 도중 읽어본 논문이라 간단하게만 정리

Tags: paper

Fast and Accurate Deep Bidirectional Language Representation for Unsupervised Learning 리뷰

July 12, 2020

얼마 전 TensorFlow Korea에 저자 분이 직접 설명을 간략하게 달아주셔서 읽어본 논문이다. ACL 2020 발표된 논문이고, Abstract에 Similarity task에는 BERT-based 모델에 비해 12배정도 빠른 속도를 가지면서도 괜찮은 성능을 가진다고...

Tags: paper

Revealing the Dark Secrets of BERT 리뷰

July 6, 2020

GLUE 태스크와 그 subset을 이용하여 정량적, 정성적으로 BERT heads 분석한 논문이다. EMNLP 2019에 Accept된 논문.

Tags: paper

List Comprehension이 빠른 이유를 찾아보자

June 21, 2020

Python을 어느정도 쓰는 사람이면 누구나 듣는 “List Append를 하는 것보다 List Comprehension을 써서 구현하는 것이 더 빠르고 간결하다.”라는 말. 하지만 실제 내부 동작과 더불어 설명하는 사람은 드물다. 실제 구현이 어떻게...

Tags: python

Apex's FusedLayerNorm vs Torch's LayerNorm

June 17, 2020

microsoft/DeepSpeedExamples의 BERT에서 Apex의 FusedLayerNorm을 사용하고 있고, NVIDIA/DeepLearningExamples에서도 Apex의 FusedLayerNorm을 사용하고 있다. 그럼 Apex의 FusedLayerNorm과 torch.nn.LayerNorm의 차이는 무엇일까?

Tags: pytorch

Are Sixteen Heads Really Better than One? 리뷰

May 18, 2020

Multi head attention이 표현력이 좋고 많은 정보를 담을 수 있다지만, 모든 head가 필요한 것은 아니다. 이에 관한 논문이 Are Sixteen Heads Really Better Than One? (Michel et al., 2019)이고, arxiv...

Tags: paper