๐Ÿค– 2019 ๋จธ์‹ ๋Ÿฌ๋‹ ์Šคํ„ฐ๋””์žผ ์ค‘๊ธ‰๋ฐ˜

๋จธ์‹ ๋Ÿฌ๋‹ ์Šคํ„ฐ๋””์žผ ์ค‘๊ธ‰๋ฐ˜!! ๋„ ํ•˜๊ฒŒ ๋˜์–ด์„œ ๊ฐ„๋žตํ•˜๊ฒŒ ์ •๋ฆฌํ•ด๋ณด๋Š” ํฌ์ŠคํŠธ๋ฅผ ์ž‘์„ฑํ•˜๋ ค ํ•œ๋‹ค. ์ด๋ฒˆ์—๋„ coursera์™€ qwiklab์„ ์ด์šฉํ•˜๋Š” ๊ฒƒ์„ ๋ณด์ธ๋‹ค. cousera๋Š” Launching Machine Learning์„ ์ˆ˜๊ฐ•ํ•˜๋Š” ๊ฒƒ์ด ๋ชฉํ‘œ์ด๊ณ , qwiklab์€ Classify Images of Clouds in the Cloud with AutoML Vision์„ ์ˆ˜๊ฐ•ํ•˜๋Š” ๊ฒƒ์ด ๋ชฉํ‘œ์ด๋‹ค.

์•„๋Š” ๋‚ด์šฉ์€ ํ‚ค์›Œ๋“œ๋งŒ ์ ๊ณ  ๋„˜์–ด๊ฐ€๊ณ  ํ—ท๊ฐˆ๋ฆฌ๋Š” ๋ถ€๋ถ„ & ๋ชจ๋ฅด๋Š” ๋‚ด์šฉ๋งŒ ์ž์„ธํžˆ ์ •๋ฆฌํ•œ๋‹ค.

๊ฐ•์˜์—์„œ๋Š” ์•„๋ž˜์™€ ๊ฐ™์€ ๋‚ด์šฉ์„ ๋ฐฐ์šด๋‹ค๊ณ  ํ•œ๋‹ค.

  • ๋”ฅ๋Ÿฌ๋‹์ด ์™œ ๊ทธ๋ ‡๊ฒŒ ์ด์Šˆ๊ฐ€ ๋˜์—ˆ๋Š”๊ฐ€
  • loss function๊ณผ performance metric์„ ์ด์šฉํ•ด ์ตœ์ ํ™”ํ•˜๊ธฐ
  • ml์—์„œ ๋‚˜์˜จ ๋ฌธ์ œ๋“ค์„ ์‰ฝ๊ฒŒ ํ’€์–ด๋ณด๊ธฐ (์–ด๋–ป๊ฒŒ ๋ฒˆ์—ญํ•ด์•ผํ• ์ง€ ๋ชจ๋ฅด๊ฒ ๋‹ค. ์›๋ž˜์˜ ๋ง์€ mitigate common problems that arise in machine learning์ด๋‹ค)
  • test dataset์„ ๋ชจ์œผ๊ณ , ํ›ˆ๋ จํ•˜๊ณ  evaluation๊นŒ์ง€ ํ•ด๋ณด๊ธฐ

์ด๋Ÿฐ ๋‚ด์šฉ์„ ์•„๋ž˜์™€ ๊ฐ™์€ ๋ชจ๋“ˆ๋กœ ๋‚˜๋ˆ„์–ด์„œ ๊ฐ•์˜๋ฅผ ํ•œ๋‹ค.

  1. Practical ML
  2. Optimization
  3. Generalization and Sampling
  4. Summary

Practical ML

์ด ๋ชจ๋“ˆ์—์„œ๋Š” ML์˜ ์ฃผ์š”ํ•œ ๋ฌธ์ œ๋“ค์„ ์‚ดํŽด๋ณด๊ณ  ์™œ ๊ทธํ† ๋ก ์ด์Šˆ๊ฐ€ ๋˜์—ˆ๋Š”์ง€ ์‚ดํŽด๋ณธ๋‹ค.

์•„๋ž˜์™€ ๊ฐ™์€ ํ‚ค์›Œ๋“œ๋ฅผ ๋‹ค๋ฃฌ๋‹ค.

  • Supervised Machine Learning vs Unsupervised
  • Two types of supervised machine learning
    • classification vs regression
    • classification์€ ๋ถ„๋ฅ˜ ๋ฌธ์ œ
    • regression์€ ๊ฐ’์„ ์˜ˆ์ธกํ•˜๋Š” ๋ฌธ์ œ

Short History of ML

Linear Regression

์ฒซ๋ฒˆ์งธ๋Š” Linear Regression์— ๊ด€ํ•œ ๊ฐ„๋‹จํ•œ ์—ญ์‚ฌ๋ฅผ ๋‹ค๋ฃฌ๋‹ค. Linear Regression์€ ํ–‰์„ฑ์˜ ์›€์ง์ž„๊ณผ ๊ฐ™์€ ์ž์—ฐํ˜„์ƒ์— ๋Œ€ํ•œ ์ดํ•ด๋ฅผ ์œ„ํ•ด ๋ฐœ์ „ํ–ˆ๋‹ค. linear regression์€ input feature๋กœ ๋“ค์–ด์˜ค๋Š” ๊ฐ’๋“ค์— ๊ฐ๊ฐ weight๋ฅผ ๊ณฑํ•ด์„œ ๊ฒฐ๊ณผ(prediction)๋ฅผ ๋ฝ‘์•„๋‚ธ๋‹ค. ์ฆ‰ ์•„๋ž˜์™€ ๊ฐ™์€ ์‹์„ ์ด์šฉํ•œ๋‹ค.

๋จธ์‹ ๋Ÿฌ๋‹์—์„œ๋„ ๋งŽ์ด ์‚ฌ์šฉ๋˜๋Š” ์‹์ด๋‹ค. ๊ทผ๋ฐ, weight๋ฅผ ์„ ํƒํ•  ๋ฐฉ๋ฒ•์ด ์—†์–ด ์ ๋‹นํ•œ ๊ฐ’์„ ์ฐพ๊ธฐ ์œ„ํ•ด loss function์„ ๋งŒ๋“ค์—ˆ๋‹ค. ์—ฌ๊ธฐ์„œ ์„ค๋ช…ํ•˜๋Š” loss function์€ mean squared error์˜€๋‹ค. ํ•˜์ง€๋งŒ, ๊ทธ loss function์—์„œ ๋ฐ”๋กœ weight๋ฅผ ๋ฝ‘์•„๋‚ด๊ธฐ์—๋„ ์–ด๋ ค์›€์ด ๋”ฐ๋ฅด๋‹ˆ, optimizing ํ•˜๋Š” ๋‹ค๋ฅธ ๋ฐฉ๋ฒ•์„ ์ƒ๊ฐํ•œ ๊ฒƒ์ด Gradient Descent์ด๋‹ค.

Perceptron

1940๋…„๋Œ€์— Frank Rosenblatt์ด ์ธ๊ฐ„์˜ ๋‡Œ๋ฅผ ๋ณธ๋”ด ๋ชจ๋ธ์ด ๊ฐ„๋‹จํ•œ ํ•จ์ˆ˜๋ฅผ ํ•™์Šตํ•  ์ˆ˜ ์žˆ๋‹ค๋Š” ๊ฒƒ์„ ์•Œ๊ฒŒ ๋˜์—ˆ๋‹ค. single layer๋ฅผ ๊ฐ€์ง€๋Š” perceptron์ด lienarํ•œ ํ•จ์ˆ˜๋ฅผ ์ตํž ์ˆ˜ ์žˆ๋‹ค๋Š” ๊ฒƒ์„ ๋ณด์—ฌ์ฃผ์—ˆ๋‹ค. (linear classifier) ํ•˜์ง€๋งŒ, XOR๊ณผ ๊ฐ™์€ lienar ํ•˜์ง€ ์•Š์€ ๊ฒƒ์€ ํ•™์Šตํ•˜์ง€ ๋ชปํ•œ๋‹ค.

Neural Network

XOR๊ฐ™์€ ๊ฒƒ์„ ํ•™์Šตํ•˜์ง€ ๋ชปํ•˜๋‹ˆ๊นŒ ๊ทธ๋ž˜์„œ ์—ฌ๋Ÿฌ ๋‹จ์˜ layer๋ฅผ ์Œ“๊ธฐ ์‹œ์ž‘ํ–ˆ๋‹ค. ๊ทธ ์‚ฌ์ด์— activation function๋„ ๋„ฃ๊ณ . ํŠนํžˆ ReLU๊นŒ์ง€ ์ ์šฉ๋˜๊ธฐ ์‹œ์ž‘ํ•˜๊ณ  ๋‚˜์„œ๋Š” ๊ต‰์žฅํžˆ ๋น ๋ฅด๊ฒŒ ํ•™์Šต๋„ ๊ฐ€๋Šฅํ•ด์กŒ๋‹ค.

Deicsion Trees

decision tree๋Š” piecewise linear dicision boundary๋ฅผ ํ•™์Šตํ•˜๋Š”๋ฐ, ์ด๋Š” ํ•™์Šตํ•˜๊ธฐ๋„ ์‰ฝ๊ณ , ์‚ฌ๋žŒ๋“ค์ด ๊ทธ ๊ฒฐ๊ณผ๋ฅผ ์ดํ•ดํ•˜๊ธฐ๋„ ์‰ฝ๋‹ค. classification, regression์— ๋‘˜๋‹ค ์“ฐ์ผ ์ˆ˜ ์žˆ๋‹ค. ์ด๋ฆ„์— ๋งž๊ฒŒ ๊ฐ๊ฐ์˜ node๋Š” ํ•˜๋‚˜์˜ feature์— ๋Œ€ํ•œ lienar classifier๋กœ ์ด๋ฃจ์–ด์ง„๋‹ค.

Kernel Methods

SVM์€ ํ˜์‹ ์ ์ด์—ˆ๋‹ค..! ํ•˜์ง€๋งŒ, linearํ•˜๊ฒŒ decision boundary๋ฅผ ๊ฒฐ์ •ํ–ˆ์œผ๋ฏ€๋กœ, non linearํ•œ ๊ฒƒ์— ๋Œ€ํ•ด์„œ๋Š” ์ ์ ˆํ•˜๊ฒŒ ๋‚˜๋ˆŒ ์ˆ˜๊ฐ€ ์—†์—ˆ๋‹ค. ๊ทธ๋ž˜์„œ kernel transform์„ ์ ์šฉํ•˜๊ธฐ ์‹œ์ž‘ํ–ˆ๋‹ค. ๊ทธ ๋ฐฉ๋ฒ•์„ ์ ์šฉํ•œ SVM์ด kernelized svm. NN์—์„œ๋Š” layer์— ๋” ๋งŽ์€ neuron์„ ๋„ฃ์–ด์ฃผ๋Š” ๊ฒƒ์ด higher dimension์œผ๋กœ mappingํ•ด์ฃผ๋Š” ์š”์†Œ๋กœ ์ƒ๊ฐํ•˜๋ฉด ๋œ๋‹ค.

Random Forests

๋” ๋งŽ์€ classifier, regressor๋ฅผ ์‚ฌ์šฉํ•ด์„œ ensembleํ•˜๋Š” ๊ฒƒ์ด ๋” ์ข‹์€ ์„ฑ๋Šฅ์„ ๋‚ด๋ฏ€๋กœ, ์ •๋ง ๊ทธ๋ ‡๊ฒŒ ํ•œ ๊ฒƒ์ด๋‹ค. Tree -> Forest๋กœ ์ƒ๊ฐํ•˜๋ฉด..? ํ•˜๋‚˜์˜ tree๊ฐ€ ๋ชจ๋“  ๊ฒƒ์„ ๊ธฐ์–ตํ•  ์ˆ˜ ์—†๊ณ , ๋…๋ฆฝ์ ์œผ๋กœ ๊ณ ๋ คํ•ด์•ผํ•  ์š”์†Œ๊ฐ€ ์žˆ์„ ์ˆ˜ ์žˆ์œผ๋‹ˆ ์ด๋ ‡๊ฒŒ ํ•˜๋Š” ๊ฒƒ ๊ฐ™๋‹ค. random holdout์„ ์‚ฌ์šฉํ•˜๋Š” k-fold validation์™€๋„ ๋น„์Šทํ•˜๋‹ค๊ณ  ํ•œ๋‹ค.

Modern Neural Networks

ํ•˜๋“œ์›จ์–ด๊ฐ€ ์ข‹์•„์ง€๊ณ , NN์— ๋Œ€ํ•œ ๋งŽ์€ ์ข‹์€ ๋ฐฉ๋ฒ•์ด ์ œ์•ˆ๋˜๋ฉด์„œ DNN์ด ์ •๋ง ์œ ๋ช…ํ•ด์ง€๊ธฐ ์‹œ์ž‘ํ–ˆ๋‹ค. ์ด ์˜์ƒ์€ ํŠน์ • ๋ชจ๋ธ์„ ๋‘๊ณ ๋งŒ ๋งํ•ด์„œ ์ข€ ๋งŽ์ด ๊ฑด๋„ˆ ๋›ฐ์—ˆ๋‹ค ใ… ใ… 

Optimization

์•„๋ž˜์™€ ๊ฐ™์€ ๊ฒƒ๋“ค์„ ๋ฐฐ์šด๋‹ค๊ณ  ํ•œ๋‹ค.

  • Quantify model performance using loss functions
  • Use loss functions as the basis for an algorithm called gradient descent
  • Optimize gradient descent to be as efficient as possible
  • Use performance metrics to make business decisions

Defining ML Models

ML ๋ชจ๋ธ์€ parameter์™€ hyper parameter๋กœ ์ด๋ฃจ์–ด์ง„ ํ•จ์ˆ˜๋ผ๊ณ  ์ƒ๊ฐํ•  ์ˆ˜ ์žˆ๋‹ค. ํ˜„์žฌ ๋ฐ์ดํ„ฐ์— ๋Œ€ํ•ด์„œ๋งŒ decision boundary๋ฅผ ๊ธ‹๋Š” ๊ฒƒ์ด ์•„๋‹ˆ๋ผ ์ƒˆ๋กœ ๋‚˜ํƒ€๋‚  ๋ฐ์ดํ„ฐ์— ๋Œ€ํ•ด decision boundary๋ฅผ ๊ธ‹๋Š” ๊ฒƒ๋„ ์ค‘์š”ํ•˜๋‹ค.

Introducing Loss Function

์–ผ๋งˆ๋‚˜ ์šฐ๋ฆฌ๊ฐ€ ์›ํ•˜๋Š” ๋‹ต์œผ๋กœ๋ถ€ํ„ฐ ๋ฉ€์–ด์ ธ์žˆ๋Š”์ง€ ์•Œ์ˆ˜์žˆ๋Š” ํ•จ์ˆ˜์ด๋‹ค. RMSE๊ฐ™์€ loss function์„ ์ ์šฉํ•ด๋ณผ ์ˆ˜ ์žˆ๋‹ค.

RMSE๋Š” ์ž˜ ๋™์ž‘ํ•  ๋•Œ๊ฐ€ ๋งŽ์ง€๋งŒ, classification์šฉ์œผ๋กœ๋Š” ์ž˜ ๋™์ž‘ํ•˜์ง€ ์•Š๋Š”๋‹ค. ๊ทธ๋ž˜์„œ Classification ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด Cross Entropy Loss์„ ์ ์šฉํ•  ๋•Œ๊ฐ€ ์žˆ๋‹ค.

์ƒ๊ฐํ•ด๋ณด๋‹ˆ ์œ„์— ๊ฒƒ๋“ค์„ ํ•„๊ธฐํ•œ ์ดํ›„๋กœ ๋‚ด์šฉ ์ฒดํฌ๋ฅผ ์•ˆํ•˜๊ณ  ๋๋‚ด๋ฒ„๋ ธ๋‹ค.... ใ… ใ… ใ… ใ… ใ… ใ… ใ… ใ… ใ… 

May 19, 2019 ์— ์ž‘์„ฑ
Tags: studyjam ์ปค๋ฎค๋‹ˆํ‹ฐ