eroicaleo.github.io

Lecture 03 Linear Regression

LINEAR REGRESSION

Why a linear mapping?

Evaluating predictions

Measure “closeness” between predictions and labels

Appropriate measure for loss

Ridge regression

MILLION SONG REGRESSION PIPELINE

Raw data

Feature extractions: Quadratic features

Supervised learning: least square regression

Evaluation (part 1)

Grid search: exhaustive search through hyper-parameter space

DISTRIBUTED MACHINE LEARNING: COMPUTATION AND STORAGE

The close form solution makes it appealing. Computational issues associated with linear regression.

In terms of computation:

In terms of storage:

Scenario 1: when n is large and d is small. We can distribute the data points, i.e. each row of X to different machines.

Matrix multiplication can be done through outer products

The above 3 steps can be summrized in following spark code:

trainData.map(computeOuterProduct).reduce(sumAndInverse)

Scenario 2: when d grow large, n is also large

GRADIENT DESCENT

Choose the step size:

Parallel the gradient descent

for i in range(numIters):
  alpha_i = alpha / (n * np.sqrt(i+1))
  gradient = train.map(lambda lp: gradientSummand(w, lp)).sum()
  w -= alpha_i * gradient

return w

Gradient descent summary

Pros:

Cons:

COMMUNICATION HIERARCHY

DISTRIBUTED MACHINE LEARNING: COMMUNICATION PRINCIPLES

2nd rule of thumb: Perform parallel and in-memory computation Persistent in memory reduces communication.

# We can cache the training set
train.cache()

for i in range(numIters):
  alpha_i = alpha / (n * np.sqrt(i+1))
  gradient = train.map(lambda lp: gradientSummand(w, lp)).sum()
  w -= alpha_i * gradient
return w

3rd rule of thumb: Minimize the Network Communication First observation: We need to store and potentially communicate data, model and intermediate objects

Second observation: ML methods are typically iterative

Extreme: Divide and Conqure

train.mapPartitions(localLinearRegression).reduce(combineLocal)

Less extreme: Mini batch

for i in range(fewerIters):
  update = train.mapPartitions(doSomeLocalGradientUpdates).reduce(combineLocalUpdates)
  w += update
return w

We can amortize latency

Lecture 04 Logistic Regression and Click-through Rate Pipeline

ONLINE ADVERTISING

$43B in 2013

The players

Efficient Matchmaking

Publishers Get Billions of Impressions Per Day