| Logistic Regression | | | | is susceptible to overfitting, possibly making |
| Logistic regression is used for binary classification | | | | logistic-regression-without-regularization look worse |
| problems -- where you have some examples that | | | | than it is. |
| are "on" and other examples that are "off." You | | | | The Python Codefrom scipy.optimize.optimize |
| get as input a training set; which has some | | | | import fmin_cg, fmin_bfgs, fminimport numpy as |
| examples of each class along with a label saying | | | | npdef sigmoid(x):return 1.0 / (1.0 + |
| whether each example is "on" or "off". The goal is | | | | np.exp(-x))class SyntheticClassifierData():def |
| to learn a model from the training data so that | | | | __init__(self, N, d): |
| you can predict the label of new examples that | | | | """ Create N instances of d dimensional input |
| you haven't seen before and don't know the label | | | | vectors and a 1Dclass label (-1 or 1). """means = |
| of. | | | | .05 * np.random.randn(2, d)self.X_train = |
| | | | np.zeros((N, d))self.Y_train = np.zeros(N)for i in |
| One of the first models that would be worth | | | | range(N):if np.random.random() > .5:y = 1else:y |
| trying is logistic regression. | | | | = 0self.X_train[i, :] = np.random.random(d) + |
| Coding it up | | | | means[y, :]self.Y_train[i] = 2.0 * y - 1self.X_test = |
| I wasn't working on this exact problem, but I was | | | | np.zeros((N, d))self.Y_test = np.zeros(N)for i in |
| working on something close. Being one to practice | | | | range(N):if np.random.randn() > .5:y = 1else:y = |
| what I preach, I started looking for a dead simple | | | | 0self.X_test[i, :] = np.random.random(d) + |
| Python logistic regression class. The only | | | | means[y, :]self.Y_test[i] = 2.0 * y - 1class |
| requirement is that I wanted it to support L2 | | | | LogisticRegression(): |
| regularization (more on this later). I'm also sharing | | | | """ A simple logistic regression model with L2 |
| this code with a bunch of other people on many | | | | regularization (zero-mean |
| platforms, so I wanted as few dependencies on | | | | Gaussian priors on parameters). """def |
| external libraries as possible. | | | | __init__(self, x_train=None, y_train=None, |
| I couldn't find exactly what I wanted, so I decided | | | | x_test=None, y_test=None,alpha=.1, |
| to take a stroll down memory lane and implement | | | | synthetic=False): |
| it myself. I've written it in C++ and Matlab before | | | | # Set L2 regularization strengthself.alpha = alpha |
| but never in Python. | | | | # Set the data.self.set_data(x_train, y_train, |
| I won't do the derivation, but there are plenty of | | | | x_test, y_test) |
| good explanations out there to follow if you're not | | | | # Initialize parameters to zero, for lack of a |
| afraid of a little calculus. Just do a little Googling | | | | better choice.self.betas = |
| for "logistic regression derivation." The big idea is | | | | np.zeros(self.x_train.shape[1])def negative_lik(self, |
| to write down the probability of the data given | | | | betas):return -1 * self.lik(betas)def lik(self, betas): |
| some setting of internal parameters, then to take | | | | """ Likelihood of the data under the current |
| the derivative, which will tell you how to change | | | | settings of parameters. """ |
| the internal parameters to make the data more | | | | # Data likelihoodl = 0for i in range(self.n):l += |
| likely. Got it? Good. | | | | log(sigmoid(self.y_train[i] *np.dot(betas, |
| For those of you out there that know logistic | | | | self.x_train[i,:]))) |
| regression inside and out, take a look at how | | | | # Prior likelihoodfor k in range(1, |
| short the train() method is. I really like how easy | | | | self.x_train.shape[1]):l -= (self.alpha / 2.0) * |
| it is to do in Python. | | | | self.betas[k]**2return ldef train(self): |
| Regularization | | | | """ Define the gradient and hand it off to a scipy |
| I caught a little indirect flak during March madness | | | | gradient-basedoptimizer. """ |
| season for talking about how I regularized the | | | | # Define the derivative of the likelihood with |
| latent vectors in my matrix-factorization model of | | | | respect to beta_k. |
| team offensive and defensive strengths when | | | | # Need to multiply by -1 because we will be |
| predicting outcomes in NCAA basketball. | | | | minimizing.dB_k = lambda B, k : np.sum([-self.alpha |
| Apparently people thought I was talking nonsense | | | | * B[k] +self.y_train[i] * self.x_train[i, k] |
| -- crazy, right? | | | | *sigmoid(-self.y_train[i] *np.dot(B, |
| But seriously, guys -- regularization is a good idea. | | | | self.x_train[i,:]))for i in range(self.n)]) * -1 |
| Let me drive home the point. Take a look at the | | | | # The full gradient is just an array of |
| results of running the code (linked at the bottom). | | | | componentwise derivativesdB = lambda B : |
| Take a look at the top row. | | | | np.array([dB_k(B, k)for k in |
| On the left side, you have the training set. There | | | | range(self.x_train.shape[1])]) |
| are 25 examples laid out along the x axis, and the | | | | # Optimizeself.betas = fmin_bfgs(self.negative_lik, |
| y axis tells you if the example is "on" (1) or "off" | | | | self.betas, fprime=dB)def set_data(self, x_train, |
| (0). For each of these examples, there's a vector | | | | y_train, x_test, y_test): |
| describing its attributes that I'm not showing. | | | | """ Take data that's already been generated. |
| After training the model, I ask the model to | | | | """self.x_train = x_trainself.y_train = |
| ignore the known training set labels and to | | | | y_trainself.x_test = x_testself.y_test = |
| estimate the probability that each label is "on" | | | | y_testself.n = y_train.shape[0]def |
| based only on the examples's description vectors | | | | training_reconstruction(self):p_y1 = |
| and what the model has learned (hopefully things | | | | np.zeros(self.n)for i in range(self.n):p_y1[i] = |
| like stronger earthquakes and older buildings | | | | sigmoid(np.dot(self.betas, self.x_train[i,:]))return |
| increase the likelihood of collapse). The probabilities | | | | p_y1def test_predictions(self):p_y1 = |
| are shown by the red X's. In the top left, the red | | | | np.zeros(self.n)for i in range(self.n):p_y1[i] = |
| X's are right on top of the blue dots, so it is very | | | | sigmoid(np.dot(self.betas, self.x_test[i,:]))return |
| sure about the labels of the examples, and it's | | | | p_y1def |
| always correct. | | | | ), .5 + .5 * self.y_train, 'bo')plot(np.arange(self.n), |
| Now on the right side, we have some new | | | | self.training_reconstruction(), 'rx')ylim([-.1, 1.1])def |
| examples that the model hasn't seen before. This | | | | plot_test_predictions(self):plot(np.arange(self.n), .5 |
| is called the test set. This is essentially the same | | | | + .5 * self.y_test, 'yo')plot(np.arange(self.n), |
| as the left side, but the model knows nothing | | | | self.test_predictions(), 'rx')ylim([-.1, 1.1])if |
| about the test set class labels (yellow dots). What | | | | __name__ == "__main__":from pylab import * |
| you see is that it still does a decent job of | | | | # Create 20 dimensional data set with 25 points |
| predicting the labels, but there are some troubling | | | | -- this will be |
| cases where it is very confident and very wrong. | | | | # susceptible to overfitting.data = |
| This is known as overfitting. | | | | SyntheticClassifierData(25, 20) |
| This is where regularization comes in. As you go | | | | # Run for a variety of regularization |
| down the rows, there is stronger L2 regularization | | | | strengthsalphas = [0, .001, .01, .1]for j, a in |
| -- or equivalently, pressure on the internal | | | | enumerate(alphas): |
| parameters to be zero. This has the effect of | | | | # Create a new learner, but use the same data |
| reducing the model's certainty. Just because it can | | | | for each runlr = |
| perfectly reconstruct the training set doesn't | | | | LogisticRegression(x_train=data.X_train, |
| mean that it has everything figured out. You can | | | | y_train=data.Y_train,x_test=data.X_test, |
| imagine that if you were relying on this model to | | | | y_test=data.Y_test,alpha=a)print "Initial |
| make important decisions, it would be desirable to | | | | likelihood:"print lr.lik(lr.betas) |
| have at least a bit of regularization in there. | | | | # Train the modellr.train() |
| And here's the code. It looks long, but most of it | | | | # Display execution infoprint "Final betas:"print |
| is to generate the data and plot the results. The | | | | lr.betasprint "Final lik:"print lr.lik(lr.betas) |
| bulk of the work is done in the train() method, | | | | # Plot the resultssubplot(len(alphas), 2, 2*j + |
| which is only three (dense) lines. It requires | | | | 1)lr.plot_training_reconstruction()ylabel("Alpha=%s" |
| numpy, scipy, and pylab. | | | | % a)if j == 0:title("Training set |
| * For full disclosure, I should admit that I | | | | reconstructions")subplot(len(alphas), 2, 2*j + 2)lr. |
| generated my random data in a way such that it | | | | |