In order to compliment my linear regression in google docs post (and because I keep forgetting how to do it), here is a quick and dirty guide to linear regression using python and pylab.

First some notes. One, there is some good info on this online (how else do you think I find this stuff?). Here is a great link:

Second, remember that I do things the 'hard way' sometimes. I am not really a programmer, I am a doer. Python lets me get stuff done even if it is not theoretically the best way. That is what makes python so great, really.

On to the problem. First, let me start with some data. I am just making this stuff up.

How about a plot? To do this, I am going to put the data into two lists (again, maybe not the best way to do this but you can't stop me). Here is the code:

This is the graph it produces:

Now to add a linear function to that data. Here is the final code.

Let me point out a couple of the key lines.

This calls the polyfit function (that is in the pylab module). Polyfit takes two variables and a degree. In this case the degree is 1 for a linear function. The results goes to the two variables *m* (for the slope) and *b* for the y-intercept of the equation *y* = *mx* + *b*.

Once I have the coefficients *m* and *b*, really I am finished. I could just print these and move on. But everyone always likes a nice graph. How do you graph the fitting function? That is where this line comes in:

This just evaluates the polynomial with the coefficients [m,b] and value *x*. So, for every x data point I have, this calculates a y value from the fitting function. Now I have a new set of values yp.

To plot this, I want the fitting function as a normal line and the original data as just data points. That is why I call both plot() and scatter(). Here is the graph that it produces:

This has a slope of 1.076 and an intercept of 2.771.

And there you have it. Linear fitting in python.