Well, welcome back. We're going to keep going and playing around with some returns volatility and start looking at risk adjusted returns. So let's go back into our Python notebook, and we don't need that space here. So let's start by reading that same data that we read last time. So I'm going to import my Pandas code. So import Pandas as pd, then if you remember I had prices and I am going to use the read read_csv method built into Pandas and the file that I was trying to read is called sample_prices.csv. Then we said once we've got the prices, we can convert it to returns by simply calling the pct change or the percentage change method on the prices DataFrame. So let's see if all this works by just printing out this. There you go. You can see that we've got a bunch of returns. Now, this may bother you a little bit, if you remember this happened because the percentage change is trying to compute the change in prices, but of course on the first day we don't have a change in price because we don't know the previous price. Well, one thing you can do to get rid, this happens a lot where you've got a DataFrame where you have some data which are na's. So one simple thing we can do is you can say, returns you can call the dropna method.of a DataFrame. If you look at the returns now after that, you will see that row zero is now gone. So you've got 12 sets of returns now. Then the next thing we said is well, we can compute the risk or the volatility just by calling the standard deviation and we said something like this, returns.std which is the standard deviation method on the returns DataFrame will give us a measure of risk. Let's actually compute the same number, the standard deviation in terms of what we understand as the standard deviation and let's see if we can get the same number. So if you remember the standard deviation what we're really trying to measure is the spread, how far do these things deviate from the mean. So let's do it from first principles. So let's say my deviations are whatever I have in my returns minus the mean return, that makes sense? So I'm subtracting or demeaning the returns by removing the mean from that. then what do I do? I want to measure how far they are away from the mean. So I want to really measure the spread, so some of these will be positive numbers, some of these will be negative numbers, we just want a measure of how far they are. So I'm going to compute the squared deviations and that's nothing more than the deviations squared. The simplest way to square it is just raise it to that power, and if you remember that's the doubled star is the exponentiation. Now, that will give you the squared deviations, let me fix that, will give you the squared deviations. Now what we want is the mean of the squares of the deviations and that will give us a variance. So let's just do that, variance is the squared deviations and we want to compute the mean of that. Okay. Then we said something like the volatility is equal to the square root of the variance. So one way of doing that is just by doing this, raising it to the 0.5 power. But actually I'm going to do it in slightly different way just for a little variance, just for a little variation. What I'm going to do is I'm going to import numpy as np, and numpy has a method in it has a function in there np.squareroot should give you exactly the same thing. Now, let's look at the volatility that we have computed this way and see if we get the same number as we have up there. So Shift Enter, and I wonder why these numbers don't match. But we just said that the standard deviation is basically the same as the square root of the mean squared deviation, so what's the difference? Well, the difference is that the standard deviation uses a numerator which is n minus 1, whereas here we're using a numerator which is n. In other words when you compute the mean you're dividing the sum by the total number of observations, whereas when you're doing the standard deviation if you remember your statistics, you divide by n minus 1 not n, and that's because what we're doing here is we're computing the sample standard deviation. We're assuming that this is a sample from a broader distribution and here we're measuring that what we've done here is measured the population standard deviation. So why don't we go fix that and the way to fix that is to not divide by n as we're doing here, but two divide by n minus 1. That will give me an excuse to introduce and play around with a few extra things. So let's start again from scratch. So we have the deviations that's correct, the squared deviations are the same as well, that's good. So we've got the squared deviations. Now what we have to do is compute the variance and the variance is instead of computing the mean what we have to do is we have to do variance is assigned the squared deviations. What we want to do is we want to do the sum of that and we want to divide not my n, but we want to divide by the number of observations, let's call it number of observations minus 1. So now the question becomes, how do I compute the number of observations? So what I need to do is I need to figure out what is the number of observations. So I'll come back to that in a second, let's go figure out how to compute the number of observations. So I'm going to cut this and let's look at the returns data structure that we have here. So if you look at returns, you can see here that this is a matrix and it's got 12 rows as you can see here, 1-12 and it's got two columns. There is a built-in attribute for every DataFrame called shape. Shape gives you back the number of rows and the number of columns in the form of a tuple. So a tuple think of it as an ordered list that you cannot change. Unlike a standard list where you can change, you can go in and alter or mutate the contents, here a tuple is just a fixed sequence of numbers. So in this case it's a tuple, there's two numbers and this is telling you right here it's 12 rows and two columns. So that is what we want, the number of observations. So now let me go back and paste that back. So we've got returns.shape, that will give you a tuple, but you want to return, you want to extract the first element and that's the element at index zero. So there you go. So this will do exactly what we had before and that's the variance, and now the volatility is what the variance to the power 0.5. Number of OBS. There you go. So whenever you see these ugly messages like that, you go back and say what's the problem, it says number of OBS is not defined, that's because I had said number of OB instead of OBS. I'm sure those of you who are watching this recording were screaming at the screen saying fix that, well, you got it. So now let's look at the volatility that we've computed this way, that's good. Now, let's look at the returns.std with a little luck they will be exactly the same. That's how you compute volatility but we aren't quite done because remember this is the volatility based on monthly data, we want to annualize this, so if you remember the way you annualize the volatility is by scaling it or multiplying it by what? You multiply it by the square root of the number of periods in a year. So in this example, this is monthly data so what would you multiply it by? You'd multiply it by the square root of 12. Why? Because there are 12 months in a year and we're trying to get the annual or the yearly volatility. So again same thing, we've got np.sqrt, remember that. That's exactly the same as doing this, isn't it? returns.std, good, same numbers. So now we have the annualized volatility. So let's look at some real data now instead of these sample Returns. Data we want to look at is here, let's go in here. What you want to look at here, let's open this up a little bit so that you can see the names a little better. So what we want to do is we want to look at these Portfolios_formed_on_ME_monthly, so let me open that up and let me explain what this dataset is and then we'll load this dataset in. So this dataset as you can see has lots of rows, this is a very big dataset and it goes all the way down to 201812, and it starts all the way back in 1926. So there's a row for every month, 1926, 07, 08, or 09, all the way through December of 2018. So the first column is the date, it's some sort of a date stamp. Then you can see that the other columns are a bunch of portfolios, you can see that these numbers here are 99 minus 99.99, which is used to signify an NA or the data is not available, and then you see that there's all kinds of returns for different portfolios. This is something called low 30 and then there's the median 40, and then there's the height 30. What does that mean? What they've done here is divide the set of all stocks into three categories by market cap. The lowest 30 percent, the middle 40 percent, and the highest 30 percent. Then they've taken all of the stocks that fall into the high 30 percent, the highest 30 percent of stocks and they've equally weighted them and that is a portfolio and this is the return of that portfolio. This next set of Returns here are quintiles. They are broken into five groups, the lowest 20 percent, that's low 20 and then there's Quintile 2, Quintile 3 Quintile 4, and then the high 20 percent. Again, just to repeat, what they've done here what is the high 20 percent? What is 3.33 there mean? It means that if you take all the stocks in that universe and you sort them by their market cap, in other words by the size of the company, and then you take the top 20 percent, in other words you take this entire list once it's sorted and draw lines so that you've broken them into five separate buckets, you take the highest bucket, that is the bucket that consists of the highest cap stocks, and then you equally weight those stocks, you treat that as a portfolio and you equally weight them, the returns of that portfolio on that month is what you're seeing here, 3.3, okay? Then the last set of portfolios here are instead of divided them into quintiles, this is what happens if you divide them into deciles. So what would you do? If you have ten such buckets, that's 10 percent each. This is the returns of the lowest 10 percent in market cap and that's the returns of the highest 10 percent in market cap that's marked as Hi ten. So what is Hi 10? It is the 10 percent largest stocks in the universe that they've selected here and this is the lowest smallest 10 percent in the universe. What is the universe? This is the universe of US stocks. So what does is this data? This is data from Ken French's website, and we've got a link there in your materials you can go and download this data as well as other data. But now that you've seen this, let's try and read this data in. So what are the things we have to watch out for? We have to say there's a header row so that we can use that as the columns. You can see that there is a date column, and we can actually conveniently use that as an index. So let's do that. Now, these are the returns directly, so returns. What do we want to do? PD.read_csv, just like we did before, read_csv. What is the name of the file? Data/Portfolios_formed_on all that kind of. So let me just put a comma here, and let's add in some more stuff. Well, what are the things we want do? The first thing that you can tell is whether there is a header column or not. This is actually the default but it's always nice to do this. So header is equal to zero is a way to say that the header is in row zero, in other words the header is in the first row. So you can say index call and that's also in column zeros, so why don't we give it that data. Now in this example, we want to treat it as dates and in general you can tell it to try and parse dates so you can say parse dates is true, and if that index column has dates, it will figure out how to parse that as best it can as dates. It's really smart because it understands a whole bunch of different date formats. So that's parse dates is true, and is there anything else I want to do? Yes. Remember we saw those weird na_values which is minus 99.99. So we can also tell pandas that any values have been encoded as minus 99.99, so let me close this back here we don't need to see this anymore. So let's do that and see what happens. So I'm going to do Shift, Enter, no errors. So let's do returns.head. So what does head do? It returns just the head of this data structure, and you can see that it seemed to get everything right. You've got all these columns. It got NaNs instead of the minus 99.99. It's got an index here, and it's got all the rows. If you scroll to the right you can see that you've got your Lo 10 and the Hi 10. So in fact let's do that, so let's pull out the columns that we really care about because we don't care about the rest of these columns. So I'm going to say columns is, and I'm going to give it a list of columns that we're interested in. So the first one that we care about is Lo 10 and the other one is Hi 10. So these are the names of the columns that I actually care about. So everything else I'm going to get rid of, so I'm going to say returns is assigned returns columns. Okay, and now if I do returns.head, you'll still all just get this. Just that. There's one last change I want to do before I make any further computations on these, which is, if you notice the way the Fama-French data Is, it actually has the number 3.29 to mean a 3.29 percent return. But if you remember, a 3.29 percent return is actually 0.0329. So these numbers have already been multiplied by a 100, so I like working with just the raw numbers, so what I do in this case is I say, returns as assigned, returns divided by 100. If I look at the head of the returns now, okay, so this looks much better because I actually expected 3.29 percent return to be shown as 0.0329, and that's how all the math, the 1 plus R stuff will work. Okay. So that looks good. The only clean up I want to do is, I don't like these names low 10 and high 10, because really low 10 is small cap and high 10 is large cap. So the way you can change that is you can say returns.columns is assigned, and then you just give it the new list of columns. So I'm going to call this SmallCap and LargeCap, okay. If I say returns.head, well, there you go, excellent. Let's try something interesting, let's try plotting it, so returns.plot. The default plot is a line plot, so you can say that explicitly by doing that, and you can see that you have this problem every now and then where you may have to just either hit that again and then that will work or you can do that magic that I gave you last time which is, %matplotlib inline. So if you do, Shift, Enter for there, it'll do the same thing. This should not be necessary for Jupyter lab, any modern version, but if you're using an old version, you might have to do that. Okay, so right away you can see here that the orange lines, which is large cap, is much less volatile. These returns move around a lot less than the small cap stocks which are in blue. So let's see if we can actually see that in the numbers. Well, remember how you do that, you say, returns.std will give you the standard deviation, and there you go. You can see that the standard deviation of the returns of small cap is almost twice as much as the standard deviation of large cap. What we can do is we can annualize this, so remember annualized vol is what? Its returns.standard deviation, and then, what do you want to do? You want to multiply that by the square root of 12. Why 12? Because this is monthly returns, and if I look at the annualized vol now, you'll see that it's 36 percent for small cap, and 18 percent for large cap, so almost twice as much. How about annualzing the returns? That is just a little bit more tricky, it's not complicated. The principle is exactly the same, so let's do this slowly in steps. So the first thing we want to do is we want to figure out what is the return per month, so let's do this, return_per_month, that's what we want do. What is the return that you have per month? Well, let's compute the total returns over all months, what is that? That is returns plus one and then all you do is compute the product of that, and if you remember we used prod for that. So that is the total return over all the months, right? But we want the return per month, so how do you do that? Well, to be able to do that we have to know how many months we have, and so what you would do is if you knew the number of months all you would do is you would raise that to one divided by let say n months. Now the question is, I don't know the number of months of returns, how many months of returns do I have? Well, I can do that in a couple of different ways. Let me just cut that and I'll come back to that in a second. If you look at the returns and you go all the way to the bottom, you can see that it's got 1,110 rows, so that's certainly one way of doing it. But actually, we've already figured out how to do this because the number of months is what? It is the number of rows, and the number of rows we already know is a returns.shape, returns.shape will give you a tuple which is one, one, one, zero and two, and we want the one, one, one, zero which is the number of rows, and so you've got that, okay. So let's just make sure we understood this part here because that's the part that might confuse some of you. If I didn't do this, this would be this minus one would give you the total returns over the entire period. But what I want to do is I want to do the returns over one month, not just over all months, not just over n months, I want to do it over just one month, so that's the return per month. You can think of this as almost the inverse of annualization. You've got a return over the entire period, now you want to look back and say, what is the return per month that would have given me that return over the total period? Then of course, that's the return this 1 plus R format, and you want to subtract one from that. Okay, so let's look at the return per month, and you see that this is giving you a 1.2 percent return in small caps and a 0.7 percent return per month in large caps. So obviously small caps have generated far more returns per month than large cap. But then as we saw above, let's actually just do it here, and if you look at the annualized volatility they're also far more volatile, almost twice as volatile. Okay, so before we start looking at this kind of risk adjusted return, let's convert the return per month to an annualized returns, so let's do annualized return. What is the annualized return? It is the return per month, plus 1. So you take that, that's the one plus R format, then you raise that to the 12th power, because why? You're going to compound that 12 times and of course that result you have to subtract one from that because we want just the return not the 1 plus R format. So the annualized return is about 16 percent for small caps and nine percent for large caps, so much lower returns. Now, I should point out that you could do all of this stuff that we did in one step, and so I'm going to just copy that. I'm going to show you another way of doing annualized return, so let's just do annualized return in a simpler way, and that is all that stuff. So instead of doing this subtracting by one, raising it to a power, adding it back, you could just do this, So the annualized return are the returns that you saw, compound it, that's basically what that's doing, and then raise it to the 12th power, divide it by n months and you should get exactly the same return. Okay, so that's just a shorthand in it doing it all in one step. So you will see this formula a lot, where this is the number of periods per year. So 12 in this case because we know it's monthly return. All right, so now that we've got the return and we've got the risk, let's just measure a return on risk ratio, So let's look at the annualized return divided by the annualized vol, and you can see that large cap give you slightly higher return per unit of vol, 0.49 versus 0.45. Okay. This is where I want to make that adjustment that we talked about before and that is computing the Sharpe ratio. So what does the Sharpe ratio really do? Instead of just looking at the annualized return per unit of volatility, what it's doing is giving you the excess return, over the risk-free rate. So let's start by defining the risk-free rate. So let's say risk-free rate is, let's just say 0.03, so that's three percent. I'm just assuming that that was on average the risk free rate, that's not the proper way to do it. What you should really do is take the time series of the risk-free rate. But just to illustrate the point, let me just assume it was a flat 3 percent, okay? So now we can compute the excess return. What is the excess return? It is the annualized return minus the risk-free rate. All right, and the Sharpe ratio is what? It is the excess return divided by the annualized_vol. So let's take a look at this, and so if you look at the Sharpe ratio, it appears that small caps did in fact give you a slightly superior risk adjusted return over large cap stocks. So I think I'm going to stop here, so just play around with this data, this return series, maybe you can see if this return observation holds, that is, that when you consider the smallest 10 percent of stocks, remember what I'm calling small cap here are the smallest 10 percent of stocks and large cap here is the largest 10 percent of stocks. Maybe repeat this entire exercise just as we did it for deciles but do it for quintiles. Does this result still hold that the Sharpe Ratio over this period from 1926-2018 is superior for small caps when you look at the smallest 20 percent versus the largest 20 percent of stocks? That would be a very nice exercise for you to do, and I'll leave that for you. Otherwise, we'll catch you at the next lab and I'll see you then.