[box style=”rounded”]Warning: some basic statistics concepts ahead. If you have not had any coffee yet today, don’t read on until you have had some. ;-}[/box]
If you’ve been following along with this series, as well as the reading the market studies generally, you’ll know that one of ways I use to examine data is via the histogram. To the analytically bent, histograms – also called bar charts – are beautiful things. They are scary simple, and show the distribution of values in a set of data.
If you’re a volume or TPO profiler, you look at histograms every day, only maybe you didn’t realize it. Profiles themselves are either time or volume histograms, only they are rotated 90 degrees and displayed vertically. Other than that, they are the same.
Why are histograms so powerful? Because they show so clearly show us frequency. Frequency, or how often a certain value appears in a set of data is essentially, conceptually, the same as probability. In other words if we have a set of, say, 10 whole numbers and the number 5 appears 9 times in that set, we can say that the probability of the number 5 appearing in our set of data is 90%.
This can have predictive value as well, assuming our sample is representative of the whole set of possible values. While a sample size of 10 is far too small to have reliable predictive value (unless the total number of possible data points is about 10) and there are mathematical tests for the reliability of a probability calculation, suffice to say the larger your sample size the more reliable the predictive value.
One last example… say the largest possible size of your set of data is 1000 numbers and say your sample contains 900 randomly selected numbers from that set of 1000. Let’s say further that the number 5 occurs in your sample 800 times. This means that the frequency of the value 5 is 800, and also that the probability of the number 5 occurring in your universe of data is 88%, which is pretty high. 88% is calculated frequency/sample size or .88 = 800/900. And since your sample included 90% of the all possible values (900 out of 1000), the predictive value is of this 88% is quite high. As traders, we live in a world of non-absolutes. We live and breathe a future without certainty. Our universe contains only likely or probable outcomes. And that is why performing this kind of analysis is so important.
Now that the, er, probability that you’ve had enough of this background mumbo-jumbo is pretty high, let’s move on to the meat of the matter. That is how to actually create one of these beasties in Microsoft Excel®. You can use other tools, of course, but Excel’s ubiquity makes it the one we’ll choose today.
Step 1 – Install the Data Analysis Toolpak
Install the Excel Data Analysis Toolpak if you do not have in installed already. Instructions from Microsoft on how that’s done are here. Go ahead and have a look. No worries, we’ll wait. Back so soon? OK, let’s continue.
Step 2 – Sample Your Data
Before we can create the histogram, we need some data. In this case, we’ll just use our example above. We’ll create a set of 10 whole numbers. To do this, fire up Excel, create a new worksheet. In column A, type the first 10 whole numbers that pop into your head. Your worksheet should look something like this:
Step 3 – Create a Bin
A histogram needs 2 things as input. First, it needs a set of values to plot (the actual data). But it also needs to know the range of values contained in the data, which is called a bin. The way to create a bin is to find the minimum and maximum values in your set of data. You can do that as follows:
- Click on the header for column A
- Click the Data item on the Excel ribbon
- Click the Filter button
Afterward a small arrow will appear on the column header, and this allows us to sort and filter the data in our worksheet by the values in that column:
Now click on the small arrow in column A and then click the menu item that says “Sort Smallest to Largest.” Afterward, we can see that the top row is the smallest value in our data set and the bottom value is our largest. This range – 1 to 12 – will be our bin. So head over the column G and type in the number 1 as that is our smallest value, then type 2, and 3 in the rows just below 1 (still in column G). Then:
- Highlight the cells containing 1, 2 and 3
- Click the small + sign in the lower right corner of the highlighted cells and drag it downward until we reach the number 12, as shown below
Step 4 – Create the Histogram
We’re almost home free now. So let’s finish this one up (everything shown below):
- Click the Data Analysis button on the ribbon. It will probably be all the way on the right of the Data tab
- Select Histogram from the list, then click OK.
- Select the values in column A as the Input Range and the values in column G as the Bin Range
- Click the Chart Output checkbox and Excel will automatically create a new chart graphic for us in addition to the histogram data (shown below)
- Click OK to generate, and the output will appear on a new worksheet
Now that your data visualization is complete, it’s on to interpretation. But that task, amigos, is up to you. Until next time…. trade ’em well.