Discrete Distribution Fitting to Duke Basketball Scores, in Crystal Ball (1/8)

Author: Eric Torkia, MASc/Thursday, December 16, 2010/Categories: Monte-Carlo Modeling, Analytics Articles

Rate this article:
No rating

Let us assume we have a batch of historical data in a spreadsheet. Our mission-of-the-moment is to use this data and fit probability distributions that describe its past variability (or uncertainty). Consider using either Crystal Ball or ModelRisk to do this task. We offer free trials of both to registered users. If you register here, you can get yours too. Try fitting the same data using these two different packages. Let us know how and why one is better than the other. In demonstrating these capabilities, we gain first-hand experience on the usability and capabilities of the alternatives and which features compared have more priority. The best way to judge is to try them out for yourself.

The data at hand is listed in the spreadsheet titled "Duke 09_10 Scores." It records the game results of last season's ('09/'10) NCAA Basketball Champions, the Duke Blue Devils. We identify the opponent, date, scores of both teams, and home/away/neutral court status. (We are excluding data from the NCAA tournament itself.) Of course there is variability in the data. Is it possible that we could use data like this to predict probabilities of future Duke Basketball outcomes?

As with any worthy distribution-fitting algorithm, both Crystal Ball (CB) and ModelRisk (MR) offer fitting to both discrete and continuous data. Since basketball scores do not offer fractional point values (as one might in fantasy sports scoring schemes), we intend on using the discrete options only. First up is CB:

Open the spreadsheet in Excel with Crystal Ball loaded. The scores of both the '09/'10 Duke team and their opponents are in Columns C and D. Crystal Ball has magical entities called Assumptions that we will place in a particular cell. That cell will then be colored bright green. These cells will display randomly-generated values, based on their PDFs, during Monte Carlo simulation.

  • Select Cell I5.
  • In the Crystal Ball ribbon bar, click on "Define Assumption". (This brings up the Distribution Gallery window.)
  • Select the "Fit…" button (lower right-hand corner).
  • Select the radio button on the left-hand side titled "All discrete".
  • Select the data (in cells C4:C37) to be fitted itself via the "Range:" option on top.
  • Select "OK". (You will get a warning message stating "No valid fits were found for one or more distributions.")
  • Select "OK" to arrive at the distribution-fitting results window (Fig. 1-1, select View > Split View and View > Goodness of Fit to see the fitting criteria values).

This split view shows the best fitted discrete PDF overlaid on the data on the left side. On the right side is a list of the PDFs fitted to the data. They are listed in order of better-fitting as ranked by Chi-Square method. (Note also the p-values associated with Hypothesis Testing. Their null hypotheses are associated with particular PDFs, much like normality testing where the normal distribution is the one of interest.) We can accept the best fit, as per CB criteria, and also provide a name for this CB Assumption.

  • Select "Accept" in lower right-hand corner. (This brings up the Define Assumption window for the selected distribution.)
  • Type in a name, such as "Duke – Poisson," in Name dialog entry near top.
  • Select "OK." (This places a CB Assumption into Cell I5.)

Perhaps the CB modeler would like to simulate Duke scores with the next-best-fitting PDFs to our data? Select Cell I7 and repeat the above process with a slight twist. Before clicking on "Accept," use the "Next>>" button in the lower left-hand corner to visualize the second-best-fitting PDF. Once that PDF is highlighted in green in the ranked list, then select "Accept" and complete the process to place a different CB Assumption into Cell I9. Do this also for Cell I11, using the third-best-fitting PDF.

The perceptive analyst should also note how much greater the Chi-Square values increase after the first four. It means the lower-ranked PDFs are very likely inappropriate candidates for PDF nomination. Also note no Chi-Square value was computed for the Hypergeometric PDF (which explains the previous message about "no valid fits").

Following the same process for the first three CB Assumptions created, we can do the same for the variation in the opponents' scores. It should be no real surprise, that the first four best-fitting distributions are the same as the first four best-fitting PDFs for Duke's offensive outputs, but with lower central tendencies. After all, they did win most of their games. (The CB Assumptions associated with the three best-fitting distributions should reside in Cells I11, I13 and I15. Repeat the fitting process 3 times for these cells.)

Are there other reasons that we should go with the Top Three discrete PDFs that Crystal Ball offers as the best-fitting? Depending on the application, the selection of distributions, just like with other statistical offerings, can be influenced by the SME's knowledge of the process being modeled. Please follow along as we use our mutual knowledge of college basketball scoring in selecting PDFs.


Number of views (2312)/Comments (0)

Please login or register to post comments.