Discrete Distribution Fitting to Duke Basketball Scores, in Crystal Ball (1/8)

Eric Torkia, MAScThursday, December 16, 2010

Rate article:

No rating

Rate this article:

No rating

Let us assume we have a batch of historical data in a spreadsheet. Our mission-of-the-moment is to use this data and fit probability distributions that describe its past variability (or uncertainty). Consider using either Crystal Ball or ModelRisk to do this task. We offer free trials of both to registered users. If you register here, you can get yours too. Try fitting the same data using these two different packages. Let us know how and why one is better than the other. In demonstrating these capabilities, we gain first-hand experience on the usability and capabilities of the alternatives and which features compared have more priority. The best way to judge is to try them out for yourself.

The data at hand is listed in the spreadsheet titled "Duke 09_10 Scores." It records the game results of last season's ('09/'10) NCAA Basketball Champions, the Duke Blue Devils. We identify the opponent, date, scores of both teams, and home/away/neutral court status. (We are excluding data from the NCAA tournament itself.) Of course there is variability in the data. Is it possible that we could use data like this to predict probabilities of future Duke Basketball outcomes?

As with any worthy distribution-fitting algorithm, both Crystal Ball (CB) and ModelRisk (MR) offer fitting to both discrete and continuous data. Since basketball scores do not offer fractional point values (as one might in fantasy sports scoring schemes), we intend on using the discrete options only. First up is CB:

Open the spreadsheet in Excel with Crystal Ball loaded. The scores of both the '09/'10 Duke team and their opponents are in Columns C and D. Crystal Ball has magical entities called Assumptions that we will place in a particular cell. That cell will then be colored bright green. These cells will display randomly-generated values, based on their PDFs, during Monte Carlo simulation.

Select Cell I5.
In the Crystal Ball ribbon bar, click on "Define Assumption". (This brings up the Distribution Gallery window.)
Select the "Fit…" button (lower right-hand corner).
Select the radio button on the left-hand side titled "All discrete".
Select the data (in cells C4:C37) to be fitted itself via the "Range:" option on top.
Select "OK". (You will get a warning message stating "No valid fits were found for one or more distributions.")
Select "OK" to arrive at the distribution-fitting results window (Fig. 1-1, select View > Split View and View > Goodness of Fit to see the fitting criteria values).

This split view shows the best fitted discrete PDF overlaid on the data on the left side. On the right side is a list of the PDFs fitted to the data. They are listed in order of better-fitting as ranked by Chi-Square method. (Note also the p-values associated with Hypothesis Testing. Their null hypotheses are associated with particular PDFs, much like normality testing where the normal distribution is the one of interest.) We can accept the best fit, as per CB criteria, and also provide a name for this CB Assumption.

Select "Accept" in lower right-hand corner. (This brings up the Define Assumption window for the selected distribution.)
Type in a name, such as "Duke – Poisson," in Name dialog entry near top.
Select "OK." (This places a CB Assumption into Cell I5.)

Perhaps the CB modeler would like to simulate Duke scores with the next-best-fitting PDFs to our data? Select Cell I7 and repeat the above process with a slight twist. Before clicking on "Accept," use the "Next>>" button in the lower left-hand corner to visualize the second-best-fitting PDF. Once that PDF is highlighted in green in the ranked list, then select "Accept" and complete the process to place a different CB Assumption into Cell I9. Do this also for Cell I11, using the third-best-fitting PDF.

The perceptive analyst should also note how much greater the Chi-Square values increase after the first four. It means the lower-ranked PDFs are very likely inappropriate candidates for PDF nomination. Also note no Chi-Square value was computed for the Hypergeometric PDF (which explains the previous message about "no valid fits").

Following the same process for the first three CB Assumptions created, we can do the same for the variation in the opponents' scores. It should be no real surprise, that the first four best-fitting distributions are the same as the first four best-fitting PDFs for Duke's offensive outputs, but with lower central tendencies. After all, they did win most of their games. (The CB Assumptions associated with the three best-fitting distributions should reside in Cells I11, I13 and I15. Repeat the fitting process 3 times for these cells.)

Are there other reasons that we should go with the Top Three discrete PDFs that Crystal Ball offers as the best-fitting? Depending on the application, the selection of distributions, just like with other statistical offerings, can be influenced by the SME's knowledge of the process being modeled. Please follow along as we use our mutual knowledge of college basketball scoring in selecting PDFs.

Comments

Collapse Expand Comments (0)

You don't have permission to post comments.

Perceptions and popularity of analytics technologies over time

The Cutting Edge - Eric Torkia

Oct 18 2017

Oracle Crystal Ball Spreadsheet Functions For Use in Microsoft Excel Models

The Cutting Edge - Eric Torkia

May 20 2014

545

Oracle Crystal Ball has a complete set of functions that allows a modeler to extract information from both inputs (assumptions) and outputs (forecast). Used the right way, these special Crystal Ball functions can enable a whole new level of analytics that can feed other models (or subcomponents of the major model).

Understanding these is a must for anybody who is looking to use the developer kit.

Why are analytics so important for the virtual organization? Read these quotes.

The Cutting Edge - Eric Torkia

Jun 26 2013

Since the mid-1990s academics and business leaders have been striving to focus their businesses on what is profitable and either partnering or outsourcing the rest. I have assembled a long list of quotes that define what a virtual organization is and why it's different than conventional organizations. The point of looking at these quotes is to demonstrate that none of these models or definitions can adequately be achieved without some heavy analytics and integration of both IT (the wire, the boxes and now the cloud's virtual machines) and IS - Information Systems (Applications) with other stakeholder systems and processes. Up till recently it could be argued that these things can and could be done because we had the technology. But the reality is, unless you were an Amazon, e-Bay or Dell, most firms did not necessarily have the money or the know-how to invest in these types of inovations.

With the proliferation of cloud services, we are finding new and cheaper ways to do things that put these strategies in the reach of more managers and smaller organizations. Everything is game... even the phone system can be handled by the cloud. Ok, I digress, Check out the following quotes and imagine being able to pull these off without analytics.

The next posts will treat some of the tools and technologies that are available to make these business strategies viable.

Multi-Dimensional Portfolio Optimization with @RISK

The Cutting Edge - Eric Torkia

Jun 28 2012

Many speak of organizational alignment, but how many tell you how to do it? Others present only the financial aspects of portfolio optimization but abstract from how this enables the organization to meets its business objectives. We are going to present a practical method that enables organizations to quickly build and optimize a portfolio of initiatives based on multiple quantitative and qualitative dimensions: Revenue Potential, Value of Information, Financial & Operational Viability and Strategic Fit.

This webinar is going to present these approaches and how they can be combined to improve both tactical and strategic decision making. We will also cover how this approach can dramatically improve organizational focus and overall business performance.

We will discuss these topics as well as present practical models and applications using @RISK.

Reducing Project Costs and Risks with Oracle Primavera Risk Analysis

The Cutting Edge - Eric Torkia

May 04 2012

It is a well-known fact that many projects fail to meet some or all of their objectives because some risks were either: underestimated, not quantified or unaccounted for. It is the objective of every project manager and risk analysis to ensure that the project that is delivered is the one that was expected. With the right know-how and the right tools, this can easily be achieved on projects of almost any size. We are going to present a quick primer on project risk analysis and how it can positively impact the bottom line. We are also going to show you how Primavera Risk Analysis can quickly identify risks and performance drivers that if managed correctly will enable organizations to meet or exceed project delivery expectations.

Modeling Time-Series Forecasts with @RISK

The Cutting Edge - Eric Torkia

Oct 18 2011

Making decisions for the future is becoming harder and harder because of the ever increasing sources and rate of uncertainty that can impact the final outcome of a project or investment. Several tools have proven instrumental in assisting managers and decision makers tackle this: Time Series Forecasting, Judgmental Forecasting and Simulation.

This webinar is going to present these approaches and how they can be combined to improve both tactical and strategic decision making. We will also cover the role of analytics in the organization and how it has evolved over time to give participants strategies to mobilize analytics talent within the firm.

We will discuss these topics as well as present practical models and applications using @RISK.

The Need for Speed: A performance comparison of Crystal Ball, ModelRisk, @RISK and Risk Solver

The Cutting Edge - Eric Torkia

Sep 20 2011

132

A detailed comparison of the top Monte-Carlo Simulation Tools for Microsoft Excel

There are very few performance comparisons available when considering the acquisition of an Excel-based Monte Carlo solution. It is with this in mind and a bit of intellectual curiosity that we decided to evaluate Oracle Crystal Ball, Palisade @Risk, Vose ModelRisk and Frontline Risk Solver in terms of speed, accuracy and precision. We ran over 20 individual tests and 64 million trials to prepare comprehensive comparison of the top Monte-Carlo Tools.

Excel Simulation Show-Down Part 3: Correlating Distributions

The Cutting Edge - Eric Torkia

Aug 19 2011

196

Escel Simulation Showdown Part 3: Correlating Distributions Modeling in Excel or with any other tool for that matter is defined as the visual and/or mathematical representation of a set of relationships. Correlation is about defining the strength of a relationship. Between a model and correlation analysis, we are able to come much closer in replicating the true behavior and potential outcomes of the problem / question we are analyzing. Correlation is the bread and butter of any serious analyst seeking to analyze risk or gain insight into the future.

Given that correlation has such a big impact on the answers and analysis we are conducting, it therefore makes a lot of sense to cover how to apply correlation in the various simulation tools. Correlation is also a key tenement of time series forecasting…but that is another story.

In this article, we are going to build a simple correlated returns model using our usual suspects (Oracle Crystal Ball, Palisade @RISK , Vose ModelRisk and RiskSolver). The objective of the correlated returns model is to take into account the relationship (correlation) of how the selected asset classes move together. Does asset B go up or down when asset A goes up – and by how much? At the end of the day, correlating variables ensures your model will behave correctly and within the realm of the possible.

Copulas Vs. Correlation

The Cutting Edge - Eric Torkia

Jun 16 2011

1188

Copulas and Rank Order Correlation are two ways to model and/or explain the dependence between 2 or more variables. Historically used in biology and epidemiology, copulas have gained acceptance and prominence in the financial services sector.

In this article we are going to untangle what correlation and copulas are and how they relate to each other. In order to prepare a summary overview, I had to read painfully dry material… but the results is a practical guide to understanding copulas and when you should consider them. I lay no claim to being a stats expert or mathematician… just a risk analysis professional. So my approach to this will be pragmatic. Tools used for the article and demo models are Oracle Crystal Ball 11.1.2.1. and ModelRisk Industrial 4.0

Excel Simulation Show-Down Part 2: Distribution Fitting

The Cutting Edge - Eric Torkia

May 15 2011

119

One of the cool things about professional Monte-Carlo Simulation tools is that they offer the ability to fit data. Fitting enables a modeler to condensate large data sets into representative distributions by estimating the parameters and shape of the data as well as suggest which distributions (using these estimated parameters) replicates the data set best.

Fitting data is a delicate and very math intensive process, especially when you get into larger data sets. As usual, the presence of automation has made us drop our guard on the seriousness of the process and the implications of a poorly executed fitting process/decision. The other consequence of automating distribution fitting is that the importance of sound judgment when validating and selecting fit recommendations (using the Goodness-of-fit statistics) is forsaken for blind trust in the results of a fitting tool.

Now that I have given you the caveat emptor regarding fitting, we are going to see how each tools offers the support for modelers to make the right decisions. For this reason, we have created a series of videos showing comparing how each tool is used to fit historical data to a model / spreadsheet. Our focus will be on :

The goal of this comparison is to see how each tool handles this critical modeling feature. We have not concerned ourselves with the relative precision of fitting engines because that would lead us down a rabbit hole very quickly – particularly when you want to be empirically fair.

RESEARCH ARTICLES | RISK + CRYSTAL BALL + ANALYTICS

Categories

Discrete Distribution Fitting to Duke Basketball Scores, in Crystal Ball (1/8)

Share:

Rate article:

Tags

Comments

Perceptions and popularity of analytics technologies over time

Oracle Crystal Ball Spreadsheet Functions For Use in Microsoft Excel Models

Why are analytics so important for the virtual organization? Read these quotes.

Multi-Dimensional Portfolio Optimization with @RISK

Reducing Project Costs and Risks with Oracle Primavera Risk Analysis

Modeling Time-Series Forecasts with @RISK

The Need for Speed: A performance comparison of Crystal Ball, ModelRisk, @RISK and Risk Solver

A detailed comparison of the top Monte-Carlo Simulation Tools for Microsoft Excel

Excel Simulation Show-Down Part 3: Correlating Distributions

Copulas Vs. Correlation

Excel Simulation Show-Down Part 2: Distribution Fitting

RESEARCH ARTICLES | RISK + CRYSTAL BALL + ANALYTICS

Need Help?

Resources

More About TP