Excel Simulation Show-Down Part 2: Distribution Fitting

Eric TorkiaSunday, May 15, 2011

Rate article:

5.0

Rate this article:

5.0

One of the cool things about professional Monte-Carlo Simulation tools is that they offer the ability to fit data. Fitting enables a modeler to condensate large data sets into representative distributions by estimating the parameters and shape of the data as well as suggest which distributions (using these estimated parameters) replicates the data set best.

Fitting data is a delicate and very math intensive process, especially when you get into larger data sets. As usual, the presence of automation has made us drop our guard on the seriousness of the process and the implications of a poorly executed fitting process/decision. The other consequence of automating distribution fitting is that the importance of sound judgment when validating and selecting fit recommendations (using the Goodness-of-fit statistics) is forsaken for blind trust in the results of a fitting tool.

Now that I have given you the caveat emptor regarding fitting, we are going to see how each tools offers the support for modelers to make the right decisions. For this reason, we have created a series of videos showing comparing how each tool is used to fit historical data to a model / spreadsheet. Our focus will be on :

The goal of this comparison is to see how each tool handles this critical modeling feature. We have not concerned ourselves with the relative precision of fitting engines because that would lead us down a rabbit hole very quickly – particularly when you want to be empirically fair.

Things to know about Goodness-of-fit

All four packages use MLE (Maximum Likelihood Estimation Calculations) to estimate distribution parameters which are then ranked using various algorithms. Crystal Ball, @Risk and Risk Solver all use the classic trio of Anderson-Darling, Kolmogorov-Smirnoff or Chi-Square.

Vose on the other-hand use Schwartz, Akaike and Hannan-Quinn information criterion models. According to Vose, the advantage of moving away from the Classic trio (AD, KS ChiSqr) is that

"Although still popular today, the Chi-Squared, Kolmogorov-Smirnoff and Anderson-Darling goodness of fit statistics are technically all inappropriate as a method of comparing fits of distributions to data. They are also limited to having precise observations and cannot incorporate censored, truncated or binned data. Realistically, most of the time we are fitting a continuous distribution to a set of precise observations and then the Anderson-Darling does a reasonable job.

For important work you should instead consider using statistical measures of fit called information criteria.

Let:

n = number of observations (e.g. data values, frequencies)
k = number of parameters to be estimated (e.g. the Normal distribution has 2: mu and sigma)
Lmax = the maximized value of the log-Likelihood for the estimated model (i.e. fit the parameters by MLE and record the natural log of the Likelihood.)

SIC (Schwarz information criterion, aka Bayesian information criterion BIC)

AIC (Akaike information criterion)

HQIC (Hannan-Quinn information criterion)

The aim is to find the model with the lowest value of the selected information criterion. The -2ln[Lmax] term appearing in each formula is an estimate of the deviance of the model fit. The coefficients for k in the first part of each formula shows the degree to which the number of model parameters is being penalised. For n > ~ 20 or so the SIC (Schwarz, 1997) is the strictest in penalizing loss of degree of freedom by having more parameters in the fitted model. For n > ~ 40 the AIC (Akaike, 1974, 1976) is the least strict of the three and the HQIC (Hannan and Quinn, 1979) holds the middle ground, or is the least penalizing for n < ~ 20." - Vose Software, 2011

Schwartz E.S (1997). The Stochastic Behavior of Commodity Prices: Implications for Valuation and Hedging'. J Finance 52(3) Papers and Proceedings Fifty-Seventh Annual Meeting, American Finance Association, New Orleans, Louisiana January 4-6, (July 1997), 923-973.
Akaike, 1974. A new look at the statistical model identification. IEEE Trans. Automat. Control. vAC-19. 716-723.
Hannan, E.J. and Quin, G.G. (1979). The determination of the order of an autoregression. J.R. Statistic. Soc. B, 41, 190-195.

In this comparison, we are not going to worry about the effectiveness of the various ranking algorithms, but more on the usability and effectiveness of each tool to get the job done.

Situations where distribution fitting is very important.

The applications and uses of fitting fall essentially into two categories:

Incorporation of real data into model assumptions/inputs.

When available, modelers should try to use as much historical data as possible. Granted, the data must be analyzed and cleansed first. Examples of historical data you can fit include:

Commodity Prices
Time to repair or Time to failure
Interest rates
Inventory Levels
Supply / Demand
Sales Volumes
Etc.

Model Consolidation and Execution Performance.

According to modern practice, if you need to run a large sub-model to generate an input, you can replicate the entire sub-model as a representative distribution. In sum, a sub-model can be condensed into a distribution, either parametric or non-parametric.

This has great benefits for improving the performance of large scale models because it a.) simplifies the model logic and b.) it removes a lot of moving parts, which would consume resources and increase execution time. In order to extract the parameters or fit the output of the sub-model into a distribution, distribution fitting capabilities are a necessity.
There are 4 different approaches for keeping your model fast and tidy using the best practice listed above.

	Crystal Ball	ModelRisk Pro	@Risk Pro	Risk Solver Pro
Approach for consolidating a model.	Model outputs can be fitted automatically and are inserted as parametric distributions (inputs) Inputs can be generated using the raw data into a custom distribution (ogive)	ModelRisk works with a different framework… they use an object model to consolidate datasets for use by Vose functions.	Model outputs can be fitted automatically and are inserted as parametric distributions (inputs)	RiskSolver Pro is able to fit outputs but there does not seem to be a way to paste them into a separate cell. On the other hand, if you have the data, the entire set of trials is condensed into a DIST function. This is a portable approach to sharing a model output.
Dynamic fitting	Crystal Ball will maintain a link between the data and its non-parametric distributions. Parametric distributions need to be refitted manually (or via macro) every time the data changes.	ModelRisk uses fitting objects that link into a data object (this is where the dataset is and it is represented by a formula in Excel) and refit dynamically. However if the nature of the data changes enough, you may need to refit manually anyways to ensure you are using the right fitting object.	@Risk Pro will automatically refit an entire dataset (range) each time the simulation is run. Very interesting feature to be used with caution for the reasons stated above.	None at this time

PART 1: Introduction and Oracle Crystal Ball 11.1.2.1. Distribution Fitting

Overall, Crystal Ball has a solid and fast data fitting tool. It is easy to use and you can import flat file from systems. We had no trouble running the tool and generating all the fits with a 150 000 points of data. I correctly picked up the data’s distribution; this would be an issue with 150k points.

Potentially, the addition of PP and QQ charts might be interesting as well as the ability to overlay multiple fits. I would also suggest that the addition of additional fitting algorithms would greatly compliment the tool.

Loading the player ...

Key Take Aways:

Fast and effective tool
Uses the conventional AD/KS/C2 ranking methods
Shows all the rankings side by side and uses effective color coding to identify the best empirical fit.
You can fit model outputs and generate distributions right from the output dialog box.
You can maintain a dynamic link with non-parametric data (custom distributions)

PART 2: ModelRisk Professional 4.0 Distribution Fitting

As usual, Vose Software has staked out some unique territory with their distinctive choice of fitting methods. The tool itself is elegant and from an anecdotal perspective, quite precise with data sets that are far less than 150k. However, like in anything too much choice increases processing time. Think about how long it takes to make up your mind when your favorite restaurant has a huge menu. If you show up with a fair idea of what you are going to eat, you spend much less time thinking about what to order. Even though ModelRisk does a good job pre-selecting potential fits, be aware that a little expert judgment upfront to further filter out distributions will save processing time…

Loading the player ...

Key Take Aways:

Very effective tool… can get bogged down fitting a few of the distributions
Very comprehensive and information rich interface
Uses the unconventional AIC/SIC/HQIC ranking methods
Shows all the rankings side by side for fit comparison – color coding top ranked fits would be nice
Objects (both data and fitting) are used to keep dynamic data fits current.

PART 3: Palisade @RISK Professional 5.7 Distribution Fitting

Originally Palisade sold its fitting package as a separate tool altogether known as “Best Fit”. This is a solid tool that offers most of the power and speed you would find with Crystal Ball with some added charts, including P-P and Q-Q. Where the tool would be vastly improved is in its ability to contrast or “triangulate” a proper fit. Another issue is the fact that the tool has a 100k datapoint import limit. This was more than required in the days where you had 256 columns and 64k rows. Given the ever increasing availability and quantity of data, it would make a lot of sense to tear down that barrier.

Loading the player ...

Key Take Aways:

Fast and effective tool
Uses the conventional AD/KS/C2 ranking methods
Does not show the rankings side by side, forces you to try and remember who ranked where.
You can fit model outputs and generate distributions right from the output dialog box.
You can automatically refit a data set at the beginning of each simulation.

PART 4: Frontline Risk Solver Professional 11.0 Distribution Fitting

Frontline’s Risk Solver package gets the job done. They have some really nice graphics and the comparative overlay of proposed fits is cool. They are also quite innovative by incorporating DIST technology into their tool. This has a similar effect to Vose’s distribution objects. Unfortunately this will not update automatically if the data changes, but will definitely make your outputs portable.

Loading the player ...

For those who are curious about DIST functions, here is the code that resides in 1 cell:

< dist ver="1.100000" type="Double" count="39" max="1021.71" min="974.008" avg="1000.37" name="test">ixq2FI7dH8NgyJVJYvmRwP//keyNIK16igznK1ckbNfiluz0q3Jq
Hlngre+jlnbSAADH/Z9Xc8WUmb2RYFSplWb1dzV0uNj+sUpE30/cAA

Key Take Aways:

Effective tool that seems to suffer from performance issues on large data sets. It appears that it dynamically recalcs the charts (i.e. PP or QQ)… so this becomes an issue with large data sets.
The workflow to fit and insert the fitted distribution is a little confusing… A few explicit buttons would make it a little more intuitive.
Uses the conventional AD/KS/C2 ranking methods
Enables comparative overlay of fits… this is a cool feature with lots more potential.
Does not show the rankings side by side, forces you to try and remember who ranked where.
You can fit model outputs and generate distributions parameters right from the output dialog box but you can’t insert them elsewhere in your model.

PART 5: Conclusion and Comparison

With enough data, all the tools will do the job right. It’s hard to ignore Vose’s introduction of new Goodness-of-fit ranking methods – they are unique. Ranking methods notwithstanding, Performance, Speed and Information/Reporting to assess the fit should be primary considerations.

ModelRisk Vs. Crystal Ball Vs. @Risk Vs. Risk Solver

If you would like to try any of these packages, with the exception of Solver (www.solver.com), you can download them from our Downloads sections and give them a whirl. You can also download the test model and compare for yourself.

Key Take Aways:

Crystal Ball Standard is fast, efficient and robust for distribution fitting. Their implementation of method triangulation is a really useful tool. The ability to fit output distributions or export them as non-parametric is a very useful and well implemented feature.
ModelRisk Professional is very fast to load large data sets and has a very comprehensive & sophisticated interface. This tool requires judgment in preselecting the proper distributions, otherwise you may still be testing 50 different distributions! Dynamic fitting with objects is a nice way to keep you model up to date when the underlying data changes.
@RISK Professional’s fitting tool is fast and seems quite robust. The ability to fit output distributions from the dialog box is nice and works well. The only unfortunate part is that it has a theoretical limit of 100k data points. For most applications this should not be an issue. Another cool feature is the ability to completely refit a dataset… as with many other things in the risk analysis world – cool, but handle with care.
Risk Solver Professional works well and has a nice interface. Speed on large data sets is a bit of an issue. Some of the usability processes may need to be revisited but one of the usability features I do appreciate is the call-out that follows the mouse asking you where to insert your fit results.

Next feature on the block will be Correlation. If you have questions or suggestions for future comparisons, please don’t hesitate to call or drop me a line at [email protected]

Comments

Collapse Expand Comments (0)

You don't have permission to post comments.

Perceptions and popularity of analytics technologies over time

The Cutting Edge - Eric Torkia

Oct 18 2017

Oracle Crystal Ball Spreadsheet Functions For Use in Microsoft Excel Models

The Cutting Edge - Eric Torkia

May 20 2014

545

Oracle Crystal Ball has a complete set of functions that allows a modeler to extract information from both inputs (assumptions) and outputs (forecast). Used the right way, these special Crystal Ball functions can enable a whole new level of analytics that can feed other models (or subcomponents of the major model).

Understanding these is a must for anybody who is looking to use the developer kit.

Why are analytics so important for the virtual organization? Read these quotes.

The Cutting Edge - Eric Torkia

Jun 26 2013

Since the mid-1990s academics and business leaders have been striving to focus their businesses on what is profitable and either partnering or outsourcing the rest. I have assembled a long list of quotes that define what a virtual organization is and why it's different than conventional organizations. The point of looking at these quotes is to demonstrate that none of these models or definitions can adequately be achieved without some heavy analytics and integration of both IT (the wire, the boxes and now the cloud's virtual machines) and IS - Information Systems (Applications) with other stakeholder systems and processes. Up till recently it could be argued that these things can and could be done because we had the technology. But the reality is, unless you were an Amazon, e-Bay or Dell, most firms did not necessarily have the money or the know-how to invest in these types of inovations.

With the proliferation of cloud services, we are finding new and cheaper ways to do things that put these strategies in the reach of more managers and smaller organizations. Everything is game... even the phone system can be handled by the cloud. Ok, I digress, Check out the following quotes and imagine being able to pull these off without analytics.

The next posts will treat some of the tools and technologies that are available to make these business strategies viable.

Multi-Dimensional Portfolio Optimization with @RISK

The Cutting Edge - Eric Torkia

Jun 28 2012

Many speak of organizational alignment, but how many tell you how to do it? Others present only the financial aspects of portfolio optimization but abstract from how this enables the organization to meets its business objectives. We are going to present a practical method that enables organizations to quickly build and optimize a portfolio of initiatives based on multiple quantitative and qualitative dimensions: Revenue Potential, Value of Information, Financial & Operational Viability and Strategic Fit.

This webinar is going to present these approaches and how they can be combined to improve both tactical and strategic decision making. We will also cover how this approach can dramatically improve organizational focus and overall business performance.

We will discuss these topics as well as present practical models and applications using @RISK.

Reducing Project Costs and Risks with Oracle Primavera Risk Analysis

The Cutting Edge - Eric Torkia

May 04 2012

It is a well-known fact that many projects fail to meet some or all of their objectives because some risks were either: underestimated, not quantified or unaccounted for. It is the objective of every project manager and risk analysis to ensure that the project that is delivered is the one that was expected. With the right know-how and the right tools, this can easily be achieved on projects of almost any size. We are going to present a quick primer on project risk analysis and how it can positively impact the bottom line. We are also going to show you how Primavera Risk Analysis can quickly identify risks and performance drivers that if managed correctly will enable organizations to meet or exceed project delivery expectations.

Modeling Time-Series Forecasts with @RISK

The Cutting Edge - Eric Torkia

Oct 18 2011

Making decisions for the future is becoming harder and harder because of the ever increasing sources and rate of uncertainty that can impact the final outcome of a project or investment. Several tools have proven instrumental in assisting managers and decision makers tackle this: Time Series Forecasting, Judgmental Forecasting and Simulation.

This webinar is going to present these approaches and how they can be combined to improve both tactical and strategic decision making. We will also cover the role of analytics in the organization and how it has evolved over time to give participants strategies to mobilize analytics talent within the firm.

We will discuss these topics as well as present practical models and applications using @RISK.

The Need for Speed: A performance comparison of Crystal Ball, ModelRisk, @RISK and Risk Solver

The Cutting Edge - Eric Torkia

Sep 20 2011

132

A detailed comparison of the top Monte-Carlo Simulation Tools for Microsoft Excel

There are very few performance comparisons available when considering the acquisition of an Excel-based Monte Carlo solution. It is with this in mind and a bit of intellectual curiosity that we decided to evaluate Oracle Crystal Ball, Palisade @Risk, Vose ModelRisk and Frontline Risk Solver in terms of speed, accuracy and precision. We ran over 20 individual tests and 64 million trials to prepare comprehensive comparison of the top Monte-Carlo Tools.

Excel Simulation Show-Down Part 3: Correlating Distributions

The Cutting Edge - Eric Torkia

Aug 19 2011

196

Escel Simulation Showdown Part 3: Correlating Distributions Modeling in Excel or with any other tool for that matter is defined as the visual and/or mathematical representation of a set of relationships. Correlation is about defining the strength of a relationship. Between a model and correlation analysis, we are able to come much closer in replicating the true behavior and potential outcomes of the problem / question we are analyzing. Correlation is the bread and butter of any serious analyst seeking to analyze risk or gain insight into the future.

Given that correlation has such a big impact on the answers and analysis we are conducting, it therefore makes a lot of sense to cover how to apply correlation in the various simulation tools. Correlation is also a key tenement of time series forecasting…but that is another story.

In this article, we are going to build a simple correlated returns model using our usual suspects (Oracle Crystal Ball, Palisade @RISK , Vose ModelRisk and RiskSolver). The objective of the correlated returns model is to take into account the relationship (correlation) of how the selected asset classes move together. Does asset B go up or down when asset A goes up – and by how much? At the end of the day, correlating variables ensures your model will behave correctly and within the realm of the possible.

Copulas Vs. Correlation

The Cutting Edge - Eric Torkia

Jun 16 2011

1188

Copulas and Rank Order Correlation are two ways to model and/or explain the dependence between 2 or more variables. Historically used in biology and epidemiology, copulas have gained acceptance and prominence in the financial services sector.

In this article we are going to untangle what correlation and copulas are and how they relate to each other. In order to prepare a summary overview, I had to read painfully dry material… but the results is a practical guide to understanding copulas and when you should consider them. I lay no claim to being a stats expert or mathematician… just a risk analysis professional. So my approach to this will be pragmatic. Tools used for the article and demo models are Oracle Crystal Ball 11.1.2.1. and ModelRisk Industrial 4.0