# Data Analysis & Simulation

## Archive for the ‘Excel’ Category

### Using Probability Distributions in Excel VBA

Monday, August 27th, 2012

Some time ago, we covered the use of probability distributions and related Excel worksheet functions available in EasyFitXL. When dealing with probability data in Excel, most of the time, you would use those functions to set up your calculations to be performed directly within your workbooks. This approach works well for applications where you need to perform typical probability analysis based on different input data: you modify the data, and Excel recalculates the entire worksheet and updates the associated results.

However, for more advanced applications, you might need to implement some complex logic requiring the use of IF statements, which will make your worksheets too complicated. Of course, you can still use the IF worksheet function, but in reality, you would want to keep your workbooks as simple as possible, which is a good idea if you want to easily get back to your analysis in a month. And that is where the built-in Visual Basic for Application programming language comes in handy: with little programming knowledge, using the VBA functions available in EasyFitXL as well as in the EasyFit SDK, you can create feature-rich probability analysis and Monte Carlo simulation applications implementing the logic of any degree of complexity.

Even though both EasyFitXL and the SDK include a variety of VBA functions, these software packages differ in the feature sets they offer. Initially, EasyFitXL was designed as an Excel add-in that brings the visual distribution fitting feature of EasyFit to Excel. Of course, we could not ignore the integration and data analysis automation capabilities of Excel, so we came up with the following ideology for EasyFitXL: visually fit distributions to data in Excel, and use the results in the most convenient way – either visually, in your worksheets, or in your VBA applications. That is why the VBA functions offered by EasyFitXL allow you to evaluate most common distribution functions (PDF, CDF etc.), calculate distribution statistics (mean, variance…), and generate random numbers from any probability distribution you choose as the model for your data.

On the other hand, the Simulation & Probabilistic Analysis SDK was designed from the ground up as the package targeting software developers and offering a complete range of functions covering the entire feature set of EasyFit. Apart from evaluating distribution functions, calculating statistics and generating random numbers, you can do distribution fitting, perform goodness of fit tests, and even create distribution graphs – all directly from your VBA applications.

Another huge difference is that technically, the SDK offers its functionality through a set of Objects, enabling you to use the object-oriented approach to software development, making your work with large projects more efficient. On the contrary, EasyFitXL employs the functional programming model, offering a separate VBA function for each kind of distribution function and each probability distribution, which is good for short and simple programs.

Overall, depending on your needs, you can use either EasyFitXL or the SDK to implement any kind of data analysis application, ranging from simple probability calculation programs to complex automated data analysis and Monte Carlo simulation systems.

### EasyFitXL Is Now Compatible With Excel 2010

Monday, July 12th, 2010

EasyFitXL – the distribution fitting add-in for Excel – was first introduced with the release of EasyFit 4.0 back in 2007. When designing EasyFitXL, we did a lot of research as to which Excel versions to support. At that time, the latest version of Excel was Excel 2007, which included some new useful features, such as the support for larger worksheets and multi-threaded worksheet recalculation capability. However, many customers were not rushing to upgrade to Excel 2007 because of it’s controversial Ribbon Interface, so we had to make EasyFitXL compatible with the previous version – Excel 2003.

According to some publicly available data, Excel 2002 and Excel 2000 still had a considerable user base, so we have made a decision to support these two older versions as well. As a result, EasyFitXL initially included support for Excel versions from 2000 through 2007, covering perhaps over 99% of all Excel installations in the world.

Last month Microsoft has released Excel 2010 which does not make a big difference in terms of data analysis, however, with its release we started receiving compatibility complaints from our customers, so we performed an in-depth testing and released an updated version of EasyFit (available for download).

### EasyFit 5.3 Released

Wednesday, January 20th, 2010

Recently a customer has contacted us and noted that the Inverse Cumulative Distribution Function (the Quantile Function) of the Inverse Gaussian distribution implemented in EasyFit works well for lambda=1902.1, mu=41857.0 and P=0.9, but fails for the same lambda & mu and P=0.99. Last week we have released an updated version of EasyFit that fixes the problem, and in this post we would like to elaborate more on the issue.

Evaluating the Inverse CDF of the Inverse Gaussian Model
Since the CDF of the Inverse Gaussian distribution is quite complicated (expressed in terms of the two Laplace Integrals), the Inverse CDF of this model is not available in closed form, and cannot be easily evaluated for a given set of distribution parameters. Initially, we have implemented an iterative approximation algorithm that evaluates the ICDF(P) using the CDF as well as the PDF to speed up the calculation. The algorithm itself works very well over a great range of input parameters, however, we have placed a limitation on how many iterations it is allowed to perform.

Because EasyFit is considered an interactive data analysis tool, we are always looking for a balance between the feature set and the performance, which is especially important when using EasyFit with Excel worksheets calculated in real time. The limitation on the number of iterations is necessary to make sure the algorithm doesn’t fall into an “infinite loop”, meaning the situation when it’s unable to reach the specified accuracy regardless of how long it continues to work. The problem usually happens when we are hitting the precision limitations of the computer’s CPU: in theory, the algorithm must converge in a limited number of steps, but in reality, it will just continue iterating over and over again without any accuracy improvements.

As a solution, we have made some improvements to the algorithm, making it more robust and efficient, so it now works with the same accuracy, but for a larger range of input parameters. For example, considering the parameters that initially caused the problem (lambda=1902.1 and mu=41857.0), the ICDF(P) can be evaluated for values of P up to 0.999925, which is more than enough for most statistical analysis applications.

Since this minor issue does not affect the accuracy of distribution fitting, you only need to upgrade if you are experiencing problems evaluating the Inverse CDF of the Inverse Gaussian distribution for P>0.9, otherwise EasyFit 5.2 will still work well for you.

### Using Distribution Functions in Excel Sheets

Tuesday, January 13th, 2009

There are many probability distributions developed by statisticians to model random data of different kinds, ranging from business data, finance data (stock prices) to engineering data (system failures) and environmental data (max. flood flows). While the standard Excel package includes some basic statistical functions, its support for probability distributions is very limited and almost useless for real world modelling applications. This article discusses the worksheet functions provided by EasyFitXL, the distribution fitting add-in for Excel that can be applied to perform a range of decision-making calculations using a variety of probability distributions… read the full article

### Using StatAssist – The Distribution Viewer Tool

Wednesday, November 19th, 2008

In EasyFit 3.0 – back in 2006 – we introduced StatAssist, the built-in distribution viewer tool that closely integrates with the distribution fitting features of EasyFit. Since then, StatAssist has proven to be quite a useful feature, so we decided to include it into EasyFitXL, our distribution fitting add-in for Excel.

StatAssist displays graphs and other useful properties of all the probability distributions available in EasyFit. Even though it has initially been designed as an essential part of EasyFit, StatAssist can be used as a separate application – for example, to take a quick look at a distribution curve, or to calculate the distribution statistics… read the full article

### Fitting Distributions in Excel

Tuesday, November 11th, 2008

Excel has become the de facto standard application for data analysis and presentation across a variety of industries, so if you deal with random data of any kind, chances are your data is stored in Excel workbooks. However, analyzing probability data in Excel can be tricky as the standard Excel package includes no facilities for fitting probability distributions to data. That is when EasyFitXL, the distribution fitting add-in for Excel, comes in handy… read the full article

### EasyFit 5.0 Coming Soon

Wednesday, September 24th, 2008

It’s been a while since we updated our blog – that’s because we have been working on a new version of EasyFit planned to be released later in October 2008. Below is a quick summary of new features in EasyFit.

(Note: The price for the new version is likely to increase, but since all licenses come with one year of free updates, you can still order EasyFit at the current price this month and upgrade to the upcoming new version for free.)

New Probability Distributions

EasyFit will include 15 new distributions:

The Burr (Singh-Maddala), Dagum (Inverse Burr), Pareto Type II (Lomax), and Log-Gamma distributions are mainly used in economics and actuarial science. We have had numerous requests for these models from some of our customers – actuaries and finance industry professionals.

The Generalized Gamma distribution is an advanced 4-parameter model including the Gamma, Weibull, and Half-Normal distributions as special cases.

The Pearson Type V (Inverse Gamma) and Pearson Type VI (Beta Prime) distributions are a part of the Pearson family of distributions often used in financial applications.

The Levy distribution is one of the few distributions that are stable and that have probability density functions that are analytically expressible, the others being the Normal distribution and the Cauchy distribution.

The Log-Pearson III (often referred to as “LP3”) distribution is the standard model recommended by the U.S. Water Resources Council for use by all U.S. Government agencies for flood frequency analysis.

The Nakagami (Nakagami-m) and Rice (Nakagami-n) distributions are used in communications theory to model signal fading under specific conditions.

The Kumaraswamy distribution is used in electrical, civil, mechanical, and financial engineering applications dealing with variables that are lower and upper bounded.

Some other new models include the Hyperbolic Secant, Error (Exponential Power), and Reciprocal distributions.

New Excel Functions

In EasyFit 4.0, we added the ability to analyze probability data in Excel, which has proven to be quite a useful feature. EasyFit 5.0 (Professional Edition) extends this functionality by introducing several new functions making it even easier to use distributions in your worksheet models.

The DistFit function allows to fit a specified distribution to your data and display the resulting model right in a worksheet. This function can be useful if you already know the underlying distribution (e.g. the Normal model) and only need to estimate the distribution parameters. The return value of this function can be supplied to several other new functions such as DistPdf, DistCdf, DistRand etc. For example, specifying

=DistCdf(DistFit(“Normal”; B1:B100); 2)

in a worksheet will fit the Normal distribution to the data in cells B1:B100, and then evaluate the CDF of the resulting Normal distribution at x=2.

### Random Numbers In Excel

Thursday, February 14th, 2008

A new article about generating random numbers in Excel is now available in the Tutorials section. The article describes how the standard Excel worksheet functions and the inverse transform method can used to generate random numbers from various distributions, discusses the drawbacks of this approach, and explains how the EasyFitXL add-in can be used to overcome the limitations of Excel.

### EasyFit 4.0 Released

Wednesday, November 28th, 2007

The new version of EasyFit introduces a number of features targeted at business users dealing with probability data analysis, modeling, and simulation. For a complete list of new features and fixes in EasyFit 4.0, please click here.

EasyFitXL – EasyFit for Excel

Many businesses use Excel for their data analysis & modeling needs, and even though Excel provides only a basic set of statistical functions, it can be easily extended by installing third party add-ins. An add-in is a piece of software that integrates into Excel and allows you to perform specific tasks which are impossible (or hard, time consuming etc.) to do using Excel alone.

EasyFitXL is a new Excel add-in which ships with EasyFit and allows to use most of the features available in EasyFit right in Excel. With EasyFitXL, you can fit distributions to worksheet data, view distribution graphs, perform probability calculations, and generate random numbers without the need to run EasyFit (the stand-alone application).

The installation process of EasyFitXL is very straightforward: if you have Excel (version 2000 or later) installed on your computer, the EasyFit setup application will automatically install and configure EasyFitXL to properly work with your copy of Excel.

Once EasyFitXL is installed, it can be accessed from the main menu of Excel:

In Excel 2007, it can be found on the Add-Ins page:

Along with the main menu commands, EasyFitXL provides a large number of functions which you can use in Excel worksheets and VBA applications to create advanced models dealing with uncertainty. The worksheet functions are divided into three large categories displayed in the Function Wizard:

The following worksheet and VBA functions are available for more than 40 probability distributions:

• PDF (probability density function)
• CDF (cumulative distribution function)
• inverse CDF (quantile function)
• hazard function (failure rate in reliability analysis)
• distribution mean/variance/standard deviation
• random number generation

These functions can be applied to solve a wide array of problems, ranging from simple probability calculations to Monte Carlo simulation.

EasyFit: select the best fitting distribution and use it to make better decisions. learn more