# Data Analysis & Simulation

## Archive for the ‘EasyFitXL’ Category

### Using Probability Distributions in Excel VBA

Monday, August 27th, 2012

Some time ago, we covered the use of probability distributions and related Excel worksheet functions available in EasyFitXL. When dealing with probability data in Excel, most of the time, you would use those functions to set up your calculations to be performed directly within your workbooks. This approach works well for applications where you need to perform typical probability analysis based on different input data: you modify the data, and Excel recalculates the entire worksheet and updates the associated results.

However, for more advanced applications, you might need to implement some complex logic requiring the use of IF statements, which will make your worksheets too complicated. Of course, you can still use the IF worksheet function, but in reality, you would want to keep your workbooks as simple as possible, which is a good idea if you want to easily get back to your analysis in a month. And that is where the built-in Visual Basic for Application programming language comes in handy: with little programming knowledge, using the VBA functions available in EasyFitXL as well as in the EasyFit SDK, you can create feature-rich probability analysis and Monte Carlo simulation applications implementing the logic of any degree of complexity.

Even though both EasyFitXL and the SDK include a variety of VBA functions, these software packages differ in the feature sets they offer. Initially, EasyFitXL was designed as an Excel add-in that brings the visual distribution fitting feature of EasyFit to Excel. Of course, we could not ignore the integration and data analysis automation capabilities of Excel, so we came up with the following ideology for EasyFitXL: visually fit distributions to data in Excel, and use the results in the most convenient way – either visually, in your worksheets, or in your VBA applications. That is why the VBA functions offered by EasyFitXL allow you to evaluate most common distribution functions (PDF, CDF etc.), calculate distribution statistics (mean, variance…), and generate random numbers from any probability distribution you choose as the model for your data.

On the other hand, the Simulation & Probabilistic Analysis SDK was designed from the ground up as the package targeting software developers and offering a complete range of functions covering the entire feature set of EasyFit. Apart from evaluating distribution functions, calculating statistics and generating random numbers, you can do distribution fitting, perform goodness of fit tests, and even create distribution graphs – all directly from your VBA applications.

Another huge difference is that technically, the SDK offers its functionality through a set of Objects, enabling you to use the object-oriented approach to software development, making your work with large projects more efficient. On the contrary, EasyFitXL employs the functional programming model, offering a separate VBA function for each kind of distribution function and each probability distribution, which is good for short and simple programs.

Overall, depending on your needs, you can use either EasyFitXL or the SDK to implement any kind of data analysis application, ranging from simple probability calculation programs to complex automated data analysis and Monte Carlo simulation systems.

### EasyFit Used for Probabilistic Currency Forecasting

Monday, February 21st, 2011

Because risk and uncertainty are a part of literally all areas of our life, with the finance being one of the most important areas, scientifically based risk management methods are gaining more and more popularity among the finance industry professionals. Currency fluctuations affect all businesses dealing with multiple currencies, so having at least some degree of certainty about the future exchange rates can be a significant success factor for any international enterprise. A wide range of currency forecasting methods have been developed, however, not many of them can pretend to be reliable in the long run: most algorithms only work for a short period of time, and need to be tweaked as the market conditions change.

Brijen Hathi, a Research Fellow at the Planetary & Space Sciences Research Institue, performs his own research in the field and publishes the results in the Currency Forecasting Blog. The forecasting methodology employed by Mr. Hathi is in part based on the same techniques used in probabilistic risk analysis. Like with most modern forecasting methods, in this approach, he uses historical data to predict the future, but the big difference here is that he also assigns specific probabilities to the predictions. For example, for a US-based company doing business in the UK, it doesn’t really matter what the exact GBP/USD exchange rate is going to be during the next 30 days, as long as it stays within a specific interval with a high probability (95% or more). Recently Mr. Hathi has published an article highlighting the use of EasyFit to model pricing probability of the Pound Sterling versus the US Dollar from historical data. It is fascinating to see how EasyFit is being used in (what we believe) a truly scientific approach to data analysis, and we hope to see new developments in this area soon.

### EasyFitXL Is Now Compatible With Excel 2010

Monday, July 12th, 2010

EasyFitXL – the distribution fitting add-in for Excel – was first introduced with the release of EasyFit 4.0 back in 2007. When designing EasyFitXL, we did a lot of research as to which Excel versions to support. At that time, the latest version of Excel was Excel 2007, which included some new useful features, such as the support for larger worksheets and multi-threaded worksheet recalculation capability. However, many customers were not rushing to upgrade to Excel 2007 because of it’s controversial Ribbon Interface, so we had to make EasyFitXL compatible with the previous version – Excel 2003.

According to some publicly available data, Excel 2002 and Excel 2000 still had a considerable user base, so we have made a decision to support these two older versions as well. As a result, EasyFitXL initially included support for Excel versions from 2000 through 2007, covering perhaps over 99% of all Excel installations in the world.

Last month Microsoft has released Excel 2010 which does not make a big difference in terms of data analysis, however, with its release we started receiving compatibility complaints from our customers, so we performed an in-depth testing and released an updated version of EasyFit (available for download).

### EasyFit 5.3 Released

Wednesday, January 20th, 2010

Recently a customer has contacted us and noted that the Inverse Cumulative Distribution Function (the Quantile Function) of the Inverse Gaussian distribution implemented in EasyFit works well for lambda=1902.1, mu=41857.0 and P=0.9, but fails for the same lambda & mu and P=0.99. Last week we have released an updated version of EasyFit that fixes the problem, and in this post we would like to elaborate more on the issue.

Evaluating the Inverse CDF of the Inverse Gaussian Model
Since the CDF of the Inverse Gaussian distribution is quite complicated (expressed in terms of the two Laplace Integrals), the Inverse CDF of this model is not available in closed form, and cannot be easily evaluated for a given set of distribution parameters. Initially, we have implemented an iterative approximation algorithm that evaluates the ICDF(P) using the CDF as well as the PDF to speed up the calculation. The algorithm itself works very well over a great range of input parameters, however, we have placed a limitation on how many iterations it is allowed to perform.

Because EasyFit is considered an interactive data analysis tool, we are always looking for a balance between the feature set and the performance, which is especially important when using EasyFit with Excel worksheets calculated in real time. The limitation on the number of iterations is necessary to make sure the algorithm doesn’t fall into an “infinite loop”, meaning the situation when it’s unable to reach the specified accuracy regardless of how long it continues to work. The problem usually happens when we are hitting the precision limitations of the computer’s CPU: in theory, the algorithm must converge in a limited number of steps, but in reality, it will just continue iterating over and over again without any accuracy improvements.

As a solution, we have made some improvements to the algorithm, making it more robust and efficient, so it now works with the same accuracy, but for a larger range of input parameters. For example, considering the parameters that initially caused the problem (lambda=1902.1 and mu=41857.0), the ICDF(P) can be evaluated for values of P up to 0.999925, which is more than enough for most statistical analysis applications.