# Data Analysis & Simulation

## Archive for the ‘Distributions’ Category

### Understanding the Log-Normal and Log-Gamma Distributions Parameterizations

Tuesday, September 11th, 2012

Naturally, when dealing with some particular probability distribution that fits to many of your data sets well, one day you will want to learn more about that distribution. What is so specific to it that it works for your data? And, moreover, how can you interpret the distribution parameters?

The good news is: for many probability distributions, the meaning of their parameters is described in the scientific literature. The classic example is the Normal distribution having two parameters: σ (scale) and μ (location). The parameterization of this distribution is pretty easy to understand: as you change the location parameter, the probability density graph moves along the x-axis, while changing the scale parameter affects how wide or narrow the graph is:

However, for quite a few of distributions, modifying the parameters and observing how the graphs change will be of little help for your undetstanding of what those parameters indicate. One of such distributions is the Lognormal model defined as “a continuous probability distribution of a random variable whose logarithm is normally distributed.” What does that mean exactly? For better understanding, compare the CDFs of the Normal and Lognormal distributions:

Normal distribution CDF

Lognormal distribution CDF

As you can see, the Normal model is “included” into the Lognormal in such a way that in the Lognormal model, ln(x) has the Normal distribution with the same parameters (σ, μ) as the original Lognormal distribution. And this is the key point to understand: the parameters of the Lognormal model are not the “pure” scale and location (pretty intuitive in the Normal model), but rather the scale and location of the included Normal distribution.

The same logic applies to the Gamma and Log-Gamma pair of distributions. The classical Gamma has two parameters: α (shape), β (scale), and the Gamma CDF is as follows:

Gamma distribution CDF

The shape parameter indicates the form of the Gamma PDF graph, while the scale factor affects the spread of the curve. Similarly to the Gamma model, the Log-Gamma distribution has two parameters with the same names (α, β), but its CDF has the form:

Log-Gamma distribution CDF

Just like with the Normal & Lognormal analogy, in the Log-Gamma model, ln(x) has the Gamma distribution with the same parameters (α, β) which cannot be treated as the “pure” Log-Gamma shape and scale, but the shape and scale of the included Gamma model.

There are dozens of different probability distributions out there, and even if you use only a couple of them on a daily basis, sometimes it can be hard to remember the meaning of all the parameters. That is why we decided to include a little feature in EasyFit that helps you keep your memory fresh: when moving the mouse pointer over a distribution parameter edit box, EasyFit displays a pop-up hint indicating the meaning of that particular parameter:

Distribution parameter hint

Using this feature, you can better focus on your core analysis rather than the technical details like the ones outlined in this article.

### EasyFit Used for Probabilistic Currency Forecasting

Monday, February 21st, 2011

Because risk and uncertainty are a part of literally all areas of our life, with the finance being one of the most important areas, scientifically based risk management methods are gaining more and more popularity among the finance industry professionals. Currency fluctuations affect all businesses dealing with multiple currencies, so having at least some degree of certainty about the future exchange rates can be a significant success factor for any international enterprise. A wide range of currency forecasting methods have been developed, however, not many of them can pretend to be reliable in the long run: most algorithms only work for a short period of time, and need to be tweaked as the market conditions change.

Brijen Hathi, a Research Fellow at the Planetary & Space Sciences Research Institue, performs his own research in the field and publishes the results in the Currency Forecasting Blog. The forecasting methodology employed by Mr. Hathi is in part based on the same techniques used in probabilistic risk analysis. Like with most modern forecasting methods, in this approach, he uses historical data to predict the future, but the big difference here is that he also assigns specific probabilities to the predictions. For example, for a US-based company doing business in the UK, it doesn’t really matter what the exact GBP/USD exchange rate is going to be during the next 30 days, as long as it stays within a specific interval with a high probability (95% or more). Recently Mr. Hathi has published an article highlighting the use of EasyFit to model pricing probability of the Pound Sterling versus the US Dollar from historical data. It is fascinating to see how EasyFit is being used in (what we believe) a truly scientific approach to data analysis, and we hope to see new developments in this area soon.

### Simulation & Probabilistic Analysis SDK Released

Thursday, December 3rd, 2009

The beta testing of our new product, the Simulation & Probabilistic Analysis SDK, is now over, and we want to thank our beta testers for their effort and valuable feedback. One of the most exciting things is that during the beta testing phase, we have not detected any bugs in the SDK, indicating the initial high quality of the product.

The production version of the SDK is now available for public download, so if you are a software developer and need to add distribution fitting or simulation features to your software with no hassle, feel free to download the fully functional version of the SDK and try it free for 30 days.

Another good news is that the Christmas is coming, so we decided to make a gift to software developers who are considering to purchase the SDK: until the end of December, the SDK Developer License can be ordered at a \$500 discount – click here details.

### Simulation & Probabilistic Analysis SDK Available for Public Beta Testing

Monday, October 12th, 2009

The new Software Development Kit enabling you to easily add Monte Carlo simulation and distribution analysis features to your applications is now available for public beta testing – please download the free beta version and take a look at the code examples (available in several languages, including C#, VB.NET, C++, and Visual Basic for Applications).

We would be glad to receive any feedback, questions or suggestions from you, so please feel free to drop us a line and let us know what you think about this product and how we can make it better.

### EasyFit Used by NASA to Improve Monte Carlo Risk Simulation Models

Tuesday, February 10th, 2009

On April 17 2005, the Millstone nuclear generating plant in Connecticut shut down when a circuit board monitoring a steam pressure line short-circuited. “Tin whiskers” – microscopic growths of the metal from soldering points on a circuit board – were blamed for causing the problem. These whiskers are comprised of nearly pure tin, and are therefore electrically conductive.

Field failures attributable to tin whiskers have cost individual programs many millions of dollars each. As a result, manufacturers of high-reliability systems are forced to use Monte Carlo simulation models to decide whether the use of tin poses an acceptable risk in a given application.

Recently a group of NASA scientists lead by Karim J. Courey, a Principal Engineer with the Orbiter Sustaining Engineering Office, Lyndon B. Johnson Space Center, used our distribution fitting software EasyFit to better understand the underlying process and develop a probability model that can be used to improve existing Monte Carlo risk simulation models… read the full case study

### Using Distribution Functions in Excel Sheets

Tuesday, January 13th, 2009

There are many probability distributions developed by statisticians to model random data of different kinds, ranging from business data, finance data (stock prices) to engineering data (system failures) and environmental data (max. flood flows). While the standard Excel package includes some basic statistical functions, its support for probability distributions is very limited and almost useless for real world modelling applications. This article discusses the worksheet functions provided by EasyFitXL, the distribution fitting add-in for Excel that can be applied to perform a range of decision-making calculations using a variety of probability distributions… read the full article

### Using StatAssist – The Distribution Viewer Tool

Wednesday, November 19th, 2008

In EasyFit 3.0 – back in 2006 – we introduced StatAssist, the built-in distribution viewer tool that closely integrates with the distribution fitting features of EasyFit. Since then, StatAssist has proven to be quite a useful feature, so we decided to include it into EasyFitXL, our distribution fitting add-in for Excel.

StatAssist displays graphs and other useful properties of all the probability distributions available in EasyFit. Even though it has initially been designed as an essential part of EasyFit, StatAssist can be used as a separate application – for example, to take a quick look at a distribution curve, or to calculate the distribution statistics… read the full article

### Fitting Distributions in Excel

Tuesday, November 11th, 2008

Excel has become the de facto standard application for data analysis and presentation across a variety of industries, so if you deal with random data of any kind, chances are your data is stored in Excel workbooks. However, analyzing probability data in Excel can be tricky as the standard Excel package includes no facilities for fitting probability distributions to data. That is when EasyFitXL, the distribution fitting add-in for Excel, comes in handy… read the full article

### EasyFit Used for Environmental Fate and Risk Assessment

Tuesday, November 4th, 2008

Since 1991, the European Union has been promoting the use of numerical models to assess the environmental fate and risk of pesticides. Recently a group of scientists from the Catholic University and the Marche Polytechnic University (Italy) in association with Informatica Ambientale, the Milan-based research and computer science company, developed a tool that integrates one of the pesticide fate models with GIS software. Several distribution fitting software products were tested to introduce distribution functions in the risk assessment study, and EasyFit was selected as the most appropriate tool for analyzing annual mean pesticide concentration and determining the most suitable distribution… read the full case study

### EasyFit 5.0 Released

Monday, October 27th, 2008