Data Analysis & Simulation

Archive for the ‘EasyFit’ Category


Probabilistic Analysis vs Time Series Analysis

Wednesday, December 14th, 2011

Can EasyFit be used to analyze time series data? To answer this question we recently received from a customer, we will shed some light on the differences between the probabilistic analysis and time series analysis.

When dealing with time series data, you usually have as an input a set of (time, value) data pairs indicating the consecutive measurements taken at equally spaced time intervals. The goal of time series analysis is to identify the nature of the process represented by your data, and use it to forecast the future values of the time series being analyzed.

A widespread application of such an analysis is weather forecasting: for more than a century, hundreds of weather stations around the world record various important parameters such as the air temperature, wind speed, precipitation, snowfall etc. Based on these data, scientists build models reflecting seasonal weather changes (depending on the time of the year) as well as the global trends – for example, temperature change during the last 50 years. These models are used to provide weather forecast for government and commercial organizations. In a typical forecast, the predicted values are not assigned probability: “In May, the maximum daily air temperature is expected to be 22 degrees Celsius.”

In contrast to the predictions based on time series analysis, when performing probabilistic analysis, you get not just a single value as a forecast, but a probabilistic model that accounts for uncertainty. In this scenario, you would obtain a continuous range of values and assigned probabilities. Of course, for real world applications, it is more practical to deal with specific values, so the probabilistic models are used to obtain predictions at fixed probability levels. Considering the above example, a forecast might look like: “In May, the maximum daily air temperature will be 22 degrees Celsius with 95% probability.”

So can distribution fitting be useful when analyzing time series data? The answer depends on the goals of your analysis – i.e. what kind of information you want to derive from your data. If you want to understand the connection between the predicted values and the probability, you should fit distributions to your data (just keep in mind that in this case the “time” variable will be unused). On the other hand, if you need to identify seasonal patterns or global trends in your data, you should go with the “classical” time series analysis methods.

EasyFit Used for Probabilistic Currency Forecasting

Monday, February 21st, 2011

Because risk and uncertainty are a part of literally all areas of our life, with the finance being one of the most important areas, scientifically based risk management methods are gaining more and more popularity among the finance industry professionals. Currency fluctuations affect all businesses dealing with multiple currencies, so having at least some degree of certainty about the future exchange rates can be a significant success factor for any international enterprise. A wide range of currency forecasting methods have been developed, however, not many of them can pretend to be reliable in the long run: most algorithms only work for a short period of time, and need to be tweaked as the market conditions change.

Brijen Hathi, a Research Fellow at the Planetary & Space Sciences Research Institue, performs his own research in the field and publishes the results in the Currency Forecasting Blog. The forecasting methodology employed by Mr. Hathi is in part based on the same techniques used in probabilistic risk analysis. Like with most modern forecasting methods, in this approach, he uses historical data to predict the future, but the big difference here is that he also assigns specific probabilities to the predictions. For example, for a US-based company doing business in the UK, it doesn’t really matter what the exact GBP/USD exchange rate is going to be during the next 30 days, as long as it stays within a specific interval with a high probability (95% or more). Recently Mr. Hathi has published an article highlighting the use of EasyFit to model pricing probability of the Pound Sterling versus the US Dollar from historical data. It is fascinating to see how EasyFit is being used in (what we believe) a truly scientific approach to data analysis, and we hope to see new developments in this area soon.

EasyFit Used to Improve the Forecasting of Software Project Status

Monday, November 29th, 2010

The software development community struggles with a way to identify if their projects are on-schedule given the inherent risks of constant invention that inevitably has elements of uncertainty and risk. Current practice is for developers to estimate a software project, and attempt to consider (up-front) all variations to get a viable estimate of time and cost. This process is laborious, and even with due rigor, project slip when the realization that estimates versus actual times fail to match. This leads to costly project overruns and lack of trust in future estimates.

As part of the Agile movement for software development, we think there is a better way and are championing the use of Monte-Carlo simulation as a ways of assessing likely progress and dealing with delays as early as possible… read the full case study

EasyFit in Academia

Tuesday, August 3rd, 2010

Over the last five years, we have been adding new features to EasyFit mostly with business users in mind, but thanks to the nature of the product and the special academic pricing, it has become quite popular among the academic community: a quick search in Google reveals numerous research papers referring to EasyFit, just to name a few:
 

  • “Co-evolution of Social and Affiliation Networks” (University of Maryland, USA) [link]
  • “Power laws in top wealth distributions: evidence from Canada” (Brock University, Canada) [link]
  • “Duration of Coherence Intervals in Electrical Brain Activity in Perceptual Organization” (RIKEN Brain Science Institute, Japan) [link]
  • “Resource Management Schemes for Mobile Ad hoc Networks” (National University of Singapore, Singapore) [link]
  • “Modelling the diffusion of innovation management theory using S-curves” (University of London, UK) [link]

(see the larger list of papers using EasyFit)

It is pleasing to see EasyFit helping researchers in such diverse disciplines get their job done in a more efficient way.

EasyFitXL Is Now Compatible With Excel 2010

Monday, July 12th, 2010

EasyFitXL – the distribution fitting add-in for Excel – was first introduced with the release of EasyFit 4.0 back in 2007. When designing EasyFitXL, we did a lot of research as to which Excel versions to support. At that time, the latest version of Excel was Excel 2007, which included some new useful features, such as the support for larger worksheets and multi-threaded worksheet recalculation capability. However, many customers were not rushing to upgrade to Excel 2007 because of it’s controversial Ribbon Interface, so we had to make EasyFitXL compatible with the previous version – Excel 2003.

According to some publicly available data, Excel 2002 and Excel 2000 still had a considerable user base, so we have made a decision to support these two older versions as well. As a result, EasyFitXL initially included support for Excel versions from 2000 through 2007, covering perhaps over 99% of all Excel installations in the world.

Last month Microsoft has released Excel 2010 which does not make a big difference in terms of data analysis, however, with its release we started receiving compatibility complaints from our customers, so we performed an in-depth testing and released an updated version of EasyFit (available for download).

EasyFit 5.3 Released

Wednesday, January 20th, 2010

Recently a customer has contacted us and noted that the Inverse Cumulative Distribution Function (the Quantile Function) of the Inverse Gaussian distribution implemented in EasyFit works well for lambda=1902.1, mu=41857.0 and P=0.9, but fails for the same lambda & mu and P=0.99. Last week we have released an updated version of EasyFit that fixes the problem, and in this post we would like to elaborate more on the issue.

Evaluating the Inverse CDF of the Inverse Gaussian Model
Since the CDF of the Inverse Gaussian distribution is quite complicated (expressed in terms of the two Laplace Integrals), the Inverse CDF of this model is not available in closed form, and cannot be easily evaluated for a given set of distribution parameters. Initially, we have implemented an iterative approximation algorithm that evaluates the ICDF(P) using the CDF as well as the PDF to speed up the calculation. The algorithm itself works very well over a great range of input parameters, however, we have placed a limitation on how many iterations it is allowed to perform.

Because EasyFit is considered an interactive data analysis tool, we are always looking for a balance between the feature set and the performance, which is especially important when using EasyFit with Excel worksheets calculated in real time. The limitation on the number of iterations is necessary to make sure the algorithm doesn’t fall into an “infinite loop”, meaning the situation when it’s unable to reach the specified accuracy regardless of how long it continues to work. The problem usually happens when we are hitting the precision limitations of the computer’s CPU: in theory, the algorithm must converge in a limited number of steps, but in reality, it will just continue iterating over and over again without any accuracy improvements.

As a solution, we have made some improvements to the algorithm, making it more robust and efficient, so it now works with the same accuracy, but for a larger range of input parameters. For example, considering the parameters that initially caused the problem (lambda=1902.1 and mu=41857.0), the ICDF(P) can be evaluated for values of P up to 0.999925, which is more than enough for most statistical analysis applications.

Should You Upgrade?
Since this minor issue does not affect the accuracy of distribution fitting, you only need to upgrade if you are experiencing problems evaluating the Inverse CDF of the Inverse Gaussian distribution for P>0.9, otherwise EasyFit 5.2 will still work well for you.

New Version of EasyFit Available

Monday, June 1st, 2009

We have just released EasyFit Version 5.1 – the update that fixes a bug causing an incorrect calculation of the chi-squared GOF statistic for small sample sizes. To upgrade, uninstall EasyFit 5.0 from your computer, then download and install the latest version.

EasyFit Used by NASA to Improve Monte Carlo Risk Simulation Models

Tuesday, February 10th, 2009

On April 17 2005, the Millstone nuclear generating plant in Connecticut shut down when a circuit board monitoring a steam pressure line short-circuited. “Tin whiskers” – microscopic growths of the metal from soldering points on a circuit board – were blamed for causing the problem. These whiskers are comprised of nearly pure tin, and are therefore electrically conductive.

Field failures attributable to tin whiskers have cost individual programs many millions of dollars each. As a result, manufacturers of high-reliability systems are forced to use Monte Carlo simulation models to decide whether the use of tin poses an acceptable risk in a given application.

Recently a group of NASA scientists lead by Karim J. Courey, a Principal Engineer with the Orbiter Sustaining Engineering Office, Lyndon B. Johnson Space Center, used our distribution fitting software EasyFit to better understand the underlying process and develop a probability model that can be used to improve existing Monte Carlo risk simulation models… read the full case study

How To Speed Up The Distribution Fitting Process?

Tuesday, December 30th, 2008

Since fitting probability distributions to large data sets can be a time-consuming task, we are currently researching the possibility of using multi-core processors to make EasyFit work faster. During the past several years, major processor manufacturers have been promoting the multi-core technology on the desktop processors market. Multiple cores in a single chip allow for better performance/price ratio on a range of tasks, however, existing software needs to be updated accordingly to take full advantage of this type of hardware.

We have modified the original distribution fitting algorithm to utilize all cores available on a system, and used it to fit distributions to a simulated set of 200,000 data points. In a series of tests on an Intel dual-core processor, the new algorithm executed almost twice as fast, yielding up to 90% performance increase, compared to the version currently used in EasyFit. These are very good results, and we will definitely be including this feature into the next release of EasyFit.

On a related note, last week we were contacted by a customer regarding our upcoming Simulation & Probabilistic Analysis SDK. They need to analyze large volumes of data, and from their description of the problem we estimated that the typical analysis would take up to 20 hours on a modern PC. With the new distribution fitting algorithm, it can take less than 12 hours on a dual-core CPU, or even less on quad-core processors popular in the server space. In a decision making environment where several hours can mean the difference between profit and loss, this is a very important improvement.

Need To Deal With Risk and Uncertainty in Your Software Applications?

Tuesday, December 23rd, 2008

Lately we have received a couple messages from customers asking if it’s possible to use the Monte Carlo simulation and distribution fitting features of EasyFit in their own software applications. The short answer is yes, but these features are limited to calculating some distribution functions in Excel VBA. There’s currently no way to run simulations, fit distributions to data, perform goodness of fit tests, or use distribution functions from C#, C++, VB.NET, and other programming languages.

To fill the gap, we are considering to create a Simulation & Probabilistic Analysis Software Development Kit (SPA SDK) for software developers who need to deal with risk & uncertainty in their applications, but don’t have time or expertise to design and implement the required features on their own. We already have in place the tried and true technology that’s a basis for our distribution fitting products EasyFit and EasyFitXL, so creating an SDK would be possible in a short period of time.

Since we have had only a few requests for an SDK, we would like to know whether you would be interested in such kind of product. Below is our vision for the SDK – you are welcome to express any thoughts or specific requirements you might have. Please feel free to contact us and we will take your input seriously.

Update: The free beta version of the SDK is now available for download – please click here for details.

What is a Simulation & Probabilistic Analysis Software Development Kit (SPA SDK) ? (more…)

EasyFit: select the best fitting distribution and use it to make better decisions. learn more
EasyFit Screenshot - Click To Enlarge
Download Free Trial