Data Analysis & Simulation

Probabilistic Analysis vs Time Series Analysis

December 14th, 2011

Can EasyFit be used to analyze time series data? To answer this question we recently received from a customer, we will shed some light on the differences between the probabilistic analysis and time series analysis.

When dealing with time series data, you usually have as an input a set of (time, value) data pairs indicating the consecutive measurements taken at equally spaced time intervals. The goal of time series analysis is to identify the nature of the process represented by your data, and use it to forecast the future values of the time series being analyzed.

A widespread application of such an analysis is weather forecasting: for more than a century, hundreds of weather stations around the world record various important parameters such as the air temperature, wind speed, precipitation, snowfall etc. Based on these data, scientists build models reflecting seasonal weather changes (depending on the time of the year) as well as the global trends – for example, temperature change during the last 50 years. These models are used to provide weather forecast for government and commercial organizations. In a typical forecast, the predicted values are not assigned probability: “In May, the maximum daily air temperature is expected to be 22 degrees Celsius.”

In contrast to the predictions based on time series analysis, when performing probabilistic analysis, you get not just a single value as a forecast, but a probabilistic model that accounts for uncertainty. In this scenario, you would obtain a continuous range of values and assigned probabilities. Of course, for real world applications, it is more practical to deal with specific values, so the probabilistic models are used to obtain predictions at fixed probability levels. Considering the above example, a forecast might look like: “In May, the maximum daily air temperature will be 22 degrees Celsius with 95% probability.”

So can distribution fitting be useful when analyzing time series data? The answer depends on the goals of your analysis – i.e. what kind of information you want to derive from your data. If you want to understand the connection between the predicted values and the probability, you should fit distributions to your data (just keep in mind that in this case the “time” variable will be unused). On the other hand, if you need to identify seasonal patterns or global trends in your data, you should go with the “classical” time series analysis methods.

What Parameter Estimation Methods Are Used in EasyFit?

October 28th, 2011

From time to time, we receive emails from our customers asking what parameter estimation methods are implemented in EasyFit to carry out distribution fitting. When designing EasyFit, we were striving for a good balance between the accuracy and speed of calculations. That is why we decided to use the Method of Moments (MOM) for those models that allow for easy use of this method. Some examples of such distributions include the Chi-Squared, Exponential, two-parameter Gamma, and Logistic models. However, for many other distributions, the Method of Moments does not yield closed form expressions for parameter estimates, and in such cases EasyFit uses the Maximum Likelihood Estimation (MLE) method. In addition, for some distributions used in specific industries, such as the Wakeby model, EasyFit employs the Method of L-Moments (LMOM). You can find a detailed list of supported distributions and estimation methods used on our website.

Simulation & Probabilistic Analysis SDK 1.2 Released

September 5th, 2011

Recently we have released a new version of our SDK. In this update, we have added a new property that lets you obtain the current licensing status of the SDK – for instance, you can determine whether the SDK is currently running in trial mode (using the Evaluation License), and if so, how many days are left until the evaluation period expires.

Consider the following scenario: you are building an application with a modular structure that, apart from its core feature set, provides some additional functionality through a number of modules, or add-ins, which can be installed and enabled on an optional basis. Now, suppose one of these modules uses the simulation or distribution fitting features of the SDK, and you want to give your users an ability to evaluate it prior to making a purchase decision. The new version of the SDK lets you easily integrate this logic into your applications, allowing you to create more flexible solutions that better meet your customers’ needs.

EasyFit Used for Probabilistic Currency Forecasting

February 21st, 2011

Because risk and uncertainty are a part of literally all areas of our life, with the finance being one of the most important areas, scientifically based risk management methods are gaining more and more popularity among the finance industry professionals. Currency fluctuations affect all businesses dealing with multiple currencies, so having at least some degree of certainty about the future exchange rates can be a significant success factor for any international enterprise. A wide range of currency forecasting methods have been developed, however, not many of them can pretend to be reliable in the long run: most algorithms only work for a short period of time, and need to be tweaked as the market conditions change.

Brijen Hathi, a Research Fellow at the Planetary & Space Sciences Research Institue, performs his own research in the field and publishes the results in the Currency Forecasting Blog. The forecasting methodology employed by Mr. Hathi is in part based on the same techniques used in probabilistic risk analysis. Like with most modern forecasting methods, in this approach, he uses historical data to predict the future, but the big difference here is that he also assigns specific probabilities to the predictions. For example, for a US-based company doing business in the UK, it doesn’t really matter what the exact GBP/USD exchange rate is going to be during the next 30 days, as long as it stays within a specific interval with a high probability (95% or more). Recently Mr. Hathi has published an article highlighting the use of EasyFit to model pricing probability of the Pound Sterling versus the US Dollar from historical data. It is fascinating to see how EasyFit is being used in (what we believe) a truly scientific approach to data analysis, and we hope to see new developments in this area soon.

EasyFit Used to Improve the Forecasting of Software Project Status

November 29th, 2010

The software development community struggles with a way to identify if their projects are on-schedule given the inherent risks of constant invention that inevitably has elements of uncertainty and risk. Current practice is for developers to estimate a software project, and attempt to consider (up-front) all variations to get a viable estimate of time and cost. This process is laborious, and even with due rigor, project slip when the realization that estimates versus actual times fail to match. This leads to costly project overruns and lack of trust in future estimates.

As part of the Agile movement for software development, we think there is a better way and are championing the use of Monte-Carlo simulation as a ways of assessing likely progress and dealing with delays as early as possible… read the full case study

Will Cloud Computing Make Risk Analysis More Economically Efficient?

November 1st, 2010

What is Cloud Computing?

For some time now, there has been a lot of buzz around cloud computing – the relatively new computing paradigm in which the resources, software, and information are shared on the computer clusters and delivered to the users on demand through the Internet. The idea behind cluster computing is not new: if your applications require a lot of computing resources or impose very strict reliability requirements which cannot be met by a single personal computer or a server, you can link a group of computers into a cluster that will provide a much better performance.

Why Not Build a Cluster Yourself?

Building and maintaining a computer cluster in your organization may have some downsides, such as large upfront investments into technology infrastructure.and high running costs. Of course, there are companies that will do the job of building and managing a computer cluster for you, but anyway, the bottom line is: depending on how loaded your cluster is going to be, it may or may not be economically feasible for your company to run it on-site. For instance, if you need to quickly perform a very CPU-intensive calculation (e.g. render a complex 3D scene), but only once a day, chances are the cluster will not pay off.

And here’s where cloud computing comes into play: you can have access to great computing resources, pay only as you use them, and not worry about the underlying technology infrastructure. These factors combined can provide a great economic benefit, and some major Internet players, including Amazon and Google, are already offering cloud computing platforms for those who want to make their businesses more efficient.

Is Cloud Computing a Good Fit for Risk Analysis?

As one might guess, not just any kind of application can be efficiently run on the cloud. Because at the core of a cloud is a number of computers linked into a cluster, it is very good at processing a large number of independent tasks, such as requests to a web server. That might be the reason why the cloud computing platforms offered by Amazon and Google are mostly used to run websites.

If you consider risk analysis, it looks like an ideal application to be run on the cloud: an input model of several megabytes that can be easily sent to the cloud, a need for huge computational resources to quickly perform Monte Carlo simulation and distribution fitting, sometimes a need for a lot of storage to hold intermediate results, and relatively small-sized analysis results that can be sent back to the users as text and graphics. Add to that the ever-increasing complexity of risk models used across various industries, causing analysts to wait for hours while their simulations are running, and you have a potentially good opportunity to make risk analysis more economically efficient.

To perform further research in this field, we have partnered with Supportex, a technology services company based in Czech Republic, Europe. Supportex has some good experience providing a cloud computing platform for solving problems much more complex than just processing requests to a web server, that is why we have decided to rely on their hardware infrastructure and domain knowledge to run some test applications and see if cloud computing can be of real help in the field of risk analysis. Once we run the tests, we plan to publish the results in this blog, so stay tuned!

EasyFit in Academia

August 3rd, 2010

Over the last five years, we have been adding new features to EasyFit mostly with business users in mind, but thanks to the nature of the product and the special academic pricing, it has become quite popular among the academic community: a quick search in Google reveals numerous research papers referring to EasyFit, just to name a few:
 

  • “Co-evolution of Social and Affiliation Networks” (University of Maryland, USA) [link]
  • “Power laws in top wealth distributions: evidence from Canada” (Brock University, Canada) [link]
  • “Duration of Coherence Intervals in Electrical Brain Activity in Perceptual Organization” (RIKEN Brain Science Institute, Japan) [link]
  • “Resource Management Schemes for Mobile Ad hoc Networks” (National University of Singapore, Singapore) [link]
  • “Modelling the diffusion of innovation management theory using S-curves” (University of London, UK) [link]

(see the larger list of papers using EasyFit)

It is pleasing to see EasyFit helping researchers in such diverse disciplines get their job done in a more efficient way.

EasyFitXL Is Now Compatible With Excel 2010

July 12th, 2010

EasyFitXL – the distribution fitting add-in for Excel – was first introduced with the release of EasyFit 4.0 back in 2007. When designing EasyFitXL, we did a lot of research as to which Excel versions to support. At that time, the latest version of Excel was Excel 2007, which included some new useful features, such as the support for larger worksheets and multi-threaded worksheet recalculation capability. However, many customers were not rushing to upgrade to Excel 2007 because of it’s controversial Ribbon Interface, so we had to make EasyFitXL compatible with the previous version – Excel 2003.

According to some publicly available data, Excel 2002 and Excel 2000 still had a considerable user base, so we have made a decision to support these two older versions as well. As a result, EasyFitXL initially included support for Excel versions from 2000 through 2007, covering perhaps over 99% of all Excel installations in the world.

Last month Microsoft has released Excel 2010 which does not make a big difference in terms of data analysis, however, with its release we started receiving compatibility complaints from our customers, so we performed an in-depth testing and released an updated version of EasyFit (available for download).

Building The International Distributors Network

June 7th, 2010

Over the past five years, we have been selling our distribution fitting software through Plimus Inc. – the U.S. based company responsible for processing credit card orders and sending out the license keys to the customers who purchased our products. What we like about Plimus is that apart from making the ordering process secure and smooth for our users, this company is also very responsive to any queries, which is especially important when it comes to dealing with people’s money.

However, in some countries there are state regulations preventing customers (mostly government organizations and academic institutions) from ordering software online. To make our distribution fitting products available to users in those countries, we are in the process of creating a network of international distributors. Specifically, this year we have partnered up with a Chinese, a Mexican, and three Taiwanese software resellers who now offer our products through their local distribution channels.

EasyFit 5.3 Released

January 20th, 2010

Recently a customer has contacted us and noted that the Inverse Cumulative Distribution Function (the Quantile Function) of the Inverse Gaussian distribution implemented in EasyFit works well for lambda=1902.1, mu=41857.0 and P=0.9, but fails for the same lambda & mu and P=0.99. Last week we have released an updated version of EasyFit that fixes the problem, and in this post we would like to elaborate more on the issue.

Evaluating the Inverse CDF of the Inverse Gaussian Model
Since the CDF of the Inverse Gaussian distribution is quite complicated (expressed in terms of the two Laplace Integrals), the Inverse CDF of this model is not available in closed form, and cannot be easily evaluated for a given set of distribution parameters. Initially, we have implemented an iterative approximation algorithm that evaluates the ICDF(P) using the CDF as well as the PDF to speed up the calculation. The algorithm itself works very well over a great range of input parameters, however, we have placed a limitation on how many iterations it is allowed to perform.

Because EasyFit is considered an interactive data analysis tool, we are always looking for a balance between the feature set and the performance, which is especially important when using EasyFit with Excel worksheets calculated in real time. The limitation on the number of iterations is necessary to make sure the algorithm doesn’t fall into an “infinite loop”, meaning the situation when it’s unable to reach the specified accuracy regardless of how long it continues to work. The problem usually happens when we are hitting the precision limitations of the computer’s CPU: in theory, the algorithm must converge in a limited number of steps, but in reality, it will just continue iterating over and over again without any accuracy improvements.

As a solution, we have made some improvements to the algorithm, making it more robust and efficient, so it now works with the same accuracy, but for a larger range of input parameters. For example, considering the parameters that initially caused the problem (lambda=1902.1 and mu=41857.0), the ICDF(P) can be evaluated for values of P up to 0.999925, which is more than enough for most statistical analysis applications.

Should You Upgrade?
Since this minor issue does not affect the accuracy of distribution fitting, you only need to upgrade if you are experiencing problems evaluating the Inverse CDF of the Inverse Gaussian distribution for P>0.9, otherwise EasyFit 5.2 will still work well for you.

EasyFit: select the best fitting distribution and use it to make better decisions. learn more
EasyFit Screenshot - Click To Enlarge
Download Free Trial