Tuesday, June 12, 2012

Building Aggregation Frameworks - Time Series, Statistical Time Analysis

Awhile back I became interested in statistical analysis of statistical time data.  The same way all projects and prototypes start at our company I created a directory in our "project prototypes" folder and wrote a readme.txt file and an implementation.txt file.

Here are the contents of those two files.
http://pastie.org/4075679   readme
http://pastie.org/4075673   implementation notes





There is still a huge need for aggregation api systems that specialize in these types of tasks. They are extremely common and form the corner stones of data analysis and retrieval.  Often data like this needs to be broken down by relational dimensions in addition to efficient time-based analysis.

I would like to mention MongoDB in this space because I am eagerly awaiting their aggregation api.
This serves needs slightly different than map-reduce and doesn't require Javascript to be written to obtain the same results in many cases.  This is absolutely a step in the right direction to complex nosql aggregation systems.  I really applaud MongoDB for their efforts in this area and recognizing this niche to fill (it won't be a niche for long).

I believe although the fundamental concepts of aggregation have been around for some time in database systems (think sql GROUP BY) we are still in the very early stages of much more intuitive, dynamic, and more useful tools to understand the ever growing amounts of data that we collect.

No comments:

Post a Comment