XML Authors: Arron Fu, Jayaram Krishnaswamy, Jason Bloomberg, Asim Saddal, Greg Schulz

Blog Feed Post

Quandl: A Wikipedia for Time Series Data

This guest post is by Tammer Kamel, Founder of Quandl Finding and formatting numerical data for analysis in R or Excel or indeed any application is a pain that all real world data analysts know all too well.  In aggregate I have probably spent weeks of my life trying to find data on the web.  And several more weeks validating, formatting and cleaning the data.  Analysis offers data scientists interesting, intellectually stimulating problems.  But data acquisition, the necessary precursor, offers only tedium and pain.  It's a time vampire. The solution to this problem is conceptually obvious:  one site with all the world’s data, nicely formatted and documented; an omni-platform.  Platforms aspiring to this objective keep appearing and disappearing.  They appear because they are great ideas.  They disappear because they demand publishers upload and maintain data on an external site.  Publishers don’t comply because they have enough work just maintaining the data in their own database, let alone someone else’s. So, if the data won’t come to the platform the only alternative is the platform comes to the data.  What does that mean? It means that to succeed in building a truly comprehensive data platform, you must ask nothing of data publishers.  You have to create a solution that feeds off whatever the publisher is spitting out regardless of how absurdly the data might be published. That’s what we're doing at Quandl.  We've built a sort of "universal data parser" which has thus far parsed about 2.8 million datasets.  We've asked nothing of any data publisher.  As long as they spit out data somehow (excel, text file, blog post, xml, api, etc) the "Q-bot" can slurp it up. The result is www.quandl.com as sort of "search engine" for numerical data.  The idea with Quandl is that you can find data fast.  And more importantly, once you find it, it is ready to use.  This is because Quandl's bot returns data in a totally standard format.  Which means we can then translate to any format a user wants. Quandl is rich in financial, economic and sociological time series data.  The data is easy to find.  It is transparent to source.  It can be easily merged with each other.  It can be visualized and shared.  It is all open.  It is all free.  There's much more about our vision on our about page. From the start, Quandl delivered data in all the standard formats (Excel, csv, xml, json).  We're now moving on to deliver data to applications in the exact format those apps demand their data.  We're starting with R.  We've done something simple to start.  The next step for us is to complete an R package to be made available on CRAN. In the near future we will be inviting (and indeed encouraging) Quandl users to "drive" the Quandl-bot themselves so that Quandl has the data they personally need.  We're working towards building a sort of Wikipedia of numerical data.  In the long term we hope to do to certain "closed data dinosaurs" what Jimmy Wales did to Britannica.  In the short term, I would be very pleased if we could make Quandl a valuable resource for the R community.

Read the original blog entry...

More Stories By David Smith

David Smith is Vice President of Marketing and Community at Revolution Analytics. He has a long history with the R and statistics communities. After graduating with a degree in Statistics from the University of Adelaide, South Australia, he spent four years researching statistical methodology at Lancaster University in the United Kingdom, where he also developed a number of packages for the S-PLUS statistical modeling environment. He continued his association with S-PLUS at Insightful (now TIBCO Spotfire) overseeing the product management of S-PLUS and other statistical and data mining products.<

David smith is the co-author (with Bill Venables) of the popular tutorial manual, An Introduction to R, and one of the originating developers of the ESS: Emacs Speaks Statistics project. Today, he leads marketing for REvolution R, supports R communities worldwide, and is responsible for the Revolutions blog. Prior to joining Revolution Analytics, he served as vice president of product management at Zynchros, Inc. Follow him on twitter at @RevoDavid