Sean O'Leary, industry thought leader, to be a featured member on the cloud computing panel at the Raleigh Convention Center April 27-28
Click here to continue to the article
I attended the Data 2.0 Conference this week in San Francisco. There is a lot to be excited about in this emerging, growing and quickly-accelerating industry. However there are still some significant obstacles that have to be overcome for the vision of the data-driven world and the “great data highway in the sky” to truly be realized.
First, there is the exciting stuff. New companies continue to emerge and grow in the space in multiple categories, including broad data sharing sites (FluidInfo, InfoChimps), purveyors of proprietary and hard-to-capture data (Navteq, Metamarkets), API infrastructure providers (WebServius, Apigee, Mashery), specific data category providers (SimpleGEO, Rapleaf, Socrata, DataSift), providers of API-based solutions (StrikeIron, Xignite) and slick data visualization tools (too numerous to list).
Also, the companies that have been in this space for five or more years are becoming larger, more sophisticated and are in many cases continuing to raise significant amounts of capital from investors looking to capitalize on the data megatrend. Even the stalwarts such as Microsoft with their DataMarket are eyeing future fruitful harvests in this space. Twitter also announced the commercial licensing of its entire “fire hose” of Tweets at the event, a move that the providers of analytics tools are hailing.
More and more public, government-sourced data is coming online everyday as fodder for this machine-to-machine information feeding frenzy. This data is coming from every level of government too; cities such as San Francisco (datasf.org), state governments such as Oregon (data.oregon.gov announced two weeks ago), and the federal government's data.gov initiative(rumors that funding might be cut are false according to keynote speaker and the charismatic, sometimes controversial industry-insider Vivek Wadhwa).
All of these government-sourced data assets are being made available to the general public in the hopes of civically engaging the creativity among us to innovate and create public value without the traditional budgetary costs. This has already led to a proliferation of applications such as online live maps of San Francisco municipal transportation schedules (including for the iPhone and Android platforms), as well as municipal vendor contracts available online for public discussion (it's amazing how many vendors consistently come in late and over-budget get rewarded new contracts over and over) like Chicago’s citypayments.org site.
Finally, the platforms (the fertile Cloud and all that is happening there with the Google AppEngine, Microsoft’s Azure, and Amazon’s various cloud offerings) and especially the devices (like smartphones and the iPad) that can make use of this data are also marching forward at a breathtaking pace.
This is all very exciting for those of us whom data represents a livelihood.
However, the significant challenge around the accessibility and usability of these vast seas of data is that it is still largely a complex, IT-oriented developer's world. Most of the access to these data sources is either via API, or available in structures and formats as varied as the data itself. This limits the applicability of these valuable data sources to a very small group of dedicated engineers and leaves us all with only a modicum of the true potential of this space.
Sure, there are API protocol standards such as REST and SOAP, but these only scratch the service. Most single-API vendors introduce a new set of behaviors, data structures, response codes and a new business model with each new API. This adds greatly to the complexity for anyone looking to put these data sources to use, both initially and ongoing. Until we can find a way to normalize the great data highway, it's going to be a bumpy road. Those that know me know I’ve been preaching this for years and have applied much of it to StrikeIron’s various data and API offerings; however it can be a difficult proposition to get adopted across the industry.
The consumption complexity issue is demonstrated by the term "mashup". Several years ago, this was the term-du-jour by an industry claiming that non-developers could combine datasets in interesting and exciting new ways without the assistance of their IT organizations. This term however has all but shriveled and died. Why? Because non-developer's couldn’t do it. The tools were cumbersome, complex, and represented whole new learning curves that most people simply don't have the time or patience for, as well as the lack of standards surrounding the datasets themselves. In fact, I never heard the term mentioned a single time at the event. A few years back, it would have been in nearly every discussion. May the term “mashup” rest in peace.
Until we surpass this hurdle of data consumption complexity and the vendors in the space only pay lip service to these challenges (prevalent on several of the panels at the event), the data-driven world will only be a shell of what it could be.