Data Warehousing 2013: A Changing Landscape
The general premise of data warehousing hasn't changed much over the years. The idea is still to aggregate as much relevant data as possible from multiple sources, centralize it in a repository of some kind, catalog it, and then utilize it for reporting and analytics to make better business decisions. An effective data warehousing strategy seamlessly enables trend analysis, predictive analytics, forecasting, decision support, and just about anything else we now categorize under the umbrella of "data science."
The premise is not different these days, but rather, it is more the shifting nature of the data sources that the warehouse must draw from to capture as much useful information as possible. It's the data that's changed, not the goal.
First, there is the rapid proliferation of social-generated data in all of its unstructured forms, making the data extraction and transformation components of loading data to the warehouse more difficult than it has been in the past. But this isn't really groundbreaking for 2013, as social data and the creation of various Big Data technologies its growth has spawned, such as Hadoop, have been emerging for several years now.
Instead, what will likely be significantly different in 2013 is the acceleration of the deployment of a multitude of SaaS applications within the enterprise, especially in the larger, often slower to adopt, companies that populate the Fortune 2000. As the deal sizes grow in size, the SaaS footprint is clearly becoming significantly bigger.
This is where it becomes interesting. It's not just that an organization has several different SaaS applications such as Salesforce, Workday, and Success Factors in place and in use across the enterprise, with a single instance of each in use by all. Instead, due to the nature of the easier adoption of these SaaS applications, many of them have come in through the back door departmentally and at different times rather than through a centralized IT-controlled proliferation. This means that multiple instances of the same application are popping up everywhere.
For example, there are large enterprises that now have 10, 20, or even 50+ instances of Salesforce running across the entire organization. Each instance has its own set of customization of data collection and storage, separate add-on applications installed, different data feeding these applications, and unique implementation approaches. This could result in the old adage of solving old problems while creating new ones.
Some questions that could be asked are what kind of data collection and ETL challenges will this cause for those wishing to leverage a data warehousing strategy? Is the fact that the operational data from these various SaaS applications is stored and maintained by different vendors, each of which who is incentivized to keep it that way, make things easier or more difficult for data warehousing and the analysis it enables? Will data fragmentation and the resultant data integration strategies scale across all of these instances of SaaS applications? It will be interesting to see organizations meet the "SaaS sprawl" challenge, especially as it relates to cross-enterprise data collection strategy.
Furthermore, SaaS applications have taken an ever-increasing hold of the enterprise as of late with larger and larger deals. With the Cloud and SaaS applications a major part of their 2013 strategies, Oracle, SAP, IBM, and the more traditional software vendors have taken notice. SAP's Business ByDesign, Oracle's Fusion Applications, and recent SaaS acquisitions will surely add to what could become a hodge podge of SaaS applications across the enterprise.
To meet these challenges currently, cloud data warehousing offerings from companies like BitYota and Amazon's Redshift are beginning to emerge with a core theme of the cloud as the centralized data storage repository. ETL and data integration solutions such as Informatica's Cloud and Dell's Boomi are racing to meet these traditional data warehousing requirements in the cloud paradigm. Also, the traditional data cleansing requirements of data warehousing are being met with their cloud-based counterparts for better, more usable data in these new age warehouses. One thing that will never change is that bad data will always equal bad analysis, and the need for making investments in data quality strategies will continue to exist.
As the landscape of SaaS continues its rapid expansion, and the data within these applications continues to burgeon, 2013 will definitely be a pivotal year in the dawn of a new class of data warehousing technologies.