Subscribe by Email

Your email:

free-trials

contact-us

StrikeIron Blog

Current Articles | RSS Feed RSS Feed

Data Warehousing 2013: A Changing Landscape

  
  
  
  
  

The general premise of data warehousing hasn't changed much over the years. The idea is still to aggregate as much relevant data as possible from multiple sources, centralize it in a repository of some kind, catalog it, and then utilize it for reporting and analytics to make better business decisions. An effective data warehousing strategy seamlessly enables trend analysis, predictive analytics, forecasting, decision support, and just about anything else we now categorize under the umbrella of "data science."

The premise is not different these days, but rather, it is more the shifting nature of the data sources that the warehouse must draw from to capture as much useful information as possible. It's the data that's changed, not the goal.

First, there is the rapid proliferation of social-generated data in all of its unstructured forms, making the data extraction and transformation components of loading data to the warehouse more difficult than it has been in the past. But this isn't really groundbreaking for 2013, as social data and the creation of various Big Data technologies its growth has spawned, such as Hadoop, have been emerging for several years now.

Instead, what will likely be significantly different in 2013 is the acceleration of the deployment of a multitude of SaaS applications within the enterprise, especially in the larger, often slower to adopt, companies that populate the Fortune 2000. As the deal sizes grow in size, the SaaS footprint is clearly becoming significantly bigger.

This is where it becomes interesting. It's not just that an organization has several different SaaS applications such as Salesforce, Workday, and Success Factors in place and in use across the enterprise, with a single instance of each in use by all. Instead, due to the nature of the easier adoption of these SaaS applications, many of them have come in through the back door departmentally and at different times rather than through a centralized IT-controlled proliferation. This means that multiple instances of the same application are popping up everywhere.

For example, there are large enterprises that now have 10, 20, or even 50+ instances of Salesforce running across the entire organization. Each instance has its own set of customization of data collection and storage, separate add-on applications installed, different data feeding these applications, and unique implementation approaches. This could result in the old adage of solving old problems while creating new ones.

Some questions that could be asked are what kind of data collection and ETL challenges will this cause for those wishing to leverage a data warehousing strategy? Is the fact that the operational data from these various SaaS applications is stored and maintained by different vendors, each of which who is incentivized to keep it that way, make things easier or more difficult for data warehousing and the analysis it enables? Will data fragmentation and the resultant data integration strategies scale across all of these instances of SaaS applications? It will be interesting to see organizations meet the "SaaS sprawl" challenge, especially as it relates to cross-enterprise data collection strategy.

Furthermore, SaaS applications have taken an ever-increasing hold of the enterprise as of late with larger and larger deals. With the Cloud and SaaS applications a major part of their 2013 strategies, Oracle, SAP, IBM, and the more traditional software vendors have taken notice. SAP's Business ByDesign, Oracle's Fusion Applications, and recent SaaS acquisitions will surely add to what could become a hodge podge of SaaS applications across the enterprise.

To meet these challenges currently, cloud data warehousing offerings from companies like BitYota and Amazon's Redshift are beginning to emerge with a core theme of the cloud as the centralized data storage repository. ETL and data integration solutions such as Informatica's Cloud and Dell's Boomi are racing to meet these traditional data warehousing requirements in the cloud paradigm. Also, the traditional data cleansing requirements of data warehousing are being met with their cloud-based counterparts for better, more usable data in these new age warehouses. One thing that will never change is that bad data will always equal bad analysis, and the need for making investments in data quality strategies will continue to exist.

As the landscape of SaaS continues its rapid expansion, and the data within these applications continues to burgeon, 2013 will definitely be a pivotal year in the dawn of a new class of data warehousing technologies.

OpenStack - Open Cloud Operating System Gaining Momentum

  
  
  
  
  

As the "Cloud" has evolved and matured from its roots the past few years, the alternatives for deploying a cloud-based solution have been almost entirely proprietary and commercial. They typically have required at least a credit card to even get started "renting" servers and storage that might be needed for only short periods of time and to achieve more flexible scalability models. With the success and momentum of OpenStack, an open source cloud operating system for deploying, hosting, and managing public and private clouds within a data center, this appears to be changing.

The OpenStack project, launched initially with code contributions from Rackspace and NASA, provides the software components for making cloud management functionality available from within any data center, including one's own, similar to what Amazon, VMWare, Microsoft and other cloud vendors are now offering commercially. Deploying OpenStack enables cloud-based applications and systems utilizing virtual capacity to be launched without the associated run-time fees the current slate of vendors require, as all of the software is freely distributable and accessible.

At first glance, this seems to be an ideal solution for larger enterprise IT organizations to offer up traditional cloud functionality, such as virtual servers and storage availability, to its constituents within the organization and without the fear of vendor lock-in and and ever-increasing vendor costs. This approach also provides for access to implementation details and the ability to customize based on specialized needs - also important in many scenarios and something not typically or easily offered by the larger commercial vendors. So the benefits to the private cloud space to those who find it appropriate to build and manage their own cloud environments are clear.

However, Rackspace itself just announced making public cloud services available using OpenStack, and others are likely to follow in the not-too-distant future, leveraging community-developed innovation in the areas of scalability, performance, and high availability that might ultimately be difficult for any single proprietary vendor to match. This should enable public service providers, especially in niche markets, to proliferate as well.

Major high tech vendors are also backing and aligning with OpenStack. In addition to Rackspace and NASA, Deutsche Telekom, AT&T, IBM, Dell, Cisco, and RedHat all have much to gain from the success of OpenStack and have announced as partners, code contributers, and sources of funding. Commercial distributions have already emerged such as StackOps. Funding for OpenStack-oriented companies has begun from the venture community, and events such as the OpenStack Design Summit and Conference this week in San Francisco are getting larger and selling out quickly.

All of the foundational pieces are in place for OpenStack to have quite a run towards achieving its goal of becoming the universal cloud platform of the future and the leaders of the "open era" of the Cloud. This is an exciting development for companies like StrikeIron and our cloud-based data-as-a-service and real-time customer data validation offerings, as the data layer of the Cloud will become even more promising and fertile as OpenStack continues to accelerate organizations towards easier adoption of cloud computing models and all of its benefits.

openstack logo

Oracle OpenWorld 2011: Data Analytics at the "Speed of Thought"

  
  
  
  
  
Oracle's OpenWorld kicked off this week in San Francisco. CEO Larry Ellison gave the keynote last night where he introduced and highlighted the Exalytics Intelligence Machine, a new in-memory appliance that utilizes data compression and also storing data in DRAM (Dynamic Random Access Memory) to substantially increase business data analytics performance. It analyzes data "several times faster" according to Larry, enabling the scanning of 200 GB/sec of data, enabling "questions to be answered before they are even asked."

This and other announcements highlight what should be a great conference and show. It will be interesting to see if Oracle provides increased clarity around some of the following questions this week, such as:

What are Oracle's long-term Java plans? Will it continue to remain open post-acquisition of Sun, or can we expect to see a flurry of Java-licensing lawsuits such as the current one with Google they may cause some to doubt its interpretation of "openness"?

Will Oracle continue to move into larger-scale systems, including more hardware offerings, to compete with IBM and SAP? Oracle currently has sold over one thousand of its Exadata machines to date, so it is currently heading in this direction.

Will Oracle continue to push the "Public Cloud", or will it steer customers and the industry more towards its "Private Cloud" solutions that are "more secure" but also require individual software purchases rather than time-shared subscriptions?

Now that Fusion Applications have been made available on Cloud, on-premise, and mobile platforms, where is is the collection of offerings and integration with other Oracle platforms headed from here?

Will it continue to add more features and capabilities to its On Demand applications, such as CRM On Demand? Its acquisition of Market2Lead last year demonstrates its advance into On Demand marketing automation platforms.

StrikeIron, as an Oracle Gold Partner, is expressly interested in the future direction of Oracle and where the company is headed. Currently, our collaboration with another Oracle partner, ActivePrime, enables us to deliver an integrated customer data quality solution to the Oracle CRM On Demand platform. Also, offerings such as our customer data quality solutions and mobile messaging solutions are available for integration into Oracle's broad stack of applications, products, and platforms. All of the great database innovation in the world won't help if it's not running on top of high quality, complete, and accurate data.

oracle crm on demand

Public Cloud Versus Private Cloud - Public Comes With Experience and Expertise

  
  
  
  
  
The debates rage on about "Public Clouds" and "Private Clouds" and which is more appropriate for serious computing efforts, including in business systems and all across the universe of applications.

Most vendors, not surprisingly, line up behind the approach that best suits their product offerings.

For example, SAAS vendors (Salesforce, NetSuite, SuccessFactors) say that multi-tenant applications are the Cloud, citing the need for a business solution with shared, multi-tenant software resources, including databases, are needed to truly make the Cloud useful. Yet many of these vendors are often criticized for not providing "open" models, so still some long-term questions remain. Yes, these Clouds are easy get into, but how do you get out of them if necessary?

The infrastructure-as-a-service crowd (Amazon's EC2, Google App Engine, Rackspace) will suggest that only infrastructure is the "true" Cloud, meaning essentially renting clean servers by the minute and storage by the byte represent the original "open" Cloud vision, enabling applications to be moved from Cloud to Cloud without difficulty. However, this is just servers and storage in the end (at least for now), so the user still has to build everything themselves. Ok for some, not entirely useful for most.

And of course the enterprise software folks (Oracle, SAP, IBM) often claim that the Cloud can and should be "Private" because it's a better security model and enables you to manage it within the organization. This enables them to capitalize on the hype of the Cloud without having to change too much of their actual offerings. Of course the challenge with this model is the lack of sharing licenses or hardware across organizations becomes quite expensive, and quite frankly we have had this model before under other names such as "mainframe", "client-server" and other "in-house" architectures. Sure, there is some incremental innovation and usefulness, but it's not too much different than what has always been offered, just another iteration.

So while there are valid use cases for each of the above scenarios, there is one thing I want to point out with Public versus Private Cloud discussions when businesses are unsure which route to go. It goes all the way back to the birth of the Cloud as a concept itself.

The reason we even have the Cloud in the first place is that heavily-trafficked Web sites such as Google and Amazon found they had to build massive, high performance, scalable systems to be able to handle the processing load at peak times (Amazon at Christmas for example). This meant that during non-peak times, they found themselves with lots of excess, unused computing capacity.

This of course spawned the idea that they could leverage this excess capacity, as well as their expertise in managing high-performance, distributed, "Web scale" computing technology as an additional line of revenue, and possibly launching a brand new industry of opportunities. Hence, the Cloud was born.

The one key piece of this Cloud concept is "expertise". This is something that you get in Public Cloud environments that you don't get in Private Clouds. With Private Clouds, you get all of the hardware and software (and the corresponding purchased licenses) that you need, but you don't have a team of experts that have been running that platform for years monitoring, managing, and supporting that platform in real-time while you use it, including having visibility into it as it runs. By definition you therefore don't have engineers supporting the success of your application systems on a minute-by-minute basis.

This real-time team of experts, and their associated expertise developed over time, is something you get inherently in the Public Cloud scenario. The folks who run these systems have as their core mission in life to keep the platform up and running, battle test it over time, improve it, enhance it, test it, analyze operational data, review performance charts, improve and enhance it again, and on and on, day after day.

Although a bit overused, the electric generator is a good example of demonstrating the difference. If you have your own electrical generators powering your home, it doesn't matter that thousands of other people have one just like it in their homes. If it goes down, you are on your own, and it's your responsibility to keep the electricity flowing from room to room. But if you plug into the electric grid run by your local power company, and there is an outage while you are having dinner somewhere, likely it will be fixed before you even get home from the restaurant. And you might not even notice there was a problem since you weren't at home (you were out dining in the "Dinner Cloud" and outsourcing the washing of dishes). This is because the system was monitored, a problem was detected, and a team was ready to spring into action once the outage occurred.

How long would it have taken to call the generator repairman to get him scheduled to come out with a power outage in your own generator? There's a reason electricity grids have evolved the way they have.

Oh, and all of the innovation occuring behind the scenes at the power company on a day to day basis? It comes to you automatically, often while you sleep, as opposed to a new giant chunk of hardware arriving every 18-24 months that you have to figure out how to configure and get up and running again.

So how is this relevant to StrikeIron?

Well, the same is also true in our case. While we are more the Software-as-a-Service variety of Cloud Computing (and in our case "data-as-a-service"), we recognize that users have a choice in the way to obtain the type of functionality we offer. A lot of the powerful capabilities we have such as our Cloud-managed Contact Record Verification Suite, such as real-time telephone, address, and email verification, could also be purchased and brought in-house as software applications and raw data sources, and a similar result could be achieved in terms of better, more usable customer data assets. The approach would just be a heck of a lot different.

In the latter scenario, all of the verification reference data would have to be managed and maintained internally. One would have to acquire the software and data files, and then get the functionality up and running. It would then have to be designed and delivered in such a way to be able to handle the various loads of data verification that might appear from different applications at different times, and often in high volume scenarios. Also, all of the other expertise around availability, testing, updating, and the usual effort associated with in-house solutions would have to be developed internally.

With us, all we do day in and day out is focus on verifying and delivering our real-time data verification capabilities to thousands of applications simultaneously with a very high level of performance at all times, delivering 24x7x365. All you need to do, just like the electric company, is plug into us. All of the data management, updating, software maintenance, and performance testing and improving is done by us, with all of the heavy lifting abstracted from you.

Since we launched our system in 2005, we have constantly improved our finely-tuned delivery and fault-tolerant capabilities, including load-balancing, high speed data I/O, redundancy, external monitoring, and everything else we have to provide to be able to support our customers and their production applications. And we are getting smarter and better about how we go about it every day. This expertise is something that each and every one of our customers gets to leverage with every single call to our system. This is why we have only had minutes of downtime over the last four years.

So could in-house solutions provide the same end result? Maybe in the sense that yes you could end up with good clean customer data somehow on your own. But at what cost, effort, and with what missed opportunities? Focus on your core business, and leave the external data verification effort to us. We will keep the lights on. Guaranteed.

A Void in the Enterprise Software Space Equals A Perfect Storm of Opportunity?

  
  
  
  
  

Acquisitions of companies such as Sun, BEA, Peoplesoft, Cognos, Siebel, Business Objects, and countless others the past few years have created a competition vacuum in the enterprise software space. For example, in the last five years or so, Oracle has spent over thirty billion USD purchasing nearly sixty companies. Microsoft has gobbled up eighty or so, IBM sixty, EMC forty and Hewlett-Packard approximately thirty-five. And these are just the giants.

The next tier of enterprise software companies also have pretty long lists of recent acquisitions. So one can imagine quite easily that this collective buying spree has created a deep void in the landscape of enterprise software, and as a result creates a tremendous opportunity.

After all, not much has happened in terms of new products and innovation in the space in the past several years, save for a handful of companies such as Salesforce.com, NetSuite and some of the various SAAS and open sources models that have emerged. But even much of this is nearing the ten year mark.

Interestingly, some of the Fortune 500 have annual I.T. budgets north of a billion dollars per year. And those that don't have budgets that are indeed quite large. This, combined with the fact that many of their primary systems were built and deployed in the 1990's (yes that's ten to twenty years ago) and are getting a bit "long in the tooth" as they say of aging horses, creates an interesting set of dynamics.

In addition, Cloud infrastructure is maturing and getting more firmly in place with more efficient computing resource and data storage models. It is quickly becoming the seedbed for future enterprise software innovation, not only in new software categories, but also in the traditional categories of business intelligence, analytics, data management, and employee and customer-facing applications.

All of these trends point to a "perfect storm" of opportunity. Their alignment ought to be attractive to a new wave of entrepreneurs that can take advantage of the emerging Cloud Computing trend in new and exciting ways. This will enable a great deal of new innovation in the enterprise/corporate information technology space.

So while much of the technology press is caught up in all of the Android-iPhone rage, Facebook privacy issues, and the Groupons and Four Squares of the world, quietly many technology veterans are taking notice of this enterprise software void and recognizing the opportunity for what it is.

As one example, Marc Andreessen of Netscape fame has recently indicated that his venture capital firm is investing in a "new wave" of enterprise software companies. Others are sure to follow this trend of focus including both entrepreneurs and investors.

In other words, I don't subscribe to the opinion held by some that enterprise software is dead. So over the next couple of years, I do expect a wave of new enterprise software companies to emerge, setting off another arms race in the corporate I.T. space as organizations battle it out to stay a step ahead of their competition.

Fortunately, companies like StrikeIron with our data-as-a-service external data and data verification components can benefit extensively from this trend, providing important pieces to these emerging applications with ease.

It should be exciting times ahead.

 

Private Clouds More Likely Option in the Enterprise?

  
  
  
  
  
Cloud computing is growing at a fast pace and will continue to do so for quite some time. The Gartner Group for example has projected a tripling of the market in the next five years, and almost everyone else is projecting some level of super-charged growth in the space. Now of course, this all depends on what you include or don't include in your definition of cloud computing (Google Apps for example). As long as you are consistent in your personal definition, the growth ought to be of a similar magnitude.

The reasons for this growth are the advantages that cloud computing provides, including faster deployment, smoother scalability, pay-for-what-you-use business models, and no capital expenditure on the hardware and software that comprises the architecture. Amazon, Microsoft, IBM, Google, Opsource, and Rackspace are all companies offering public cloud infrastructure for rent, and a myriad of vendors have lined up to add layers of capabilities on top of these offerings such as RightScale, and the ecosystems that can take advantage of these architectures such as StrikeIron's are continuing to invest in the space as well. Unfortunately Sun's promising efforts in this space have been discontinued by Oracle for one reason or another.

This public computing resource trend has been great for startups because new companies can launch on cloud infrastructure "virtually" overnight, without the traditional costs tied to software, hardware, and the management of those resources, which traditionally has required them to seek and spend time on obtaining private funding. Reducing startup "start friction" has in turn created a bubbling sea of innovation as of late.

However, there has been more reluctance in the enterprise space to move to the "Cloud" because of worries about security and losing control when utilizing these public resources. There are just some highly-valued sets of data and mission-critical business processes that many organizations just don't want to put in the hands of a third party.

As a result, many of these companies are now building out their own "private cloud" infrastructure that mirrors the public clouds in functionality. This "member-only" infrastructure can then be shared across business units and geographies in an effort to eliminate IT redundancy, reduce costs, and increase efficiency, just as public clouds do for the masses.

Because of this trend, many of the cloud infrastructure providers are now offering virtual private capabilities. For example, Amazon's Virtual Private Cloud (Amazon VPC) is in an effort to provide a "hybrid" solution for enterprises building out a private cloud where some public computing resources can be utilized where it makes sense to do so.

What's still not clear though is what actual separation of data on the actual public cloud servers really occurs, rendering the concept by some as an exercise in marketing, at least so far. However, the enterprise market for cloud computing is potentially huge, so I am expecting a lot more to occur in this space.

There definitely are solid cases to be made for both public and private clouds (as well as hybrid solutions), so my guess is these two will co-exist for quite some time, and the line as to what separates the two will be somewhat blurred (as usual). The end result will be that whatever route or combination of routes companies employ in the new age of the Cloud, these efforts will leave more resources available for actual innovation rather than infrastructure management and a repetitive IT exercises, and that can only be good for us all, right?

 

All Posts