Subscribe by Email

Your email:

free-trials

contact-us

StrikeIron Blog

Current Articles | RSS Feed RSS Feed

Amazon's NoSQL and Database Evolution: What Can Be Learned

  
  
  
  
  
Late last week, Amazon released an update to its DynamoDB service, a fully managed NoSQL offering for efficiently handling extremely large amounts of data in Web-scale (generally meaning very high user volume) application environments. The DynamoDB offering was originally launched in beta back in January, so this is its first update since then.

The update is a "batch write/update" capability, enabling multiple data items to be written or updated in a single API call. The idea is to reduce Internet latency by minimizing trips back and forth to Amazon's various physical data storage entities from the calling application. According to Amazon, this was in response to developer forum feedback requests.

This update to help address what was already an initial key selling point of DynamoDB tells us that latency is still a significant challenge for cloud-based storage. After all, one of the key attributes of DynamoDB when first launched was speed and performance consistency, something that their NoSQL precursor to DynamoDB, SimpleDB, was unable to deliver, at least according to some developers and users who claimed data retrieval response times ran unacceptably into the minutes. This also could have been a primary reason for SimpleDB's lower adoption rates. Amazon is well aware of these performance challenges, and hence the significance of its first DynamoDB update.

Another key tenant of DynamoDB is that it is a managed offering, meaning the details of data management requirements such as moving data from one distributed data store to another is completely abstracted away from the developer. This is great news, as complexity of cloud environments was proving to be too challenging for many developers trying to leverage cloud storage capabilities. The masses were scratching their heads as to how to overcome storage performance bottlenecks, attain replication, achieve response latency consistency, and perform other operations-related data management challenges when it was in their purview to do so. By the way, management complexity will likely still be a major challenge for other NoSQL vendors, and there are many "big data" startups offering products in this category, who do not offer the same level of abstraction that DynamoDB offers. It will be interesting to see if the launch of DynamoDB becomes a significant threat to many of these startups.

We learned this reduction of complexity lesson at StrikeIron within our own niche offerings as well. We gained a much bigger uptake of our simpler, more granular Web services APIs, such as email verification, address verification, and other products such as reverse address and telephone lookups as single, individual services, rather than complex services with many different methods and capabilities. This proved true even if the the more complex services provided more advanced power within a single API. In other words, simplified remote controls for television sets are probably still the best idea for maximum television adoption, as initial confusion and frustration tends to be inversely proportional to the adoption of any technology.

Another interesting point is that this is the fifth class of database product offerings in Amazon's portfolio. Along with DynamoDB, there is also still the aforementioned SimpleDB, a schemaless NoSQL offering for "smaller" datasets. There is also the original S3 offering with a simple Web service based interface for storing, retrieving, and deleting data objects in a straightforward key/value pair format. Next, there is Amazon RDS for managed, relational database capabilities that utilize traditional SQL for manipulating data and is more applicable for traditional applications. Finally, there are the various Amazon Machine Image (AMI) offerings on EC2 (Oracle, MySQL, etc.) for those who don't want a managed relational database and would rather have complete control over their instances (and not have to utilize their own hardware) and the RDBMs that run on them.

This tells us that the world is far from one-size-fits-all cloud database management systems, and we can all expect to be operating in hybrid storage environments that will vary from application to application for quite some time to come. I suppose that's good news for those who make a living on the operations teams of information technology.

And along with each new database offering from Amazon also comes a different business model. In the case of DynamoDB for example, Amazon has introduced the concept of "read and write capacity units", where charges will be based on the combination of frequency of usage and physical data size. This demonstrates that the business models are still somewhat far from optimal, and will likely change again in the future. Clearly they are not yet quite right for the major vendors trying to figure it all out as business model adjustments in the Cloud are not just limited to Amazon.

In summary, following the Amazon database release timeline over the years yields some interesting information, namely that speed/latency, reduction of complexity, the likelihood of hybrid compute and storage environments for some time to come, and ever-changing cloud business models are the primary focus of cloud vendors responding to the needs of their users. And as any innovator knows, the challenges are where the opportunities are.

DynamoDB

Cloud Landscape: Cloud Databases Emerging Everywhere

  
  
  
  
  

2011 has been the year of the Cloud database. The idea of shared database resources and the abstraction of underlying hardware seems to be catching on. Just like Web and application servers, paying-as-you-go and eliminating unused database resources, licenses, hardware, and all of the associated cost is proving to have attractive enough business models that the major vendors are betting on it in significant ways.

The recent excitement has not been limited to just the fanfare around "big data" technologies. Lately, most of the major announcements have come around the traditional relational, table-driven SQL environments Web applications make use of much more widely than the key-value pair data storage mechanisms "NoSQL" technology uses for Web-scale data-intensive applications such as Facebook, NetFlix, etc.

Here are some of the new Cloud database offerings for 2011:

Saleforce.com has launched Database.com, enabling developers in other Cloud server environments such as Amazon's EC2 and the Google App Engine to utilize its database resources, not just users of Salesforce's CRM and Force.com platforms. You can also build applications in PHP or on the Android platform and utilize Database.com resources. The idea is to reach a broader set of developers and application types than just CRM-centric applications.

At Oracle Open World a couple of weeks ago, Oracle announced the Oracle Database Cloud Service, a hosted database offering running Oracle's 11gR2 database platform available in a monthly subscription model, accessible either via JDBC or its own REST API.

Earlier this month, Google announced Google Cloud SQL, a database service that will be available as part of its App Engine offering based on MySQL, complete with a Web-based administration panel.

Amazon, to complement its other Cloud services and highly used EC2 infrastructure, has made the Amazon Relational Database Service (RDS) available to enable SQL capabilities from Cloud applications, giving you a choice of underlying database technology to use such as MySQL or Oracle. It is currently in beta.

Microsoft also has its SQL Azure Cloud Database offering available in the Cloud, generally positioned as suited for applications that use the Microsoft stack for developers that will want to leverage some of the benefits of the Cloud.

"Marketecture?"

Some of the above offerings have only been announced so far, and not actually launched. Or, they have limited preview access available now. Also, even the business models in some of these cases have not even been completely divulged, or if so are very likely to change.

Clearly there is a considerable marketshare land grab existing now.  All of the major vendors are recognizing that traditional-SQL Cloud storage infrastructure will be an important technology going forward. Adding a solid database layer to the Cloud architecture story seems like an important step in the continuing enterprise and commercial software move to the Cloud, and these new vendor offerings should in turn accelerate this move.

Latency?

So, is this really the wave of the future? Some of the major questions that will have to be answered include those around latency. When data requests have to hop from a client application, then to the application server, to the database, and then back to the server and client, even multiple times within a single request, it can result in quite a performance hit. Likely, these machines exist far from each other geographically and might really slow things done, annoying an end-user with the slow page loads. This is probably why most infrastructure providers realize that they have to have the corresponding database capabilities available and accessed natively to reduce this latency. However, performance, along with security issues (perceived or otherwise) still could be a significant barrier to mainstream adoption.

Also, most of the relational database environments that exist in the Cloud only have a subset of SQL capabilities available and in some cases can be quite limited. For example, many of these Cloud SQL platforms don't support cross-table joins, at least not yet. This is a very common requirement for SQL applications. The lack of support is primarily because joins can consume a lot of resources, another performance-killer in shared environments.

Next?

Once most of this storage and Cloud database infrastructure gets in place however, incorporating more content-oriented data services such as customer data verification will become commonplace and easy to leverage.  We may even see them incorporated into the database offerings themselves as they look to differentiate themselves from vendor to vendor. Cloud-based database offerings have the advantage of making much larger libraries of data-oriented add-on capabilities available right out of the box, so the story here is much more than just cost.

While SQL Cloud offering announcements are all the rage in 2011, 2012 will undoubtedly tell the adoption tale. No doubt these offerings will be ideal and cost-effective for many use cases out there. But will demand be large enough quickly enough to support all of these vendors and drive the innovation at a speed that will make these platforms viable in the near future for enterprise and commercial applications? The answer is likely yes, but the next twelve months or so will give us a lot of the supporting data to measure the extent of the trend.

clouddb

All Posts