As you may know, StrikeIron is an Informatica Cloud partner. We recently won another customer account that will be using the StrikeIron Contact Record Verification suite to clean their records as they move between Salesforce.com, a proprietary marketing database, and Eloqua via Informatica Cloud. To help this customer get started, we wanted to be able to run Informatica Cloud on a Mac as well as have a test platform that was remotely accessible from anywhere.
Running Informatica Cloud on AWS accomplished both of these goals. We could run the secure agent on the EC2 instance and then access the Informatica Cloud web front end from a Mac or any of our customer's computers without worrying about firewalls, etc.
This tutorial will go step-by-step through how to create an AWS EC2 Windows Server instance and install the Informatica Cloud Secure agent.
The first step is to create your Amazon AWS account on this page by clicking the “Sign Up” button in the top right corner. The instance created in this tutorial will run in the free tier so if you are a new user, it should not cost you anything. Once your account is created and approved we are ready to start.
Create the instance:
1) Log into your AWS account at: https://console.aws.amazon.com/console/home
2) You should be on the AWS Management console screen. Click the EC2 icon . This will take you to the EC2 Console Dashboard.
3) Click the “Launch Instance” button to display the Create New Instance Dialog.
4) Make sure the Quick Launch Wizard radio button is selected. There are three key pieces of information you will enter on this screen:
In the “Name your Instance” field type "InfaCloudTest” or whatever you would like to call this instance.
In the “Choose Your Key Pair” section, select the "Create New" radio button and name your security key pair “InfaCloudTest”. The key pair is used to create a secure password for your remote desktop. Click “Download” to download your PEM file to your computer. Note the location as you will need it later.
Finally, you will select the instance configuration. Choose the “Microsoft Server 2008 Base” with the 64 bit option selected.
Your "Create New Instance" dialog box should now look like this:
5) Click “Continue” to see the next step in the "Create a New Instance" process.
6) The next dialog should look like the following. You should not need to change anything but there are two important settings to note. First, make sure the Shutdown behavior is set to “Stop”. “Stop” means that if you shutdown the instance, all of your data will persist – just like a normal PC. If this option is set to “Terminate” your instance will be effectively formatted and will also disappear from your instance table next time Amazon does a cleanup sweep.
The next important item is the Security Group. Amazon creates a default security group for you. Depending on what endpoints you connect to, you may need to open up ports in the security group later.
7) Click “Launch” to continue. You will receive a confirmation box saying that your instance is launching. Click “Close”.
8) You will be taken back to the EC2 Management Console. On the top-right hand side, you will see a section called “My Resources”. It should now show that you have 1 running instance (you may need to wait up to 2 minutes then click refresh for it to show up).
9) Click “1 Running Instance” and you will be taken to the “My Instances” page as seen below. Click the check box to the left of your instance name (InfaCloudTest) to display the instance information in the bottom pane. Take a look at this information which includes the full domain name, security groups, and elastic IP if you have linked one (note: we do not need an elastic IP for running Informatica Cloud).
10) Right click on the instance and select “Connect” as seen below:
11) You will see a dialog box like below which contains the remote desktop login details for your instance.
12) Click the “Retrieve Password Link”. You may get a warning saying “Not Available Yet”. If so, you will need to wait up to 15 minutes.
13) Click “Choose File” and find the PEM file you downloaded in step 4.
14) Click “Decrypt Password”. This will display a dialog box with the login information.
15) Note the Public DNS, username, and password as you will need this information to Remote Desktop into the machine. You can download a shortcut file to a Remote Desktop Instance as well.
16) Now open your Microsoft Remote Desktop Application. This will be in the Application Folder if you are on a Mac (RDS comes with Office or you can download from: http://www.microsoft.com/mac/remote-desktop-client) or access via "Program Files | Accessories | Remote Desktop Connection" if you are on a PC.
17) For the computer name, enter the Public DNS entry (note: this will change each time you stop and restart and instance).
18) Remote Desktop will pop up a login box. Enter “Administrator” as the User Name and the password you copied from step 15 above. Leave the domain field blank. Click the “Add this information in your keychain” if you are on a Mac to remember your password.
19) You may receive a warning that the server name on the certificate is invalid. Click “Connect”.
20) You should now be logged into your AWS Windows instance and see a Windows desktop.
Installing Informatica Cloud:
21) Start up Internet Explorer. Select “Don’t use recommended settings” if prompted. Internet Explorer comes with very tight security settings on Windows Server so I suggest you navigate to http://google.com/chrome and download Chrome to save some time and frustration. You will likely have to add several google domains to the Trusted Sites list when prompted to download.
22) Navigate to www.informaticacloud.com and click “Login Here” in the top right corner.
23) Login using your Informatica Cloud credentials.
24) Click “Configuration”. Click “Agents”.
25) Click the yellow “Download Agent” button.
26) Select “Windows” as the platform and click “Download”.
27) When the agent_install.exe dialog is complete, click agent_install.exe and “Run” in the Windows security box.
28) Select the default values for the Informatica Cloud Agent install wizard and click “Done” when complete.
29) Enter your Informatica Cloud credentials and click “Register” in the setup box.
30) After approximately 30 seconds, you should see that the Secure Agent is up and running on the Windows Server.
31) You should see the Agent populate on the Informatica Cloud site in the Configuration | Agents section.
32) If you are going to use files or database on the AWS Windows Server, you will also need to add a connection to the EC2 instance. For example, to read/write flat files on the Windows Server, in the Informatica Cloud web app, click “Configuration”, and then “Connections”. Click the yellow “New” button:
33) Create a target directory on the Windows Server, "c:\infacloud" in this case, and fill out the new connection information as seen below:
Your Informatica Cloud instance is now ready. You can create Contact Validation, Data Synchronization, and other tasks.
I hope you found this tutorial helpful. Please leave any questions or comments below or feel free to drop us an email at email@example.com
Late last week, Amazon released an update
to its DynamoDB
service, a fully managed NoSQL
offering for efficiently handling extremely large amounts of data in Web-scale (generally meaning very high user volume) application environments. The DynamoDB offering was originally launched in beta back in January, so this is its first update since then.
The update is a "batch write/update" capability, enabling multiple data items to be written or updated in a single API call. The idea is to reduce Internet latency by minimizing trips back and forth to Amazon's various physical data storage entities from the calling application. According to Amazon, this was in response to developer forum feedback requests.
This update to help address what was already an initial key selling point of DynamoDB tells us that latency is still a significant challenge for cloud-based storage. After all, one of the key attributes of DynamoDB when first launched was speed and performance consistency, something that their NoSQL precursor to DynamoDB, SimpleDB
, was unable to deliver, at least according to some developers and users who claimed data retrieval response times ran unacceptably into the minutes. This also could have been a primary reason for SimpleDB's lower adoption rates. Amazon is well aware of these performance challenges, and hence the significance of its first DynamoDB update.
Another key tenant of DynamoDB is that it is a managed offering, meaning the details of data management requirements such as moving data from one distributed data store to another is completely abstracted away from the developer. This is great news, as complexity of cloud environments was proving to be too challenging for many developers trying to leverage cloud storage capabilities. The masses were scratching their heads as to how to overcome storage performance bottlenecks, attain replication, achieve response latency consistency, and perform other operations-related data management challenges when it was in their purview to do so. By the way, management complexity will likely still be a major challenge for other NoSQL vendors, and there are many "big data" startups offering products in this category, who do not offer the same level of abstraction that DynamoDB offers. It will be interesting to see if the launch of DynamoDB becomes a significant threat to many of these startups.
We learned this reduction of complexity lesson at StrikeIron
within our own niche offerings as well. We gained a much bigger uptake of our simpler, more granular Web services APIs, such as email verification
, address verification
, and other products such as reverse address and telephone lookups
as single, individual services, rather than complex services with many different methods and capabilities. This proved true even if the the more complex services provided more advanced power within a single API. In other words, simplified remote controls for television sets are probably still the best idea for maximum television adoption, as initial confusion and frustration tends to be inversely proportional to the adoption of any technology.
Another interesting point is that this is the fifth class of database product offerings in Amazon's portfolio. Along with DynamoDB, there is also still the aforementioned SimpleDB, a schemaless NoSQL offering for "smaller" datasets. There is also the original S3
offering with a simple Web service based interface for storing, retrieving, and deleting data objects in a straightforward key/value pair format. Next, there is Amazon RDS
for managed, relational database capabilities that utilize traditional SQL for manipulating data and is more applicable for traditional applications. Finally, there are the various Amazon Machine Image (AMI) offerings on EC2
(Oracle, MySQL, etc.) for those who don't want a managed relational database and would rather have complete control over their instances (and not have to utilize their own hardware) and the RDBMs that run on them.
This tells us that the world is far from one-size-fits-all cloud database management systems, and we can all expect to be operating in hybrid storage environments that will vary from application to application for quite some time to come. I suppose that's good news for those who make a living on the operations teams of information technology.
And along with each new database offering from Amazon also comes a different business model. In the case of DynamoDB for example, Amazon has introduced the concept of "read and write capacity units", where charges will be based on the combination of frequency of usage and physical data size. This demonstrates that the business models are still somewhat far from optimal, and will likely change again in the future. Clearly they are not yet quite right for the major vendors trying to figure it all out as business model adjustments in the Cloud are not just limited to Amazon.
In summary, following the Amazon database release timeline over the years yields some interesting information, namely that speed/latency, reduction of complexity, the likelihood of hybrid compute and storage environments for some time to come, and ever-changing cloud business models are the primary focus of cloud vendors responding to the needs of their users. And as any innovator knows, the challenges are where the opportunities are.
Much of cloud computing terminology is based on the notion of ‘as a Service’ (or ‘aaS’).
The ‘as a Service’ tag has migrated to several new uses. Here is my attempt at a set of definitions (and please comment if you disagree):
- SaaS (Software as a Service) – I mainly see this as an application that runs in the cloud and requires the user to download no (or very little, maybe a browser plugin) software to use the application. (e.g. SalesForce, Cisco WebEx, Google Apps)
- DaaS (Data as a Service)* – This is providing data over the cloud either as the result of a query (is the email address firstname.lastname@example.org valid) or involving a data transformation (correct the address 101 First Ave, Mytown, NC 2513). (e.g. StrikeIron!)
- PaaS (Platform as a Service) - Providing a platform for running applications, data storage abstraction, etc. One step up the software stack then IaaS (e.g. Google App Engine, Force.com/Heroku, PHP Fog)
- IaaS (Infrastructure as a Service) – Providing a virtual machine and storage mechanisms that can be loaded with operating systems and software (custom, open source, commercial, etc). (e.g. Rackspace, Amazon AWS, GoGrid)
There are some proprietary aaS’s as well. My favorite is HP’s Everything as a Service. I am not sure what this really is but it sounds impressive.
Clear as mud? There is certainly some overlap between the different technologies but at the end the trend is clear. Leverage the efficiencies of scale, lower the barrier of entry, and speed up the time for implementation.
*DaaS can also refer to “Desktop as a Service” and “Database as a Service” in several sources.