To SQL (relational) or not to SQL (NoSQL) that is the question

March 19, 2012

This document is work in progress. I have been asked to review this issue and I am providing this info in a draft form.
This post focused on MongoDB; you will find these issues related with other NoSQL DBs, but the particulars may be different.

As always, the answer to the question “do we go relational DBMS or NoSql?” is: ‘it depends’. And it depends on a number of issues. I will try to enumerate and address those issues in the posting. I don’t assume that I have covered all issues; you should not either.

Before we begin, I am going to ask a question of the reader, “do you have a data description document?” Something in writing (written down, not just in your head) that describes your data requirements and how you will be using your data. I know that some people are thinking that I want you to have a E-R diagram fully developed, that is not the case. I just want the reader to have some sense what data they will be working with, how is that data related (or not), what they need to do with the data, etc. If you know those things then you will be able to determine what issues are relevant to you and how important are they.

Schema
All data is related. If your data is not related then you may/may not need a DB to store your data.  Generally, in all  non-trivial, large-scale, productions deployments of data stores, there is some piece of data (entity or attribute) that is related to another. Therefore, the idea that NoSQL databases are schema-less may not be the best way of thinking about your data.  I believe that you will want/need to develop some sort of schema – description of the data, how is organized, how is it related and how it will be used.
Note: I do not suggest creating a E-R diagram as it will drive you down the ‘relational’ path. However, you will want that description to define your collection and documents if you decide go down the ‘NoSQL’ path.

Relationships Between Entities
This is a very tricky question, because the answer depends on what data entities you have and how are they related. Again, having a schema makes addressing this issue easier. If you find that you data (read entities) have a large number of relationships, then NoSQL may not be the best solution.
I suggest that you create a very high-level E-R diagram and then take the entities and see if you can ‘easily’ refactor the schema into a MongoDB schema – how efficiently and effectively and can you embed the related entities into objects and arrays inside a BSON document. Also, this ‘embedding’ will be supported by client side linking … more on that later.

Atomicity
While Mongo does support some [built in] atomic operations (e.g. findAndModidy), it currently it does not guarantee atomic operation on a single documents. If you have a schema where a number of entities need to be atomically updated at the same time (read transactions) then Mongo is not right for you

Consistency – Single DB
RBRMS are strongly consistent by design; most allow table and/or row level locking.
MongoDB (current version) is strongly consisted because it has one global lock for read/write operations. There is some talk of have global, collection level locking. Bottom line, if your DB is going to be to a significant number of concurrent read/write operation then Mongo may not be the best solution for you.

WYOW Consistency (single server)
One proposed work around is to leverage MongoDB’s atomic update semantics with an optimistic concurrency solution. This comprises four basic steps: 1. read a document; 2. modify the document (i.e. present it to the user in a web form); 3. validate that the document hasn’t changed; 4. commit or abandon the user’s update.
Note: There are a number of posts regarding read-your-own-write consistency that that would be good to review is this is a large issues for you.

Consistency – Distributed DBMS
For ‘industrial strength’ DBMS this is a solved problem. For example, Oracle has RAC. If you really, really need it then it may be worth the money, but be very sure you need it as it is a very expensive solution.
MongoDB does not offer master-master replication or multi-version concurrency. In other words, writes always go to the same server in a replica set. By default, even reads from secondaries are disabled so the default behavior is that you communicate only with one server at a time.   This may need to be its own posting as this is a complicated issue.  More on this later.

Querying
Many RDBMS have standardized on SQL (ANSI) and are generally consistent. However, your stored procedures are not portable.
MongoDB has a relatively rich set of data access and manipulation commands. The find (select) command returns cursors. However, the language is particular to Mongo.

Indexing
Both RDBMS and MongoDB support the declaration and use of indexes.

Scalability
Scale-out is relatively easy. Scale reads by using replica sets. Scale writes by using sharding (auto balancing). There are issues with Sharding that need to be understood. More on that later.

Cost
MongoDB is ‘free’. However, there are a number of ‘free’ RDBMS. But, as always, you need to factor in the costs for development and production support – which are non-trivial.

Maintainability
This is a challenge for a new player like MongoDB. The administrative tools are pretty immature when compared with a product like MySQL.


AT&T launches Synaptic Compute with support for hybrid clouds

March 7, 2012

AT&T launches Synaptic Compute with support for hybrid clouds. AT&T says the system supports bursting, disaster recovery and mobile application development and deployment.

AT&T’s goal is to ease the transition for enterprises to upgrade from a private cloud to a hybrid cloud system using AT&T’s network to provide addition storage and compute power. The service targets VMWare customers that are using the vCloud Datacenter.

AT&T claims that the system supports bursting, data center extensions, disaster recovery, and mobile application development and deployment. New features include virtual machine cloning, scalable computing and memory resources, multiple user interfaces, a multi-layer firewall and open standard software.


Selection of Hybrid Cloud Vendors

February 22, 2012

How do you evaluate and select a hybrid cloud vendor; it really depends on the problem(s) that you need to address and solve. If data transfer and storage are critical then the most important issues are those of bandwidth and data transferring. If the system needs to support bursting, or spikes in web traffic and/or computation loads then price may be more important.

What follows is a description of our evaluation and comparison of vendors based on our needs for a hybrid cloud infrastructure that provides a private cloud (more like managed hosting) and a public cloud (that provided elastic/on-demand computing resources).
Please bear in mind that your needs are likely to be different.

Cloud computing vendors were evaluated using the follow criteria:
Completeness of Hybrid Offering – By vendor or in combination with 3rd party.
Maturity of Offering(s) – Relative length of time vendor has been providing hybrid offerings.
Cost – Total costs.
Reliability – SLAs for the private and public clouds
Bandwidth and Data Transfer – Maximum bandwidths for data transfers between clouds.
Self-service Support –
Developer Support – How ‘developer friendly’ is the infrastructure (and vendor).
Portability of Deployments – How easy is it to move deployment from one vendor to another.
Integration Support – Support for open standards or public APIs for integration.
Security – Tools and capabilities.
Management – Management tools for both public and private clouds.

Notes
1) As portability of deployment is a critical requirement, no PaaS solutions were considered as those solutions, by their design and implementation, are not portable.
2) Computing and storage costs are highly dependent on configurations.
3) Portability of IaaS cloud implementation can be heavily dependent on how the systems are configured and deployed.

The following vendors were considered as we believe that their current offerings could address most of our evaluation criterion: AWS, ATT, Datapipe, Go Grid, IBM, RackSpace, Terrmark.

Potential candidates
Datapipe – Has strong manage hosting and ability to hybridize Amazons solutions with its own. Claims seamless integration between AWS and Datapipe environments, high I/O performance, and integrated support and management.
Go Grid – Smaller, independent provider of public and private clouds. Very high SLAs. Competitive pricing. All APIs are proprietary, portability may be an issue.
RackSpace – Strong managed hosting. Open source development via OpenStack project. Offers some hybrid configuration. Is moving quickly to provide fully featured, hybrid offerings.

Vendors that are lacking in one/more critical areas
ATT – Very strong in managed hosting. And, Synaptic Compute is an ambitious offering. However, the services appears to still be in beta (not fully released)
AWS – Amazon does not provide a native, hybrid cloud offerings and they do not provide non-virtualized servers. They do provide hybrid offering in partnership with 3rd party vendors (e.g. Equinix) via Direct Connect. However, that would require us to provision two separate clouds with two different vendors.
CSC – Nascent hybrid solutions.
IBM – Strong managed offerings. Complex contracts and pricing structures. Focused on large enterprises. Level of commitment to full set of hybrid offerings is unclear at this time.
Terrmark – Moving quickly into the hybrid cloud space with the acquisition of CloudSwitch. However, their hybrid offering are relatively new.


The Future of Hybrid Cloud Computing

January 25, 2012

Currently there are few standards for interoperability between public and private clouds.

There are a number of ‘forces’ that are shaping the future of cloud computing

Amazon – AWS is still the dominate force in cloud computing.  They are the largest and at the same time the most innovative vendor in the space – no mean feat.  Amazon is working to working to support interoperability between its offering and private cloud enterprises via a published API.

Rackspace –  OpenStack – The community has over 150 members that are dedicated to creating an interoperability model for a variety of cloud configuration.  However, OpenStack is still ancient; it’s future as an industry standard is not certain.

VMWare – VMWare claims 80% market share in private cloud deployments.  vCloud – The service is based on VMware’s vSphere and vCloud Director (vCD), exposes the vCloud API.  vCD is a key part of VMware’s strategy for driving adoption of hybrid clouds.  It provides interoperability between VMware-virtualized infrastructures and 3rd party service providers.  These service providers are part of VMware’s service provider partner program.  Note. It can be challenging integrating public cloud services from vendor that are not VMWare based.

The efforts of the major players is shaping the future of hybrid cloud computing.

– AWS Direct Connect to private cloud vendors such as Equinox
– AT&T’s Synaptic Compute as a Service makes the company’s IaaS public cloud compatible with VMware’s vCloud Datacenter offering.
– CSC Cloud Services
Datapipe
Go Grid
– IBM enhanced its Smart Cloud offering by the acquisition of Cast Iron
– Rackspace’s commitment to OpenStack
Terrmark


How Cloud Computing Will (already has) Transformed Enterprise Computing

October 25, 2010

There is no shortage of definitions of cloud computing.  See the article in Cloud Computing Journal 21 Experts Define Cloud Computing.  And yes, there are 21 different definitions and many of them have significant differences.
Needless to say, the definition is subject to a variety of interpretations.  The latest Gartner report on Cloud Computing Systems did not include Google (their app engine was seen as an application infrastructure) or Microsoft (Azure was seen as a services platform).  You have to take these things with a ‘grain of salt’ – Gartner’s report did not have Amazon in the ‘leader’s quadrant’.
One general description that I like is that cloud computing involves the delivery of hosted services over the Internet that are sold on demand (by time, amount of service and/or amount of resources), that are elastic (users can have as much or as little of a service or resource as they need), and that are managed by the [service] provider.

I attended a recent TAG Enterprise 2.0 Society meeting (un-conference).  During the discussions one of the participants asked “how do we go about starting to use cloud computing?”   The first thought that came to mind was ‘you already are’.  If you socialize on Facebook or LinkedIn, if you collaborate/network using Ning or Google Groups, if you Twitter, if you get your Email via Gmail or Hotmail, or if you use Saleforce.com then you are already using cloud computing – using applications/services that, in some form, run in the cloud.

A recent Newsroom release Gartner Research predicted that by 2012 (just two or three years hence), cloud computing will become so pervasive that “20 percent of business will own no IT assets”. No matter how you slice it that is a pretty bold statement to make (even for Gartner).
I don’t know if I believe that 20 percent of businesses will have no IT assets (by 2010).  I believe that there are significant issues that will preclude business from putting 100% of their IT assets in the cloud.  These include security of data (that is stored in the cloud), control and management of resources, and the risks of lock-in to cloud platform vendors.
What seems more plausible are reports by ZDNet and Datamonitor which predict that within the next few years up to 80% of Fortune 500 companies will utilize cloud computing application services (i.e. SaaS applications), and up to 30% will purchase cloud computing system infrastructure services.
In the near term, I see cloud computing as more of an implementation strategy.  Enterprise computing assets and resources (including social computing software and social media) that are currently implemented within enterprise datacenters will migrate into the cloud.
The shift toward cloud services hosted outside the enterprise’s firewall will cause a major shift in how enterprises develop and implement their overall IT strategies and, in particular, their Enterprise Social Computing strategies.
This shift toward and the eventual wide spread adoption of cloud computing by the enterprise will be driven by a number of factors

Cost (computing resources)
Late last year (2009) Amazon, Google and Azure lowered their published pricing for reserved computing instances (computing cores).  Amazon’s rate for a single CPU, continuously available cloud computing instance was little as 4 cents an hour (effective hourly rate based on 7×24 usage) for customers that sign up for a three year contract.
Single year contract rates were about 20% higher.  Pricing for on-demand instances (no upfront payments or long term commitments) was about two and a half to three times the three year contract rates.
A rough calculation says that a cloud data center of 10, single core servers (at a three year contract rates) could be operated around the clock under $0.50 an hour, or just under $3,500 a year (about $350 per server per year).  And that includes data center facilities, power, cooling, and basic operations.  Pretty impressive numbers!

Commoditization of Cloud Computing
And if the costs of cloud computing weren’t low enough Amazon announced pricing for EC2 ‘spot instances’.  This pricing model will usher in the beginnings of a trading market for many types of cloud computing resources: support services, storage, computing power, and data management.
Under the old model you had to pay a fixed price that you negotiated with a bulk vendor or a private supplier.  Now in the new spot market you can look that the latest price of available cloud capacity and place a bid for it.  It your bid is the highest, then the capacity is yours. Currently this is available from Amazon’s EC2 Cloud Exchange.

Leveling the playing field for startups and SMBs
One of the most important aspects of cloud computing is that SMBs can afford to do things they could not have afforded to do before;  they can do new, exciting, innovative things – not just the same old things for less money.
In the past, when SMBs needed to build a new IT infrastructure (or significantly upgrade the current one) they often could not afford to buy large amounts of hardware and the latest/greatest enterprise software.
In the cloud you pay for the hardware and software that you need in bite-sized chunks. Now the SMBs can afford clustered, production-ready databases and application servers, and world class, enterprise software (via SaaS).  Having equivalent technology can help ‘level the playing field’ when competing against large enterprises.
New Products and Services
The availability of large amounts of computer processing power and data storage will allow innovative companies to create products and services that either weren’t possible before or were not economically feasible to deploy and scale.
In the past, business ideas that required prohibitive amounts of computing power and data storage may not have been implemented due to technical restrictions or cost-effectiveness.  Many of these ideas can now be realized in the cloud.

Reliability
Most cloud computing vendors offer three and a half nines of service level availability – annual percentage uptime of 99.95% (or about 4 ½ hours down time per year).  If applications can be deployed to clusters of servers then downtimes will be greatly reduced.
Note:  ‘Five nines’ of SLA is said to available from a few vendors.  However, upon closer reading of their offerings you may find wording such as “we are committed to using all commercially reasonable efforts to achieve at least 99.999 percent availability for each user every month.”
As always, read the SLAs very carefully.

Agility
Cloud computing enables two types of ‘agility’.  The first is time to realization; how fast you can see that an idea is working or is not working.  Cloud computing support the rapid acquisition, provisioning, and deployment of supporting resources (potentially much faster than in traditional TI environments).
The second type of agility is flexibility (aka elasticity) of computing and service resources.  Elasticity can reduce the need to over-provision.  The enterprise can start small, and then scale up when demand goes up.  And, if they have been prudent with their contractual obligations, they can scale down when resources are no longer needed.

Cloud Vendors – The New and the Old
The early leaders Amazon, Google and Microsoft have been joined by big names like HP, IBM, Dell, and Cisco; even Oracle has gotten into the game. They are utilizing existing strengths to create successful cloud computing products and services for their customers and partners.
There is new generation of companies that are developing cloud offerings – see The Top 150 Players in Cloud Computing.  These new companies are likely to be more nimble and move more quickly than the current leaders.  We are already seeing a number of new, innovative approaches (technologies, business models, and openness) to cloud based services.

It is not an exaggeration to say that ‘the IT industry landscape will be remade by cloud computing’.


The Cost of Cloud Computing

May 30, 2010

Yes, I know, price isn’t everything, it’s not the only thing, but enterprise computing costs do matter!  If you don’t think so, then just imagine the conversation you will have with the C-level/VP/Director/Manager person when you go in and ask for X percent (say 10, 20, 30 or more) of this year’s Cap-X budget to fund the hardware  for your latest and greatest project (and don’t forget to include the support costs).  Oh yea, that will be a ‘fun’ conversation.

Cost of Cloud Computing Resources
Late last year (2009) Amazon, Google and Azure lowered their published pricing for reserved computing instances (computing cores).  Amazon’s rate for a single CPU, continuously available cloud computing instance was little as 4 cents an hour (effective hourly rate based on 7×24 usage) for customers that sign up for a three year contract.
Single year contract rates were about 20% higher.  Pricing for on-demand instances (no upfront payments or long-term commitments) was about two and a half to three times the three year contract rate.
A rough calculation says that a cloud data center of 10, single core servers (at the three year contract rates) could be operated around the clock under $0.50 an hour, or just under $3,500 a year (that includes servers, data center facilities, power, cooling, and basic operations). That’s about $350 per server per year – pretty impressive!

Commoditization of Cloud Computing
And if the costs of cloud computing weren’t low enough Amazon announced pricing for EC2 ‘spot instances’.  This pricing model will usher in the beginnings of a trading market for many types of cloud computing resources: support services, storage, computing power, and data management.
Under the old model you had to pay a fixed price that you negotiated with a bulk vendor or a private supplier.  Now in the new spot market you can look that the latest price of available cloud capacity and place a bid for it.  It your bid is the highest, then the capacity is yours. Currently this is available from Amazon’s EC2 Cloud Exchange.


AWSome (ATL Cloud Computing) March 2010 Meetup

February 25, 2010

AWSome March 2010 Meetup

“Cloud and Virtualiztion Security”
Taylor Banks
Owner at KnowThreat

“More Cloud Security”
Special Guest

Who should attend?
Anyone who is working for startups that is thinking about using the cloud should attend.
The cloud has removed the barrier to entry to own a data center.  However it has not removed the need for control and management of resources or the need to secure enterprise data.