Pramak, A High-Tech Consulting and Software Services Company
Knowledge and Experience working for you
Scan the horizons with Pramak
                                  Musings on technologies that surround us

Tuesday, Oct 23, 2012

High-tech space observations 

There is much going on in the high tech industry.  We mention three observations that should be obvious to people who are following the industry closely. These relate to different aspects of computing.  Each is significant in its own right and independent of the others.

  • Software and hardware under one control

A few big high-tech companies seem to be realizing the benefit of following the Apple model of computing - of tightly controlling the hardware on which the company’s personal computing software runs.  

Apple, a pioneer in controlling the hardware on which its operating system software and applications run, has earned kudos about the quality of its devices in regard to their displays, battery life, sleek designs, finish, components used, reliability, and other aspects.  Because Apple has tightly controlled and customized the hardware to its exact needs, it has been able to reduce its costs of building its devices, thus wresting a handy profit from the sales of the same. We now see Microsoft dipping its toes into the same model with its Surface RT tablet for Windows 8. This tablet has received a lot of press. Though Microsoft is still relying on its rich ecosystem of hardware partners to build and market their devices running Windows 8, it has put itself as a player in a field which it had earlier relegated completely to its partners. Incidentally, if one goes by Microsoft's 10K filing, it is going to be getting more and more into hardware in the future. Another recent news is about Amazon negotiating a buyout of TI mobile chip business. The speculation is that such an acquisition would help Amazon lower the cost of the hardware for its Kindle Fire tablets and e-readers and for building more power efficient custom hardware for its sprawling AWS data centers.

So, there you have it. Apple, Microsoft, Amazon, three biggest names in consumer computing, are all at various stages - from the “firmly experienced”  to the “recent entrant” to the “new convert” - of the same model, that of having tight control of the devices on which their operating system and/or applications run.  Who will be next? 

  • Advertising models evolving

The Advertising world is likely to see significant changes in the next few years in the way advertisements are targeted at the user. With Google expected to get in the lead in display advertising, Microsoft trying all it can to get a bigger slice of the extremely enticing advertising market, Facebook under pressure to increase its mobile ad business, and Yahoo continuing its focus on display advertising, its traditionally strong business, we can expect old models to evolve and new models to emerge. 

One can expect ample creativity in how more data about consumers is acquired and used by advertisers and publishers to do better targeting and re-targetting of advertisements and get more click throughs.  We already see an increasing use of data that indicates users’ likes/dislikes and their visits to various places in the virtual and physical worlds.  With Facebook having a billion odd active users now, and facing pressure from Wall Street to generate more revenue, one can expect it to leverage its social platform and the data it has on its users in novel ways. Facebook’s various methodologies and schemes (mostly experimental at this point) such as Offers, Exchanges, Custom Audiences, Collections, Sponsored stories, and sharing of insights about its subscribers with its premier advertisers, are doing exactly this.  

We expect the advertising landscape to evolve significantly in the next few years. Google, a leader on all three platforms, search, mobile, and display, Microsoft with its vast array of Internet properties, and Yahoo a leader in display advertising until last year and continuing to focus on it, will like Facebook have their bag of new and improved offerings to lure marketers.  It promises to be a good four horse race with much creativity and innovation thrown in.

  • Data access and manipulation becoming easier

Data is becoming easier to access and manage. Big Data platforms employing programming languages that were hard to use by non-programmers such as data scientists can now be accessed through SQL like languages or even SQL. Also, relational databases that earlier couldn't scale as well as NoSQL databases are evolving and getting used in such platforms.

There is a trend toward having a single platform for both analytic and transactional processing over large volumes of data as we see with offerings from some companies providing NewSQL databases or those like Hadapt that use actual SQL to access Hadoop data stored in relational databases on the multitude of Hadoop machines or from cloud vendors like SAP that are using real time databases such as Hana for doing both OLAP and OLTP processing. Not only is the manipulation of data becoming easier and SQL and NoSQL processing being handled efficiently by a single database system as noted above, the movement of data between on-premise and cloud is also becoming easier with improvements to database offerings (Oracle 12c, SQL Azure/SQL Server) and the use of storage gateways. Amazon has been providing a storage gateway solution for a few months now while Microsoft has recently acquired StorSimple to do the same. 

There are also companies like Nasuni that provide “storage as a service” which makes it easy for on-premise services to transparently use a cloud provider such as Amazon, Microsoft, Rackspace, and others for storage.

For the end customer, handling data is becoming easier and easier by the day.

The above are observations of just three changes that we see happening. As is evident from the goings on in the high-tech industry, we are living in an era where much is changing around us. Changes that seem small at first can become big over time and impact our lives and experiences significantly. There are numerous smart, ambitious companies, with new ones entering the sector every day, all innovating in their chosen technology and business areas so as to differentiate their offerings from those of their competitors. Because of this, we will, in the days ahead, continue to see significant improvements in technologies, in the processes that leverage them, and in the experiences we get as end customers. We are certainly living in very interesting times.

Tuesday, Oct 9, 2012

Backup Blues 

Have you noticed that there have been several serious service disruptions across the globe in the last few months? These were in most cases due to faults in the services' high-tech infrastructure. The disruptions could have been short-lived in many cases and perhaps not even noticed had it not been for the backup systems failing to engage when needed. One might wonder why this has been the case so many times.

Organizations providing the services that were affected due to such disruptions will no doubt make some changes to their policies and procedures, or to other aspects of their operations as a result of the learning they acquire after investigating the cause of the disruptions. Their customers might do the same. However it is unclear how much of what these organizations learn or do to prevent such incidents or be better prepared for them the future would be shared by them with others. In the absence of these organizations or others that have analyzed these disruptions giving us a detailed analysis and a blueprint of preventing or at worst minimizing such incidents in the future, let us see if we can review some of these incidents ourselves and learn something from them. We can then combine any insights we get out of such a review with our knowledge of fault-tolerant, reliable, and resilient systems to arrive at a few best practices that we can then follow to minimize such incidents in the future,

Given below are some major disruptions that have happened in the recent past due to the backup systems in place failing to keep the service up.

  • Amazon loses power in North Virginia datacenter

Amazon lost both primary and backup power generators on getting a cable fault in its North Virginia datacenter in June, 2012. This affected AWS hosted services of companies such as Quora, Hipchat, Heroku, and others. For companies whose service infrastructure was not spread across multiple availability zones, it was total lights out for several hours.

Cause of backup failing: Primary backup power generator failed due to a defective cooling fan. The backup power generator failed due to a configuration error where a circuit breaker of the backup generator had erroneously been set to a low power threshold.

  • United Airlines reservation system halted

United Airlines’ reservation system stopped working for 2 hours due to a communication equipment failure in its datacenter that cut off all communication with its airports and and website. The backup systems did not kick-in.

Cause of backup failing: Under investigation

  • Traffic tunnels closed in Melbourne causing chaos

A core communication switch failure caused the neon speed and lane changing signal to stop working. The backup systems also failed. Because it was not clear in the incident control room whether the incident detection and safety systems such as sprinklers and ventilation system would work in case of an accident, the tunnels were closed for 12 hours.

Cause of backup failing: Under investigation

  • Tokyo Stock Exchange’s (TSE) derivative trading stopped

A router issue affected derivative trading. The backup system didn’t work also and all derivatives trading was halted. A failure of backup system had caused the worst trading disruption in TSE six months before also. A Tokyo based financial services company attributed the second disruption to lack of good testing by TSE.

Cause of backup failing: Switch to activate backup system did not work

  • Upgrade to IOS 6 disrupts Internet connectivity over Wi-Fi

Devices like IPad and IPhone that had been upgraded to IOS 6 could not connect to the Internet over Wi-Fi. This was due to IOS 6 failing to get to a file on the Internet that made it think that there was no Internet connectivity over Wi-Fi. The problem was possibly due to a human error of deleting the required file. Lack of a procedure to catch this kind of error caused the disruption. Because of a single point of failure for detecting Internet connectivity, people had to suffer loss of connectivity over Wi-Fi. So here the disruption was not due to a backup system not kicking in but instead due to the absence of a check to validate a conclusion.

Cause of backup failing: There was no backup scheme for validating a conclusion

On reviewing the above incidents, it seems that in all the above cases real life simulation of various failures could have tested the failover mechanisms and the operations of the backup systems. It is not known whether such testing was done by the affected organizations and if it was done then how frequently it was done and whether it was of high quality and comprehensive enough to cover the kind of failures that occurred later. Perhaps, we won’t ever know this or maybe would learn of this in the future when more details pertaining to the incidents are revealed.

Though doing good testing periodically is a basic necessity for ensuring reliability of the systems in production use, such testing might not find all issues. Some issues can always slip by even with good testing because simulations no matter how close they are to real life scenarios are not exactly the same as what one might encounter at a later point. Keeping this in mind, what one needs to do is to build good resiliency in the production system so that it continues to operate, albeit at a lower performance level, despite different types of failures occurring individually or together.

Looking at the causes of the above incidents and using our understanding of reliable and resilient systems, here are some best practices that could minimize the possibility of backup systems failing when needed.  This list is not exhaustive.

  • Measure “disruption risk”

An organization should periodically check for vulnerabilities in the fail-over mechanism and in the operations of the backup systems for the services it is providing. The learning from this should get translated into a quantifiable metric indicating the risk level of the service to disruptions. Such a metric and the details behind its calculation when reported to the chief risk or operations officer or others responsible for the health of the production systems of the organization would raise awareness of any issues with the services. This awareness would be a trigger for the actions that need to be taken to fix the issues and make the overall system more robust.

As an example of the values that the metric can take, the value can either be a numeric in a range, say 1 to 10, or a non-numeric textual value ranging from say “very risky” at one end to “very robust” at the other, or it could be of a different type, one that the organization feels comfortable with.

  • Test fault-tolerance regularly

An organization should test the fault-tolerance of the system to different kinds and combinations of disruptions. Since the disruptions can be due to one or more components like machines, infrastructure, network, or others hosting and supporting the service failing individually or together, the testing needs to be comprehensive and high quality. Such testing is likely find most if not all problems in the failover mechanisms and in the backup systems that can then be addressed.

  • Use simple models

Though not always possible, one should strive to keep things as simple as possible. The more complex the system architecture and components, the more one has to learn to understand them well and the more complicated their operations become. There are also then more aspects of the system that can go bad. Simple systems are always easier to understand, control and test.

  • Diversify

It is very important to have diversification across the primary and backup systems. Though this best practice is a bit in conflict with the one above of having simple systems, this is a prudent thing to do despite the extra complexity it might bring to the operations. The backup system should be as different as possible from the primary system. Diversifying across vendors and platforms can help in case a single vendor or platform is affected by an attack or a bug. For example, a virus or malware attacking all hosts of a single vendor or a single platform could cripple the service unless the backup system for the service is from a different vendor or using a different platform than the main production system.

  • Remove single points of failure

Reliability and resiliency of a system can go to naught if there is a single point of failure in the system. As an example, if the primary and backup power distribution systems for a datacenter share the same cooling subsystem or some parts of the distribution systems are in the same room, then failure of the cooling subsystem or fire in the common room will bring both the power distribution systems down. So the best practice here is to not have a backup system share anything with the primary system that it is a backup for.

  • Watch out for inadvertent disruptions

Sometimes unknown side effects of actions taken to keep the backup system robust and healthy can lead to serious disruptions. For example, an upgrade to new version of some software could block a firewall port by default which could then cause a loss of communication leading to disruption of service when the backup system kicks in. This is what happened with an upgrade to XP SP2 at a site where some processes lost communication because XP SP2 turned on the firewall by default. It is therefore prudent to ensure that backup systems don’t get afflicted inadvertently by what one perceives to be an innocuous or beneficial act.

  • Take the human out of the equation

Humans can make errors in configuring, testing, or monitoring backup systems due to oversight, distractions, or being sloppy or lazy in not following the correct procedures. One should use well tested automation as much as possible for these activities. Also one should have checks and balances in place to validate the actions of humans and automated systems.

  • Do analytics to catch outliers and get insights

Analyzing logs and operational data about a service’s operations as well of the infrastructure software and hardware supporting it can help find lurking issues. The logs and other data pertinent to a service’s operation should be saved over time and analytics run on them to get good insights about the operations. This can help in ferreting out potential issues that develop over time and for gaining a deep understanding about the overall system. Big Data analytics on all the data collected over time as against a sample can help find outliers that can point to potential issues that might get missed otherwise.

  • Provide insurance for backup

Providing insurance for backup means providing a backup for the backup system. This is to ensure that a service doesn’t come to a complete halt when the backup to the primary system fails. The more backups there are the better it is. As an example, Okta, a leading provider of identity and access services, has the philosophy of using all availability zones provided by Amazon.

Consideration of the economics and technical feasibility of providing insurance for a service under different situations along with an assessment of the risk to the reputation of the organization and loss of business that can result in not providing such insurance can help an organization decide on whether it wants to provide the insurance and if it provides it then how sophisticated it is.

Some examples of insurance for certain kinds of backup system failures are given below

    1. Backup has no Wi-Fi connectivity

Insurance - switch to cellular. This is what IOS 6 did when it determined erroneously that there was no Wi-Fi connectivity in the disruptive incident mentioned above.

    1. Backup switch fails

Insurance - use a different network. For this, the machine connected to the switch needs to be on at least two different networks.

    1. Backup power for systems fails in a region

Insurance - Use a different region. This means that a standby should be maintained in a different region than the backup.

    1. Backup availability zone down

Insurance - switch to another availability zone in a different area. Amazon customers like Okta, Netflix, and others that were utilizing more than two availability zones had their operations stay up when multiple availability zones got affected due to an outage in late June, 2012 while those like suffered complete outage and loss of valuable business.

  • Do regular auditing and spot checks

It is important to have periodic auditing of policies and procedures, whether they are being followed, and of operations. This serves to ensure that any slip-ups are caught in time. Also, spot checks help to keep people responsible for the health of the the backup systems on their toes thus helping maintain an alertness, an aspect that is very important in ensuring top-notch reliability and resiliency of production systems.

It makes excellent business sense for an organization to be extra careful when it comes to reliability and resiliency of its service. Many customers have little patience for repeated failures. If their tolerance level to disruptions is breached too often, it can result in them taking flight. Devising and following best practices such as the above can ensure good fault-tolerance, reliability, and resiliency of systems. If organizations don’t institute best practices or don’t ensure that these are followed, then they risk losing their goodwill, reputation, and business.

Tuesday, Oct 2, 2012

Scalability hot buttons 

Servicing large volumes of traffic for an online property can require thousands or even hundreds of thousands of machines in one or more datacenters. To do efficient scaling for such high volumes of traffic, one must have the right architecture and design of the service infrastructure in place. One also must use the right software and hardware components and employ the correct operational methodologies. All this requires special skills that one can learn over time as one tries out what one thinks is best and then revises that based on newly acquired learning. This process might repeat one or more times before one gets scalability right. All this takes precious time and the missteps along the way can lead to anxiety and headaches. One can do oneself a favor by shortening the learning curve by learning from the experiences of those that have travelled this path before. This can help one sidestep many potholes on the way and make the task of developing the requisite skills and expertise for building highly scalable systems much easier and quicker.

Companies like Amazon, Google, Facebook, Microsoft, Yahoo, and many others have been managing  their very popular, highly scalable online properties for many years. Due to this, they have developed a deep understanding and gained valuable insights into the various issues involved in scaling online properties for handling large volumes of traffic and high volatility. Some of these companies have over time shared their knowledge and even made the code for the tools and frameworks they use for handling large traffic volumes open source. A deliberate approach to learning from the experiences of such companies and analyzing what worked and did not work for them can help a company jumpstart its effort to build a highly scalable system.  

So, what learnings can one get? There are many. For building scalable systems, one has to take care of many aspects. These fall under the realm of engineering and IT operations. In the engineering realm, to be able to deal with traffic at a massive scale without incurring astronomical costs, one needs to implement the right architecture and design of the service in question and its supporting software and hardware. This includes having a multi-tier architecture, doing context free processing, having latency sensitive systems in place, using caches, CDN, and fast storage, and many other such aspects. This writeup is however not about them. It is about operations related tasks and tools.

The following operational aspects play a big part in developing highly scalable services. 
  • Provisioning

For online services handling massive traffic volumes, one may need to bring tens of thousands of machines into service quickly. Facebook engineers have experienced this. They have through the development of certain tools and processes cut down the time to bring machines into production from several weeks to several days.  When Facebook needed to bring their first owned datacenter into production, they had a lot to deal with. To handle the daunting task, they started Project Triforce. The work done under this project involved simulating a datacenter, instantiating and tearing down clusters quickly, doing synthetic load and power tests on machines, and auditing the results. For automating these tasks, Facebook engineers built a software suite called Kobold. In doing this, sets of machines simulating the datacenter were tested with actual production traffic without facebook users noticing anything. There is a lot of good learning one can get from how Facebook engineers went about doing all this.

  • Instrumentation, monitoring, and reporting

Instrumentation, monitoring, and reporting is extremely important to discover what is going on in the multitude of machines and infrastructure devices involved in providing a service. Unless one has good  instrumentation and monitoring through controllers and agents, utilization of in-memory databases for high speed, and customized visual reporting, it can be extremely hard to determine the source of problems when they occur. It is informative to look at tools such as Scuba and Claspin built at Facebook to help monitor their systems.

  • Automation

Automation done the right way is extremely important for saving time. When one has to add thousands of machines quickly, ferret out the source of a problem expeditiously, or monitor thirty different parameters to keep a check on the pulse of a system, one better have automation. There are tools available for IT automation and system administration such as Puppet for Unix like operating systems, System Center for Windows, BSM from BMC, etc. For elastic handling of increasing and decreasing traffic on AWS, criteria based auto-scaling works well and so is a good example to study and learn from.

  • Analytics

Analytics help you detect and adapt to new developments quickly. They can discover useful patterns, trends, outliers, relationships, and connections between objects of interest. These can provide an early indication of spikes and dips in traffic and insights about the health of the system thus enabling one to be proactive in handling a developing situation appropriately. As an example, analytics can detect pattern of traffic development on a site by monitoring relevant parameters and raising an alert about it when appropriate so that any action that can be taken to alleviate the situation gets taken. Amazon’s auto-scaling makes uses of simple analytics for scaling services. One can develop much more sophisticated analytics to gain a deeper understanding of traffic patterns and their timing and so be prepared to handle them when they occur.

  • Clustering and Sharding

Clustering involves automatic distribution of data as seen in services like Cassandra, HBase, and others. Sharding involves manually (mostly) deciding how to divide the data across several servers for scalability. Both mechanisms provide scalability by distributing data across a set of servers, thus spreading the load. While clustering leads to a more balanced distribution, sharding being mostly order preserving can lead to better locality and faster range based operations. Different products have differing support for each. Sharding done manually can add complexity to operations and to the application since the application has to decide which server to go to and so might need some modification. However, it provides more control over the data’s placement and so can lead to good customized scalability solutions. One needs to determine what works best for one’s particular situation. As an example, Pinterest found it better when using MySQL to shard the data as against doing clustering. It wanted more control over how the data was distributed. It also wanted to avoid any possibility of a bug in the cluster management software from affecting its operations.  

  • Simple models

This aspect can not be over-emphasized. Keeping things simple helps in being able to scale quickly. The more types of components you use and the more complex they are, the more difficult it becomes to scale quickly. In designing a system, one should use mature components with a good track record in operations and in scaling.

Online properties that fail to scale quickly to high volumes of traffic can lose customers. It is therefore very important to treat the matter of scalability seriously. To do a good job, one needs to handle scalability holistically. As mentioned earlier, good scaling requires good engineering too. Engineering comes before operations and so must be done well and in support of operations for the operational tasks identified above to be maximally successful. A comprehensive awareness of architectures, designs, infrastructures, deployments, operations, and tools is very important for engineering and operating highly scalable systems. Anything less will likely fail to keep your customers happy for long.

Tuesday, Sept 26, 2012

Datacenter integration made easy 

As cloud computing matures, it is becoming increasingly clear that all three models of computing - traditional, private cloud, and public cloud, will coexist and complement each other in the foreseeable future. Each has its pros and cons, its place, and its followers. There will be a few companies that will continue to stay with just one of these models of computing but there will be many, their numbers growing non-linearly over time, whose computing will cover multiple models.

As we have seen and will likely continue to see, many companies are uncomfortable trusting their crown jewels in regard to certain applications, services, and data to a public cloud. Security and compliance considerations as well as the cost of making major changes to a tried and tested existing IT infrastructure are all important considerations. Many times such considerations tilt the scale in favor of staying completely with the traditional infrastructure one has been using or at best moving to a private cloud. If a company concerned about the above aspects moves to a public cloud, it might do so in dribbles where it keeps its sensitive data and critical business logic on-premise in a traditional datacenter or in a private cloud while moving non mission critical computing to the public cloud.  

Many established companies have invested much money in their traditional datacenters and trained their IT staff on it. They have many legacy applications that would need to be engineered to take full advantages of a cloud deployment. These companies are likely to be loath to uproot all this infrastructure and ignore all their investments in it to move en-masse to a cloud infrastructure. It is unlikely to happen until the resultant benefits from the transition in terms of cost, scalability, elasticity, or other notable advantages touted about the cloud are very clear and just too compelling for these companies to ignore. What these companies are much more likely to do as can be seen from what is happening, is to dabble into the private cloud or public cloud arena until they are reasonably comfortable with it to make a bigger transition from their existing model of computing to the cloud. These companies are likely to do any transition from their existing hosting solution in very well measured steps and only to a certain extent. On the other hand, startups and companies that don’t have much investment in traditional datacenters are much more likely to adopt a new hosting paradigm such as as the cloud. There will also be companies among those that adopt public computing that over time, as they understand their traffic patterns and the costs involved better and feel the need to have more control and flexibility over their deployment, would like Zynga move part of their IT operations from a public to a private cloud.

What the above tells us is that cloud providers would be helping their cause by enabling companies that are hesitant to make a change from one computing model to another to transition between them with the least amount of disruption. They need to make hybrid computing as easy as possible for such companies so that they can start dabbling in it sooner rather than later. The evidence is around us. Hybrid or mixed model computing is becoming increasingly more important and popular by the day. For a cloud vendor to tap into the lucrative market generated by such computing, it needs to provide smooth, seamless integration between various datacenters - traditional and those hosting private and public clouds. Cloud vendors big and small are realizing this and creating the technologies and techniques to make it easy to do this.   

The following are a few recent developments that are indicative of how vendors are making it convenient for companies to transition from one model of computing to another and to integrate these models with each other.

  • Eucalyptus-Amazon private public-cloud compatibility
Amazon offers AWS a public cloud computing platform.  It is the clear winner by a large margin in the infrastructure as a service public cloud space.  Eucalyptus offers a private cloud solution and is a big, if not the biggest private cloud platform vendor. There is an agreement between these two companies, each a leader in its space, to enable ease of transition of software from private to public clouds and vice versa.. Eucalyptus is doing so by providing APIs that are compatible with AWS apis

  • Microsoft’s integration technologies for datacenters
Microsoft is making it easier and easier by the day to connect traditional data centers and private clouds with Windows Azure, its public cloud offering. For example, System Center 2012 can be used to move VMs seamlessly between private and public clouds. VMM2012 of System Center 2012 lets you drag and drop VM instances from private to public clouds and vice versa.  

Microsoft also enables easy linking of SQL Server to SQL Azure  to support distributed queries that span on-premise and cloud databases. This is a new database hybrid solution for spanning on-premise corporate networks and the Windows Azure cloud.

Windows Azure Active Directory enables enterprises to sync their on-premise directory information with their public directory allowing easy integration between the on-premise and the Azure public cloud.

  • Oracle eases movement of workloads across public and private clouds 
Oracle is the latest big software company to jump into the IaaS cloud platform space and is going to provide software for customers to build "identical services" in their own data centers, allowing them to move workloads back and forth between their public and private deployments. The public-private cloud aspects of Oracle's service will rival services and products from companies like IBM, Hewlett-Packard and others.

  • Rackspace integrates on-premise with cloud
Microsoft, Amazon, and many others let you do hybrid hosting already. Now Rackspace, a company that helped create OpenStack that has garnered large industry support and a vendor of open source clouds, has joined the group.  RackConnect from Rackspace helps integrate on-premise infrastructure with the cloud. The RackConnect solution uses encrypted VPN tunnels to make cloud resources an extension of on-premise resources. It helps build the bridges between traditional and cloud based solutions.  

The above are just a few examples of how providers are easing the transition between different models of computing and integrating the traditional with the new. As cloud computing matures further, we will see more and more of such integration improvements and solutions come about. Companies will steadily and in increasing numbers move to hybrid computing that covers one or more of the traditional form of computing and the cloud varieties. Such hybrid computing will be secure, cohesive, and seamless with smooth, effortless, and instantaneous flow of data, applications, and VMs from one point to another as and when needed. The day when this is commonplace is not too far away.

Tuesday,  Sept 18, 2012

Importance of “comprehensive” analytics 

“Comprehensive” analytics are smart analytics done over data that is directly or indirectly associated with an organization’s vision, goals, and strategy. These are done as part of a top-down approach where an organization defines its vision, goals, and strategy, and the Key Performance Indicators (KPIs) for monitoring these, and runs analytics on the data that is relevant to them in order to derive business intelligence for enhanced performance. Such analytics yield useful results fairly quickly enabling a company to be alert and agile in discovering and addressing problem areas and leveraging short windows of opportunities that turn up thereby improving its return on investment and growth prospects substantially.

Businesses are used to doing analytics over structured data stored in their relational databases for deriving business intelligence and making better business decisions. For example, a retail company selling products might do analytics over data pertaining to its customers, employees, suppliers, partners, and other people that it deals with. Restricting the scope of this blog to just customer data - a company might review information about customers such as purchase history, recency, frequency, and monetary value (RFM) of purchases, coupon use, and other such aspects. Such analysis might get done at an individual, group, region, country, or global level. The company might also take into account demographics, gender, economic level and other attributes in its analysis. Other aspects than those mentioned above that are directly or indirectly relevant to customers might also get reviewed at an individual or at an aggregate level. Slicing and dicing of different pieces of data in different ways to analyze, assess, and report findings certainly provides valuable insights for driving actions. However in this day and age it is the minimum that a company should do. Given the pace of technology advancement and the large amounts of customer data being generated in various places, internal and external to the company, the above type and scope of analytics done over limited amounts of structured data is not good enough. There is a lot more data available that should be collected and comprehensively analyzed in order to maximize benefit.

Besides structured data of the kind mentioned above, there are mounds and mounds of various types of unstructured (and semi-structured) data about customers spread across call center records, email communication, captured behavior patterns on company websites, social networking sites, blogs, micro-blogs, and other platforms and forums that can provide valuable insights about customers. There is also a variety of other customer data such as from GPS, mobile use, click-stream behavior, sensors, and other sources. The more data one harnesses and uses for analysis, the more knowledge and insights one can glean from it. Companies are waking up to the availability and importance of all this data and to the fact that we now have the technologies and techniques to extract, store, and process it quickly and cost effectively. The question is how to do it comprehensively in order to leave no stone unturned in our quest to derive maximum value.

To do “comprehensive” analytics i.e. intelligent analysis of the vast amounts of structured and unstructured data available from various sources in order to derive valuable insights, a company needs to adopt a formalized top-down approach. It needs to have a vision, goals, and a strategy to achieve its goals. To track progress towards its goals and discover areas where it needs to improve, it needs to determine its Key Performance Indicators. These will then tell it what information and insights are needed. As the company defines its strategy, draws up plans for reaching them, and executes on them, it should constantly monitor its KPIs. For this, it needs to analyze the results of its execution and the data generated as a consequence of it - for example, customer purchase patterns, tweets, comments, blogs, or emails lauding or critical of one or more aspect of the company’s execution, the data impacting its execution such as maintenance schedule of its websites involved in the execution, and other data that is independent of its execution but relevant to it. Examples of this are discussions in the blogosphere and social networks about a competitor’s strategy or about the usefulness of some technology that is relevant to the company’s business. Such analysis can help the company fine-tune its strategy, plans, and execution and in some atypical cases, even its vision and goals. The above cycle of strategy, planning, execution, and analysis should form a virtuous circle for continual improvement.  

The relationships of various aspects discussed above for the typical case are shown below

As mentioned above, for comprehensive insightful analysis, the determination of KPIs is very important. Once you have defined the KPIs, you can determine the data and the insights you need in order to investigate, measure, and improve them. For this, you would very likely need to get both structured and unstructured data - geospatial and temporal, and reams and reams of it from various sources. You should do your analysis over the entire data set available to you (as against using just a sample - a sample of data does not yield as good an understanding as the full data set and it will not help you find outliers that can provide valuable insights) across various dimensions of interest.  Having KPIs enable you to be like Sherlock Holmes - one who investigates a particular area to answer specific questions  as against Christopher Columbus who set out to see what he could find. You want to get clues and hints and decipher patterns, trends, and relations through your analysis to help you make intelligent inferences and reach useful conclusions.  

Below is a pictorial depiction of various kinds of data that can help measure and improve the Customer Satisfaction KPI.

When you have so much data from various sources as is available these days, much of it being unstructured, the simple analytics one does with structured data are not enough for being competitive in the market. Analytics that get done over vast and varied data sets are much more complex in nature. When done well, they can help determine probabilities of specific events happening and for making reasonably accurate predictions. The above work involves slicing and dicing the data in different ways and at different granularities and analyzing it statistically, inferentially, relatively, and holistically along various dimensions using different criteria with appropriate weights given to them. Examples of dimensions that may be used are product category, revenue, industry, and time while examples of criteria used are relevancy, timeliness, skewness, importance, and assumption validity. You can change the dimensions, criteria and weights and make other tweaks relevant to your analysis to see how that impacts the results and whether that leads to new insights.

The formalized approach of determining KPIs enables one to be comprehensive, thorough, and smart in one’s analysis. It tells one what all data to collect and what all to determine from it.  As an example, in this world of instant communication through social network and micro-blogging sites like Facebook and Twitter respectively, to do a good job on the “customer satisfaction” KPI shown above, it is increasingly important for an organization to gauge the  emotions, sentiments, moods, propensities, and attitudes of its customers. Companies are realizing the importance of this. For gauging these, you need the data available from public and private social networks, blogging and micro-blogging sites, call center records, email communication, surveys, and other platforms. After collecting this data, you need to analyze it for determining the various aspects mentioned above. The technology for determining emotions is there - the other aspects of the customer mindset mentioned above derive from that.  An example is the Crane engine from Kanjoya. It claims to determine various kinds of emotions to a fine degree of accuracy. A good reading of various emotions across a large data set can lead to a much better assessment of sentiments, moods, propensities and attitudes of customers than would be possible from existing mechanisms of slicing and dicing structured customer data from a much smaller set of sources. It enables an organization to take quick proactive action where needed. This can have a significant impact in improving perception, sales, growth, and revenue of the organization or in case of governments in helping them diffuse or protect against a fast developing volatile situation.  

Important aspects that might go unnoticed in a non-formalized approach get the attention they deserve in a formalized approach. This can have a big impact on the bottom line of a company. As an example, looking at the “customer satisfaction” KPI, it should be clear that measuring the performance of a company’s websites - this aspect would come under the “behavior pattern on a website” bubble in the figure above - would be important since it can have a significant influence on whether a person browsing the site stays with it or comes back to it later. The data on how important such an aspect is is around us. A study done by Amazon a few years back determined that there could be a 1% loss in sales due to a 100 msec delay in page load time. A similar study by Google found that a 500 msec delay in search results could reduce revenues by 20%. These are significant losses.   

In all of the above, the importance of collecting data and analysing it in real time or close to it can not be overemphasized. The sooner one analyzes data and gets insights from it, the sooner the required action that is needed can be taken and the window of opportunity to prevent loss, make profit, retain brand value, avert a mishap, or benefit in some other way, exploited. Real time streaming and collection of data, fast in-memory storage of applications and data, and parallel processing of data over a large number of commodity machines or on a fast high end mainframe makes real time analysis possible.

As should be evident from the above, volume and variety of data, and doing smart analytics on it at a high velocity are all very important aspects of collecting, processing, and analyzing data for maximum benefit. Much of the data being unstructured and from various platforms, and being geospatial and temporal in nature, various kinds of analytics - textual, contextual, social, predictive, and others, can be done. This involves use of sophisticated natural language processing and keyword analysis, figuring out of relations and correlations between different objects of interest, discovering patterns and trends, determining emotions, attitudes, and propensities, deciphering future behavior, so on, and so forth. Doing all this over large amounts of structured and unstructured data available from a variety of sources in a timely manner is what “comprehensiveness” is all about. The good news is that we now have the capability to do all this and at affordable prices. Technologies such as Cloud computing, Map-Reduce programming using large compute clusters of commodity machines, fast, highly scalable and increasingly powerful NoSQL storage, emerging NewSQL databases that claim to do both transactional and analytic processing well, variety of high level languages that enable data scientists to model and analyze vast amounts of data across relational and non-relational stores, sophisticated natural language  processing that help decipher relationships, emotions, and intentions, fast and increasingly powerful traditional databases, and a lot more are all there to help. If you don’t want to deal with developing the required analytic software and using the right infrastructure for it, there are ready made solutions available from vendors that are specializing in analytics and business intelligence.

The area of comprehensive analytics is vast. The above description of it just scratched its surface. Companies like Google, Microsoft, Amazon, IBM, and others are realizing the importance of doing comprehensive analytics. They are investing heavily in research and development in this area and in productizing solutions that can store and analyze large amounts of data quickly so that quick action can be taken when needed to improve customer experience and loyalty thus generating more revenue and growth for the company. Many startups are getting into the business of providing analytics. Big Data is getting a lot of press. There is a sea change going on. Many companies realize the importance of comprehensive analytics but are hesitant to plunge into it because of one or more reasons such as inertia, misgivings about the urgency of it, lack of understanding of technologies and methodologies that make it feasible, or misunderstanding of the costs involved. The companies that make the effort to understand this area well and take the step to do the needful will reap rich dividends. Those that don’t will likely over time lose market share to those that do and will eventually have to play catch up. Whether they succeed in that is anybody’s guess.

Tuesday, Sept 11, 2012

From a centralized to a distributed world or is it t he other way around? 

In this day and age of distributed processing with commodity machines doing the work of big mainframe computers as in cloud computing, so much hype about parallel, distributed computing for Big Data, and the prevalent use of distributed websites, caches, and storage as computing infrastructure, is there anything getting centralized? Answer: Yes, plenty!  Is the high tech computing world moving towards distributed or centralized models? Answer: Both!  What should we expect in the future - distributed computing to rule the roost or centralization to be all powerful? Answer: Neither and Both!  They will be complementary. 

We attempt to scratch the surface of the above topic by providing some examples of what is happening in the world of computing where rapid innovation, technology breakthroughs, new exciting offerings, creative differentiation, and an intense competitive environment are changing our lives continuously.  We will let you figure out the answers to the above questions.  If nothing else, the examples we provide should bring into focus certain aspects of emerging models of computing and make you understand and appreciate better our answers to the above questions.

What is becoming distributed is in clear view - cloud providers, infrastructure vendors, press, and numerous others have been blowing their horns about cloud computing for several years now. Cloud computing is about using relatively cheap commodity computers in datacenters  to do the work that was traditionally done on-premise by expensive mainframes and other powerful computers. These commodity computers enable scalability, elasticity, reliability, and disaster recovery by doing distributed processing smartly. Hadoop takes this concept to the next level - rustle up hundreds or thousands of relatively cheap commodity machines quickly and make them do distributed crunching in parallel at a tiny fraction of the cost that we used to incur for such processing before.  What about storage at the backend?  The situation is not much different there - many new NoSQL databases powering large online services like email, search, maps use distributed hashing to determine which storage server of the several available to use and have multiple replicated copies of data, sometimes across datacenters.  Relational databases are distributed in many respects too - they have to be in order to serve a flat world where data is often gathered from numerous globally spread sources and customers come from all over.  How about memory? Same story  - CDN caches are distributed at hundreds of POPs close to the customers to serve them quickly with data and information.  We can point other examples of distributed computing but let us stop here.  The above examples should have indicated to you that the computing world is becoming a distributed one in several respects because of prevalence of usage of a multiplicity of computing machines, caches, storage, access, etc.

So, what about this "moving from distributed to centralized" aspect?  Is there any such thing happening that is worth discussing? You bet, there is, and plenty. The above examples demonstrated just one aspect of the computing world. The following should show you another.  One area where the transition from distributed to centralized is happening is with regard to the control and the management of the distributed computing resources, or in other words, with respect to the brain or intelligence that makes efficient distributed computing possible and flourish. This exceedingly important aspect is getting centralized in several important areas of computing where before it was not.  Let us start by looking at networking first.  What is it moving to?  Software Defined Networks (SDNs)!  In an SDN, the control plane for a network that used to be in the individual switches and routers of the network is now in a centralized controller or application with the network gear such as switches and routers, and  in more evolved forms of SDNs, load-balancers, firewalls, and other such equipment relegated to handling the data plane under instructions from the controller. What will this centralization of the control plane result in?  It will lead to lower costs, faster configuration, better control & provisioning, and easier management of the network.  It will lead to management software treating the network as a single entity instead of a loose coalition of individually intelligent pieces of hardware. This centralized control is extendible to controlling of the hypervisors with hypervisors becoming commodity software over time.

Let us look at another example which some would argue is a trend - Virtual Desktops (VDs).  With  virtualization and cloud computing making deep inroads into computing, and the advent of increasingly powerful, intelligent hypervisors, VDs are becoming much more practical and popular.  Here a user’s devices are dumb endpoints served by virtual machines running desktops in datacenters, all under centralized control.  We talked about centralization of network above. This one is centralization of computing. Another example of the same is the running of hundreds or even thousands of virtual machines on a single powerful server machine in a datacenter?  This is centralization of computing done for effective capacity utilization and cost savings. So, while on one hand cloud computing generally uses cheap, commodity machines as against big powerful servers, there are situations like those above where it centralizes many virtual machines on a single server.  Two sides of the same coin.

Let us look at what could conceivably be the beginning of another trend  - IBM introduces a mainframe for private clouds. Here, instead of many relatively inexpensive commodity computers providing cloud computing, we have a big mainframe computer in their place.  Hmm... wait a minute!  So after moving to a model where multiple commodity machines replace big mainframes in cloud computing, are we now beginning to move back to mainframes replacing commodity machines?  Ignoring IBM for a second, we also have smaller players like ProfitBricks touting scale-up of a virtual machine instead of scale-out across virtual machines, stating that they do better at scaling due to scale-up than AWS does due to scale-out. The rationale is - scale-up suits high power databases and other high-end applications more than scale-out since such applications have traditionally scaled-up on big on-premise server machines. A single big VM, as against a bunch of smaller VMs, is more in line with what vendors of such applications have been using on-premise. So what Profitbricks is offering does have potential to become popular over time and with IBM now bringing mainframes to serve the cloud, even more so.  Do two examples like the above make a trend?  No they do not, but they catch your attention, especially when the companies making a splash are as varied as in this case - one being one of the biggest, if not the biggest, names in computing and the other being a nimble startup with fresh ideas, agility, and the drive to do things differently..

At the risk of seeming to beat this "moving to centralization" horse to death, how about CDNs, whose example we gave earlier when discussing distributed systems?  It is true that the POPs of a CDN are distributed but where is the redirect to the browser that takes you to a CDN POP coming from?  It is coming from a central website (or set of websites - still a far cry from the extensively distributed nature of the POPs) hosting the page. The website could be that of a cloud blob storage service or that of an Ad. server responsible for serving advertisements or that of some other service. The web server providing the re-targeting URL can be thought of as the brain here and the CDN as the brawn. In wrapping this up - where is peer to peer or mesh networking that were making headline news a few years back?  These are alive and moving along for sure but are nowhere as much in the news now as SDN and Virtual Desktops are, are they? 

So getting back to a question we started off with - is the computing world becoming more distributed or centralized?  With the above examples in focus, what would your answer be?  Our answer stays the same - Both!  We can see that the model of a “centralized head controlling numerous, replicated, replaceable body parts” is taking root and hold. It encompasses both centralization and distributed aspects. We also see centralization of processing on big machines even though the work being done is distributed over virtual machines. We also have models of computing where there is minimal to zero centralization such as in peer to peer networking, mesh networking, certain distributed multi-master databases, etc. Whether ultimately you call a world with such diversity of models more distributed or centralized is up-to you. It depends what areas and examples you focus on or what you are more affected by or maybe the time of the day, your mood, or whatever.  Whatever the case might be, when explaining your rationale you could possibly wing it either way and perhaps get away with it with an accepting audience that takes things at face value. However with an informed, educated, observant, and discriminating audience looking for balanced and accurate insights you would need to present both sides of the argument pretty well in order to make headway with it. The few words we used to answer the questions at the beginning of this blog won't cut it.

Tuesday, Sept 4. 2012

Hosting choices aplenty... how to choose? 

There are many choices for hosting available to customers that want to partially or completely rid themselves of the burden of hosting their own computing hardware or software or both. The spectrum is wide. At one end of the spectrum is the basic colocation service where one outsources (rents) datacenter space comprised of one or more racks and/or containers, cooling, power, internet connectivity, network bandwidth, physical security, and other such datacenter specific entities from a provider.  In this, the computing hardware and software used is one’s own. At the other end of the spectrum is Software as a Service or SaaS where one rents the use of application(s) provided by the provider.  This in effect is outsourcing of everything needed to make such use possible - physical space for running the required machines, datacenter infrastructure, physical and virtual machines, the software running on the machines, storage, network, caches including content delivery network POPs, software services, and the application(s) themselves - stuff that one would need in one’s own datacenter and beyond if one were hosting the application(s) oneself. In-between the above two extremes are several other offerings.

At a level above the very basic colocation service is an offering where the provider provides the basic colocation services described above as well as infrastructure services such as load balancing, firewall protection, disaster recovery, and monitoring & management of computing hardware and software. A level above this is one where a customer can rent physical and virtual machines, OS software for the same,  storage services, and other such system software supportive services.  As we proceed further up the set of offerings, we get Infrastructure as a Service or IaaS which provides much touted features of cloud computing such as scalability, elasticity, agility, reliability, fault-tolerance, and disaster recovery in a more programmable, administratively controllable, and automated manner as well as other IT services such as local & distributed caching, content delivery network, map-reduce clusters, big data processing, hybrid clouds, search integration, etc. In IaaS offering, the customer has to do the setup and maintenance of the system software oneself. Continuing up the list, we get Platform as a Service or PaaS that takes care of providing all needed software and infrastructure services, and doing their setup, maintenance, and management, letting one focus just on developing and hosting one’s applications. And then, as mentioned above, there is SaaS at the top of the list where the provider provides even the applications. 

The spectrum of offerings available from various hosting providers is shown in the figure below.  Different providers provide different sets of the listed offerings, usually a contiguous set of one or more starting from some point on the spectrum line and proceeding right or left from it. Offerings of the same type from different providers can be at different levels of completeness, comprehensiveness, and sophistication. Also, hosting providers that provide IaaS or PaaS services can either be public or private cloud providers or might cover both kinds of computing. Though the offerings described above (and shown below) are the main ones, there are variations of the same available from providers seeking to differentiate their offerings from those of others.

The providers offering various types of hosting services range from small ones that have one or two datacenters to the big ones that have many globally dispersed datacenters.   A few examples of cloud providers covering the above spectrum of offerings are: DigitalFortress, FiberCloud, WoWrack, Amazon, Eucalyptus, Rackspace, Microsoft, and Google. There are many more providers - the list is too big and growing to give here. 

To determine which hosting provider to use when contemplating the cloud for oneself, one should determine which of the above offering types would best fit one's needs and if relevant, as in case of the IaaS offering, whether one wants private or public or a mix of the two kind of hosting.  Other scoping criteria may also be used. Once this has been done, one should evaluate the providers in contention using various criteria to determine which provider to go with. The criteria used for evaluation of providers can include but is not limited to: provider’s track record and priorities, location of the facility or facilities to be used, customer service and whether the provider can personalize it if needed, provider’s handling of one’s security and compliance requirements, transparency and ease of access of outsourced resources, ease of monitoring and managing of the same, aspects such as elasticity, scale-up, scale-out (if relevant to the offering type), infrastructure and software services provided, disaster recovery preparedness and services, bandwidth and capacity available, carrier neutrality, provider’s fit for the long term, lock-in quotient i.e. how locked-in one would be with the provider, pricing plans offered, cost of service, so on, and so forth.

Not all the above criteria will apply to every provider though most should. What applies depends on the provider’s type of offering. Different providers will rate differently on various criteria.  After comparing providers on the relevant criteria, if one is still unable to decide which provider to use, then one should expand the criteria set and go deeper into one or more of them until the differences between providers becomes starker and the selection easier. As an example, when evaluating a provider offering the basic colocation service, one should review aspects such as disaster preparedness at the datacenter one would be using, direct and remote accessibility of one’s hardware, services like site attendant power cycling one’s hardware if needed, track record of provider, physical security provided at the facility, compliance certification attained by the provider, facility design with regard to power, cooling, floors, redundancy of equipment and infrastructure, fuel supply levels, type of network infrastructure, availability of other sites for disaster preparedness, bandwidth to the Internet and between sites, customer service, future fit, etc.  As another example, when evaluating a provider of IaaS service, one should review criteria such as kind of computing provided - public, private, or both, infrastructure and software services provided (these should much more than those provided by a colocation provider), the number, locations and interconnectivity of datacenters, pricing plans available, provider’s track record, growth prospects (for meeting your growth necessitated requirements in then future), and other such stuff.  Some of these aspects such as location of datacenters would be very important for a rapidly growing global corporation serving customers in different continents where in case of some countries the data can not leave country boundaries.

Though evaluation of providers using various criteria is good, the task is not as simple as it might seem.  For example, one should get data for assessing each criterion from a variety of sources instead of from just the provider. The other sources could include past and current customers, official filings about the provider, and information obtained through research on the net.  Also, each criterion being used should be evaluated along several dimensions in order for it to be assessed well. As an example, in case of the disaster preparedness criterion, one should evaluate it along dimensions such as “vertical disruptions” - disruption due to acts of nature, human error, or accident, “horizontal disruptions” - wide disruptions across multiple datacenters due to virus, malware, or APT attacks, “disruption proclivity” - probability of disruptions happening in the future, “redundancy strength” - how redundant the physical equipment and infrastructure at a site is, “infrastructure strength” -  how strong and impervious to problems the physical infrastructure and equipment is, and “skewness factor’ - skewness of the above aspects across sites and time. Also, when looking at a specific dimension, one should consider all factors pertinent to it.  These factors could be mutually exclusive, overlapping, or compensating in nature with respect to each other.  As an example, the “infrastructure strength” dimension of the “disaster preparedness” criterion can be reviewed based on the following factors among others - “Failure points of facility, physical equipment, and network”, “quality, age, and reliability of equipment”, “use of watermark alerts”, and “availability and readiness of trigger-action procedures”. It is important to determine what all dimensions to consider for each criterion and what all factors to use for each dimension when doing the evaluations and then do them well.

In evaluating a provider, appropriate weights may be assigned to each criterion, to each dimension of a criterion, and to each factor of a dimension based on how important these are to the evaluation process.  One can also use the notion of critical, noncritical, inconsequential (or don't care), etc. for a factor, dimension, or criterion. There are other aspects and their nuances that one might take into account in the analysis. The deeper, more comprehensive, and more sophisticated one goes in the analysis the more accurate one’s assessment would be.

The following schematic shows the relationship between criteria, dimensions, and factors. The list of criteria, dimensions, and
factors shown is not complete.

While it is important to rate a provider based on various criteria, each criterion based on specific dimensions, and each dimension based on pertinent factors to determine a good fit, it is equally important to use the selected provider’s offerings most effectively. For example, in case of simple colocation service, one should ensure that the rack and container housing one’s hardware is in a “dense” environment i.e there is enough power, bandwidth, and space available to support future growth without increasing cost, and that the hardware is secured properly to avoid accidents due to human error (accidental power down), and other such aspects. One also needs to make appropriate use of multiple datacenters if available to ensure business continuity in the event of disasters or multiple failures occurring simultaneously. For example, in case of IaaS service from a provider such as Amazon, one needs to do the following: use the right software and network architecture for one’s services (Amazon offers a number of system and network architectures to select from), spread one’s services across availability zones appropriately for ensuring business continuity in case of faults or disasters (don't pay the price paid), and negotiate the required security and compliance services from the provider.

At the end of the day, how well you do in selecting a provider and using it effectively for your needs is based on how much importance you give to this task and how comprehensively you do it. The buck stops squarely with you. Regardless of whether you go with one of the hosting providers in the market or end up doing your own hosting and management, you need to protect yourself well. Knowledge, awareness, and the correct application of concepts discussed above are extremely important.  There is a scientific, algorithmic method of evaluating hosting providers. One should use such a method.  However, determining the right provider from the results of such a method is more an art than science. Also, to use a provider most effectively requires domain expertise and experience. You should go with the experts for doing all the above or alternatively do whatever it takes to become an expert yourself. Make the right choice and you will be pleased with the results.

Tuesday, Aug 28, 2012

Network-centric datacenter innovations 

With cloud computing taking off there is much innovative work that has happened in the last few years and that continues to get done for improving datacenter operations for the cloud. The work covers various aspects of datacenters - physical, environmental, mechanical, electrical, and technological, to name the main ones.  While it is tempting for us to touch at least briefly on each of these aspects when talking about datacenter innovations, we will restrict ourselves to just the technology aspect for now, and there too, just talk about one of the four foundational technology pillars of cloud computing - Networking.  This is a vast area in itself and a critical one for improving and evolving cloud computing in significant ways.

As is known to any who understands cloud computing, the four foundational and fundamental technology pillars of the “pay for what you use” model of cloud computing are - compute, memory, storage, and networking. To improve cloud computing, we need to improve all of these, and in ways where they complement each other to provide synergistic holistic solutions to pressing issues and create evolved forms of computing. These pillars of cloud computing can be thought of as the four wheels of a car where car is the cloud.  If any wheel of a car is misaligned or weaker than the others, you will be in trouble or at least hobbled in your progress - if not right away then very soon in the future.  Thankfully, because of the rapid pace of innovation that we have seen and continue to see in each of the above areas, any temporary weakness in one or more of the above areas with respect to supporting the other areas appropriately gets removed quickly enough before it becomes an achilles heel for cloud computing. So for now we see no cause for any concern about misaligned wheels or in other words skewed pace of improvements across these areas that can hobble progress.  

With the above said, let us dive a bit into the networking (including protocols) aspect of datacenter operations for cloud computing.

When we think of innovative work and other important goings on in datacenter networking, the following innovations or significant improvements, whatever you want to call them, come to mind in rapid succession - Virtual networking, Flattened networks, Software Defined Networking (SDN), Service centric networking (Serval), Deadline-aware data center networking,  Early congestion monitoring & handling, Layer 2 connectivity improvements (Trill), so on, so forth  The above list is indicative of some big changes that have occurred in datacenter networking in the last few years and those that have emerged recently.  Ideas encompassed in some of the above areas of innovation have been implemented and deployed in various measures. They have become standards or are on their way to becoming standards. Ideas in some others have reached a stage where they should get widely implemented and adopted soon while some others are still in experimental and prototyping stage.  Discussing the specifics of each of the above areas will require a book.  Presenting even a brief overview of each would require a much larger blog than what we intend this one to be.  Given that some of the above areas and the technologies and methodologies they encompass have been discussed in our previous blogs and in our datacenter networking paper, we won’t explain these here other than giving a one line description for them.  We will however indicate the important cloud datacenter operations that are being addressed by them. The associated links for each and your individual research on the Internet can be used to get more information about them if so desired.

Given below is the list of cloud datacenter operations (or aspects) that should be done well by a cloud provider in order for it to provide a good platform for cloud computing and be competitive in the marketplace. The technologies and methodologies that help do these operations are listed. The lists are not comprehensive by any means but cover the important work that has been done and continues to get done for enabling these operations.

  • Optimized East-West traffic  - to reduce network hops and latencies across servers in the datacenter

  • Quick, non-interruptive VM migrations  - for power savings, reliability, and fault-tolerance of services and operations

  • Minimization of communication latencies - for meeting SLAs of online data intensive applications and improved quality of responses

  • Quick detection and handling of congestion - for partition-aggregate architectures used for handling large computations and data with high amounts of traffic between layers

  • Quick, automated network configuration - for elasticity, scalability, fault tolerance, programmability, and virtualization of the datacenter network as an entity

  • Virtualized networking - for scalability, elasticity, flexibility, security, control, and better capacity utilization of the datacenter network

  • Isolation and security  - for protection and data leak prevention of each tenant’s traffic

    • Virtual Switches, Virtual Networks

The above aspects though relevant to any datacenter are more relevant and important to get right for an elastic, scalable, fast, reliable, fault-tolerant, multi-tenant cloud computing datacenter.  A cloud provider must stay on top of the above areas and have a plan to incorporate solutions for each as they get standardized, or even before if need be, in order to stay competitive in the marketplace.  Cloud providers and infrastructure vendors such as Amazon, Microsoft, Google, Cisco, IBM, Nicira (now part of VMWare), Intel and others, as well as various research labs, educational institutions, government agencies, and other organizations are all deeply vested in the success of cloud computing. These organizations are committed to improving and optimizing datacenter operations and therefore very much involved with the above datacenter aspects in some way, shape, or manner. One can follow some of the above and other such organizations doing hi-tech work for the cloud to stay abreast of the latest innovations in the datacenter networking space.

We know from history that the pace of high tech innovations is unrelenting - that there is no finish line or rest areas for an organization, whether commercial, government, or educational, that wants to reach and stay at the top. New scenarios and business models keep emerging at a rapid pace because of new innovations and ambient developments in related areas fueled by an intensely competitive industry.  This leads to more innovations.  It is a non-stop virtuous cycle. Datacenter networking is a very important area of focus for improving cloud computing. One needs to leverage the various innovations and improvements occurring in this space and in related areas in the most optimal and holistic manner for providing an enhanced cloud computing experience to both users and providers.  Smart cloud providers know this and do what is necessary for the achieving the same. In doing so they increase their brand value and market share. All power to them!

Tuesday, Aug 21, 2012

Cloud - the great “coordinator and collaborator” 

In life some things are more obvious than others. It depends on how much you have been exposed to them, how much attention you are paying to them, and how much insight you possess of the subject matter in question. This fact applies as much to the cloud as to anything else.

Let us examine a trend about the cloud that is becoming increasingly obvious to those that are paying attention  -  that of the cloud becoming a great coordinator and collaborator for us.  We won’t talk about the really obvious stuff here such as online email, instant messaging, voice and conference calls, social networking, etc. that are being used widely and have made our lives more connected with each other, but instead will discuss how our profile and the context of our activities in the cloud can affect our future activities in interesting ways. More specifically, we will look at how context established by a user of cloud services and her profile (hereto considered as part of context only) can be used by the cloud to tie together the user’s activities across devices, applications, time, and locations for providing more enhanced experiences and improved productivity.  

Let us start with the basics - we know that the context of a user and her application’s activities are known to cloud services that serve her. How else would the cloud be able to provide acceptable service to her?  All the services mentioned above such as email, social networking, voice calling, etc, use this context in some way, shape or form. For example, our profiles which are part of the overall context enable the cloud to authenticate us (through username, password), show us data that we are interested in (contacts, friends, presence) or might be interested in (advertisements, coupons, events), do personalized filtering for us (firewall, spam filtering), and other such stuff. Beyond this, what is becoming popular now is the use of other context, such as that established by our actions, across instances of the same application or different applications that we use. It does not matter what device we are using and when or from where. Context aware computing does the magic of coordinating and improving the efficacy of our activities and workflows across various devices, applications, time, and geographic locations while providing us with a seamless and gratifying experience.

The following are examples of some popular applications that use context awareness for making our everyday computing more interesting and productive for us.  

  • Kindle

Kindle enables one to preview books on multiple Kindle supporting devices before buying and read them.  This is due to the app's Whispersync technology, which transfers bookmarks and other annotations between devices.

  • Netflix

The app syncs across devices and browsers so one can start watching a movie on a desktop and then pick up where one left off on another device such as a laptop, tablet or a smartphone.

  • Dropbox

Dropbox, a cloud-based storage service, enables easy sharing of  content between Web-connected devices. One can load up content on one’s Dropbox from one device, launch the Dropbox app on another device and everything within the Dropbox is available on the other device.

  • Office 2013

When one opens a document, it gets positioned at the place it was at when it was closed.

  • Xbox SmartGlass

One can control another device as well as get information pertinent to the program running on it as is the case with a smartphone or a tablet and a TV.

The contextual tie-up as done by the above specialized applications is due to the cloud being the anchor and the storehouse. It certainly makes our experiences more seamless, interesting, and productive. Such tie-ups can be done even through general purpose applications such as  browsers. The common denominator and the workhorse across applications, time, devices, and locations is the cloud. The context used for tie-ups is comprised, but not restricted to, information about the user, user’s activities and workflows, user’s device characteristics, user’s location, and the data the user creates or uses in the cloud. When the user accesses the same service again or a different one that is contextually tied to the former, whether from the same application or a different one, the same device or a different one, concurrently with another such access or at a different time, from the same location or a different one, the cloud service can use the stored, shared context to tie the user’s new session to the previous one in smart ways that can take into account  the new environment the user is in, thereby enhancing the user's experience, productivity, and even scope of activity as in the SmartGlass case.

Consider the possibilities! Tie-ups of activities across time and devices can lead to workflows that pick up from where they got left off or that work in tandem with other workflows to provide enhanced coordinated services - evidence of these can be seen in the workings of the applications above. Use of shared context leads to possibilities like showing you advertisements that are relevant to what you were browsing earlier (something Facebook is trying to do or might already be doing) or what you like doing and are now in vicinity of. In such tie-ups, appropriate consideration can be given to factors such as time-elapsed between user sessions, characteristics of devices accessing the service, user's location, and external data from third parties as well as internal data that is relevant to the user and her workflows. This can lead to a “calculated context” that can be used for guiding the user and the workflows. Such consideration can be optional, driven by user profile and choices. The above tie-ups are most effective when the applications running on devices are smart in taking advantage of shared context as is the case with regard to the applications mentioned above but is good even with general purpose, non-specialized applications such as the browsers. This is evident from examples such as a user leaving forms and surveys half done that she continues later or initiating an activity, logging off, and then going back later from a different device to check its progress and other aspects - all done mostly through browsers

Other examples of coordination that we could see in the future -  two musical instruments both signed onto the cloud playing music together while controlled by a conductor (software service) in the cloud. Or wearable computing on your person controlled by the cloud to make your life much different from today’s. Your location transmitted by your phone could be used to show you on your smart glasses information that is appropriate for that location and aligned to your tastes - for example sales of certain shopping items or availability of certain kinds of foods in the vicinity.

Coordination and collaboration such as the above between devices and services are all based on shared and in some cases calculated context, all driven by a smart cloud engine. This engine could run analytics using your online history and other relevant data (welcome Big Data!) and utilize the tremendous horsepower of a large number of machines (hello Hadoop!) to make its predictions more prescient and its actions smarter. This would lead us to a “sentient world” where the machines that serve us, whether they are the devices we use to access cloud services or those serving us in the cloud,  become more aware of our activities, more in-tune with our likes and dislikes, smarter in their analysis of data and information, and more proactive in helping us - a constant companion that understands the context of our day to day activities and helps us accordingly. Of course for all of this to bear fruit, privacy issues would need to be tackled well. These issues will likely be tractable if cloud's actions stay within the confines of what is acceptable to users.

There is really no visible end to what can be achieved through sharing, analyzing, and predicting context. The cloud is showing us the way by its ability to achieve coordination and collaboration across devices, applications, time, and locations. There is likely going to be a lot of innovation in this area in the near future. However things shape up, one thing is assured and that is that we are on an exciting, promising journey.

Tuesday, Aug 14, 2012

Rethinking networking, protocols, and connections…. 

There is no end to the improvements we can wrest in technology if we apply our minds to it, have deep knowledge of the space we are dealing with, are innovative in our approach, and motivated to make a difference. Often times a confluence of factors provides an extra impetus to us pushing us to strive harder for such improvements or to rejuvenate and expedite our efforts that are already underway. When such improvements do appear in everyday use, they appear very natural, to the point that some people might question why it took as long as it did to make them happen. What these people forget is that it takes much sweat equity, a lot of ingenuity, deep knowledge, dogged dedication, a lot of practical experience, and special smarts to research, discover, formulate, develop, test, and bring into mass adoption such ideas. These ideas more often than not solve hard problems faced my many - that is why they get adopted.  The solutions incorporating these ideas often carry the "wow" factor and make companies scramble to incorporate them or use them, as the case may be, in some way, shape, or form, in order to stay competitive and relevant.

Let us take a brief look at two improvements in the networking and protocol space that are worth a deeper dive due to the significant impact they can have in our day to day experience in using our devices and the services they connect to.
One is what Princeton researchers are working on – a layer 3.5 in the OS stack. Called Serval or an end host stack for service centric networking, it enables applications to not be affected by movement of target end points. For example, when a virtual machine (VM) moves and changes its network address (IP address), as happens a lot in cloud computing, applications communicating with the VM would continue to communicate with it without any interruption. They would stay connected and unaffected, oblivious to the network address changes of their endpoint services.  The network also wouldn’t need to adapt itself to ensure continuity of traffic i.e. it wouldn’t have to do any triangle routing or do other kinds of forwarding. Layer 3.5, called so because it is a layer between the network and transport layers, layers 3 and 4 respectively, of the stack, enables an application or a service to bind to a “service id” instead of a network address.  Internally the stack binds the service id to network address(es). When an address change occurs, the service id gets automatically mapped to the new address. The transport connections using the service id stay oblivious to the remapping that happens under the covers. This work is funded by DARPA, NSF, Cisco and others. Cisco is now looking to test it in-house. This work if successfully completed and adopted by vendors would be very useful in our world of BYOD and Cloud computing. What exists today for handling migrations and mobility is clunky in comparison and so such an improvement is welcome.  Whether or not layer 3.5 gets standardized and gains mass adoption or stays as a nice idea living an existence of a proprietary extension of some vendors is yet to be seen. Regardless, it is a good development, one solving the problems of today that have been brought into sharper focus due to the dynamics of cloud computing.

The second improvement, further along than the above and likely to be standardized and adopted en-masse, is an application level protocol called SPDY (pronounced Speedy) created by Google. It improves web page load time on a client by incorporating in a smart way a set of improvements that have hitherto been used elsewhere and not all together in various protocols and technologies. These improvements include using parallelism, concurrency, compression, predictive analysis, collaboration, active pushes, server hints, etc to speed up web page load times. In specific, in this new protocol, multiple HTTP streams can use the same TCP connection concurrently, HTTP headers are compressed while content can be optionally compressed, requests can be prioritized, and web servers provide hints to the client to help it along and also predict web pages that the client might need, proactively pushing them to the client.  Google has demonstrated through tests that this new protocol results in significant performance improvements in loading web pages. This technology has been incorporated in Chrome and Firefox browsers and is on numerous websites including those for various Google services and Twitter. All indications are that this protocol will be adopted by other browser vendors including Microsoft.  And why not?  The improvement is too compelling to ignore and not supporting it soon enough would likely be too risky an undertaking from a market share and bragging rights standpoint.

Another recent development to note - Swiss scientists have developed an algorithm that by investigating  network connections and propagation aspects between a subset of nodes in a network can determine the source of spam or malware and its first victim. There can be other applications of this algorithm, even in non-technology areas.  For example, the algorithm can help find the source of terrorist plots/attacks or source of medical infections or rumors. The above is yet another example of how network or protocol centric innovations can lead to resolution of challenging problems and result in significant improvements in our use of devices and services.

So, is there a confluence of factors helping these innovations along and expediting their day in the sun? Absolutely!  Rapid increase in use of mobile devices, increasing use of cloud computing for daily activities, heavy use of virtualization in data centers, persistent security and attack concerns, and intense competition between vendors to outperform each other in the foundational technologies that define our experiences are all contributing to fueling the above innovations and many more. The biggest driver however is the plain old human desire to solve hard problems and improve our experiences in this exciting world that we live in. Amen to that!


Tuesday, Aug 7, 2012

Intellectual property is big business 

We might not realize it but intellectual property or IP for short, largely in the form of patents, is big business for many companies.  In fact most, if not all companies, small and big, aspire to get onto the "patent bandwagon" i.e. beefing up their intellectual property to realize the benefits of innovation, creativity, and original thought.  These benefits are too alluring to resist.  Not only does a patent enable a company to have a lock-in on a process, method, flow, recipe, behavior, look and feel, action or whatever it covers that is innovative and original, it also provides the company with protection against the company’s use of another company’s IP.  Patents can lead to many years of significant revenue streams if the patents can be monetized in the form of high-value product(s), service(s), or license agreements.  We see this all around us in various industries such as high-tech, pharmaceuticals, chemical, bio-tech, and others where patents are big business.

As is evident to those knowledgeable about IP, patents are used by a company for various purposes relevant to its strategy and unique situation. A company can use patents for being one-up on the competition in the market by having highly valued products or services that leverage the patents, for creating a recurring revenue stream like an annuity through licensing of the patent ideas to others, for protection from lawsuits when using other companies’ ideas through cross-licensing agreements with these companies, and for improving the company’s reputation and net asset value in the market. Patents can result in supra-normal profits due to their lock-in protection of novel ideas and the recurring revenue streams they can create. No wonder, we see examples all around us of companies going all out to get, leverage, and defend patents. You only have to look at the current events and at the history in any industry where patents have been important - industries where R&D for innovative work is costly while imitation is cheap and where competitors do not gain much of an advantage by knowing a patent's specifics - and you will realize the weight and truth of the above statement.  

Let us look at some recent news about intellectual property in the high-tech industry. The latest is that Apple and Samsung are going at each other around the world in the biggest court battle in decades to convince the judges that they should be afforded protection for their invention (Apple) or that there has been no copying of the other company’s ideas (Samsung). Big bucks in the form of a large “patent infringement” penalty ($2.5B in claims by Apple) and being shut down from lucrative market are at stake. There was also recent news that SAP agreed to pay Oracle north of $300M to prevent continuation of a lawsuit and the risk of a much larger penalty for copying Oracle’s software.  In this case, Oracle is protected by copyright laws (like Patents, Copyright protects original authorship and artistic work). Now let us look at the monetizing aspect of intellectual property - Microsoft is getting a few dollars for each Android device sold by hardware manufacturers such as HTC, Samsung, and others because of Android’s use of Microsoft patents.  And as many know, IBM makes north of a billion per year from its patents. Then there are companies like Intellectual Ventures whose entire business model is based on IP - creating, acquiring, and monetizing it. Also, many times a company or part(s) of the company get acquired by other companies simply for its patents. Recently a few companies including Microsoft, Apple, and EMC teamed together to beat Google in acquiring Nortel patents paying $4.5B in cash. That was a coup for them and put Google in peril of being hobbled by them in the future at what it could or could not do in the mobility and communication space. Next we hear - Google acquires Motorola Mobility for $12.5B to beef up its patents in the mobile space (at the risk of antagonizing its hardware partners) for its protection and possibly as a future revenue source. Now there is speculation that RIM or Nokia, currently both down and struggling could be potential targets for acquisition by the likes of Microsoft because of, in a significant measure, their patent portfolio and copyrights. 

Yes sir, intellectual property is indeed big business in terms of dollars spent and saved. Companies like IBM and Microsoft as mentioned above earn significant chunks of money through licensing of their patents and by protecting themselves and their markets through cross-licensing agreements. Large companies like these are all too willing to pay serious money to beef up their patent portfolios in lucrative markets.  A decade or so back, IBM used to file more patents than Microsoft, Cisco, Intel, and Oracle combined – it still files the most, however its reign is now under threat by fast rising Asian rivals. Also before, Microsoft used to get threatened with lawsuits or sued left, right, and center for large sums of money for claimed patent infringements. Microsoft realized that it was at constant peril of being shutout of lucrative markets and so started encouraging its employees to file patents with rewards given for each filing and started to actively acquire patents from other companies. This strategy has worked very well for Microsoft. Now, Microsoft has a very healthy patent portfolio. It continues to beef it up in emerging areas of computing. This constantly growing patent portfolio is a good revenue stream for Microsoft while affording it strong protection from lawsuits through cross-licensing agreements with other companies. As a result, Microsoft is now much less impeded in building cool products and providing services that it might have been blocked from due to patent infringement fears. Cisco, Intel, EMC, Google, Facebook, and every other successful company have and continue to use the same strategy.

So, what does all the above indicate? To us, it indicates that for a company to survive and prevail in a cut-throat, intensively competitive industry such as high-tech or any other where having a lock-in on innovation and original thought can reap many rewards, it is critical for the company to be proactive in creating and/or acquiring intellectual property and using it effectively in ways discussed above.  Any company that does not understand this and does not take steps to do so is causing great peril to itself.   


Tuesday, July 31, 2012

Watch out for the potential pain points 

OK, so cloud is the way to go. We have heard it time and again from various sources why it can be good for an organization that wants to scale, be elastic, be agile, and save money while freeing itself from mundane IT tasks so that it can do what it was meant to do i.e. grow its business and net asset value. As time passes, we find that issues with cloud computing that were serious yesterday are not so serious anymore due to advancements made in software, infrastructure, offerings, and general awareness. There is now much more flexibility in how you might outsource IT - there are many more providers now than before, each with wares that are becoming progressively better by the day. The provider newbies of yesterday are maturing in their offerings, infrastructure, tools, and governance; their offerings are becoming more comprehensive and tuned to customers’ needs. The providers that have been there for a while are providing more advanced and larger set of services. We are steadily but surely moving towards a world where cloud computing will be used pervasively like a utility such as electricity is used today. There is no doubt about that.

So, how do CIOs look at cloud computing these days? The answer to this depends on who you ask. For some, cloud is still something to watch from afar, for others, it is best thing to have happened to this world since sliced bread. One thing is for sure - each customer’s situation is different and a customer's tolerance for disruptive change with its potential pain points differs from the next guy.  Also, some customers need more guidance and help than others in what they see as pitfalls with the cloud and so can take longer in moving to the cloud.

So, what are the main issues today that can be potential pain points tomorrow if not addressed adequately now?  We discuss them below. These issues can be intimidating for some, for others that are more knowledgeable about the available solutions for them, not so.  The good news is that there are good solutions available for each issue. You just need to choose the solution(s) that are most appropriate for your particular situation, be agile to adapt them as the dynamics of your situation changes, and execute well - always!. 

Given below, in no particular order, are some important issues that CIOs are concerned with these days, and if they are not, either out of ignorance or oversight, then perhaps they ought to start looking at them seriously if they want to be successful in their company's use of cloud computing .

1)  Lack of transparency: Virtualization is spreading its wings from compute and storage to networking. There was recent news   about Oracle stating its intention to acquire Xsigo and VMWare stating the same about Nicira. This is good news from a capacity utilization, isolation, flexibility, and performance standpoint but it is creating more complexity all around. Customers want to know where exactly their applications, data, and network are hosted in this increasingly scoped virtual world. Lack of transparency with respect to the above as can be the case in a highly virtual environment and a consequent feeling of lack of control can be disconcerting and deterrents for some when contemplating a move to the cloud.

2)  Security: Are my crown jewels in the cloud safe and secure? I seem to have little control over their security. Who all have access to them? How protected am I from leaks and tampering? Are there good control mechanisms and auditing in place? Security continues to stay at the top of CIOs' minds.

3)  Compliance: Am I compliant with regulation? Am I able to meet audit requirements if I am hosted in the cloud? Compliance is becoming increasingly important and penalties can be real high. Lack of good answers to the above questions can be a big deterrent to cloud adoption for some.

4)   Outages: Am I safe from outages or can I be held hostage by these and not be able to do anything to resurrect the situation in the time frame I want? How protected am I from esoteric faults and disasters? The recent outages with AWS, Salesforce, and Azure are cases in point.

5)   Lose access to data:  Can I lose access to my data? Do I have it protected enough and readily accessible under all situations or can I get caught in the quagmire as is the case for Megaupload customers who can not get their data stored on the Megaupload platform because of a U.S. court ordered shutdown of the vendor’s domain name?

As mentioned before, there are good answers and adequate solutions available for each of the above concerns. To determine which solution is the best for a particular customer requires experience and expertise with cloud technologies, architectures, methodologies, offerings, processes, vendors, and an understanding of the customer’s business, its dynamics, needs, and constraints. Various criteria need to be considered when doing the evaluation. Otherwise it can be dizzying world to deal with in which the above areas of concern might not getting addressed adequately causing them to become sore pain points of tomorrow. While determining the right set of solutions for each issue and and applying them, a CIO and his staff have to also ensure proper governance in the areas covered by the above issues. This can be done through the application of a well formulated set of policies and rules within the company and by the cloud providers used by the company.  

The last thing you want is to be caught in a bad situation like that experienced by those that hitched their wagon with Megaupload, a situation where you lose all access to your data hosted with a provider because of actions of the provider such as violating the law that are beyond your control, or find yourself held hostage to frequent outages like in the case of that was hit with two outages in one month with AWS much to the detriment of its bread and butter business, or get caught in some other bad situation. You can avoid such situations by being diligent in designing the right cloud architecture for your business, choosing the best platforms and providers for it, ensuring adequate protection in each of the above areas and others that concern you, and by being agile in adapting to even better solutions as and when they become available.

Knowing the potential pain points and designing well-crafted solutions and governance to address them from the get-go can go a long way in making you comfortable and cozy with the cloud and rest easier about your investments in cloud computing  


Tuesday, July 24, 2012

SSDs on the rise

Solid State Disks (SSD) are gaining in popularity and finding increased usage in devices, enterprise, and in the cloud. There is a clear trend of increasing SSD usage.  As per a recent IDC report, the market is growing fast with 2011 seeing over 100% increase in Industry revenue due to SSD compared to 2010.  As per the same report, the market is expected to continue to grow more than 50% per year until 2015. But did we really need such a report to show to us that SSD usage is rising and quickly?  The evidence is around us in the increasing number of devices and storage solutions that are now incorporating SSD.  This rise in SSD usage is no doubt in large measure due to SSD prices falling rapidly in the last 12 months and there is no end in sight to this downward trajectory.  Just September of last year when Intel introduced 300 GB SSDs, the price was north of $6/GB.  Now, you can shop for a 300 GB SSD drive and get it for even less than $1/GB.  Quite a quick fall by any measure!   See the historical price trending of HDDs (Hard disk drives) and SSDs as of Dec, 2011 down below, courtesy of Tom’s Hardware --,14336.html).

Low prices of around a dollar per GB make SSDs affordable to the techies and innovators as well as to some pragmatists (using terminology from Geoffery Moore’s model of technology adoption chart). Such low prices are resulting in the incorporation of SSDs in mobile devices, PCs, and servers.  Hybrid disks comprising HDDs and smaller capacity SSDs, a relatively new entrant to storage, are paving the way to increased SSD use and will likely continue to lead the charge by keeping the overall prices of the compute devices incorporating SSD low, near the sweet spot for good sales.  Crossing the chasm with regard to SSD adoption does not seem too far off in the horizon anymore.  

But wait!  A discussion of just the price point is not enough here. Lower price by itself is never a good enough basis for a technology or a product to continually gain market share. Let us therefore take a look at the advantages SSD offers over a hard disk drive.  These are significant.  For one, the speed improvement over spinning hard drives is phenomenal.  The read and write I/O rates are substantially higher, more than an order of magnitude more than what you get with HDDs, resulting in significant improvements in performance of applications and services such as high speed games, high end databases, and real-time high bandwidth video/audio streaming. The improved performance provides a much more pleasing experience to a user and consequently improves the growth prospects for the company providing it.  Looking at the boot time of a device using SSD technology  as an example, it is significantly shorter compared to that of a non-SSD device.  Just ask any MacBook Air user (MacBook Air uses flash memory, same technology that an SSD uses)!  Is she delighted by it, and tell us if she disagrees. Like boot time, application launch time is much faster too because of no spinning and seek time.  As SSDs become more affordable, vendors are increasingly adopting SSD for high speed transactions.  IBM’s DB2 made news earlier with its configurable feature of being able to use SSD for selective transactions. Second, SSDs use much less energy and are more reliable than spinning hard drives because of the fact that there are no moving mechanical parts in them – some might argue the overall reliability aspect of SSDs because of their relative newness and therefore a lack of many years of history compared to HDDs. Third, SSDs use up less space than hard disks - this leads to lighter and more compact devices. Fourth, there is much less noise from them since there are no mechanical spinning parts.  There are other advantages too but the above for now state the case for SSDs well.  Needless to state, the above benefits when available at affordable prices do make the use of SSD based storage quite compelling.

So, what is the adoption like so far?  The two words that describe it well are “very promising”. More and more devices – ultrabooks, notebooks, tablets, laptops, desktops, and servers, often equipped with SSD in hybrid configurations with HDD, are hitting the market as we speak. This onslaught is only going to increase as prices keep falling. SSDs keep making news. Just a few days back Amazon announced a high I/O capacity AWS virtual server using 8 virtual cores, 60.5 GB of RAM and 2 TB of local SSD storage. Companies like NetFlix that do high speed streaming of content to its subscribers and gaming sites that need very high I/O speeds for their games seem to have welcomed this new member to the AWS EC2 portfolio.  These companies do not think too much of the price of $3.10 per hour of usage of such a high I/O capacity server and are from all indications planning to use one or more of these for hosting their services.  As SSD prices fall further or even otherwise, Amazon is likely to drop the price of the above server over time to attract the more price sensitive customers out there. Interestingly, a ramification of introducing a powerful high I/O capacity server like the one above is likely to result in less scale-out by AWS customers that make use of it than they would otherwise.  This is more so since high end databases and other such high speed services/applications needing extremely high I/O are better suited to scale-up than scale-out (at least until they get rewritten for the cloud) and so will benefit from this new server.

So, given the above, is SSD a game changer?  Well, let us see. The world of SSD will surely and steadily replace one of hard drives that we have relied on for an eternity now and continue to rely on.  No seeking or spinning anymore leads to much improved performance, increased reliability (the above mentioned caveat notwithstanding), and less noise, less power usage leads to lower energy costs, much higher speeds results in increased throughput and better user experience, and less space and lighter weight leads to compactness of design and easier portability.  The above are all compelling selling points for SSDs. The higher cost/GB, much higher compared to HDDs, a big deterrent is eroding away steadily as prices of SSDs continue their downward spiral.  We are not too far from experiencing a new era in fast storage and significantly improved user experience with every device and service we use, on premise or in the cloud. The new world that is being ushered in is very appealing.  So, given all this, is SSD a game changer?  You decide.

Tuesday, July 17, 2012

Enterprise ready cloud 

Cloud platforms are evolving as we speak.  Almost every other day, if not every day, we hear of an improvement to some existing aspect of cloud computing or the introduction of a new way of providing some cloud computing feature. There are many cloud providers in business, from the big to the small, from the ones who have been at it for many years to those just entering, from those that have made it a multi-billion dollar business to those that hope to disrupt the run of the mill apple-cart with game changing improvements in order to lure existing and new customers into their fold.  

Given the above, it is no surprise that cloud computing is becoming increasingly better with time for all - enterprises, small businesses, and the small guy.  This phenomenon is no different than what we have seen in the past with other emerging disruptive technologies.  Over time, the technology, methodology, and processes, improve enough to address the last few remaining critical needs of those that have not yet crossed the chasm, the so called “pragmatists”.  We see this happening in the cloud space with some recent noteworthy developments. These developments promise to make it more compelling for the pragmatists to reconsider and review their decisions of holding off and get them closer to giving in to the constant drum roll of cost savings, elasticity, scalability, and the other enticing features that cloud providers, and press in some measure, have been playing out for a while now, and jump in.

So, what are some of these improvements and additions?  Well, there are several but to name a few that caught our attention lately are the ones from ProfitBricks and Oracle, new entrants to cloud computing. Their cloud propositions don’t follow the familiar trail tread by others but instead blaze their own with significant differences from the existing crowd, getting us closer to what enterprises are used to dealing with on-premise software. For example, ProfitBricks, an IaaS provider that launched recently is providing a cloud platform that scales-up vertically as against scaling out horizontally over commodity hardware like the present public cloud providers do. In other words, you get more of the cores and CPU power on a single virtual machine on the same physical machine when needed as against scaling out across multiple virtual/physical machines. As per reports, the vertical scale-up can go up to 192 EC2 units (ECU) instances as against a maximum of 8 ECUs that the extra-large server provides on AWS. Many existing applications and services such as large database systems and business software scale up more easily than they scale-out. Also enterprises are more used to scaling up than they are to scaling out with their on-premise deployments. This is therefore a welcome change to the existing cloud platform methodology. Because of the above, ProfitBricks claims that its IaaS scales better than Amazon’s. This is probably not an idle boast that can be dismissed lightly.  As a side note, ProfitBricks also claims to be better at avoiding the “noisy neighbor” problem, where an overactive tenant on the same machine causes others VMs sharing the core to move. This is true because it does not subdivide cores, a comforting aspect for enterprises. On the Oracle front, the recently unveiled “Oracle cloud” purports to provide single instance platform for every customer unlike the multi-tenant cloud platforms of existing vendors. Oracle can therefore work within a customer’s schedule with regard to updates and introducing newer versions of the software to its cloud platform. This is in line with customers wanting more control and customization of the deployed software, and again it is more like what they are used to with on-premise deployments. Both of the above changes in how clouds operate are sure to find their way into other providers’ platforms over time if they get lapped up by customers.  

Besides the above significant changes to the way cloud platforms operate, there are other important improvements that are creeping in, surely but steadily, in areas such as security, a major concern for many that have been holding off on adopting the cloud. For example, as per a recent announcement, Google is going to encrypt all “for fee” applications deployed from Google Play. This is to address “piracy due to ease of dis-assembly” concerns. This will likely extend at some point to cover “free” apps too. This feature is sure to help the application store model of deploying applications that every cloud and mobile vendor seems to be embracing and espousing. Also, there have been other announcements of better isolation and protection through micro-virtualization from startups such as Bromium and through support of Data Execution Prevention (DEP) and Address Space Layout Randomization (ASLR) in version 4.1 of Android (Jelly Beans). All of these changes are sure to increase the comfort quotient of the fence sitters and those on the other side of the fence.  Aside from security, we also had recent major announcements from Industry behemoths, Google and Microsoft, about them extending their cloud platforms to provide IaaS functionality to cater to customers that want more control, flexibility, and choice. All such shifts, improvements, additions, and new paradigms that we are seeing in the cloud space are whittling away the remaining unaddressed needs of customers - to have more control over how software functions and is managed in the cloud, to make it secure, and as close to the kind of deployments they are used to. 

The barrage of improvements, incremental, evolutionary, and in some cases disruptive to the regular way of doing things, will surely continue and are very likely become commodity features of the cloud over time. These are inching us closer to a world where the cloud platforms will be truly enterprise ready and cloud computing as pervasively used as utilities such as electricity and water.  It is only a matter of time.

Tuesday, July 10, 2012

Are you prepared for a disaster? 

The recent outage experienced by AWS customers because of an act of nature has soured at least one of them,, so much that it has decided to pack its bags and jump off the AWS bandwagon and go with its own homegrown disaster recovery solution. It just so happened that this particular customer got hit twice in the last month through an AWS outage that lasted for as long as 2 hours each time. That was it for this customer! It couldn’t afford in its line of business, that of setting up online dates between people, to have its clients miss out on meeting their possible soul mates because of 99.95% reliability offered by AWS. It needs 100% or at least 5 nine reliability if it has to stay in business in the tough world of dating. So, it decided to leave AWS and do something different for disaster recovery (DR).

So, what is this ex-AWS customer doing?  It is setting up its own servers at a cloud provider’s data center using what is called colocation, where the data center vendor provides the power, cooling, and does mundane tasks such as rebooting the servers when needed, monitoring their health, and doing other such odds and ends. It is now is looking for a second colocation site for disaster recovery. Whatsyourprice believes that this architecture will give it more control over its destiny in terms of a better disaster recovery solution than it got with AWS. On AWS, its internal IT staff was powerless in bringing up backup virtual machines because the backup generator supplying power to these machines failed too. When bad things happen they do happen in droves - ask Amazon. Interestingly, some bigger AWS customers such as Netflix, Okta, Heroku, and others were not as badly affected as Whatsyourprice. Was it because they designed their DR architecture better? Maybe. If indeed so, did it cost them more than Whatsyourprice's costs to do its DR, the one that didn’t work for it? Maybe. Or was it just bad luck that Whatsyourprice got hit twice in a month and couldn’t do much to address it expeditiously last time around despite trying desperately because the backup generator’s failure affected it particularly? Maybe. Would Whatsyourprice be better off on its own as it expects? Maybe.

There are a lot of maybes above. This is because DR is not foolproof. This is especially so when the DR architecture and solutions are not designed well thereby leaving much to luck. This is especially true if you are unwilling to spend gobs of money to have your backup services in every data center around the world and do frequent replication of data to get good Recovery Time Objective or RTO (time to recover service) and Recovery Point Objective or RPO (time upto which data can be recovered). Notwithstanding the above, there are some basic dos and donts that should be followed to maximize your chances of escaping with just a scratch as against with a gaping wound or worse being felled egregiously by one. The good news is that you can follow these dos and donts at a reasonable, affordable cost if done right. You can’t afford to ignore these, no pun intended.

So, in a nutshell, what are these dos and donts and how do you follow them smartly? Well, for one, don’t put all your eggs in one basket. In other words, don’t put all your compute and storage in one data center. Spread these across geographical areas that differ historically in the manner and timing of their brow-beating by nature. Second, determine what your threshold RTO and RPO are and ensure that your backup policies and their executions are good enough to achieve them. Third, ensure that you choose vendor(s) that provide you much better than 3 nines reliability. 99.9% or 99.95% reliability as offered by many vendors shouldn’t cut it for you if your business can’t afford even 2 hours of downtime in a month. There are vendors, techniques, and DR architectures that provide you better than 3 nines reliability at an affordable cost. Fourth, design your cloud operations in a way to minimize costs while maximizing disaster recovery and business continuity. This can be done by using smart algorithms, methodologies, and techniques for instantiating, provisioning, replicating, and recovering data and images in case of a disaster. Look up technologies such as data deduplication, continuous replication, colocation, hybrid networks using on-premise, private, and public clouds, etc. Fifth, ensure that you do some good disaster recovery testing, not just in the lab but on production servers in a data center, extending the disaster scenario to an entire data center, and even beyond if it makes sense - for example extend it to more than one data center if these are all in the same disaster prone area. You will be surprised at what you might discover when you go through the paces of simulating a disaster. Lastly, you need to pray for good luck. However, the good news with this last “do” is that by doing the other things indicated above, your prayers are much more likely to be answered than would be the case otherwise.

Tuesday, July 3, 2012

Google’s foray into IaaS space

The appetite for public infrastructure as a service (IaaS) among cloud providers is growing. Google’s (and Microsoft’s) recent foray into this halo field led for years by Amazon, the big kahuna of IaaS cloud computing, with smaller players such as Rackspace, Terremark, and others bringing up the long tail, promises to be good for customers. As per early reports, it seems that Google’s public IaaS service, Google Compute Engine (GCE), provides much more capacity or performance for the same price than what one gets from the other IaaS providers. As per New York Times’ recent article, Google is offering computing at half the price of Amazon. The regular compute unit from Google is ½ of Sandy bridge core thread and roughly equivalent to Amazon’s basic core unit, a single thread of a Xeon server.  However, Sandy-bridge is a more powerful 2011 chip. Results show that Google compute engine delivers impressive results in terms of CPU power and reduced connection failures, better than other IaaS providers.  GCE does seem to have come in with a bang in terms of its offerings of capacity, performance, and price vis. a vis. its competitors.

Besides the improvements in capacity, performance, or price, what else does Google bring to the public IaaS world?  A number of things: for one, GCE has many data centers with excellent connectivity between them and all traffic between virtual machines regardless of their locations is encrypted. This should be comforting to enterprises concerned about security and reliability. For a customer to get the same kind of capability from another IaaS public cloud provider requires a more involved setup and complex pricing. Second, launching a server in GCE is quite fast, relatively speaking of course.  For those that have twiddled their thumbs in frustration for several minutes waiting for a server to launch on their current IaaS platform, this is good news. Launching a server takes just around a minute on GCE, twice if not more faster than on other IaaS platforms. There is also the promise of other goodies to come. Google brings a wealth of experience running its search engine and its other public service platforms such as Gmail, Google Docs, Google Maps, etc.  Its ground breaking ideas such as MapReduce, used in Google File System, have been learnt by others through technical papers written by Google engineers. Google has so far been using its software and tools internally for its various services. Now that it is in public IaaS space, its innovation and advanced tools are likely going to be offered to its IaaS customers to help them architect and manage their services on GCE.  It is also likely that GCE will be integrated with Google's service offerings such as Search, Gmail, and others. This will increase GCE's value further. Also, the fact that Google is a software company is apparent through the quality of its APIs in its SDK for GCE. The APIs have earned kudos for being designed well making it easier to provision security aspects of a customer’s deployment on GCE. The same cannot be said with a straight face about the other IaaS cloud providers.

The above aspects do not imply that Google has it easy; far from it.  Google is much behind Amazon in several respects. Amazon has been a leader in the public IaaS space for many years now. Over these years, it has learnt how to service all types of customers, big and small, from new startups to mature enterprises, each with its own set of requirements.  Amazon’s IaaS, Amazon Web Services or AWS for short, provides a large array of infrastructure services, around 25 and counting, to cater to the diverse needs of its large set of customers. As an example of areas where GCE lags AWS - GCE's handling of disaster recovery leaves much to be desired when compared with AWS's services in this area, the recent outage at AWS notwithstanding, that minimize downtime for its customers.  Also, the granularity at which GCE’s CPU/RAM/disk can be provisioned works well at the low end but becomes more expensive for customers at the high end.  GCE is also limited in its support of OSs and related technologies and applications.  It supports only CentOS and Ubuntu distributions currently; no RedHat, no Debian, no Fedora, and no Windows. Amazon is far ahead in this area, supporting nine OSs including Windows.  It also supports a large set of databases and application servers thus catering well to a large clientele.

Microsoft’s recent IaaS announcement is equally interesting for the market.  Microsoft has created significant ripples by providing a compelling IaaS platform that supports open source LINUX distributions and applications like WordPress, Apache, MySQL, PHP, etc. All of a sudden Azure has started looking more interesting to a large set of customers that want to run legacy applications in the cloud or like open source software.  We will analyze Microsoft’s foray into IaaS in another blog.

So, where is the customer with all this?  In a good position in many ways, no question about it. Why?  Because, the competition in the IaaS space has all of a sudden jumped up several notches. Google and Microsoft are giants with deep pockets. They have rich experience in building software and providing public cloud services. They can bring their experience and software ecosystem to bear in the IaaS market.  Their foray into the IaaS area will create more choices for the consumer, and bring forth advancements that until now were the exclusive domain and privilege of Amazon and to a lesser extent of some smaller players.  Who won when Chrome, Firefox, and other browsers came along many years back to compete with IE the undisputed leader at the time in browsers? The customers! The arena of public IaaS cloud platforms seems positioned in a similar way to unfold benefits for its customers.
Watch out folks!  The public IaaS cloud space has just become a lot more interesting.

Tuesday, June 26, 2012
The Programmable Network 

The world is changing.  We have made rapid progress in hardware innovation and software power.  Hardware is shrinking in size and software is becoming increasingly more capable of manipulating it efficiently to achieve maximal gains.  Just look at the state of hardware a decade ago and compare it with what it is now.  Now we have data centers with thousands of racks with each rack comprising of large number of machines. This was not possible earlier.  The machines were bigger and less powerful then. They needed more power and cooling than they do now. Having thousands of such machines in a data center was the exception rather than the rule.  With much progress having been made in shrinking machine size, in improving cooling systems, and in generating and using low cost power efficiently, we can now pack a lot more machines in a much tighter space than we could before. The progress continues. Microservers are on their way - these are much smaller machines that take up a lot less space and energy while providing much more power and performance in the aggregate than the machines in data centers today. HP made a recent announcement about providing microservers using the Intel chip Centerton. The train wrt such machines has left the station so to speak. With microservers, as HP said, one can save 94% space on a rack. This translates to being able to pack 20 times more machines in a rack than one can do with standard 1U systems.

How do tens of thousands of machines, err... excuse us, an order of magnitude more machines if you think "virtual", packed in a data center communicate with each other and the outside world? You would say - over networks stupid! and you would be right. However, are these networks and their administration efficient or even capable when stressed, of handling such a large number of machines running different kinds of services and applications and serving hundreds of millions of customers in as agile a manner as customers have come to expect from the cloud? Clearly, the cloud data centers of today are very demanding of their networks and they will be even more so tomorrow. These networks need to support rapid virtual machine movement from one physical machine to another when needed for ensuring reliability, performance, and energy savings, and they need to support a large amount of traffic while protecting themselves and their hosted software adequately.  With the use of various kinds of services and applications such as email, web, commerce, collaboration, social network, analytics, etc. on the network, and the volatile traffic due to them that drives the elastic behavior of the data center, the demands on the network to be efficient and optimal in serving such services and applications on hundreds of thousands of machines are severe.

The above situation mandates that the network in a data center can’t be static anymore.  It needs to be dynamic, agile, and adaptive. Networks where each router, switch, access point, and other network gear needs to be programmed individually and in isolation without a broad context and  overall policy governing its functioning and that of the network as a whole are going to be slow to adapt to quick changes. Such networks will be obsolete soon.  Their continued use over time is very likely to make a network administrator pull his hair out in frustration at being stuck in the dark ages in an era of rapid progress all around. Clearly we can't have this unless we want the network to be a bottleneck to agility.  In cloud data centers of today, there is a need for moving large amounts of traffic east-west to handle the demands of the hosted services and applications, for doing rapid provisioning and migration of VMs on physical machines, for ensuring bandwidth efficient movement of data to thousands of machine for big data processing, for supporting various kinds of services and applications with differing requirements in terms of bandwidth, QoS, and performance, and last but certainly not the least, for ensuring tight security, isolation, privacy, and compliance of software and data. To do all this, we need rapid programmability of the network “as an entity", as against the slow piecemeal and limited programming of each switch, router, and other network gear using proprietary APIs and configuration tools. We need smart software that uses this programmability to adapt the network to the need of the hour or more like need of the second. What we need is software defined networking (SDN).

So what exactly is SDN? SDN is an emerging area within networking that is rapidly taking hold with network vendors. In its expansive form, it covers within its scope the various features championed by several industry efforts currently underway such as OpenFlow, SDN as defined separately from OpenFlow by some, Cisco Open Network Environment (Cisco ONE), etc.  All network vendors worth their salt are moving to SDN in some way, shape, form, or manner. With SDN, you get open protocols and APIs supported by switches, routers, virtual network devices, edge devices, and access points that enables us to program them easily thus allowing for quick adaptation and morphing of a network on the fly. Over time, firewalls, load balancers, gateways, and other network gear come under its gamut. SDN ushers in an era of white box networking that allows for intelligent manipulation of a network by smart applications. Such ability allows sub-networks that are isolated from other networks, well protected from attackers, and provisioned with the necessary bandwidth, as well as adaptive to the changing needs of the services and applications hosted on them to be formed and terminated quickly when needed. In some ways it is like different social or conferencing groups getting formed and terminated quickly within the context of a larger social networking group with assistance from the underlying platform. With SDN, the programming of networks becomes much more powerful, quicker, and easier leading to the emergence of intelligent networks.

Whether you have a public, private, or hybrid cloud or an on-premise network, you need it to be dynamic and agile in meeting the needs of the applications and services running on it with regard to QoS, security, isolation, flows, reliability and fault-tolerance, You need to have the power to control every aspect of it individually and in concert with other aspects to make this possible. You need software defined networking.

Tuesday, June 19, 2012

Doing big things with Big Data  

What exactly is Big Data and what big things can be done with it?

The above question is very relevant nowadays when there is so much talk of analyzing and benefiting from huge amounts of data that is available to us about products, services, people, resources and whatever else impacts us in our daily lives. Big Data is the name given to highly scalable distributed parallel processing of large amounts of structured and non-structured data using a vast array of compute resources, generally commodity hardware provided by the cloud, through a “map and reduce" paradigm of software programming. The data that impacts us that needs to be processed for valuable insights is all around us.  It can be collected from multiple sources and stored in fast key-value storehouses that go by the name of NoSQL databases, and then processed rapidly by a large number of machines.  In short, with Big Data, you achieve high velocity of processing of a large volume of varied data gathered from a variety of sources. And the great thing about all this is that it can be done at a reasonable cost to make it practical for the average company out there.  Because you now have the ability to scale hugely and process stuff quickly at a reasonable cost, it enables you to tackle large data sets and not just samples as was the case before. You are therefore able to see trends, patterns, connections, and relationships in data that were not visible earlier. Also, outliers in huge data sets provide significant insights that are not possible to get from samples. With Big Data, 2 + 2 = 3.9 is good enough because absolute precision is not as important as a general understanding of the dynamics, behaviors, trends, and undercurrents that affect us. Such an understanding leads to 2 + 2 = 5 more easily.  Processing vast data sets assimilated from different sources comprehensively and quickly provides much more meaningful understanding of vital business aspects and touch-points. It transforms the area of business analytics dramatically making it much more effective. Being able to do huge amounts of processing of large amounts of data in real-time or close to it makes a business alert and nimble in responding to rapidly changing situations enabling it to avail of opportunities for growth that such situations present.

So, what are some examples of the power of Big Data?  Here are a few - you can sort 1 terrabyte of data in 1 minute using around 1000 machines as Yahoo showed a few years back or do even better than this as demonstrated by Microsoft recently. If you are Google or Microsoft, your search engines can do indexing and run large scale instant analytics by doing distributed processing on 10s of 1000s of machines moving 10s of petabytes of data everyday.  If you are a  small business, you can do mainframe work with just a credit card at a tiny fraction of the cost of a mainframe using Hadoop offerings from any of a variety of vendors such as Amazon, Microsoft, Cloudera, Hortonworks, and EMC to name a few. So on and so forth.  The above is just a tiny sample of the tremendous power made available to you through Big Data.

Could a company have done the above easily a few years back? And at as reasonable a cost as is the case these days? The answer to both questions is a resounding NO. The world has changed much from a few years back. Nowadays, the ability to corral up a large number of commodity machines in the cloud to process a huge amount of varied structured and unstructured data collected from various sources and stored in fast databases quickly and at a reasonable cost is very practical.  The average person has the power of a mainframe in his hand to use as he sees fit without losing his shirt in the process.  This tremendous power is opening up new vistas for advancement and growth for businesses in ways not imagined before.

Unless you have been visiting another planet for the past few years and just got back to Earth, you have very likely been impacted in some ways by Big Data. It is all around us and gaining ground by the day.  Big companies use it.  Small and medium sized businesses are getting into it.  This world of Big Data is transforming us from an "information economy" to an "intelligent economy".  If you are new to this world, you have to ask yourself the question - Do I want to be in with the times and avail of the exciting prospects it offers?  If the answer is yes, you need to get on the Big Data bandwagon.  Check it out, understand it well, and use it to advance your understanding of your clients, products, suppliers, partners, or whoever and whatever your business deals with in a much more comprehensive, insightful, and impactful manner.  This is sure to help you grow your business much more quickly than before.

Tuesday, June 12, 2012

The enthrallment and travails of ubiquity

Google’s Android has reached 900,000 activations daily.  Awesome achievement!  However, is it good news or bad?  The answer to this depends on who you are asking and what their leanings are.  Do they look at stuff in isolation or do they look at it holistically?  Do they work for Google or for Google's competitors such as Microsoft and Apple?  Are they iOS fans that keep Android a mile away or Android app developers that love the ease with which they can publish their apps?  Are they consumers that want to be extremely productive in their use of their Android devices or a security vendor salivating at the opportunities that a large installed base brings to those that know the platform better than others.

While 900,000 activations daily is a stupendous achievement by any measure, one for Google to be proud of and for competitors to envy, what does it portend?  One aspect of course is that given the already large installed base of Android such a high adoption rate will soon, if not already there, bring the advantages of ubiquity to Android, one that lets you standardize and innovate on the platform regardless of whether you are developing, supporting, or using the OS.  The other aspect is the disadvantage to the same people if the current security situation with Android continues, of being taken back in time to when security issues were ripe with another ubiquitous platform - Windows.  There are other advantages and disadvantages too based on your particular context vis-a-vis Android but this blog is not about them.

Let us take a closer look at just the security aspect.  With so many Android versions out there, it is a fragmented market.  With Google just learning the ropes on securing a platform in relative terms to Microsoft’s long tryst and in some ways deep understanding of it, what should we expect?  One fear clearly visible to people well versed in security is that unless Google takes some concrete steps with respect to securing its platform and the apps that run on it, we could again experience the wild wild west situation that we only recently got over with Windows. The buzz indicates this -  100 malicious apps yanked from Android store last year,  Android - the most malware targeted smartphone OS,  no good vetting of apps prior to installing on Google Play,  dramatic growth in Android malware from roughly 400 samples in June to over 13,000 samples by the end of 2011, smartphone manufacturers such as HTC and ZTE introducing vulnerabilities and backdoors into the OS as they specialize it for their phones, pronouncements from  Neal O’Farrell, executive director of the non-profit Identity Theft Council stating that 80% of mobile banking apps have security flaws, and so on, so forth.

Because the huge installed base of Android is growing rapidly by the day, it is already in the rear view mirror of Windows.  One can therefore be sure that malware proponents will increasingly target it over time as they shift their attention to it. If the platform and its apps are not secured better, we are going to have numerous security incidents in the future which will cause much heartache for many.  Currently, the use of smartphones and tablets for business and financial activity is limited but it is likely to grow as these devices become more powerful and capable, rivaling desktops and laptops in everyday use. Unless Google takes concrete and pointed action to address the various security issues with Android and the apps that run on it, the security industry is going to try to fill the wide void left by it. Such security from outside is not likely to be as good as one provided organically by the architects of the OS and those directly vested in the platform's growth. This would of course benefit both iOS and Windows Phone. However, their market shares, especially the latter’s, pale in comparison to Android’s. Because of its large installed base, Android's users are going to suffer pain until Google sets it and its apps’ security on the right course. Google needs to start thinking about them more.

Tuesday, June 5, 2012

Secure cloud software

What does it take to build secure cloud software?   This is a question that can either take a thick book to answer or it can be answered relatively quickly.  It all depends upon the level of detail you are seeking and your depth of understanding of security.  We will give you the short of it - there are some basic principles one starts from, ends at, and uses in between, during the life-cycle of software development and its operations to build strong, secure foundations;  and on top of these, one uses some smart ways of creating hugely secure and compliant systems.  Knowing these principles and using them intelligently to build the foundations and evolved designs on top will get you tight security and compliance in the cloud.  Gone would be your anxiety, stress, and concerns about cloud security and consequently your missing out on the rich opportunities that cloud offers. 

So what are these principles and smart ways of building secure cloud software?   Put simply, the answer is – knowledge, understanding, and application of fundamentals of secure software development, designing of secure cloud architectures and management techniques through components, methodologies, and tools available today, and building of an integrated security framework through intelligent collaboration between various security entities that are involved.  To realize the above, all principals involved in building and managing cloud software should be made accountable for the same.  These principals include planners, architects, designers, implementers, cloud service brokers, and cloud vendors handling the data centers involved.  There should be good collaboration between these people at every stage in the development, deployment, and management of cloud software in order for you to realize air-tight security and compliance.  

You should set a  goal for yourself to build an intelligent, dynamic, and active protection system that is highly protective and resilient in all situations with minimal human intervention.  Is it a tall order?  No, it is not!   What is required to achieve it is to do the “basics” well and then follow that up with the “advanced and evolved”.   Both are an absolute must. 

So what are the basics?  The basics include handling well-known security aspects that have been discussed and thrashed out in detail in the security industry over the last decade.  These include security vulnerabilities and attacks such as buffer overflows, cross-site scripting, various kind of injections, denial of service, man-in-the middle, etc.; protection techniques like authentication, authorization, accounting, encryption, obfuscation, service hardening, privilege pruning,  white lists, black lists, roles, isolation, DEP, etc.; protection mechanisms such as filters, firewalls, intrusion detection, anti-malware processing, honeypots, network access protection, and the like; relatively newer threats due to social networking, coordinated command and control centers, data leaks, bring your own devices to work, and so on, so forth.  And what are the advanced and evolved?  The advanced and evolved include devising smart integration between various software and hardware protection mechanisms, using appropriate networking protocols, services, and models of operation and devising smart solutions through them for providing enhanced security,  building resiliency in the software and infrastructure supporting it, and doing  risk based assessments and dynamic tuning continually.  

Do the above seem a handful and sound intimidating to you?  Relax!  They are not as handful or intimidating as they look.  You can achieve the above without breaking a sweat by taking the right steps early on in your software development and deployment, in your data center infrastructure designs, and in selecting and negotiating with your cloud vendor(s).  These steps include:  providing good education and awareness on secure software development and operations to the people involved with the same, devising strict prescriptive policies and ensuring their adherence by them, ensuring that there is a culture of smooth cooperation and collaboration between all parties involved, and providing the proper development and deployment processes, methodologies, and tools for use by all. If you do the above, you will be amply rewarded.  You will be able to take advantage of the cloud and all it offers without any trepidation or fear about its security.  New vistas for growth will open up for you.

The time to gird up and start working towards secure cloud software is NOW.


Tuesday, May 29, 2012

Security - what does the cloud provider give you?

With regard to moving their IT operations to the cloud, many companies rate security as their top-most concern.  This is due to good reason.  Cloud computing is a new phenomenon and  so some teething issues are to be expected.  It will take a bit of time before security concerns with cloud computing get addressed appropriately.  However, if statistics are to be believed, many companies might just be better off even now in terms of security if they used a reputed PaaS provider’s cloud as against hosting their software on-premise.  This is because a reputed PaaS
provider is far more likely to address security issues more expeditiously than a company hosting its software on-premise.  This includes upgrading to the latest released versions of security packages, enhancements, and bug fixes as and when these come out.
Here is a statistic worth noting (courtesy -
According to Trustworthy Internet Movement's SSL Pulse project, fewer than two percent of the Internet's top 200,000 HTTPS-enabled websites support TLS 1.1 or 1.2, the latest versions of the protocol.

The vast majority of websites still support SSL 3.0, the precursor of TLS, and TLS version 1.0, which was
designed in 1999. Over 30 percent of them still support SSL 2.0, the first publicly available and most insecure version of the protocol.

A pair of security researchers has recently proposed TACK, short for Trust Assertions for Certificate Keys,
for consideration to the Internet Engineering Task Force (IETF), the body in charge of TLS.  TACK is an extension to TLS that enables websites to assert the authenticity of the certificate used in TLS.  TACK tries to resolve the trust-related problems with the public key infrastructure that arise due to the use of fraudulent certificates.  This serious issue was highlighted by last year's security breaches at certificate authorities (CAs) Comodo and Diginotar.

Given companies' excruciatingly slow rate of upgrading the existing SSL/TLS infrastructure, how
soon do you think such companies will adopt TACK once it is approved and available?  If we have to bet,  it won't be quick. In fact, going by past track record of the large majority of companies, it could be very long.  On the other hand, how long do you think a reputed PaaS cloud provider would take to upgrade its SSL/TLS infrastructure?   We believe it would be much quicker than companies upgrading their on-premise software.  It after all has its reputation at stake and competitors breathing down its neck to take business away from it.

So, going by the above example, who fares better at providing security for protecting your resources –  internal data center staff  or a
reputed cloud provider vigilant about providing a secure platform for its customers?  But wait, is it really that simple to answer this question?  What about aspects of security such as confidentiality, integrity, and availability?  How does a PaaS cloud provider fare in comparison to on-premise staff in these other areas?  How do we analyze security holistically to get to a good answer to the above question?  Food for thought!

Tuesday, May 22, 2012

Improving of the foundational technologies….

We are fast moving to a world where what we have today will soon seem outdated.  Just look at what is on the horizon for everyday use - hugely powerful machines with large number of cores that are commodity hardware,increasingly capable smartphones that pack the punch of today’s powerful desktops, super high bandwidth networks, for example GB wireless and 1000 GB Ethernet, that provide blazing fast speeds, high speed storage using SSDs that provide an order of magnitude higher performance and substantially increased reliability than the fastest disks available today, and if this is not enough, the ability for an ordinary person to get a supercomputer on his/her fingertips quickly and cheaply through the cloud. Some of these improvements are available today albeit at a cost that make these more an exception than the rule.  Some of these are expected in the near future.
The world is changing, and rapidly.  Can you imagine how much more we will be able to accomplish in our everyday lives because of the significant advancements in the above fundamental enabling technologies?  Food for thought!  Of course, for us to be able to leverage these significant improvements, the software that can make use of these needs to advance in concert with them. If past history is any indication, it will. We can already see this  happening with the onset and increasing popularity of map-reduce enabled Big Data business intelligence that makes use of large clusters of commodity hardware to put the power of mainframes in your hands, in the increasingly powerful hypervisors and management tools that go hand in hand to enable hosting and management of a large number of virtual machines on a single physical machine to wrest the last ounce of capacity from it, in the de-duplication  and compression techniques that reduce storage and network costs significantly which otherwise would delay the adoption of these improved technologies, in databases that use fast expensive storage preferentially for providing faster performance for critical data, so on, and so forth.  In fact, if one digs deep into the aforementioned improvements in foundational technologies, one would find that software is a contributory factor for some – for example improved protocols, frame-formats, and channel bonding contribute to increasing speeds of wireless networking, and map-reduce programming is responsible for taking advantage of clusters of machines for providing supercomputing at an affordable cost.

Given the above advancements, who do you think is well placed to adopt them expeditiously and bring them to you?  We believe it is the cloud providers of the public clouds and the smart, knowledgeable, tech savvy cloud managers of your private clouds. These people because of their expertise in the above foundational technologies that form the bulwark of cloud computing, their ability to wrest economies of scale from the large volumes of everything they deal with, and their need to excel at providing the best resources and services to their customers in order to outmatch their competitors, are uniquely positioned and motivated to leverage the above advancements maximally and in the most cost efficient manner.  By making use of these advancements, these providers and managers can improve their services in terms of performance, scalability, reliability, and efficiency and therefore grow their customer base and businesses. Because they are motivated, it bodes well for us, their customers.

The new world where these advancements will be in commonplace use will be upon us soon.  It is going to be breathtaking!

Tuesday, May 15, 2012

Cloud world

We are living in a cloud world.  There are clouds galore with each cloud having something different to offer.   Let us look at the various types of clouds that exist currently.  

There are the regular, by now probably well-known, types of cloud that are based on the number of levels of the OS stack that are provided as a service.  These are - IaaS (Infrastructure as a Service), PaaS (Platform as a Service), and SaaS (Software as a Service) clouds. Then there are the cloud types that indicate the level of sharing (or lack of) with other tenants such as public, private, hybrid, and community clouds.  Then there are the technology related cloud types.  These include services such as DaaS (Desktop as a Service), SaaS (Security as a Service), StaaS (Storage as a Service), NaaS (Network as a service), and so on, so forth.  Then there are also what one might call, for lack of a better phrase,  "Cloud covering clouds" such as "Virtualized" clouds and "Converged" clouds.  We can go on but let us just stop here.  This is a big enough list of basic cloud types to grok for now.  

When looking at cloud types, one should note that various cloud types can fit under (or span) other types of clouds, sometimes several of them.  For instance, the StaaS or DaaS clouds could be virtualized public clouds, a hybrid cloud could be a mix of IaaS, SaaS, PaaS, public, and private clouds, a virtualized cloud could use IaaS and PaaS clouds, and a converged cloud could have various cloud types under the covers.  There are more such examples. The question is - Should each such cloud that spans multiple basic cloud types be given a more descriptive name?  Also what more categories can we expect in the future?  This is food for thought!  We won't be surprised if more names and types start cropping up in the future.

Like their namesakes up in the sky, no two clouds are alike. Each cloud type is provided by multiple vendors and the list is growing by the day.  The offering(s) of each vendor differs from the next guy in some way, shape, or form.  Finding the right cloud type or the right mix of cloud types, and the right providers for the same, for the short and long term, in order to maximize one's return on investment is a hard and complex task.  Some considerations to keep in mind would be - how to avoid vendor lock-in, how to get the best service across a cross section of clouds, and how to be cost efficient while maximizing growth, agility, and value to one's customers.  Determining the right strategy and road-map is of paramount importance.  It can spell the difference between those that just run and those that leap ahead and leave others behind.

Tuesday, May 8, 2012

Dawn of a new era

Cloud computing, mobility, social networking, the large continuing influx of small form factor personal devices, and big data based business intelligence are together changing the landscape of computing. These are opening new vistas for growth and progress. Those that leverage these unshakable trends will lead and succeed in the new era of computing that is emerging and taking over the world.  Spread your wings and soar onto the new horizons that the above changes are heralding.  You will never look back.

Tuesday, May 1, 2012

Avoid the latency hit by using the right network gear and practices

When moving your operations to the cloud, are you prepared for a latency hit?  Unless you ensure that the network pipes to the cloud data center are sufficiently provisioned to handle the traffic that your cloud applications generate, and that the cloud provider has the proper network gear that uses these pipes efficiently, and that there is a good sync up strategy between the data centers that host your applications, you could be facing significant performance and consistency issues.  These issues could be due to the pipes getting swamped with data or because they are flaky enough to be unable to support long data intensive transactions in an uninterrupted manner.  These could also be because you can not get to the latest data on time due to slow replication between the provider’s data centers.  Remote offices with small pipes to the ISP as well as those that are geographically far from the nearest data center hosting their applications and data would be specially vulnerable.

When planning for the cloud, you should design the network support infrastructure for your offices properly and should ensure that the provider has done the same.  Also, the provider should establish  good sync up and backup practices between its data centers. The above can be done by ensuring that the provider and your on-premise network have devices such as WAN optimization controllers for compressing WAN traffic, proxy caches for caching information for repeated local retrievals, de-duplication devices for reducing volume of data that needs to be copied, and devices for other such optimizations.  Also, replication between data centers that serve you and the backup processes should be demonstrably quick.  Smart use of the proper network gear and establishment of good operational practices will not only boost performance for your applications and services but also reduce your costs because you would be sending less data across the pipes.

Good network design and smart operations at both the client and provider ends are very important.  Ignoring these will cause grief when slow, clogged connections and retries of some application/service task  due to flaky short-lived or over subscribed connections cause you to pull your hair out.  Don’t let that happen to you.