A familiar blinking cursor at the MATLAB command prompt greets me as I look away from the grey clouds overhead - the kind that carries rain/snow. It has been nice to receive our fair share of precipitation in California this year.
A more interesting cloud is the buzzword that the software world frequently uses as a catch-all term, a fuzzy inkblot, that means different things to different people based on their personal experience, context, skills, etc.
In this post, I offer my perspective in an attempt to demystify the technology and view it through the eyes of a MATLAB developer trying to leverage advances in cloud based technology.
To build up our understanding of the cloud, it is useful to discuss the technology that powers it by broadly classifying it as private and public cloud offerings. A private cloud being a software stack that is installed within your computer network, either on-premises or within a private datacenter. The public cloud on the other hand being resources that are managed by vendors that offer them to the paying public as a service.
This begs the question - What does this software stack, private or public, actually do? That is an astute question and one that can only be answered as... "hmmm, it depends".
Offering X as a Service (where X = it doesn't matter)
At the most basic level, virtualization software allow users to provision computers and network them together to build the necessary computational, storage and networking infrastructure to solve problems. Not unlike tinker toys or building bricks - just a software version of it. Even at this level, the analogy to a MATLAB user should be evident.
MathWorks large portfolio of toolboxes, blocksets and technologies offers a strong analogy. We are the hardware store. We don't sell kitchens but offer all the lumber, tools, services and practically everything you need, under one roof, to build the kitchen of your dreams.
The real value of leveraging cloud offerings is that it enables IT organizations to support engineers and scientists to do whatever they usually do, often on the cutting edge of science and engineering. These virtualized resources are available in large variety of configurations, power and specialized capabilities (like GPUs).
This access is most frequently referred to as offering the infrastructure as a service (IaaS). If these resources come with pre-installed software then they are typically ready to use out of the box with a few clicks. This is often referred to as offering the platform as a service (PaaS). And finally, it is possible to hide the infrastructure and offer only the software installed therein as a service (SaaS).
The implications for a MATLAB user
Now, from MATLAB's perspective, all of these virtualized resources are indistinguishable from the computers and networks that currently run MathWorks products like MATLAB, MATLAB Distributed Computing Server, MATLAB Production Server, etc. Or more plainly, if you are looking to leverage cloud infrastructure (IaaS) then our products can and will fit.
One ramification of having all this compute, storage and networking resources available at a click is that it is possible to scale upwards or downwards quickly and easily.
Alas, the word scale is often as fuzzy, overused and ambiguous as the word cloud. From a MATLAB developers perspective, most problems scale in one or more of 3 dimensions.
Most common problems can be broken down along these axes, and tackled with one or more software stack(s) of tools.
Scaling MATLAB for Concurrent access
MATLAB Production Server is designed to handle concurrent loads. It is benchmarked and tested to handle many thousands of requests a second. This enables access to MATLAB algorithms as low latency transactions that run on a highly performant platform that can be scaled elegantly both deep (with bigger more powerful hardware) and wide (with load balancers across multiple machines if required).
Scaling MATLAB Compute
Some problems will require several minutes, hours or more of computation. MATLAB Code that starts as a single process on a single core of a computer can be scaled up to leverage all CPU and GPU resources of that machine with the Parallel Computing Toolbox and from there, upwards to the cloud with offerings such as the MATLAB Parallel Cloud, and onwards with offerings like the MATLAB Distributed Computing Server for Amazon EC2. The MathWorks Cloud Center is an example of a platform offering easy configuration and access to Cloud based resources (PaaS).
Are you using a different cloud vendor? At the heart of the solution is a MathWorks product that you can provision on the infrastructure provided by the cloud vendor of your choice.
MATLAB Distributed Computing Server (MDCS) accepts interactively developed MATLAB code and parallelizes the compute over many nodes exposing all the nice interactive desktop features such as the live editor to harness the power of parallel execution in the form of batches, jobs and access to distributed memory across a cluster of machines.
Scaling MATLAB to tackle Big Data problems
Anyone following the public discourse on Big Data will recognize the befuddling alphabet soup of technologies that are emerging to handle problems that involve massive amounts of data.
At the heart of this revolution is a rather simple idea. As data scales upwards, it first becomes "too big to fit in a spreadsheet". Most seasoned MATLAB users will know that MATLAB continues to handle these problems swimmingly.
On further growth, the data becomes too big to fit into a single computer's processor(s) and memory. At this point, a compute will allow access to the combined power of a cluster of computers (hint: MDCS). Eventually, the data grows to size that irrespective of the size of cluster, it becomes too big to fit through a network. At this point, it is easier to move the program to the data than to slice the data and attempt to move it to compute resources. There are software frameworks to help with this slice/dice and store operations. Say hello to Hadoop, MapReduce and Spark.
The simple idea being that data can be correctly distributed and algorithms can move to the data which stays at rest. Or as Francis Bacon put it "If the mountain will not come to MATLAB, then MATLAB must go to the mountain". 🙂
I can go on demystifying what these software stacks do but for now it is sufficient to note that MathWorks products play well. Both our distributed compute products and compiler products embrace these frameworks with out-of-box shrink-wrapped functionality.
Support from Language Semantics
This support in handling big data problems extends down into the language semantics with constructs such as datastores, tall arrays, etc that are built to harness the underlying technology such as Apache Spark.
Cloud based storage products such as MATLAB Drive offer access to the data as a hosted solution that works across multiple devices. Alternatively, if you have your data in offerings like the Simple Storage Service (S3), then language constructs such as the datastore offer connectivity to the data across MATLAB, distributed compute and application deployment products.
If you are a developer that is looking for the MATLAB desktop experience as a service, you should definitely checkout the MATLAB Online offering that offers the MATLAB flagship software as a service (SaaS).
What about Simulink?
As of our latest R2017a release, Simulink users can leverage features such as parsim that can bring the power of parallelized cloud based simulation to model-based design workflows. These approaches leverage products such as the MATLAB Distributed Computing Server to scale Simulink simulations.
Vendor-agnostic capabilities with vendor-specific convenience
The most recognizable of the public cloud vendors are Amazon with their Web Services (AWS), Microsoft with Azure and the Google Cloud Platform. Clients of ours are already leveraging the cloud to scale their workflows both for research and production, with success.
If there is one thing you should take away from this post, it should be that if your future looks cloudy (pun intended), the MathWorks can help!
Please use the comments and let us know is any of this resonates and what part of this vast array of technologies is of deeper interest.
Get the MATLAB code
Published with MATLAB® R2017a