Developer Zone

Advanced Software Development with MATLAB

A quick dip in the lake

Posted by Arvind Hosagrahara,

As summer vacation comes to an end and schools get back into session, I hope that you, the reader, enjoyed the fine weather and great outdoors.

Tenaya Lake

Cloud - check! Lake - check! Any guesses as to where I took this picture?

The midwest US, where I lived for more than a decade, is home to nearly one-fifth of the freshwater on the planet. It was easy to take lakes for granted in beautiful Michigan. It seems like a lifetime ago especially when contrasted with the beautiful Bay Area. On March of this year, California declared the end of a 7+ year drought. Surviving the austerity measures, and experiencing a glimpse of a dry future, gave me a new respect for our beautiful planet and its limited resources.

The lake in this post is a place of great beauty too. The Azure™ Data Lake™ Storage enables developers, data scientists, analysts to store and analyze data of any shape, size, speed, etc. Such managed services allow engineers and domain experts to focus on the math, algorithms, and processing while delegating the actual maintenance and operation of the infrastructure to experts.

MATLAB developers can leverage cloud services from their desktop MATLAB environments (in End-user mode) as well as from our compiled, production and parallel computing products (in Service-to-service mode). Any of these modalities requires a configuration to be provided in the form of a JSON file.

These capabilities are more significant when viewed through the lens of our reference architectures for the Azure cloud. A MATLAB interface for the Azure Data Lake Storage is available on github.com.

The rest of this post will assume that you have a configured Azure Data Lake Storage service. A skeletal set of instructions on how to set up the service can be found in our documentation.

The interface consists of MATLAB classes that wrap an underlying SDK. This provides the greatest flexibility to our developers since the underlying artifacts can be rebuilt to point to updated versions of the underlying SDK. Additionally, unit tests build on the MATLAB unit testing framework ensures that the functionality works with the newer versions of the SDK

For example, for interactive (end-user) usage, create a file called azuredatalakestore.json:

{ "AccountFQDN" = "mydatalakename.azuredatalakestore.net", "NativeAppId" = "1d184e4a-62c0-4244-8b68-4cffe757131c" }

Running the startup.m file will make the tooling available to MATLAB.

>> startup Adding Interface for Azure Data Lake Storage Paths Adding /home/username/mydir/Azure-Data-Lake-Storage/Software/MATLAB/app Adding /home/username/mydir/Azure-Data-Lake-Storage/Software/MATLAB/app/functions Adding /home/username/mydir/Azure-Data-Lake-Storage/Software/MATLAB/app/system Adding /home/username/mydir/Azure-Data-Lake-Storage/Software/MATLAB/lib Adding /home/username/mydir/Azure-Data-Lake-Storage/Software/MATLAB/config Running post setup operations Adding: /home/username/mydir/Azure-Data-Lake-Storage/Software/MATLAB/lib/jar/target/azure-dl-sdk-0.1.0.jar

Setup

Once configured, users can connect to the data lake via a MATLAB client.

dlClient = azure.datalake.store.ADLStoreClient;
dlClient.initialize();

Listing stored files

Data Lake uses Unix style forward slash path separators i.e. '/' and names are case sensitive. Drive letters are not used and the root directory is indicated by a leading forward slash.

dirTable = dlClient.enumerateDirectory('/')

By default, all metadata in the listing is returned as 36 char strings. These can be handy for programmatic use but not very readable. Using a simple configuration, it is possible to instruct the tooling to return human-readable content.

rep = azure.datalake.store.UserGroupRepresentation.UPN;
dirTable = dlClient.enumerateDirectory('/', 'UserGroupRepresentation', rep)

The results of the listing show up as easy to handle MATLAB tables.

dirTable = 4x13 table Name FullName Length Group User LastAccessTime LastModified Type Blocksize ReplicationFactor Permission AclBit ExpiryTime _________________ __________________ ______ ____________________ ____________________ ______________________ ______________________ ___________ ___________ _________________ __________ ______ __________ 'MyTestDirectory' '/MyTestDirectory' [ 0] 'myemail@mycompany.com' 'MATLAB' [10-Jan-2018 16:17:46] [10-Jan-2018 16:17:46] [DIRECTORY] [ 0] [0] '770' [0] [NaT] 'README.md' '/README.md' [1707] 'myemail@mycompany.com' 'myemail@mycompany.com' [21-Dec-2017 14:50:14] [21-Dec-2017 14:50:14] [FILE ] [268435456] [1] '770' [0] [NaT] 'RELEASENOTES.md' '/RELEASENOTES.md' [ 651] 'myemail@mycompany.com' 'myemail@mycompany.com' [21-Dec-2017 14:50:08] [21-Dec-2017 14:50:08] [FILE ] [268435456] [1] '770' [0] [NaT] 'Readme.txt' '/Readme.txt' [2962] 'myemail@mycompany.com' 'myemail@mycompany.com' [08-Dec-2017 18:16:02] [08-Dec-2017 18:16:02] [FILE ] [268435456] [1] '770' [1] [NaT]

Create, Retrieve, Update, Delete (CRUD)

Creating a directory/folder on the service is accomplished via the client.

dlClient.createDirectory('myDirName')

It is also possible to create directories with specific access permissions using Unix style octal permissions:

dlClient.createDirectory('myMatFiles', '744')

Uploading content to the Data Lake storage should be trivial. The first argument specifies the name of the file as visible on the Data Lake.

dlClient.upload('myadlfilename.csv', 'mylocalfilename.csv');

Similarly, downloading content from the Data Lake is equally simple.

dlClient.download('myadlfilename.csv', 'mylocalfilename.csv');

Deleting content uses the full path of the file. It is also possible to delete an entire directory tree in one operation.

tf = dlClient.delete('/my/path/to/file1.mat');
tf = dlClient.deleteRecursive('/myMatFiles');

The tooling offers other features such as saving/retrieving of variables directly from the service and connecting to appendable streams. All of this is described in the documentation.

In closing, client-based access in MATLAB extends the capabilities of MATLAB to connect to nearly unlimited amounts of storage. Easy to use algorithms provided by MATLAB and its toolboxes meet the power of cloud storage and infrastructure optimized to handle your most demanding data storage and analysis scenarios. These interfaces serve as building blocks to tackle your application-specific problems at-scale. It really is a tale that transcends time...

Lady in the Lake


Get the MATLAB code

Published with MATLAB® R2019a

21 views (last 30 days)  | |

Comments

To leave a comment, please click here to sign in to your MathWorks Account or create a new one.