{"id":2190,"date":"2019-08-20T17:26:41","date_gmt":"2019-08-20T21:26:41","guid":{"rendered":"https:\/\/blogs.mathworks.com\/developer\/?p=2190"},"modified":"2019-08-20T17:26:41","modified_gmt":"2019-08-20T21:26:41","slug":"a-quick-dip-in-the-lake","status":"publish","type":"post","link":"https:\/\/blogs.mathworks.com\/developer\/2019\/08\/20\/a-quick-dip-in-the-lake\/","title":{"rendered":"A quick dip in the lake"},"content":{"rendered":"<p>As summer vacation comes to an end and schools get back into session, I hope that you, the reader, enjoyed the fine weather and great outdoors. <\/p>\r\n\r\n<p><a href=\"https:\/\/blogs.mathworks.com\/developer\/files\/DataLakeScreencap.jpg\"><img decoding=\"async\" loading=\"lazy\" width=\"1024\" height=\"498\" src=\"https:\/\/blogs.mathworks.com\/developer\/files\/DataLakeScreencap-1024x498.jpg\" alt=\"Tenaya Lake\" class=\"alignleft size-large wp-image-2192\" \/><\/a><\/p>\r\n<p><i>Cloud - check! Lake - check! Any guesses as to where I took this picture?<\/i><\/p>\r\n\r\n<p>The midwest US, where I lived for more than a decade, is home to nearly one-fifth of the freshwater on the planet. It was easy to take lakes for granted in beautiful Michigan.  It seems like a lifetime ago especially when contrasted with the beautiful Bay Area.  On March of this year, California declared the end of a 7+ year drought. Surviving the austerity measures, and experiencing a glimpse of a dry future, gave me a new respect for our beautiful planet and its limited resources. <\/p>\r\n\r\n<p>The lake in this post is a place of great beauty too. The <a href=\"https:\/\/azure.microsoft.com\/en-us\/solutions\/data-lake\/\" rel=\"noopener\" target=\"_blank\">Azure&#x2122; Data Lake&#x2122; Storage<\/a> enables developers, data scientists, analysts to store and analyze data of any shape, size, speed, etc.  Such managed services allow engineers and domain experts to focus on the math, algorithms, and processing while delegating the actual maintenance and operation of the infrastructure to experts.<\/p>\r\n\r\n<p>MATLAB developers can leverage cloud services from their desktop MATLAB environments (in <a href=\"https:\/\/github.com\/mathworks-ref-arch\/matlab-azure-data-lake\/blob\/master\/Documentation\/GettingStarted.md#end-user-authentication\" rel=\"noopener\" target=\"_blank\">End-user<\/a> mode) as well as from our <a href=\"https:\/\/www.mathworks.com\/products\/compiler.html\" rel=\"noopener\" target=\"_blank\">compiled<\/a>, <a href=\"https:\/\/www.mathworks.com\/products\/matlab-production-server.html\" rel=\"noopener\" target=\"_blank\">production<\/a> and <a href=\"https:\/\/www.mathworks.com\/products\/matlab-parallel-server.html\" rel=\"noopener\" target=\"_blank\">parallel <\/a>computing products (in <a href=\"https:\/\/github.com\/mathworks-ref-arch\/matlab-azure-data-lake\/blob\/master\/Documentation\/GettingStarted.md#service-to-service-authentication-using-azure-active-directory\" rel=\"noopener\" target=\"_blank\">Service-to-service<\/a> mode). Any of these modalities requires a configuration to be provided in the form of a JSON file.<\/p>\r\n\r\n<p>These capabilities are more significant when viewed through the lens of our <a href=\"https:\/\/blogs.mathworks.com\/developer\/2018\/06\/29\/on-cloud-reference-architectures\/\" rel=\"noopener\" target=\"_blank\">reference architectures<\/a> for the Azure cloud. <a href=\"https:\/\/github.com\/mathworks-ref-arch\/matlab-azure-data-lake\" rel=\"noopener\" target=\"_blank\">A MATLAB interface for the Azure Data Lake Storage is available on github.com<\/a>.<\/p>\r\n\r\n<p>The rest of this post will assume that you have a configured Azure Data Lake Storage service. A skeletal set of instructions on <a href=\"https:\/\/github.com\/mathworks-ref-arch\/matlab-azure-data-lake\/blob\/master\/Documentation\/GettingStarted.md\" rel=\"noopener\" target=\"_blank\">how to set up the service can be found in our documentation<\/a>. <\/p>\r\n\r\n<p>The interface consists of MATLAB classes that wrap an underlying SDK. This provides the greatest flexibility to our developers since the underlying artifacts can be <a href=\"https:\/\/github.com\/mathworks-ref-arch\/matlab-azure-data-lake\/blob\/master\/Documentation\/Rebuild.md\" rel=\"noopener\" target=\"_blank\">rebuilt<\/a> to point to updated versions of the underlying SDK. Additionally, unit tests build on the MATLAB unit testing framework ensures that the functionality works with the newer versions of the SDK<\/p> \r\n\r\n<p>For example, for interactive (end-user) usage, create a file called <i>azuredatalakestore.json<\/i>:<\/p>\r\n\r\n<code>{\r\n    \"AccountFQDN\" = \"mydatalakename.azuredatalakestore.net\",\r\n    \"NativeAppId\" = \"1d184e4a-62c0-4244-8b68-4cffe757131c\"\r\n}\r\n<\/code>\r\n\r\n<p>Running the startup.m file will make the tooling available to MATLAB.<\/p>\r\n\r\n<code>>> startup\r\nAdding Interface for Azure Data Lake Storage Paths\r\nAdding \/home\/username\/mydir\/Azure-Data-Lake-Storage\/Software\/MATLAB\/app\r\nAdding \/home\/username\/mydir\/Azure-Data-Lake-Storage\/Software\/MATLAB\/app\/functions\r\nAdding \/home\/username\/mydir\/Azure-Data-Lake-Storage\/Software\/MATLAB\/app\/system\r\nAdding \/home\/username\/mydir\/Azure-Data-Lake-Storage\/Software\/MATLAB\/lib\r\nAdding \/home\/username\/mydir\/Azure-Data-Lake-Storage\/Software\/MATLAB\/config\r\nRunning post setup operations\r\nAdding: \/home\/username\/mydir\/Azure-Data-Lake-Storage\/Software\/MATLAB\/lib\/jar\/target\/azure-dl-sdk-0.1.0.jar\r\n<\/code>\r\n\r\n\r\n\r\n\r\n\r\n\r\n<div class=\"content\"><p><b>Setup<\/b><\/p><p>Once configured, users can connect to the data lake via a MATLAB client.<\/p><pre class=\"codeinput\">dlClient = azure.datalake.store.ADLStoreClient;\r\ndlClient.initialize();\r\n<\/pre><p><b>Listing stored files<\/b><\/p><p>Data Lake uses Unix style forward slash path separators i.e. '\/' and names are case sensitive. Drive letters are not used and the root directory is indicated by a leading forward slash.<\/p><pre class=\"codeinput\">dirTable = dlClient.enumerateDirectory(<span class=\"string\">'\/'<\/span>)\r\n<\/pre><p>By default, all metadata in the listing is returned as 36 char strings. These can be handy for programmatic use but not very readable. Using a simple configuration, it is possible to instruct the tooling to return human-readable content.<\/p><pre class=\"codeinput\">rep = azure.datalake.store.UserGroupRepresentation.UPN;\r\ndirTable = dlClient.enumerateDirectory(<span class=\"string\">'\/'<\/span>, <span class=\"string\">'UserGroupRepresentation'<\/span>, rep)\r\n<\/pre>\r\n\r\n<p>The results of the listing show up as easy to handle MATLAB tables.<\/p>\r\n<code>dirTable =\r\n\r\n  4x13 table\r\n\r\n          Name                FullName         Length           Group                    User                LastAccessTime             LastModified            Type         Blocksize     ReplicationFactor    Permission    AclBit    ExpiryTime\r\n    _________________    __________________    ______    ____________________    ____________________    ______________________    ______________________    ___________    ___________    _________________    __________    ______    __________\r\n\r\n    'MyTestDirectory'    '\/MyTestDirectory'    [   0]    'myemail@mycompany.com'    'MATLAB'                [10-Jan-2018 16:17:46]    [10-Jan-2018 16:17:46]    [DIRECTORY]    [        0]    [0]                  '770'         [0]       [NaT]     \r\n    'README.md'          '\/README.md'          [1707]    'myemail@mycompany.com'    'myemail@mycompany.com'    [21-Dec-2017 14:50:14]    [21-Dec-2017 14:50:14]    [FILE     ]    [268435456]    [1]                  '770'         [0]       [NaT]     \r\n    'RELEASENOTES.md'    '\/RELEASENOTES.md'    [ 651]    'myemail@mycompany.com'    'myemail@mycompany.com'    [21-Dec-2017 14:50:08]    [21-Dec-2017 14:50:08]    [FILE     ]    [268435456]    [1]                  '770'         [0]       [NaT]     \r\n    'Readme.txt'         '\/Readme.txt'         [2962]    'myemail@mycompany.com'    'myemail@mycompany.com'    [08-Dec-2017 18:16:02]    [08-Dec-2017 18:16:02]    [FILE     ]    [268435456]    [1]                  '770'         [1]       [NaT]    \r\n<\/code>\r\n\r\n<p><b>Create, Retrieve, Update, Delete (CRUD)<\/b><\/p><p>Creating a directory\/folder on the service is accomplished via the client.<\/p><pre class=\"codeinput\">dlClient.createDirectory(<span class=\"string\">'myDirName'<\/span>)\r\n<\/pre><p>It is also possible to create directories with specific access permissions using Unix style octal permissions:<\/p><pre class=\"codeinput\">dlClient.createDirectory(<span class=\"string\">'myMatFiles'<\/span>, <span class=\"string\">'744'<\/span>)\r\n<\/pre><p>Uploading content to the Data Lake storage should be trivial. The first argument specifies the name of the file as visible on the Data Lake.<\/p><pre class=\"codeinput\">dlClient.upload(<span class=\"string\">'myadlfilename.csv'<\/span>, <span class=\"string\">'mylocalfilename.csv'<\/span>);\r\n<\/pre><p>Similarly, downloading content from the Data Lake is equally simple.<\/p><pre class=\"codeinput\">dlClient.download(<span class=\"string\">'myadlfilename.csv'<\/span>, <span class=\"string\">'mylocalfilename.csv'<\/span>);\r\n<\/pre><p>Deleting content uses the full path of the file. It is also possible to delete an entire directory tree in one operation.<\/p><pre class=\"codeinput\">tf = dlClient.delete(<span class=\"string\">'\/my\/path\/to\/file1.mat'<\/span>);\r\ntf = dlClient.deleteRecursive(<span class=\"string\">'\/myMatFiles'<\/span>);\r\n<\/pre><p>The tooling offers other features such as saving\/retrieving of variables directly from the service and connecting to appendable streams. All of this is described in the <a href=\"https:\/\/github.com\/mathworks-ref-arch\/matlab-azure-data-lake\/blob\/master\/Documentation\/BasicUsageADL.md\" rel=\"noopener\" target=\"_blank\">documentation<\/a>.<\/p>\r\n\r\n<p>In closing, client-based access in MATLAB extends the capabilities of MATLAB to connect to nearly unlimited amounts of storage. Easy to use algorithms provided by MATLAB and its toolboxes meet the power of cloud storage and infrastructure optimized to handle your most demanding data storage and analysis scenarios. These interfaces serve as building blocks to tackle your application-specific problems at-scale. It really is a tale that transcends time...<\/p>  \r\n\r\n<a href=\"https:\/\/blogs.mathworks.com\/developer\/files\/ladyinthelake.png\"><img decoding=\"async\" loading=\"lazy\" width=\"1024\" height=\"784\" src=\"https:\/\/blogs.mathworks.com\/developer\/files\/ladyinthelake-1024x784.png\" alt=\"Lady in the Lake\" class=\"alignleft size-large wp-image-2232\" \/><\/a>\r\n\r\n<script language=\"JavaScript\"> <!-- \r\n    function grabCode_9c541a80800f4ba48179405f2dc6504a() {\r\n        \/\/ Remember the title so we can use it in the new page\r\n        title = document.title;\r\n\r\n        \/\/ Break up these strings so that their presence\r\n        \/\/ in the Javascript doesn't mess up the search for\r\n        \/\/ the MATLAB code.\r\n        t1='9c541a80800f4ba48179405f2dc6504a ' + '##### ' + 'SOURCE BEGIN' + ' #####';\r\n        t2='##### ' + 'SOURCE END' + ' #####' + ' 9c541a80800f4ba48179405f2dc6504a';\r\n    \r\n        b=document.getElementsByTagName('body')[0];\r\n        i1=b.innerHTML.indexOf(t1)+t1.length;\r\n        i2=b.innerHTML.indexOf(t2);\r\n \r\n        code_string = b.innerHTML.substring(i1, i2);\r\n        code_string = code_string.replace(\/REPLACE_WITH_DASH_DASH\/g,'--');\r\n\r\n        \/\/ Use \/x3C\/g instead of the less-than character to avoid errors \r\n        \/\/ in the XML parser.\r\n        \/\/ Use '\\x26#60;' instead of '<' so that the XML parser\r\n        \/\/ doesn't go ahead and substitute the less-than character. \r\n        code_string = code_string.replace(\/\\x3C\/g, '\\x26#60;');\r\n\r\n        copyright = 'Copyright 2019 The MathWorks, Inc.';\r\n\r\n        w = window.open();\r\n        d = w.document;\r\n        d.write('<pre>\\n');\r\n        d.write(code_string);\r\n\r\n        \/\/ Add copyright line at the bottom if specified.\r\n        if (copyright.length > 0) {\r\n            d.writeln('');\r\n            d.writeln('%%');\r\n            if (copyright.length > 0) {\r\n                d.writeln('% _' + copyright + '_');\r\n            }\r\n        }\r\n\r\n        d.write('<\/pre>\\n');\r\n\r\n        d.title = title + ' (MATLAB code)';\r\n        d.close();\r\n    }   \r\n     --> <\/script><p style=\"text-align: right; font-size: xx-small; font-weight:lighter;   font-style: italic; color: gray\"><br><a href=\"javascript:grabCode_9c541a80800f4ba48179405f2dc6504a()\"><span style=\"font-size: x-small;        font-style: italic;\">Get \r\n      the MATLAB code <noscript>(requires JavaScript)<\/noscript><\/span><\/a><br><br>\r\n      Published with MATLAB&reg; R2019a<br><\/p><\/div><!--\r\n9c541a80800f4ba48179405f2dc6504a ##### SOURCE BEGIN #####\r\n%%\r\n% *Setup*\r\n% \r\n% Once configured, users can connect to the data lake via a MATLAB client.\r\n\r\ndlClient = azure.datalake.store.ADLStoreClient;\r\ndlClient.initialize();\r\n\r\n%% \r\n% *Listing stored files*\r\n% \r\n% Data Lake uses Unix style forward slash path separators i.e. '\/' and \r\n% names are case sensitive. Drive letters are not used and the root directory \r\n% is indicated by a leading forward slash.\r\ndirTable = dlClient.enumerateDirectory('\/')\r\n\r\n%%\r\n% By default, all metadata in the listing is returned as 36 char strings.\r\n% These can be handy for programmatic use but not very readable. Using a\r\n% simple configuration, it is possible to instruct the tooling to return human-readable\r\n% content. \r\n\r\nrep = azure.datalake.store.UserGroupRepresentation.UPN;\r\ndirTable = dlClient.enumerateDirectory('\/', 'UserGroupRepresentation', rep)\r\n\r\n%% \r\n% *Create, Retrieve, Update, Delete (CRUD)*\r\n% \r\n% Creating a directory\/folder on the service is accomplished via the\r\n% client.\r\n\r\ndlClient.createDirectory('myDirName')\r\n\r\n%% \r\n% It is also possible to create directories with specific access\r\n% permissions using Unix style octal permissions:\r\n\r\ndlClient.createDirectory('myMatFiles', '744')\r\n\r\n%% \r\n% Uploading content to the Data Lake storage should be trivial. The first\r\n% argument specifies the name of the file as visible on the Data Lake.\r\n\r\ndlClient.upload('myadlfilename.csv', 'mylocalfilename.csv');\r\n\r\n%% \r\n% Similarly, downloading content from the Data Lake is equally simple.\r\n\r\ndlClient.download('myadlfilename.csv', 'mylocalfilename.csv');\r\n\r\n%%\r\n% Deleting content uses the full path of the file. It is also possible to\r\n% delete an entire directory tree in one operation.\r\ntf = dlClient.delete('\/my\/path\/to\/file1.mat');\r\ntf = dlClient.deleteRecursive('\/myMatFiles');\r\n\r\n%%\r\n% The tooling offers other features such as saving\/retrieving of variables \r\n% directly from the service and connecting to appendable streams. Please\r\n% refer to the documentation for more details.\r\n\r\n##### SOURCE END ##### 9c541a80800f4ba48179405f2dc6504a\r\n-->","protected":false},"excerpt":{"rendered":"<div class=\"overview-image\"><img src=\"https:\/\/blogs.mathworks.com\/developer\/files\/DataLakeScreencap.jpg\" class=\"img-responsive attachment-post-thumbnail size-post-thumbnail wp-post-image\" alt=\"Tenaya Lake\" decoding=\"async\" loading=\"lazy\" \/><\/div><p>As summer vacation comes to an end and schools get back into session, I hope that you, the reader, enjoyed the fine weather and great outdoors. \r\n\r\n\r\nCloud - check! Lake - check! Any guesses as to... <a class=\"read-more\" href=\"https:\/\/blogs.mathworks.com\/developer\/2019\/08\/20\/a-quick-dip-in-the-lake\/\">read more >><\/a><\/p>","protected":false},"author":135,"featured_media":2192,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":[],"categories":[22,30,10],"tags":[],"_links":{"self":[{"href":"https:\/\/blogs.mathworks.com\/developer\/wp-json\/wp\/v2\/posts\/2190"}],"collection":[{"href":"https:\/\/blogs.mathworks.com\/developer\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/blogs.mathworks.com\/developer\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/blogs.mathworks.com\/developer\/wp-json\/wp\/v2\/users\/135"}],"replies":[{"embeddable":true,"href":"https:\/\/blogs.mathworks.com\/developer\/wp-json\/wp\/v2\/comments?post=2190"}],"version-history":[{"count":22,"href":"https:\/\/blogs.mathworks.com\/developer\/wp-json\/wp\/v2\/posts\/2190\/revisions"}],"predecessor-version":[{"id":2238,"href":"https:\/\/blogs.mathworks.com\/developer\/wp-json\/wp\/v2\/posts\/2190\/revisions\/2238"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/blogs.mathworks.com\/developer\/wp-json\/wp\/v2\/media\/2192"}],"wp:attachment":[{"href":"https:\/\/blogs.mathworks.com\/developer\/wp-json\/wp\/v2\/media?parent=2190"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/blogs.mathworks.com\/developer\/wp-json\/wp\/v2\/categories?post=2190"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/blogs.mathworks.com\/developer\/wp-json\/wp\/v2\/tags?post=2190"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}