{"id":3444,"date":"2018-03-16T14:09:33","date_gmt":"2018-03-16T19:09:33","guid":{"rendered":"https:\/\/blogs.mathworks.com\/videos\/?p=3444"},"modified":"2018-03-17T10:32:42","modified_gmt":"2018-03-17T15:32:42","slug":"using-the-mapreduce-technique-to-process-500gb-of-server-logs","status":"publish","type":"post","link":"https:\/\/blogs.mathworks.com\/videos\/2018\/03\/16\/using-the-mapreduce-technique-to-process-500gb-of-server-logs\/","title":{"rendered":"Using the MapReduce Technique to Process 500GB of Server Logs"},"content":{"rendered":"<p>Here I&#8217;m using the MapReduce functionality in Parallel Processing Toolbox to process several hundred GBs of server logs from our web site. I want to be able to visualize the counts per minute of certain quantities and also filter the data to look for certain special requests to our website. I start small, getting my algorithm to work with one file first and without parallel processing. But MapReduce lets you write it in a way that it will work on any size and with parallel processing.<\/p>\n<p>It eventually took 50 min to process one day&#8217;s worth of data (72GB) and about 14hrs to do 8 days (562GB).  I think I&#8217;ll profile the small dataset problem to see where its spending the time, but suspect it is all file I\/O.<br \/>\n<\/p>\n<pre>\r\n>> sum(minuteResults.totalRequests)\r\nans =\r\n   1.3388e+09\r\n>> bar(minuteResults.timeMinute ,minuteResults.totalRequests)\r\n\r\n<\/pre>\n<p><a href=\"https:\/\/blogs.mathworks.com\/videos\/files\/barRequests.png\"><img decoding=\"async\" loading=\"lazy\" width=\"733\" height=\"420\" src=\"https:\/\/blogs.mathworks.com\/videos\/files\/barRequests.png\" alt=\"\" class=\"alignnone size-full wp-image-3462\" \/><\/a><br \/>\nFeatures covered in this video include:<\/p>\n<ul>\n<li><tt><a href=\"https:\/\/www.mathworks.com\/help\/matlab\/mapreduce.html\">mapreduce<\/a><\/tt><\/li>\n<li><tt>varfun<\/tt><\/li>\n<\/ul>\n<p><div class=\"row\"><div class=\"col-xs-12 containing-block\"><div class=\"bc-outer-container add_margin_20\"><videoplayer><div class=\"video-js-container\"><video data-video-id=\"5752971512001\" data-video-category=\"blog\" data-autostart=\"false\" data-account=\"62009828001\" data-omniture-account=\"mathwgbl\" data-player=\"rJ9XCz2Sx\" data-embed=\"default\" id=\"mathworks-brightcove-player\" class=\"video-js\" controls><\/video><script src=\"\/\/players.brightcove.net\/62009828001\/rJ9XCz2Sx_default\/index.min.js\"><\/script><script>if (typeof(playerLoaded) === 'undefined') {var playerLoaded = false;}(function isVideojsDefined() {if (typeof(videojs) !== 'undefined') {videojs(\"mathworks-brightcove-player\").on('loadedmetadata', function() {playerLoaded = true;});} else {setTimeout(isVideojsDefined, 10);}})();<\/script><\/div><\/videoplayer><\/div><\/div><\/div><br \/>\nPlay the video in full screen mode for a better viewing experience.\u00a0<\/p>\n","protected":false},"excerpt":{"rendered":"<div class=\"thumbnail thumbnail_asset asset_overlay video\"><a href=\"https:\/\/blogs.mathworks.com\/videos\/2018\/03\/16\/using-the-mapreduce-technique-to-process-500gb-of-server-logs\/?dir=autoplay\"><img decoding=\"async\" src=\"https:\/\/cf-images.us-east-1.prod.boltdns.net\/v1\/static\/62009828001\/abe99f11-4862-478f-b8d8-8bb9241853a7\/e0d041eb-60b6-4ce9-a9d1-1318779dab62\/1280x720\/match\/image.jpg\" onError=\"this.style.display ='none';\"\/><\/p>\n<div class=\"overlay_container\">\n      <span class=\"icon-video icon_color_null\"><time class=\"video_length\">41:48<\/time><\/span>\n      <\/div>\n<p>      <\/a><\/div>\n<p>Here I&#8217;m using the MapReduce functionality in Parallel Processing Toolbox to process several hundred GBs of server logs from our web site. I want to be able to visualize the counts per minute&#8230; <a class=\"read-more\" href=\"https:\/\/blogs.mathworks.com\/videos\/2018\/03\/16\/using-the-mapreduce-technique-to-process-500gb-of-server-logs\/\">read more >><\/a><\/p>\n","protected":false},"author":133,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":[],"categories":[4],"tags":[],"_links":{"self":[{"href":"https:\/\/blogs.mathworks.com\/videos\/wp-json\/wp\/v2\/posts\/3444"}],"collection":[{"href":"https:\/\/blogs.mathworks.com\/videos\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/blogs.mathworks.com\/videos\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/blogs.mathworks.com\/videos\/wp-json\/wp\/v2\/users\/133"}],"replies":[{"embeddable":true,"href":"https:\/\/blogs.mathworks.com\/videos\/wp-json\/wp\/v2\/comments?post=3444"}],"version-history":[{"count":7,"href":"https:\/\/blogs.mathworks.com\/videos\/wp-json\/wp\/v2\/posts\/3444\/revisions"}],"predecessor-version":[{"id":3464,"href":"https:\/\/blogs.mathworks.com\/videos\/wp-json\/wp\/v2\/posts\/3444\/revisions\/3464"}],"wp:attachment":[{"href":"https:\/\/blogs.mathworks.com\/videos\/wp-json\/wp\/v2\/media?parent=3444"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/blogs.mathworks.com\/videos\/wp-json\/wp\/v2\/categories?post=3444"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/blogs.mathworks.com\/videos\/wp-json\/wp\/v2\/tags?post=3444"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}