{"id":8671,"date":"2017-06-30T09:00:42","date_gmt":"2017-06-30T13:00:42","guid":{"rendered":"https:\/\/blogs.mathworks.com\/pick\/?p=8671"},"modified":"2018-09-14T07:02:24","modified_gmt":"2018-09-14T11:02:24","slug":"classifying-old-japanese-characters-using-cnn","status":"publish","type":"post","link":"https:\/\/blogs.mathworks.com\/pick\/2017\/06\/30\/classifying-old-japanese-characters-using-cnn\/","title":{"rendered":"Classifying old Japanese characters using CNN"},"content":{"rendered":"<div class=\"content\"><p><a href=\"https:\/\/www.mathworks.com\/matlabcentral\/profile\/authors\/869871\">Jiro<\/a>&#8216;s pick this week is <a href=\"https:\/\/www.mathworks.com\/matlabcentral\/fileexchange\/62682-cnn-for-old-japanese-character-classification\">CNN for Old Japanese Character Classification<\/a> by one of my colleagues <a href=\"https:\/\/www.mathworks.com\/matlabcentral\/profile\/authors\/8859523\">Akira Agata<\/a>.<\/p><p>Nowadays, I probably go many days without seeing a handwritten document. From computers and smartphones, to TVs and books, almost every character I see is a printed character. So it&#8217;s refreshing to see a handwritten document from time to time.<\/p><p>This demo by Akira uses deep learning (convolutional neural networks) to classify various handwritten Japanese characters. In old writings, these Japanese characters are quite difficult to decipher, because of the cursive nature. For example, here are 100 samples of such characters.<\/p><p><img decoding=\"async\" vspace=\"5\" hspace=\"5\" src=\"https:\/\/blogs.mathworks.com\/images\/pick\/jiro\/potw_character_classification\/jp_char_sample.png\" alt=\"\"> <\/p><p>At a first glance, they just look like scribbles. Perhaps if they were in sentences, you may be able to identify the characters through context. But can we train a network to identify the characters purely by themselves? Akira shows how.<\/p><p>He uses a large Japanese Classics Character Dataset from Center for Open Data in Humanities. Just to show you how difficult it is even for a human, here are some of the samples from the dataset.<\/p><p><img decoding=\"async\" vspace=\"5\" hspace=\"5\" src=\"https:\/\/blogs.mathworks.com\/images\/pick\/jiro\/potw_character_classification\/jp_char_label.png\" alt=\"\"> <\/p><p>I&#8217;ve color-coded it so that the same characters are highlighted with the same color. Some look similar, but others look quite different even for the same character.<\/p><p>Akira uses convolutional neural network (from the <a href=\"https:\/\/www.mathworks.com\/products\/deep-learning.html\">Deep Learning Toolbox<\/a>) to train a network using the character dataset. Typically, when training a network from scratch you need <b>a lot<\/b> of labeled images. No worries there. The dataset he&#8217;s using has over 20,000 images, with over 1000 images for each character he would like to classify. Training such network is very computationally expensive, so you typically want to do this with <a href=\"https:\/\/www.mathworks.com\/help\/deeplearning\/ug\/neural-networks-with-parallel-and-gpu-computing.html\">a GPU, or multiple GPUs<\/a> if you have them. However, from R2017a you can train a convolutional neural network on a CPU. It took me a little over 10 minutes, but I was able to train the network using my CPU-only consumer laptop.<\/p><p>Once trained, Akira tested the network against a test character set (different from the training set). His network achieved over 90% accuracy. Here are a few samples of the correctly classified characters.<\/p><p><img decoding=\"async\" vspace=\"5\" hspace=\"5\" src=\"https:\/\/blogs.mathworks.com\/images\/pick\/jiro\/potw_character_classification\/jp_char_correct.png\" alt=\"\"> <\/p><p>Here are some of the incorrectly classifed characters.<\/p><p><img decoding=\"async\" vspace=\"5\" hspace=\"5\" src=\"https:\/\/blogs.mathworks.com\/images\/pick\/jiro\/potw_character_classification\/jp_char_incorrect.png\" alt=\"\"> <\/p><p>To learn more about the process, take a look through Akira&#8217;s demo.<\/p><p><b>Comments<\/b><\/p><p>Give it a try and let us know what you think <a href=\"https:\/\/blogs.mathworks.com\/pick\/?p=8671#respond\">here<\/a> or leave a <a href=\"https:\/\/www.mathworks.com\/matlabcentral\/fileexchange\/62682-cnn-for-old-japanese-character-classification#comment\">comment<\/a> for Akira.<\/p><script language=\"JavaScript\"> <!-- \r\n    function grabCode_b879f58895e34b3e95cbe20acd0754f9() {\r\n        \/\/ Remember the title so we can use it in the new page\r\n        title = document.title;\r\n\r\n        \/\/ Break up these strings so that their presence\r\n        \/\/ in the Javascript doesn't mess up the search for\r\n        \/\/ the MATLAB code.\r\n        t1='b879f58895e34b3e95cbe20acd0754f9 ' + '##### ' + 'SOURCE BEGIN' + ' #####';\r\n        t2='##### ' + 'SOURCE END' + ' #####' + ' b879f58895e34b3e95cbe20acd0754f9';\r\n    \r\n        b=document.getElementsByTagName('body')[0];\r\n        i1=b.innerHTML.indexOf(t1)+t1.length;\r\n        i2=b.innerHTML.indexOf(t2);\r\n \r\n        code_string = b.innerHTML.substring(i1, i2);\r\n        code_string = code_string.replace(\/REPLACE_WITH_DASH_DASH\/g,'--');\r\n\r\n        \/\/ Use \/x3C\/g instead of the less-than character to avoid errors \r\n        \/\/ in the XML parser.\r\n        \/\/ Use '\\x26#60;' instead of '<' so that the XML parser\r\n        \/\/ doesn't go ahead and substitute the less-than character. \r\n        code_string = code_string.replace(\/\\x3C\/g, '\\x26#60;');\r\n\r\n        copyright = 'Copyright 2017 The MathWorks, Inc.';\r\n\r\n        w = window.open();\r\n        d = w.document;\r\n        d.write('<pre>\\n');\r\n        d.write(code_string);\r\n\r\n        \/\/ Add copyright line at the bottom if specified.\r\n        if (copyright.length > 0) {\r\n            d.writeln('');\r\n            d.writeln('%%');\r\n            if (copyright.length > 0) {\r\n                d.writeln('% _' + copyright + '_');\r\n            }\r\n        }\r\n\r\n        d.write('<\/pre>\\n');\r\n\r\n        d.title = title + ' (MATLAB code)';\r\n        d.close();\r\n    }   \r\n     --> <\/script><p style=\"text-align: right; font-size: xx-small; font-weight:lighter;   font-style: italic; color: gray\"><br><a href=\"javascript:grabCode_b879f58895e34b3e95cbe20acd0754f9()\"><span style=\"font-size: x-small;        font-style: italic;\">Get \r\n      the MATLAB code <noscript>(requires JavaScript)<\/noscript><\/span><\/a><br><br>\r\n      Published with MATLAB&reg; R2017a<br><\/p><p class=\"footer\"><br>\r\n      Published with MATLAB&reg; R2017a<br><\/p><\/div><!--\r\nb879f58895e34b3e95cbe20acd0754f9 ##### SOURCE BEGIN #####\r\n%%\r\n% <https:\/\/www.mathworks.com\/matlabcentral\/profile\/authors\/869871 Jiro>'s\r\n% pick this week is\r\n% <https:\/\/www.mathworks.com\/matlabcentral\/fileexchange\/62682-cnn-for-old-japanese-character-classification CNN for Old\r\n% Japanese Character Classification> by one of my colleagues\r\n% <https:\/\/www.mathworks.com\/matlabcentral\/profile\/authors\/8859523 Akira\r\n% Agata>.\r\n%\r\n% Nowadays, I probably go many days without seeing a handwritten document.\r\n% From computers and smartphones, to TVs and books, almost every character\r\n% I see is a printed character. So it's refreshing to see a handwritten\r\n% document from time to time.\r\n%\r\n% This demo by Akira uses deep learning (convolutional neural networks) to\r\n% classify various handwritten Japanese characters. In old writings, these\r\n% Japanese characters are quite difficult to decipher, because of the\r\n% cursive nature. For example, here are 100 samples of such characters.\r\n%\r\n% <<jp_char_sample.png>>\r\n%\r\n% At a first glance, they just look like scribbles. Perhaps if they were in\r\n% sentences, you may be able to identify the characters through context.\r\n% But can we train a network to identify the characters purely by\r\n% themselves? Akira shows how.\r\n%\r\n% He uses a large Japanese Classics Character Dataset from Center for Open\r\n% Data in Humanities. Just to show you how difficult it is even for a\r\n% human, here are some of the samples from the dataset.\r\n%\r\n% <<jp_char_label.png>>\r\n%\r\n% I've color-coded it so that the same characters are highlighted with the\r\n% same color. Some look similar, but others look quite different even for\r\n% the same character.\r\n%\r\n% Akira uses convolutional neural network (from the\r\n% <https:\/\/www.mathworks.com\/products\/deep-learning.html Deep Learning\r\n% Toolbox>) to train a network using the character dataset. Typically, when\r\n% training a network from scratch you need *a lot* of labeled images. No\r\n% worries there. The dataset he's using has over 20,000 images, with over\r\n% 1000 images for each character he would like to classify. Training such\r\n% network is very computationally expensive, so you typically want to do\r\n% this with\r\n% <https:\/\/www.mathworks.com\/help\/deeplearning\/ug\/neural-networks-with-parallel-and-gpu-computing.html\r\n% a GPU, or multiple GPUs> if you have them. However, from R2017a you can\r\n% train a convolutional neural network on a CPU. It took me a little over\r\n% 10 minutes, but I was able to train the network using my CPU-only\r\n% consumer laptop.\r\n%\r\n% Once trained, Akira tested the network against a test character set\r\n% (different from the training set). His network achieved over 90%\r\n% accuracy. Here are a few samples of the correctly classified characters.\r\n%\r\n% <<jp_char_correct.png>>\r\n%\r\n% Here are some of the incorrectly classifed characters.\r\n%\r\n% <<jp_char_incorrect.png>>\r\n% \r\n% To learn more about the process, take a look through Akira's demo.\r\n%\r\n% *Comments*\r\n%\r\n% Give it a try and let us know what you think\r\n% <https:\/\/blogs.mathworks.com\/pick\/?p=8671#respond here> or leave a\r\n% <https:\/\/www.mathworks.com\/matlabcentral\/fileexchange\/62682-cnn-for-old-japanese-character-classification#comment\r\n% comment> for Akira.\r\n\r\n##### SOURCE END ##### b879f58895e34b3e95cbe20acd0754f9\r\n-->","protected":false},"excerpt":{"rendered":"<div class=\"overview-image\"><img decoding=\"async\"  class=\"img-responsive\" src=\"https:\/\/blogs.mathworks.com\/images\/pick\/jiro\/potw_character_classification\/jp_char_sample.png\" onError=\"this.style.display ='none';\" \/><\/div><p>Jiro&#8216;s pick this week is CNN for Old Japanese Character Classification by one of my colleagues Akira Agata.Nowadays, I probably go many days without seeing a handwritten document. From&#8230; <a class=\"read-more\" href=\"https:\/\/blogs.mathworks.com\/pick\/2017\/06\/30\/classifying-old-japanese-characters-using-cnn\/\">read more >><\/a><\/p>","protected":false},"author":35,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":[],"categories":[16],"tags":[],"_links":{"self":[{"href":"https:\/\/blogs.mathworks.com\/pick\/wp-json\/wp\/v2\/posts\/8671"}],"collection":[{"href":"https:\/\/blogs.mathworks.com\/pick\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/blogs.mathworks.com\/pick\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/blogs.mathworks.com\/pick\/wp-json\/wp\/v2\/users\/35"}],"replies":[{"embeddable":true,"href":"https:\/\/blogs.mathworks.com\/pick\/wp-json\/wp\/v2\/comments?post=8671"}],"version-history":[{"count":6,"href":"https:\/\/blogs.mathworks.com\/pick\/wp-json\/wp\/v2\/posts\/8671\/revisions"}],"predecessor-version":[{"id":10141,"href":"https:\/\/blogs.mathworks.com\/pick\/wp-json\/wp\/v2\/posts\/8671\/revisions\/10141"}],"wp:attachment":[{"href":"https:\/\/blogs.mathworks.com\/pick\/wp-json\/wp\/v2\/media?parent=8671"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/blogs.mathworks.com\/pick\/wp-json\/wp\/v2\/categories?post=8671"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/blogs.mathworks.com\/pick\/wp-json\/wp\/v2\/tags?post=8671"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}