File Exchange Pick of the Week

Our best user submissions

New MathWorks Tools

Sean‘s pick this week is to revisit three prior Picks of the Week. While reading through the 2017 review and reviews for the previous years, I saw a few picks where MathWorks has now incorporated similar functionality into the product.

Contents

Extract Text from PDF Documents

Jiro’s original pick is here: https://blogs.mathworks.com/pick/2017/07/21/extract-text-from-pdf-documents/.

This functionality was added in the Text Analytics Toolbox, released in R2017b. The function to use is extractFileText. Note that this is a generic text reading function that can read from PDF, Microsoft Word, or text files.

Here, I’ll read the second page of the 2017 Pick of the Week index exported to pdf.

txt = extractFileText('2017review.pdf', 'Pages', 3);
lines = splitlines(txt);
lines(strlength(lines) > 0)
ans = 
  46×1 string array
    "1/3/2018 Looking back: 2017 in review » File Exchange Pick of the Week"
    "https://blogs.mathworks.com/pick/2017/12/29/looking-back-2017-in-review/ 3/5"
    "Deep Learning Tutorial Series"
    "Johanna Pingel"
    "Download code and watch video series to learn and implement deep learning techniques"
    "__________________________________________________________________________"
    "Process Manager"
    "Brian Lau"
    "Matlab class for launching and managing asynchronous processes"
    "__________________________________________________________________________"
    "CatStruct"
    "Jos (10584)"
    "Concatenate/merge structures (v4.1, feb 2015)."
    "__________________________________________________________________________"
    "Source Control Information Block"
    "Gavin Walker"
    "Display Simulink project source control information in the Simulink editor"
    "__________________________________________________________________________"
    "CNN for Old Japanese Character Classification"
    "Akira Agata"
    "Create Simple Deep Learning Network for Old Japanese Character Classification"
    "__________________________________________________________________________"
    "Fidget Spinner (Simscape Multibody)"
    "Pavel Roslovets"
    "3DOF gyro psysical model of fidger spinner"
    "__________________________________________________________________________"
    "Signature Tool"
    "McSCert"
    "The Signature Tool extracts the interface of a Simulink subsystem."
    "__________________________________________________________________________"
    "“Read text from a PDF document”"
    "Derek Wood"
    "Read the text from a simple PDF document into MATLAB as a string"
    "__________________________________________________________________________"
    "Real-Time Pacer for Simulink"
    "Gautam Vallabha"
    "Simulink block for forcing a simulation to run in real (wall clock) time"
    "__________________________________________________________________________"
    "impressionism"
    "David Mills"
    "impressionism takes an RGB image and “paints” it as though it were an impressionist painting."
    "__________________________________________________________________________"
    "OOP example"
    "per isakson"
    "tracer4m traces calls to methods and functions."
    "__________________________________________________________________________"

Base 64 Encoding

Jiro’s original pick is here: https://blogs.mathworks.com/pick/2016/12/23/encode-images-as-base64/

We were unaware that this was added as part of the HTTP interface in R2016b. The functions to use are matlab.net.base64encode and matlab.net.base64decode to encode and decode images.

Here I will encode and decode an image of two cars on my desk.

import matlab.net.*
I = imread('lambos.jpg');
base64 = base64encode(I(:));
cars = base64decode(base64);
imshow(reshape(cars, size(I)))

Word Cloud

My original pick is here: https://blogs.mathworks.com/pick/2015/10/09/word-data-visualization/.

This capability was added to MATLAB in R2017b as the wordcloud function. The Text Analytics Toolbox further enhances it and provides other ways to display text as well. Let’s see the word cloud of the Pick of the Week review.

txt = extractFileText('2017review.pdf');
wordcloud(txt);

It makes me happy to see “Cannibals” in there!

Comments

Give the MathWorks versions a try and let us know what you think here.

Published with MATLAB® R2018a

|
  • print

Comments

To leave a comment, please click here to sign in to your MathWorks Account or create a new one.