Extract text from PDF documents
Jiro‘s pick this week is “Read text from a PDF document” by Derek Wood.
Ah, this is a nice entry. I was hoping for something like this. I keep track of my household expenses using MATLAB. I know, I know. Online banking now makes it easy to manage your expense, but I like using MATLAB to give me various views into my finances. One of the tasks I’m currently doing manually is entering of the expenses into my program. Some bank statements can be downloaded as CSV files, but one of my financial institutions only provide PDF files for the statements. For those statements, I would enter them in manually.
Derek’s pdfRead lets me automate this! His function, reads in any text information found in the PDF file. For a structured PDF file, like a bank statement, it’s fairly easy to extract out the necessary information from that text.
Just to show you how it works, I saved our MathWorks Blogs top page as a PDF file.
Then, I simply called pdfRead.
p = pdfRead('blogs.pdf');
p{1}
ans =
'Get the inside view on MATLAB & Simulink!
Cleve’s Corner: Cleve Moler
on Mathematics and
Computing
Scientific computing, math & more
Loren on the Art of MATLAB
Turn ideas into MATLAB
Guy on Simulink
Simulink & Model-Based Design
Steve on Image Processing
Concepts, algorithms & MATLAB
File Exchange Pick of the
Week
Our best user submissions
Stuart’s MATLAB Videos
Watch and Learn
Developer Zone
Advanced Software Development with
MATLAB
Behind the Headlines
MATLAB and Simulink behind today’s
news and trends
Hans on IoT
ThingSpeak, MATLAB, and the
Internet of Things
Racing Lounge
Best practices and teamwork for
student competitions
MATLAB Community
MATLAB, community & more
Recent Posts
JUL 20 Send Bulk Sensor Data to ThingSpeak for Analysis by Hans Scharler
JUL 18 MIT’s new robot can 3D print a building... by Lisa Harvey
JUL 17 What is the Condition Number of a Matrix? by Cleve Moler
JUL 14 Juno Delivers by Steve Eddins (1)
JUL 14 What are the functional inputs and outputs of... by Guest Picker
JUL 12 Developing a Function that Replicates an Excel Worksheet... by Stuart McGarrity
JUL 10 Web Scraping and Mining Unstructured Data with MATLAB by Loren Shure
JUL 7 Watering my Plants with Simscape Fluids by Guy Rouleau
JUL 6 Don’t Mock Me! by Andy Campbell
JUL 5 Building practical skills through student competitions by Christoph Hahn
JUN 30 Cody Turns One Million by Ned Gulley (2)'
Comments
Give it a try and let us know what you think here or leave a comment for Derek.
- 범주:
- Picks


댓글
댓글을 남기려면 링크 를 클릭하여 MathWorks 계정에 로그인하거나 계정을 새로 만드십시오.