Stuart’s MATLAB Videos

Watch and Learn

Handling Multiple Match Tokens in My Web App

Now I want to modify the MATLAB app I recently made to scrape web pages, so that it can handle multiple multi-part patterns. i.e. where there are more than one token in the regular expression match pattern.

I’ll use this example where I want to find the href attributes and the link text for all the links on a web page (for multiple pages).

So, for the example string:

 <a href="/academia.html?s_tid=gn_acad">Academia</a>

I want to extract:

  1. /academia.html?s_tid=gn_acad
  2. Academia

So I will use this regex pattern:

<a href="([^"]*)">([^<]*)</a>

Features covered in this code-along style video include:

Play the video in full screen mode for a better viewing experience. 

|
  • print

评论

要发表评论,请点击 此处 登录到您的 MathWorks 帐户或创建一个新帐户。