Stuart’s MATLAB VideosWatch and Learn

This is machine translation

Translated by
Mouseover text to see original. Click the button below to return to the English version of the page.

Creating a MATLAB Function to Split a URL into Component Parts, Part 13

Posted by Stuart McGarrity,

I often need to break up a URL string into components such as protocol, hostname, file path, query parameters, etc. Here, I create a function to do this, similar to the fileparts function in MATLAB.

Topics I cover in this code-along style video include:

• Writing a function
• Writing tests for functions using assert
• Matching regular expressions using regexp
• Using data types including char arrays,  structures (accessed via dynamic field names)

Play the video in full screen mode for a better viewing experience.

Jan replied on : 1 of 3
Hello Stuart, Thanks for the vlog. It made me to check documentation for "regexp" again, to go back to an option "names" I have not been using recently. It creates structure of named tokens created from searched text, so it's perfect for presented task. It feels like "single" regular expression could be more transparent anyway. Let's take URL scheme like this: "protocol://host/filename?query#fragment" Neither "protocol", "host", "filename", "query" nor "fragment" consist of signs ":/?#". All to do is to create expresions to handle each part of URL
%% example url
>> url1 = 'https://www.mathworks.com/matlabcentral/fileexchange/?term=example#total_results_az_footer';

%% URL parts expressions
>> protocol = "([^:/?#]+(?=:))?:?(?://)?";  % incl. phrase strartpoint and "://" at the end
>> host = "([^:/?#]+)?(:\d+)?/?";               % incl. ":port/" at the end
>> filename = "([^?#]+)?";
>> query = "\??([^#]+)?";                      % incl "?" at the beggining
>> fragment = "#?(;.*)\$";                    % incl. "#" at the beggining and phrase endpoint

>> urlScheme = strcat( protocol, host, filename, query, fragment);
>> results1 = regexp( url, urlScheme, 'names')

results1 =

protocol: 'http'
host: 'www.mathworks.com'
filename: 'matlabcentral/fileexchange/'
query: 'term=example'
fragment: 'total_results_az_footer'

>> url2 = 'blogs.mathworks.com/videos/2016/05/26/creating-a-matlab-function-to-split-a-url-into-component-parts-part-1/?dir=autoplay';
>> results2 = regexp(url2, urlScheme, 'names')

results2 =

protocol: ''
host: 'blogs.mathworks.com'
filename: 'videos/2016/05/26/creating-a-matlab-function-to-split-a-url-into-component-parts-part-1/'
query: 'dir=autoplay'
fragment: ''

Then "results.host" can be easily cut into domain parts via "strsplit", etc., or more sophisticated url scheme can be used.
Jan replied on : 2 of 3
* Instead of &lt there should be < everywhere it occurs. It seems preview for comments doesn't work perfectly.
Stuart McGarrity replied on : 3 of 3
This is great Jan. I was not aware of the 'names' feature. Your right, its perfect. I'm also impressed by your regular expression Kung Fu. I have recorded one or two other parts to this video which I will still post but I may follow up with another one talking about your version. You could put yours on the File Exchange.