MATLAB Speaks Python

作者 Loren Shure, March 3, 2020

66 次查看（过去 30 天） | 0 个赞 | 8 个评论

MATLAB is a great computing environment for engineers and scientists. MATLAB also provides access to general-purpose languages including C/C++, Java, Fortran, .NET, and Python. Today's guest blogger, Toshi Takeuchi, would like to talk about using MATLAB with Python.

Why Not Use Both?
Setting up Python in MATLAB
Karate Club Dataset
To Import or Not to Import
Extracting Data from a Python Object
Handling a Python List and Tuple
Handling a Python Dict
Visualizing the Graph in MATLAB
Passing Data from MATLAB to Python
Community Detection with NetworkX
Streamlining the Code
Summary

Why Not Use Both?

When we discuss languages, we often encounter a false choice where you feel you must choose one or the other. In reality, you can often use both. Most of us don't work alone. As part of a larger team, your work is often part of a larger workflow that involves multiple languages. That's why MATLAB provides interoperability with other languages including Python. Your colleagues may want to take advantage of your MATLAB code, or you need to access Python-based functionality from your IT systems. MATLAB supports your workflow in both directions.

Today I would like to focus on calling Python from MATLAB to take advantage of some existing Python functionality within a MATLAB-based workflow.

In this post, we will see:

How to import data from Python into MATLAB
How to pass data from MATLAB to Python
How to use a Python package in MATLAB

Setting up Python in MATLAB

MATLAB supports Python 2.7, 3.6 and 3.7 as of this writing (R2019b). And here's another useful link.

I assume you already know how to install and manage Python environments and dependencies on your platform of choice, and I will not discuss it here because it is a complicated topic of its own.

Let's enable access to Python in MATLAB. You need to find the full path to your Python executable. Here is an example for Windows. On Mac and Linux, your operating system command may be different.

pe = pyenv;
if pe.Status == "NotLoaded"
    [~,exepath] = system("where python");
    pe = pyenv('Version',exepath);
end

If that doesn't work, you can also just pass the path to your Python executable as string.

pe =
pyenv('Version','C:\Users\username\AppData\Local\your\python\path\python.exe')

myPythonVersion = pe.Version
py.print("Hello, Python!")

myPythonVersion = 
    "3.7"
Hello, Python!

Karate Club Dataset

Wayne Zachary published a dataset that contains a social network of friendships between 34 members of a karate club at a US university in the 1970s. A dispute that erupted in this club eventually caused it to break up into two factions. We want to see if we can algorithmically predict how the club would break up based on its interpersonal relationships.

This dataset is included in NetworkX, a complex networks package for Python. We can easily get started by importing the dataset using this package.

I am using NetworkX 2.2. To check the package version in Python, you would typically use the version package attribute like this:

>>> networkx.__version__

MATLAB doesn't support class names or other identifiers starting with an underscore(_) character. Instead, use the following to get the help content on the package, including its current version.

> py.help('networkx')

To Import or Not to Import

Typically, you do this at the start of your Python script.

import networkx as nx
G = nx.karate_club_graph()

However, this is not recommended in MATLAB because the behavior of the import function in MATLAB is different from Python's.

The MATLAB way to call Python is to use py, followed by a package or method like this:

nxG = py.networkx.karate_club_graph();

If you must use import, you can do it as follows:

import py.networkx.*
nxG = karate_club_graph();

As you can see, it is hard to remember that we are calling a Python method when you omit py, which can be confusing when you start mixing MATLAB code and Python code within the same script.

Extracting Data from a Python Object

The following returns the karate club dataset in a NetworkX graph object.

myDataType = class(nxG)

myDataType =
    'py.networkx.classes.graph.Graph'

You can see the methods available on this object like this:

methods(nxG)

You can also see the properties of this object.

properties(nxG)

A NetworkX graph contains an edges property that returns an object called EdgeView.

edgeL = nxG.edges;
myDataType = class(edgeL)

myDataType =
    'py.networkx.classes.reportviews.EdgeView'

To use this Python object in MATLAB, the first step is to convert the object into a core Python data type such as a Python list.

edgeL = py.list(edgeL);
myDataType = class(edgeL)

myDataType =
    'py.list'

Now edgeL contains a Python list of node pairs stored as Python tuple elements. Each node pair represents an edge in the graph. Let's see the first 5 tuple values.

listContent = edgeL(1:5)

listContent = 
  Python list with no properties.

    [(0, 1), (0, 2), (0, 3), (0, 4), (0, 5)]

Handling a Python List and Tuple

The Python way for handling a list or tuple typically looks like this, where you process individual elements in a loop.

for i in l: print i             # l is the list
for u, v in t: print((u, v))    # t is the tuple

The MATLAB way is to use arrays instead. The Python list can be converted into a cell array.

edgeC = cell(edgeL);
myDataType = class(edgeC)

myDataType =
    'cell'

This cell array contains Python tuple elements.

myDataType = class(edgeC{1})

myDataType =
    'py.tuple'

The Python tuple can also be converted to a cell array. To convert the inner tuple elements, we can use cellfun.

edgeC = cellfun(@cell, edgeC, 'UniformOutput', false);
myDataType = class(edgeC{1})

myDataType =
    'cell'

The resulting nested cell array contains Python int values.

myDataType = class(edgeC{1}{1})

myDataType =
    'py.int'

Handling a Python Dict

Now let's also extract the nodes from the dataset. We can follow the same steps as we did for the edges.

nodeL = py.list(nxG.nodes.data);
nodeC = cell(nodeL);
nodeC = cellfun(@cell, nodeC, 'UniformOutput', false);

An inner cell array contains both Python int and dict elements.

cellContent = nodeC{1}

cellContent =
  1×2 cell array
    {1×1 py.int}    {1×1 py.dict}

Python dict is a data type based on key-value pairs. In this case, the key is 'club' and the value is 'Mr. Hi'.

cellContent = nodeC{1}{2}

cellContent = 
  Python dict with no properties.

    {'club': 'Mr. Hi'}

Mr. Hi was a karate instructor at the club. The other value in the Python dict is 'Officer', and the officer was a leader of the club. They were the key individuals of the respective factions. The node attribute indicates which faction an individual node belongs to. In this case, Node 1 belonged to Mr. Hi's faction.

The Python way for handling a dict typically looks like this, where you process individual elements in a loop.

for k, v in d.items():
    print (k, v)

Again, the MATLAB way is to use an array. The Python dict can be converted to a struct array.

nodeAttrs = cellfun(@(x) struct(x{2}), nodeC);
myDataType = class(nodeAttrs)

myDataType =
    'struct'

We can extract the individual values into a string array. The club was evidently evenly divided between the factions.

nodeAttrs = arrayfun(@(x) string(x.club), nodeAttrs);
tabulate(nodeAttrs)

    Value    Count   Percent
   Mr. Hi       17     50.00%
  Officer       17     50.00%

Let's extract the nodes that belong to Mr. Hi's faction.

group_hi = 1:length(nodeAttrs);
group_hi = group_hi(nodeAttrs == 'Mr. Hi');

Visualizing the Graph in MATLAB

MATLAB also provides graph and network capabilities and we can use them to visualize the graph.

Let's convert Python int values in the edge list to double and extract the nodes in the edges into separate vectors.

s = cellfun(@(x) double(x{1}), edgeC);
t = cellfun(@(x) double(x{2}), edgeC);

MATLAB graph expects column vectors of nodes. Let's transpose them.

s = s';
t = t';

The node indices in Python starts with 0, but the node indices must start with non-zero value in MATLAB. Let's fix this issue.

s = s + 1;
t = t + 1;

Now, we are ready to create a MATLAB graph object and plot it, with Mr. Hi's faction highlighted.

G = graph(s,t);
G.Nodes.club = nodeAttrs';
figure
P1 = plot(G);
highlight(P1, group_hi,'NodeColor', '#D95319', 'EdgeColor', '#D95319')
title({'Zachary''s Karate Club','Orange represents Mr. Hi''s faction'})

Passing Data from MATLAB to Python

In this case, we already have the NetworkX graph object, but for the sake of completeness, let's see how we could create this Python object within MATLAB.

Let's create an empty NetworkX graph.

nxG2 = py.networkx.Graph();

You can add edges to this graph with the add_edges_from method. It accepts a Python list of tuple elements like this:

[(1,2),(2,3),(3,4)]

This is not a valid syntax in MATLAB. Instead we can use a 1xN cell array of node pairs like this:

myListofTuples = {{1,2},{2,3},{3,4}};

When we pass this nested cell array to py.list, MATLAB automatically converts it to a Python list of tuple elements.

myListofTuples = py.list(myListofTuples);
myDataType = class(myListofTuples{1})

myDataType =
    'py.tuple'

Let's extract the edge list from the MATLAB graph. It is a 78x2 matrix of double values. In MATLAB, double is the default numeric data type.

edgeL = G.Edges.EndNodes;
myDataType = class(edgeL)

myDataType =
    'double'

If we convert an array of double values to a Python list, the values will be converted to Python float, but the default numeric data type in Python is int. So we cannot use double.

listContent = py.list(edgeL(1,:))

listContent = 
  Python list with no properties.

    [1.0, 2.0]

Also, Python indexing is 0-based while MATLAB is 1-based. We need to convert the array of double elements to int8 and change the variable elements to 0-based indexing.

edgeL = int8(edgeL) - 1;
myDataType = class(edgeL)

myDataType =
    'int8'

We can use num2cell to convert the matrix of int8 values to a 78x2 cell array, where each element is in a separate cell.

edgeL = num2cell(edgeL);
myDataType = class(edgeL)

myDataType =
    'cell'

We can place the node pairs in the same cell by converting the 78x2 cell array to a 78x1 cell array using num2cell.

edgeL = num2cell(edgeL,2);
[rows,cols] = size(edgeL)

rows =
    78
cols =
     1

The add_edges_from method expects a 1xN Python list. Now let's turn this into a 1xN cell array by transposing the Nx1 cell array, converting it to a Python list and adding it to the empty NetworkX graph object.

nxG2.add_edges_from(py.list(edgeL'));

The edges were added to the NetworkX graph object. Let's check the first 5 tuple values.

edgeL = py.list(nxG2.edges);
listContent = edgeL(1:5)

listContent = 
  Python list with no properties.

    [(0, 1), (0, 2), (0, 3), (0, 4), (0, 5)]

The nodes were also added in the graph, but they currently don't have any attributes, as you can see below in the first 3 elements of the node list.

nodeL = py.list(nxG2.nodes.data);
listContent = nodeL(1:3)

listContent = 
  Python list with no properties.

    [(0, {}), (1, {}), (2, {})]

To add attributes, we need to use the set_node_attributes method. This method expects a nested Python dict. Here is how to create a dict in MATLAB.

myDict = py.dict(pyargs('key', 'value'))

myDict = 
  Python dict with no properties.

    {'key': 'value'}

The set_node_attributes method expects a nested dict. The keys of the outer dict are the nodes, and values are dict arrays of key-value pairs like this:

{0: {'club': 'Mr. Hi'}, 1: {'club': 'Officer'}}

Unfortunately, this won't work, because pyargs expects only a string or char value as the key.

>> py.dict(pyargs(0, py.dict(pyargs('club', 'Mr. Hi')))) Error using
pyargs Field names must be string scalars or character vectors.

Instead, we can create an empty dict, and add the inner dict from the tuple data, using 0-based indexing, with the update method like this:

attrsD = py.dict;
for ii = 1:length(nodeAttrs)
    attrD = py.dict(pyargs('club', G.Nodes.club(ii)));
    attrsD.update(py.tuple({{int8(ii - 1), attrD}}))
end

Then we can use the set_node_attributes to add attributes to the nodes.

py.networkx.set_node_attributes(nxG2, attrsD);
nodeL = py.list(nxG2.nodes.data);
listContent = nodeL(1:3)

listContent = 
  Python list with no properties.

    [(0, {'club': 'Mr. Hi'}), (1, {'club': 'Mr. Hi'}), (2, {'club': 'Mr. Hi'})]

Community Detection with NetworkX

NetworkX provides the greedy_modularity_communities method to find communities within a graph. Let's try this algorithm to see how well it can detect the factions!

Since this club split into two groups, we expect to see 2 communities.

communitiesL = py.networkx.algorithms.community.greedy_modularity_communities(nxG2);
myDataType = class(communitiesL)

myDataType =
    'py.list'

The returned Python list contains 3 elements. That means the algorithm detected 3 communities within this graph.

num_communitieis = length(communitiesL)

num_communitieis =
     3

The list contains a frozenset. A Python frozenset is the same as a Python set, except its elements are immutable. And a Python set is similar to a Python list, except all its elements are unique, whereas a list can contain the same element multiple times.

listContent = communitiesL{1}

listContent = 
  Python frozenset with no properties.

    frozenset({32, 33, 8, 14, 15, 18, 20, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31})

Let's convert it into nested cells.

communitiesC = cell(communitiesL);
communitiesC = cellfun(@(x) cell(py.list(x)), communitiesC, 'UniformOutput', false);
myDataType = class(communitiesC{1}{1})

myDataType =
    'py.int'

The inner most cell contain Python int values. Let's convert them to double.

for ii = 1:length(communitiesC)
    communitiesC{ii} = cellfun(@double, communitiesC{ii});
end
myDataType = class(communitiesC{1}(1))

myDataType =
    'double'

Since the nodes are 0-based indexed in Python, we need to change them to 1-based indexed in MATLAB.

communitiesC = cellfun(@(x) x + 1, communitiesC, 'UniformOutput', false);

Let's plot the communities within the graph.

tiledlayout(1,2)
nexttile
P1 = plot(G);
highlight(P1, group_hi,'NodeColor', '#D95319', 'EdgeColor', '#D95319')
title({'Zachary''s Karate Club','Orange represents Mr. Hi''s faction'})
nexttile
P2 = plot(G);
highlight(P2, communitiesC{1},'NodeColor', '#0072BD', 'EdgeColor', '#0072BD')
highlight(P2, communitiesC{2},'NodeColor', '#D95319', 'EdgeColor', '#D95319')
highlight(P2, communitiesC{3},'NodeColor', '#77AC30', 'EdgeColor', '#77AC30')
title({'Zachary''s Karate Club','Modularity-based Communities'})

If you compare these plots, you can see that the two communities on the right in orange and green, when combined, roughly overlap with Mr. Hi's faction.

We can also see that:

Community 1 represents the 'Officer' faction
Community 3 represents the devoted 'Mr. Hi' faction
Community 2 represents the people who had connections with both factions

Interestingly, Community 2 ultimately ended up siding with Mr. Hi's faction.

Let's see if there is any difference between the output of the algorithm and the actual faction.

diff_elements = setdiff(group_hi, [communitiesC{2} communitiesC{3}]);
diff_elements = [diff_elements setdiff([communitiesC{2} communitiesC{3}], group_hi)]

diff_elements =
     9    10

The community detection algorithm came very close to identifying the actual faction.

Streamlining the Code

Up to this point we have been examining what data type is returned in each step. If you already know the data types, you can combine many of these steps into a few lines of code.

To get the karate club data and create a MATLAB graph, you can just do this:

nxG = py.networkx.karate_club_graph();
edgeC = cellfun(@cell, cell(py.list(nxG.edges)), 'UniformOutput', false);
nodeC = cellfun(@cell, cell(py.list(nxG.nodes.data)), 'UniformOutput', false);
nodeAttrs = cellfun(@(x) struct(x{2}), nodeC);
nodeAttrs = arrayfun(@(x) string(x.club), nodeAttrs);
s = cellfun(@(x) double(x{1}), edgeC)' + 1;
t = cellfun(@(x) double(x{2}), edgeC)' + 1;
G = graph(s,t);
G.Nodes.club = nodeAttrs';

To create a Python graph from the MATLAB data, you can do this:

nxG2 = py.networkx.Graph();
edgeL = num2cell(int8(G.Edges.EndNodes) - 1);
nxG2.add_edges_from(py.list(num2cell(edgeL, 2)'));
attrsD = py.dict;
for ii = 1:length(G.Nodes.club)
    attrD = py.dict(pyargs('club', G.Nodes.club(ii)));
    attrsD.update(py.tuple({{int8(ii - 1), attrD}}))
end
py.networkx.set_node_attributes(nxG2, attrsD);

And to detect the communities, you can do this:

communitiesC = cell(py.networkx.algorithms.community.greedy_modularity_communities(nxG2));
communitiesC = cellfun(@(x) cell(py.list(x)), communitiesC, 'UniformOutput', false);
for ii = 1:length(communitiesC)
    communitiesC{ii} = cellfun(@double, communitiesC{ii});
end
communitiesC = cellfun(@(x) x + 1, communitiesC, 'UniformOutput', false);

Summary

In this example, we saw how we can use Python within MATLAB. It is fairly straight forward once you understand how the data type conversion works. Things to remember:

Python is 0-based indexed vs MATLAB is 1-based indexed
Python's default numeric data type is int whereas it's double for MATLAB
Instead of loops, convert Python data into suitable types of MATLAB arrays
Use cell arrays for Python list and tuple
Use struct arrays for Python dict

In this example, we used a Python library in our MATLAB workflow to get the data and detect communities. I could have coded everything in MATLAB, but it was easier to leverage existing Python code and I was able to complete my tasks within the familiar MATLAB environment where I can be most productive.

Are you a coding polyglot? Share how you use MATLAB and Python together here.

Published with MATLAB® R2019b