Leading by Example: How lively examples help MATLAB community toolboxes grow their capabilities & communities
Today's post is from Vijay Iyer, a principal academic discipline manager (Neuroscience) who is also leading the MATLAB Community Toolbox program.
Over my past few years at MathWorks, I’ve been incubating the MATLAB Community Toolbox program. This is a first blog post to begin sharing stories about what we’re seeing and learning working with (over 50 to date) open-source toolboxes built by MATLAB users for other MATLAB users.
In my two decades as a research software engineer, various “driver” paradigms for software development have appeared on the horizon. Back in the aughts, I learned about domain-driven design. More recently in the teens, I’ve been learning about test-driven development.
While working with several open-source toolboxes to plan short collaborative development cycles, a pattern appeared that strikes me as a new ‘driver’ paradigm. Let’s call it example-driven development.
Not unlike test-driven development, the name itself sounds a bit like putting the cart before the horse. But in its first few years, the program often finds that open-source MATLAB codebases of all ages & sizes can benefit from making one or more rich examples a driving goal as part of their development cycles.
The beauty of putting examples front & center is that they can serve three purposes at once:
- Documentation, via the example’s narrative text which motivates & lightly explains the code
- Smoke Testing, i.e., exercising core functionalities of the toolbox
- Dissemination, via engaging images, structure, & figure outputs to ‘hook’ potential new users
Let’s put each of these aspects in context, while exploring a few early…err…examples of example-driven development.
Key ingredient for examples: computational notebooks
Credit where credit is due. Example-driven development is really an outgrowth of another recent trend: the growing use of computational notebooks, such as Jupyter notebooks and more recently MATLAB live scripts. Computational notebooks have all the elements (narrative text, code, equations, graphical figure outputs, interactive controls, and more) to author rich examples, i.e., examples that walk through a common use case or typical workflow including guidance about motivations overall and at each step.
While teachers and researchers have been the lead adopters (for teaching technical concepts and conveying code underlying published research, respectively), software tool builders have also embraced computational notebooks for software documentation. Our development community at MathWorks is no exception: examples based on live scripts have rapidly become a cornerstone for documentation across the MATLAB platform.
Left: Gallery of live script examples in the Wavelet Toolbox documentation Right: Web tutorial for EEGLAB, a MATLAB community toolbox (credit: Swartz Center for Computational Neuroscience, UCSD)
In the early days of our program, we’ve found most MATLAB community toolboxes built by researchers for researchers are so far lagging this trend. Rich examples authored as live scripts thus became an early focus for the program. But it’s worth noting that the core idea – software documentation that actively teaches its new and existing users – has long roots in the MATLAB ecosystem. For instance, the widely-used EEGLAB community toolbox has an extensive library of web tutorials recently implemented in web markdown syntax, which is well-suited for their predominantly app-based workflows. The program is proud to have helped support this upgrade of the EEGLAB web documentation.
Developing the recipe: example-driven development for research software
Many MATLAB community toolboxes, such as PPML and Homer3, offer their users command-line interfaces for scripting and programming. These two tools historically relied on other approaches such as script-based examples and wiki-based documentation, respectively, to teach their users. As the program worked with each to sponsor defined discrete projects around feature-enhancement goals their lead authors identified as important for their research community, we asked them to simultaneously add some live script examples in the process.
Rather than simply being an additional task, creating live script examples proved quite complementary:
- Homer3 supports users of functional near-infrared spectroscopy (fNIRS), a neuroimaging modality. Their project focused on adding support for the new SNIRF data standard. One of the new live script examples helped to test and document this new capability.
- PPML applies electromagnetic (EM) modeling for a class of layered photonic nanostructures. Their development cycle focused on a new diagram to visualize parametrically designed nanostructures and improved reflectance plots. Their live scripts incorporated live controls enabling domain experts to assess the new visualizations across a range of input parameters.
Another community toolbox sponsored by the program was DeepInterpolation with MATLAB, a framework for denoising various modalities of raw neuroscience data using deep learning. The reference implementation for the Deep Interpolation principle is coded in Python and includes Jupyter notebook examples for different modalities and use cases. DeepInterpolation with MATLAB was developed from the ground up to make the principle readily available to MATLAB users. As part of this, live scripts analogous to the original’s Jupyter notebooks were a central requirement, enabling both scientific reproducibility (users can compare the results on the same sample data) and tailoring to the individual languages (e.g., the MATLAB version used the datastore workflow central to the Deep Learning Toolbox).
Live script examples created for Homer3 (left), PPML (top right), and DeepInterpolation with MATLAB (bottom right) as part of program development cycles
Taste testing the example: a ‘smoke testing’ tool
In the lingo of software testing methodology, examples like these which helped to verify key new capabilities being developed can be considered smoke tests. In contrast to unit testing, which comprehensively exercises the functions in a library with many small-scale test functions, smoke testing focuses on exercising core functionalities of a software package.
Alongside the program’s first projects, a new development team here began to interview several research software builders to better understand their MATLAB requirements. It quickly became clear to them that good community toolbox examples often are good smoke tests. From this insight, the Examples Driven Tester was born, which connects a library’s live script examples to the MATLAB unit testing framework under the hood. Tool builders can benefit from automated testing without becoming full-blown testing experts. This utility is freely available on GitHub, with early users and feedback most welcomed.
The example is served: Open in MATLAB Online & science gateways
Last but not least, centering rich examples for research software tools can aid with dissemination, i.e., attracting new users. A key enabler for this was first announced in this blog: Open in MATLAB Online from GitHub. This connector allows end users to run MATLAB code from a GitHub repository (where most research tools are hosted today) in their web browser, without installing MATLAB nor navigating GitHub source control. Authors can add an Open in MATLAB Online badge to their GitHub repository inviting curious browsers to quickly give their tool a try. Similarly, File Exchange highlights this fast web-based access for all GitHub repositories linked to File Exchange with a new Open in MATLAB Online button:
Open in MATLAB Online badge in a GitHub repository (left) and Open in MATLAB Online button in its linked File Exchange entry (right). Both buttons clone the latest repository code and open it in the browser-based MATLAB Online.
MATLAB Online comes in two versions: a basic version available to anyone worldwide and the full version (available to all academic users). Thus, for many community toolboxes, MATLAB Online enables authors to serve their examples to anyone worldwide.
Some toolboxes have multiple examples. For these cases, our program has begun recommending projects tabulate their top examples prominently on their GitHub README. For instance, the README for the DeepInterpolation with MATLAB shows individual Open in MATLAB Online links (▶️ buttons) for three examples applying denoising models to different kinds of neuroscience data:
Snippet from README file of the DeepInterpolation with MATLAB community toolbox showing individual view () and run (▶️) links for a library of lightweight live script examples corresponding to specific use cases appealing to different audiences
In this way, there are runnable examples tailored to the interests of each potential user! Note the examples are also viewable (the links) by all worldwide via a File Exchange rendering service for live scripts in GitHub repositories.
To show their tool in the best way, tool authors must take care that their (beautiful ) examples run through and generate the expected outputs at each release. In other words: testing, documentation, and dissemination are intertwined. That’s example-driven development.
Some MATLAB community toolboxes have additional more compute- or data-intensive examples, or may have several third-party software dependencies that are not (yet!) registered as MATLAB add-ons. In either of these cases, research software tools may turn to domain-focused compute environments, sometimes termed science gateways, as a place to point their users to runnable examples. More on this topic to come in future posts.
Whether it’s on MATLAB Online or beyond, we’re excited to start seeing the impact that rich and runnable examples will have to help software tool builders grow their user communities.
댓글
댓글을 남기려면 링크 를 클릭하여 MathWorks 계정에 로그인하거나 계정을 새로 만드십시오.