Reproducible research in signal processing

Posted by Steve Eddins, May 5, 2009

3 views (last 30 days) | 0 Likes | 2 comments

Some blog readers may know that I've been giving a presentation called "Take Control of Your Code" for a couple of years now, since the 2006 International Conference on Image Processing in Atlanta. Since ICIP, I've given the presentation a few times a year at various universities. The intended audience is engineering and scientific researchers who have to write software in order to do their work. I discuss a few software tools and tips that I think will help the audience be more productive and efficient.

In the introduction to my talk I mention several typical software problems that bedevil researchers, including the issue of reproducibility: "Last year's results can't be reproduced because the software has changed." And then I try to convince my audience to try using a version control system for all of their software development projects.

But version control is just one aspect of the overall problem of research reproducibility. In fact there is a growing body of literature on reproducible research in computational scientific and engineering research: What it means, why it's important, and how to do it. There's a fascinating paper on the subject in this month's IEEE Signal Processing Magazine (one of my favorite technical publications):

Patrick Vandewalle, Jelena Kovačević, and Martin Vetterli, "Reproducible Research in Signal Processing," IEEE Signal Processing Magazine, vol. 37, no. 3, May 2009, pp. 37-47.

Here's how they introduce the topic:

Have you ever tried to reproduce the results presented in a research paper? For many of our current publications, this would unfortunately be a challenging task. For a computational algorithm, details such as the exact data set, initialization or termination procedures, and precise parameter values are often omitted in the publication for various reasons, such as a lack of space, a lack of self-discipline, or an apparent lack of interest to the readers, to name a few. This makes it difficult, if not impossible, for someone else to obtain the same results. In our experience, it is often even worse as even we are not always able to reproduce our own experiments, making it difficult to answer questions from colleagues about details. Following are some examples of e-mails we have received:

“I just read your paper X. It is very completely described, however I am confused by Y. Could you provide the implementation code to me for reference if possible?”

“Hi! I am also working on a project related to X. I have implemented your algorithm but cannot get the same results as described in your paper. Which values should I use for parameters Y and Z?”

Similar frustrations arose from time to time in our labs when changes had to be made to a figure for a revision of a paper or reuse in another work, and we are sure we are not alone.

To address the problem, we have started making our research reproducible.

The authors go on to define reproducible research (including degrees of reproducibility), compare with research practices in other domains, and summarize the history of reproducible research. Then they present the results of a reproducibility study based on all the papers published in IEEE Transactions on Image Processing Processing during 2004. They finish with an extensive section on doing reproducible research, including case studies and discussions about archiving publications, web page lifetimes, software licensing, platform choices, and data availability.

The authors believe that doing reproducible research increases the impact of their work by increasing the number of citations and by increasing the use of their work in other research, in teaching, and in commercial applications. I haven't been involved research publications and teaching for fifteen years now, but I can definitely confirm the value of reproducible research practices in encouraging commercial application.