Why should you share code?
Mike Croucher
on 26 Jun 2025
In a discussion on LInkedin about my recent blog post, Do these 3 things to increase the reach of your open source MATLAB toolbox, I was asked by "Could you elaborate on why someone might consider opening/sharing their code? Thinking of early-career researchers, what might be in it for them?"
I'll give my answer here but I'm more interested in yours. How would you have answered this?
This is what I said:
- It's the right thing to do scientifically. A computational paper is essentially just an advertisement of what you've done. The code contains vital details about how you actually did it. A computational paper is incomplete without the code.
- If you only describe your algorithm in a paper, I have to implement it before I can apply your research to my problem. If you share the code, I can get started much more quickly using your research. This means I publish faster and since I am a good scientist, this means you get cited faster.
- Other scientists start off as users of your code. This leads to citations. Over time, some of them start deeply using and modifying your code, this leads to collaborators.
- Once you decide to share code via something like GitHub, you quickly start adopting good software engineering practices without initially realizing it. This improves the quality of your research since adopting good software practices makes it more likely that your software will give the right answers.
That last point can be a little hard to get your head around sometimes. Even if all you do is use file upload to get your stuff onto GitHub (i.e. you're not using git properly yet) you will start to naturally converge towards better code.
Why? Because as soon as you share code, you have to solve the problem of getting it to run on someone else's machine.
A trivial example concerns hard coded paths, for example. If you only ever run it on your machine then having a line like datafile = "C:\Mystuff\data.csv" always works but it breaks as soon as I try to run it on my machine. You'll look at this and think "Maybe there's a better way to do that".
Similarly dependencies. Your Path may be full of stuff that isn't present on my machine. As soon as I try to run your code, it won't work and you'll have to figure out how to handle dependencies in a reproducible way.
Documentation! An empty README.md is no good if you expect me to know how to use your code. You at least have to say something like "To run this, type runme(N) into MATLAB where N is the size of the model...etc etc)
The act of sharing, and dealing with the consequences, leads to much better code than if you keep it to yourself.
8 Comments
Time Descending@Mike Croucher, when you started this thread, back in June, it immediately caught my eye and my interest. I wanted to share my thoughts then, but I had just returned from a long trip, and I was about to embark on another long trip.
Anyway, I really appreciate your insights about this topic, and I also appreciate the thoughtful comments of your readers.
I would like to expand upon a couple of your points, based on my experience as a MathWorks software developer, as well as my experience in engineering algorithms research before that.
First, it has been my long experience a research paper based on a significant software component is unlikely to be reliably reproducible unless the code is provided. Specifically, two independent researchers who attempt to reproduce the paper’s results, using only the information in the published paper, are unlikely to end up seeing exactly the same results. And sometimes those differences can be significant.
Here’s an example from Image Processing Toolbox history. In the late 1990s and early 2000s, I and others on the development team found that all the publicly available implementations for the famous Canny edge detector produced significantly different results from each other. With further investigation, we concluded that the variations were caused by some vague wording and missing details in the 1986 IEEE TrPAMI paper that everyone was citing. We eventually worked out exactly what the toolbox implementation should do, but it took a lot of time, and we had to consult Canny’s Masters thesis for clarification about certain aspects of the TrPAMI paper.
I saw this sort of thing happen so often with published research that I eventually added something about it to my interviewing questions when hiring someone at the Ph.D. level.
Second, successfully sharing your code for use by others greatly improves the likelihood that you will be able to reproduce your own work, should it become necessary. Inexperienced software developers, and sometimes even experienced ones, tend to:
- Discount the need to reproduce one’s own work, sometimes years later, possibly in a different location and with different computer equipment
- Overlook or forget about computer and OS dependencies, or data dependencies, or certain pieces of code, that are necessary for complete and accurate reproduction
I was lucky to learn this lesson early in my career, while I was still in graduate school. When I was near the end of writing my doctoral thesis, I found that I needed to modify and rerun some of my experiments from a couple of years earlier. I managed it, but it was unexpectedly challenging, and I was a bit lucky. This is one of the reasons that I started using software version control so early, several years before software development became my career.
Thanks, Mike, for prompting this discussion.
In recent years, I have open-sourced a large number of computer vision (CV) algorithms, covering many small application areas (which can be partially seen from my personal homepage). I am very interested in using MATLAB for algorithm research and exploration, and subsequently I use Python/C++ for developing concrete practical applications. Sharing my work/codes gives me more opportunities to connect with peers, showcase my abilities, and find like-minded partners. Even more, I hope to attract employers who recognize my value! On the other hand, due to work confidentiality requirements, there are many excellent work/codes that I have not open-sourced.
However, considering the job market in China, age(35+) seems to be a stumbling block, and the employment environment is very unfriendly to countless people like me😢. I eagerly look forward to some positive changes.
If the MathWorks community platform could offer some online job listings, I would be very willing to join.
Since 2012, I have been engaged in research in the field of signal processing and time-frequency analysis. To date, I have published over 40 journal papers and shared more than 20 code packages, accumulating over 2,700 citations on Google Scholar. I have deep insights into this issue. Beyond the reasons you mentioned earlier, I believe there is another factor. I do believe that my work can advance research in fields such as mathematics, physics, engineering, chemistry, and medicine. Whenever I see my research being applied across different disciplines, it reaffirms the original aspiration that led me to become a scientist or engineer.
There is a segment of the population that looks for proof of skills. Having a non-trivial amount of open-source code is often counted as proof of skills (or at least of dedication), and can lead to employment offers.
On the other hand...
- industries that depend upon material being kept confidential may worry that you might not respect confidentiality, so you might lose some opportunities
- there might be a feeling that it might not be necessary to pay you, as you "give away your code for free"
Sign in to participate