Bozhidar Bozhanov

About Bozhidar Bozhanov

Senior Java developer, one of the top stackoverflow users, fluent with Java and Java technology stacks - Spring, JPA, JavaEE. Founder and creator of Computoser and Welshare. Worked on Ericsson projects, Bulgarian e-government projects and large-scale online recruitment platforms.

The Low Quality of Scientific Code

Recently I’ve been trying to get a bit into music theory, machine learning, computational linguistics, so I ended up looking at libraries and tools written by the scientific community – examples include the Stanford Core NLP library, GATE, Weka, jMusic, and several more.

The general feeling is that scientific libraries have mostly bad code. I will not point fingers, but there are too many freshman mistakes – not considering thread-safety, cryptic, ugly and/or stringly-typed APIs, lack of type-safety, poorly named variables and methods, choosing bad/slow serialization formats, writing debug messages to System.err (or out), lack of documentation, lack of tests.

 
Thus using these libraries becomes time consuming and error prone. Every 10 minutes you see some horribly written code that you don’t have the time to fix. And it’s not just one or two things, that you would report in a normal open-source project – it’s an overall low level of quality. On the other hand these libraries have a lot of value, because the low-level algorithms will take even more time and especially know-how to implement, so just reusing them is obviously the right approach. Some libraries are even original research and so you just can’t write them yourself, without spending 3 years on a PhD thesis.

I cannot but mention Heartbleed here – OpenSSL is written by scientific people, and much has been written on topic that even OpenSSL does not meet modern software engineering standards.

But that’s only the surface. Scientists in general can’t write good code. They write code simply to achieve their immediate goal, and then either throw it away, or keep using it for themselves. They are not software engineers, and they don’t seem to be concerned with code quality, code coverage, API design. Not to mention scientific infrastructure, deployment on multiple servers, managing environment. These things are rarely done properly in the scientific community.

And that’s not only in computer science and related fields like computational linguistics – it’s everywhere, because every science now requires at least computer simulations. Biology, bioinformatics, astronomy, physics, chemistry, medicine, etc – almost every scientists has to write code. And they aren’t good at it.

And that’s OK – we are software engineers and we dedicate our time and effort to these things; they are scientists, and they have vast knowledge in their domain. Scientists use programming the way software engineers use public transport – just as a means to get to what they have to do. And scientists should not be distracted from their domain by becoming software engineers.

But the problem is still there. Not only there are bad libraries, but the code scientists write may yield wrong results, work slowly, or regularly crash, which directly slows down or even invisibly hampers their work.

For the libraries, we, software engineers can contribute, or companies using them can dedicate an engineer to improving the library. Refactor, cleanup, document, test. The authors of the libraries will be more than glad to have someone prettify their hairy code.

The other problem is tougher – science needs funding for dedicated software engineers, and they prefer to use that funding for actual scientists. And maybe that’s a better investment, maybe not. I can say for myself that I’ll be glad to join a research team and help with the software part, while at the same time gaining knowledge in the field. And that would be fascinating, and way more exciting than writing boring business software. Unfortunately that doesn’t happen too often now (I tried once, a couple of years ago, and got rejected, because I lacked formal education in biology).

Maybe software engineers can help in the world of science. But money is a factor.

 

Reference: The Low Quality of Scientific Code from our JCG partner Bozhidar Bozhanov at the Bozho’s tech blog blog.

Do you want to know how to develop your skillset to become a Java Rockstar?

Subscribe to our newsletter to start Rocking right now!

To get you started we give you two of our best selling eBooks for FREE!

JPA Mini Book

Learn how to leverage the power of JPA in order to create robust and flexible Java applications. With this Mini Book, you will get introduced to JPA and smoothly transition to more advanced concepts.

JVM Troubleshooting Guide

The Java virtual machine is really the foundation of any Java EE platform. Learn how to master it with this advanced guide!

Given email address is already subscribed, thank you!
Oops. Something went wrong. Please try again later.
Please provide a valid email address.
Thank you, your sign-up request was successful! Please check your e-mail inbox.
Please complete the CAPTCHA.
Please fill in the required fields.

4 Responses to "The Low Quality of Scientific Code"

  1. Andy Graham says:

    I’ve used a lot of scientific libraries and I’d have to agree that the code quality is not always as good as it could be. However, if we just criticise, we run the risk that some library of useful tools will go unshared because the author daren’t risk the ridicule.

    Maybe the way to go is to do a kind of Apache Commons thing, and take on the development of a library once it’s out there. In fact, a lot of the complicated maths I’ve done (including some of the stuff I used in my PhD), I learned by looking at someone else’s source code and trying to understand what it was doing. So the effort on improving libraries might work both ways…

  2. Daniel Painter says:

    Good article, I agree with many of your points.

    I also think that a part of the problem is the model of funding for many science research communities (grants) and how it doesn’t often lend itself to paying big money for well designed software. Hence some scientists end up writing it themselves, or cheap outsourcing, and the quality is reduced in many cases (as you said, they are scientists, not software engineers).

    For a senior project in University I was on a team that implemented a motion-detection tool for biologists, and this was something our client was really excited about, because she previously had no way to get this tool developed, as grant money was very tight in her department.

    We have since released it as open source on Github, but another part of the problem (as you alluded to) is that scientific software is usually quite complex, and also sometimes limited to select use cases, and that makes good open source contributors sometimes hard to come by, and affects the features of the product as well as the quality.

  3. Leo (@shikida) says:

    this is so true.

  4. David says:

    Are you only referring to open source software? If so, what quality requirement do you expect, and would you be willing to contribute then yourself to open source developments?

Leave a Reply


five × 1 =



Java Code Geeks and all content copyright © 2010-2014, Exelixis Media Ltd | Terms of Use | Privacy Policy
All trademarks and registered trademarks appearing on Java Code Geeks are the property of their respective owners.
Java is a trademark or registered trademark of Oracle Corporation in the United States and other countries.
Java Code Geeks is not connected to Oracle Corporation and is not sponsored by Oracle Corporation.
Do you want to know how to develop your skillset and become a ...
Java Rockstar?

Subscribe to our newsletter to start Rocking right now!

To get you started we give you two of our best selling eBooks for FREE!

Get ready to Rock!
You can download the complementary eBooks using the links below:
Close