Software Development

What are the most popular libraries Java developers use? 2017 edition

What are the most popular libraries Java developers use? 2017 edition

It feels like only yesterday we were scraping data from GitHub to discover what are the top Java libraries of 2016, and all of a sudden another year has passed. This year, we’re kicking this data crunch up a notch and introducing Google BigQuery into the mix to retrieve the most accurate results.

For this year’s data crunch, we’ve changed the methodology a bit, thanks to Google BigQuery. First, we pulled the top 1,000 Java repositories from GitHub by stars. Now that we had the most popular Java projects on Github, we filtered out Android and focused only on 477 pure Java projects.

After filtering the projects, we counted the unique imports within each of them and summed it all together. A deeper walkthrough of the research process is available at the bottom of this post.

Without further adieu, it’s time to see who are the winners and bloomers of 2017 most popular Java libraries. Who will sit on the Java throne?

The top 20 Java libraries

The top 20 Java libraries

Keeping the same position from last year, JUnit is the most popular Java library in GitHub. It also holds the 2nd place with the extended JUnit abstract Runner class, and you can even find it in the 3rd place, with the older junit.framework.

Mockito, the open source testing framework is now the 4th most popular Java library. Rounding out the top 5 is slf4j, the logging facade for Java. Its popularity emphasizes developers’ dependance on logging and shows the low usage of the standard java.util.logging library. We’ve recently taken a deeper look in the most common logging habits among Java developers and published the research as an extensive eBook, you can check it out right here.

The rise of Hamcrest, a framework that assists writing tests within JUnit and jMock, is another sign that developers need better testing environments.

Creating a better debugging environment

Top positions for libraries aimed at producing better code stress out the importance of testing. It also brings up the fact that production errors are one of the biggest pains that developers have to face, and no wonder they try to avoid it as much as they can. The need to solve this pain is one of the main reasons that brought us to build OverOps.

Debugging in production consists of sifting through log files, trying to reproduce the variable state that caused the error. OverOps provides engineers with the exact variable state behind any exception, logged error or warning. It lets you see the complete source code and variable state across the entire call stack of the error. Even if it wasn’t printed to the log file.

We’d be happy to show you how it works, click here to schedule some time for us to meet.

Top trends and noticeable libraries

Within the top 20 libraries we can see a representation for the popular Google Guava libraries, more uses of the JUnit framework and an increased use of javax libraries. We can also see that the most popular JSON library is Jackson.

At #20 we can see a new name popping up that we didn’t notice on last year’s top 20: org.w3c.dom, which provides the interfaces for manipulating the DOM (Document Object Model). Also, taking a broader look at the top 100 list, we can see that Spring has a wide representation, with the following 8 libraries:

#57 – org.springframework.beans.factory.annotation
#60 – org.springframework.context
#65 – org.springframework.context.annotation
#66 – org.springframework.stereotype
#68 – org.springframework.util
#81 – org.springframework.test.context.junit4
#85 – org.springframework.beans.factory
#91 – org.springframework.web.bind.annotation

Another trend we were able to detect is the wide use of Apache libraries:

#16 – org.apache.commons.io
#22 – org.apache.http
#24 – org.apache.commons.lang
#25 – org.apache.http.impl.client
#30 – org.apache.http.client
#33 – org.apache.http.client.methods
#34 – org.apache.log4j
#35 – org.apache.commons.codec.binary
#45 – org.apache.commons.lang3
#53 – org.apache.http.entity
#61 – org.apache.http.util
#64 – org.apache.commons.logging
#75 – org.apache.http.message
#88 – org.apache.zookeeper
#95 – org.apache.hadoop.conf
#98 – org.apache.http.client.config
#100 – org.apache.http.client.utils

One of the notable changes in the chart is the rise of AssertJ, a library that provides a fluent interface for writing assertions. This year it climbed up and reached #50, which means that the most popular projects put a big emphasis on best practices, such as testing. At the bottom of the spreadsheet we can find the scripting API javax.script and org.apache.http.client.utils, a builder for URI instances.

Feel free to explore the full top 100 Java libraries list right here.

How did we do it?

As we mentioned at the beginning of the post, this year we used Google BigQuery to crunch data from GitHub. We’ve used GitHub’s API to pull the top 1,000 repositories, and extracted the Java libraries these repos use.

After filtering out Android, Arduino and deprecated repos, we were left with 259,885 Java source files. Then, we removed duplicate uses of the same library in the same repo, and ended up with 25,788 unique libraries.

How did we actually do it? With the kind help of Guy Castel from the OverOps R&D team, and some SQL queries. First, we wanted to create the top repositories table, called java_top_repos_filtered:

SELECT
  full_name
FROM
  java_top_repos_1000
WHERE NOT ((LOWER(full_name) CONTAINS 'android') OR
           (LOWER(full_name) CONTAINS 'arduino'))
      AND ((description IS null) OR
           (NOT ((LOWER(description) CONTAINS 'android') OR
                 (LOWER(description) CONTAINS 'arduino') OR
                 (LOWER(description) CONTAINS 'deprecated'))));

Now that we had the names of the top repositories, we pulled all of their content:

SELECT
  repo_name,
  content
FROM
  [bigquery-public-data:github_repos.contents] AS contents
INNER JOIN
(
  SELECT
    id,
    repo_name
  FROM
    [bigquery-public-data:github_repos.files] AS files
  INNER JOIN
    java_top_repos_filtered AS top_repos
  ON
    files.repo_name = top_repos.full_name
  WHERE
    path LIKE '%.java'
) AS files_filtered
ON
  contents.id = files_filtered.id;

After we had the source files for each project, we wanted to pull all of their unique import statements. In the following query, we extract the package name, and made sure it is counted just once per project:

SELECT
  package,
  COUNT(*) count
FROM
( //extract package name (exclude last point of data) and group with repo name (to count each package once per repo)
  SELECT
    REGEXP_EXTRACT(import_line, r' ([a-z0-9\._]*)\.') package,
    repo_name
  FROM
  ( //extract only 'import' code lines from *.java files
    SELECT
      SPLIT(content, '\n') import_line,
      repo_name
    FROM
      java_relevant_data
    HAVING
      LEFT(import_line, 6) = 'import'
  )
  GROUP BY
    package,
    repo_name
)
GROUP BY
  package
ORDER BY
  count DESC;

The final step was filtering the results again, making sure that there’s no Android, Arduino, deprecated or standard Java libraries that might have slipped through our query-cracks:

SELECT
  *
FROM
  java_top_package_count
WHERE
  NOT ((LEFT(package, 5) = 'java.') OR
       (LOWER(package) CONTAINS 'android'))
ORDER BY
  count DESC;

And there you have it, the top Java libraries of 2017.

Final thoughts

Using Google BigQuery paid off, and we got a much more verbose overlook of the libraries being used within the top GitHub projects.

The main conclusion is that most of the libraries who were popular in 2016 are still on top in 2017. The way we see it, it means that the developers, teams and/or companies behind these libraries are working hard at keeping them relevant and up-to-date.

It also means that if you’re planning on starting your own Java project, our spreadsheet could offer some good references to the libraries you should use.

Henn Idan

Henn works at OverOps, helping developers know when and why code breaks in production. She writes about Java, Scala and everything in between. Lover of gadgets, apps, technology and tea.
Subscribe
Notify of
guest

This site uses Akismet to reduce spam. Learn how your comment data is processed.

0 Comments
Inline Feedbacks
View all comments
Back to top button