One of the biggest dilemmas developers face every day is which software libraries to use. Go with the hot new framework or the “boring” tried-and-tested one that’s been around for 10 years? One of the main things that make frameworks successful is their communities of users and contributors. While it can be easy to know how many people contribute to a project (especially if it’s open source), it’s pretty hard to know how many are actually using it. We decided to take a data-driven approach to answer these questions.
GitHub hosts more than a million projects today. Projects range from small utilities and test apps all the way to massive infrastructure projects with hundreds of contributors. As such, it provides a fairly diverse and up-to-date dataset to explore, one which is also indicative of the trends in closed-source and enterprise software.
We analyzed what are the top 100 commonly used components, grouping them into categories (e.g. Testing, DB , UI, etc..). It’s pretty interesting to see how these differ between the different Languages.
Here are a some notable findings and the top 10 libraries for each language (you can find the full list at the bottom of this post):
- SQL still dominates. While NoSQL databases are all the rage these days, relational databases (SQL) still dominate the Ruby world – Sqlite, postgreSQL, MySql are used in 25% of the projects, while Redis and mongo only appear in 3% of the projects.
- MongoDB is however still popular in Ruby with 185 entries, which is twice as much projects than in Java.
- In web development we see that while new frameworks have gained traction in the last few years (such as Sinatra with 570 entries), Ruby is still centered around Rails, with over 7,000 projects. For web servers, Thin (with 487 entries) is used by twice as many projects compared to Unicorn.
- Twitter has also made a big impact in Ruby with 3 libraries in the top 100 and 382 projects using them. While, that’s pretty big, it’s still not not quite as big as Google’s influence on Java as we’ll see in a second.
- Grunt is huge. The Grunt automation framework plays a very big role in JS development (especially for node.js) with 23% of of top 100 libraries plugging-in to it. Grunt seems to be filling the gap in the build, testing and deployment cycle in JS. This is handled externally from the project in languages such as Java by other prominent tools such as Maven or Jenkins.
- For server-side web development – the express framework for node.js is leading the chart with 631 entries.
- It’s Guava season – Google code has gone mainstream. Spring and Apache libraries are so prevalent they’re practically a part of the language, with over 25% of the top 100 libraries split fairly evenly between the two. Something a bit surprising is the prevalence of Google made libraries, such as GWT and Guava, in Java, with 7% of the top 100. Seems like there’s one more area in our life which Google has a big part in.
- It’s interesting to see that Hadoop is living up to its promise as the leading big data technology with 168 entries. To put in perspective, MySql, one of the most well-known and common SQL DBs, has 225 entries. Postgre SQL, another well-known relational DB, has 121.
- ElasticSearch, a new technology for searching across large data sets, is also doing quite well on GitHub with over a 100 projects using it.
Click here to see the complete top 100 libraries list.