Fastest Growing New Languages on Github are R, Rust and TypeScript (and Swift)

While researching TypeScript’s popularity I ran across a post by Adam Bard listing the most popular languages on Github (as of August 30, 2013). Adam used the Google BigQuery interface to mine Github’s repository statistics.

What really interested me was not absolute popularity but which languages are gaining adoption. So I decided to use the same approach to measure growth in language popularity, by comparing statistics for two different time periods. I used exactly the same query as Adam and ran it for the first half of 2013 (January 1st through June 30th) and then for the first half of 2014 (more details about the exact methodology at the end of this post).

Results

Based on this analysis, the twenty fastest growing languages on Github in the past year are:

At the risk of jeopardizing my (non-existent) reputation as a programming language guru, I’ll admit that several of these are unfamiliar to me. Eliminating languages with less than 1000 repos to weed out the truly obscure ones yields this revised ranking:

We are assuming that growth in Github repository count serves as a proxy for increasing popularity, but it seems unlikely that Pascal, CSS and TeX are experiencing a sudden renaissance. Some proportion of this change is due to increasing use of Github itself, and this effect is probably more marked for older, more established languages that are only now moving onto Github. If we focus on languages that have started to attract attention more recently, the biggest winners over the past year appear to be R, Rust and TypeScript.

Random thoughts

What the hell is R?

The fastest growing newish language is one that was unfamiliar to me. According to Wikipedia, R is “a free software programming language and software environment for statistical computing and graphics.” Most of the developers around the office said they had heard of it but never used it. This is a great illustration of how specialized languages can gain traction without making much of an impact on the broader developer community.

Getting Rusty

Of the newer languages with C-like syntax, both Rust and Go are gaining adoption. Go has a headstart, but a lot of the commentary I’ve seen suggests that Rust is a better language. This is supported by its impressive 220% annual growth rate on Github.

Building a better JavaScript

Two transpile-to-JavaScript languages made it onto the list: TypeScript and CoffeeScript. Since JavaScript is the only language that runs in the browser, a lot of developers are forced to use it. But that doesn’t mean we have to like it. While CoffeeScript is still ahead, TypeScript has the advantage of strong typing (something many developers feel passionate about) in addition to a prettier syntax than JavaScript. If it keeps up its 100% year-on-year growth, it may catch up soon.

Dys-functional

According to an old saw, everyone always talks about the weather but no one ever does anything about it. The same could be said about functional languages. Programming geeks love them and insist that they lead to better quality code. But they are yet to break into mainstream usage, and not a single functional language figures in our top-20 list (although R and Rust have some characteristics of functional languages).

Swift kick

The language with the highest growth of all didn’t even show up on the list because it had no repositories at all in the first half of 2013. Only a few months after it was publicly announced, Swift already had nearly 2000 repos. While it is unlikely to keep up its infinite annual growth rate for long, it is a safe bet that Swift is destined to be very popular indeed.

Methodology

The data for 2013 and 2014 from BigQuery was imported into two CSV files and merged them into a single consolidated file using Bash:

$ cat results-20140723-094327.csv | sort -t , -k 1,1 > results1.csv
$ cat results-20140723-094423.csv | sort -t , -k 1,1 > results2.csv
$ join -o '1.1,2.1,1.2,2.2' -a 1 -a 2 -t, results1.csv results2.csv | awk -F ',' '{ if ($1) printf $1; else printf $2; print "," $3 "," $4 }'

The first two commands sort the CSV files by language name (the options -t , and -k 1,1 are needed to ensure that only the language name and not the comma delimiter or subsequent text is used for sorting). The join command takes the sorted output and merges it into a single consolidated file with the format:

Language1,Language2,RepoCount1,RepoCount2

If the language is present in both datasets then Language1 and Language2 are identical. If it isn’t, then one of them is empty. Either way we really want to merge these into one field, which is what the awk command does. (A colleague suggested using sed -r 's/^([^,]*),\1?/\1/', but I decided that awk—or pretty much anything—is easier to read and understand.)

I then imported the entire dataset into Google Spreadsheet. The “2014 Projected” column is the 2013 value increased by the overall growth rate in Github repository count for the top 100 languages. This is used as a baseline to compare the actual 2014 figure and calculate the growth rate, since it is most interesting to measure how fast a language is gaining adoption relative to the growth of Github itself.

Matthew Gertner

Matthew Gertner