Using Tags on Stack Overflow Careers to Analyze Demand for Software Development Skills (With Cool Interactive Chart!)
Ever wonder which programming languages are most in demand in the current software development market? I do, and it occurred to me that Stack Overflow Careers would be a great source of information since it has a relatively large number of job listings (generally around 1500) and each one is tagged with relevant keywords, often including the programming language.
Since Stack Overflow Careers does not appear to have an API, I wrote a simple script to scrape the listings and extract the tags. As a bonus, this yielded data on a broad range of software development skills, not just programming languages. I dumped the raw data into a Google Spreadsheet (you can see it in the "Raw Data" tab). I then did a bit of manual and automated preprocessing to massage the data into a convenient form in the "Processed Data" tab:
- Merged tags that were obviously the same (e.g. combining "angular.js" and "angularjs" into a single row).
- Eliminated all tags with only one occurrence since the data is not that interesting and there is a lot of noise.
- Classified the tags into the totally arbitrary categories listed in the legend in the upper right. You will almost certainly disagree with my choice of categories and with my classification of specific tags. Nonetheless I thought that for the interactive chart (see below) it would be useful to make some high-level distinction between e.g. a programming language and an operating system. If you feel like I totally messed this up then feel free to flame me in the comments (or just ignore the categories).
- Calculated confidence intervals for the occurrences using the Adjusted Wald method, since the dataset is still relatively small. There is a 95% chance that the "real" value for a given tag falls between the lower and upper bound. (More precisely, the actual percentage in the population has a 95% chance of being between the lower % and upper % values, and this is then extrapolated back into the size of the actual dataset).
My awesome colleagues Tomas and Igor were kind enough to whip up this simple interactive chart using D3. You can see the occurrence values for each tag with confidence intervals. Use the checkboxes to choose which categories you want to view (assuming you don't hate the way I classified them). Use the slider to filter the results to a specific range of occurrences (e.g. only tags with more than 100 occurrences). You can also zoom into a specific section of the chart by selecting it with the mouse. Fun, isn't it?
Of course, this is a relatively small sample size, and there is no evidence that Stack Overflow Careers is representative of the broader job market other than common sense (and we know how unreliable that is). It may be that the vast majority of employers are looking for developers to code in Brainfuck for BeOS, but for whatever reason they choose to list their job postings elsewhere. With that caveat, here are some random observations:
Java is still king
Did you know that people still code in Java? Amazing, eh? When you do web development, you tend to think of Java as just one of a number of plausible server-side languages (and by no means the most popular). In reality it is the most sought-after programming language by a lot, probably because its usage is the aggregate of a zillion pre-web systems that are still actively developed (in the finance industry, for example) as well as a fair number of web backends. And it certainly doesn't hurt that it is used for Android development as well.
Angular.js is in a league of its own
Python, PHP, C and C++ are going strong
No surprise here. These languages aren't in the same league as Java(Script), but they are significantly ahead of the rest of the pack.
Android and iOS are neck and neck
Android has a few more occurrences but the confidence intervals overlap almost perfectly. It seems inevitable that one or the other will pull ahead, especially since the trade-offs of developing for one or the other are so stark. Without more historical data it is impossible to say where the momentum is, however, so we'll have to wait a few months and run the analysis again.
Where the #%!@$ is Windows?
Linux has 102 occurrences, Windows has 8. This is such as stark difference that I'm not sure whether employers truly aren't looking for Windows devs, or it's such a standard skill that it is assumed if no other operating system is specified. There might be a little bit of both at play here but it is hard to shake the feeling that there is trouble ahead for Microsoft.