O’Reilly Strata Survey Looks a Salaries for Big Data Devs

A recent survey by O'Reilly found devs who are versed in open source or commercial big data tools and languages can make a pretty penny. Devs can expect to earn up to $150,000 annually – the biggest earners are those who can work with multiple “‘open” tools. IDN speaks with O’Reilly survey co-author John King.

Tags: analytics, Apache, Cloudera, developer, Hadoop, Hive, MapR, MapReduce, R, Strata, survey,

A recent survey by O'Reilly found devs who are versed in open source or commercial big data tools and languages can make a pretty penny. Devs can expect to earn up to $150,000 annually – the biggest earners are those who can work with multiple “open” tools. 

The survey, entitled 2013 Data Science Salary Survey: Tools, Trends, What Pays (and What Doesn’t) for Data Professionals, was co-authored by John King. In researching big data dev salaries, King first uncovered and defined two main groups or “clusters” of tooling skills. 
 
One cluster is comprised of next-gen “open” big data technologies and languages. The survey called this the Hadoop (“open”) cluster.

The other cluster is comprised of a blend of commercial and more traditional data tools, such as Excel, Windows and SQL. The survey called this the Excel/SQL (“commercial”) cluster.

Devs with expertise for either cluster of Big Data / analytics will do well, King found. But those devs who work with “‘open”’ tools and languages will do up to 40%-50% better, according to King’s analysis of the survey results. “The direct finding was that people using from open-source or script-based tools, their salaries are higher on average, than those in the other group. It’s not double, but it is a significant difference,” he said.

One of the survey’s written findings put it just this plainly:

“Respondents selecting tools from the open source cluster had higher salaries than respondents selecting commercial tools. For example, respondents who selected 6 of the 19 open source tools had a median salary of $130k, while those using 5 of the 13 commercial cluster tools earned a median salary of $90K,” according to one of the study’s main conclusions.

Another interesting finding is that devs don’t need to know ALL tools in a cluster to be highly valued. ”No respondent reported using all tools in either cluster, but many gravitated toward one or the other,” King added.
 
The survey also spelled out with dramatic numbers the delta between those devs who can work with tools in the “open” Hadoop cluster, versus those who stay in the “commercial” Excel/SQL cluster. “Median base salary generally rises with the number of tools used from the [open] Hadoop cluster, from $85k for those who do not use any such tools to $125k for those who use at least six,” the survey said.

Hadoop (or ‘open source’ Cluster tools)

 

Excel / SQL (or ‘commercial’) Cluster Tools

Amazon EMR

Apache Hadoop
Cassandra
Cloudera
D3
Graph Processing
Hbase
Hive
IBM SystemML
Java
LIBSVM
Linux
Mahout
MapR
MongoDB
Nimble
Networks/Social
Pentaho
Pig
Python
R

 

BusinessObjects
Cognos
IBM DB2
IBM Netezza
Microsoft Excel Microsoft SQL Server
Microsoft Windows
Oracle RDB
SAS
SQL Visual Basic/VBA
Tableau
Teradata

Not surprisingly, the more tools a dev knew the more he could expect to get paid. “Salaries positively correlated with the number of tools used by respondents. The average respondent selected 10 tools and had a median income of $100k; those using 15 or more tools had a median salary of $130k,” the survey found.

 

The survey described how this synergy of tools clusters impacts skills and salaries this way:

Using a wider variety of tools – programming languages, visualization tools, and relational database/Hadoop platforms – correlates with higher salary.

Using more tools tailored to working with big data, such as MapR, Cassandra, Hive, MongoDB, Apache Hadoop, and Cloudera, also correlates with higher salary.

So, why the big salary gap between big data devs who know “open” big data technologies and those who don’t? While the survey did not directly ask that question, King thinks he has discovered some of the reasons.

  1. Once a dev picks which tools cluster he will specialize in, he or she will most likely stay in that cluster when it comes to learning new and other tools. “We found the more you used multiple tools from one of these clusters, the less likely you were to use tools from the other cluster. We’re not saying that one [cluster] is superior to the other, it’s just the way the statistics broke down,” King said. 
  2. Devs that focus tools from the Hadoop (open) cluster may earn more money because learning one open-source tool or language may more naturally lead to the need to learn others – and the Hadoop cluster has more tools and languages in it.


“I thought it was interesting that people were using so many open-source tools. You have all these tools under the Apache umbrella [and] there is a real considered effort to make them compatible with others and link together,” King said. For instance, a dev in the “open” cluster may start with Python, but because of the interrelationships of Apache technologies the dev can end up acquiring a whole set of new skills from that simple start, he added.

The 2013 Data Science Salary Survey: Tools, Trends, What Pays (and What Doesn’t) for Data Professionals was conducted during O’Reilly’s Strata Conference: Making Data Work in Santa Clara, Calif. and Strata + Hadoop World in New York. Respondents included attendees from 37 US states and 33 countries.

The 2013 Data Science Salary Survey is here.




back