Originally posted here on the Jobr blog . The blog uses data from Jobr’s jobs postings.
One great thing about working at Jobr is the amount of data that we have to play around with that enables us to provide a better product and experience for our end users. Data science specifically gives us good insights on how to deploy more relevant recommendations and match job postings to users on Jobr.
As Jobr grows, we’ve found ourselves curiously solving more and more interesting problems. One project we started doing was finding how much skills are worth in comparison to salary offered. Because our jobs stretch across many industries and verticals, we have a very diversified dataset to test what kinds of skills contribute the most to higher incomes.
The data nitty gritty
We took over 100,000 jobs with specified salaries attached to them and extracted the top 500 skills and relevant keywords and used them as features with TFIDF weights. TFIDF stands for term frequency-inverse document frequency which is a numerical statistic that is intended to reflect how important a word is to a document in a collection of documents. Therefore a skill like CPA will have more weight than a common keyword like marketing. We also included position titles and years of experience to normalize the model.
After fitting it, we extracted the coefficients and plotted them out on a bar graph. The intercept of the model was at 75K base so each skill’s coefficient would add or subtract to that annual salary amount. We plotted the top keyword coefficients below
Unsurprisingly the keywords and skills reference mostly medical fields, banking, and finance with keywords like acquisitions, trading, inpatient, and mba. Certainly hiring an MBA candidate will make the job more expensive because of the high salaries that people out of business school attract. Another thing to note is that all of the keywords are more generalized keywords and not real hard skills which goes to show how these keywords show up most prominently when looking at management roles where there are always going to be high salaries.
Tech skills that rule them all
Looking into tech skills more specifically, we plotted the most valued skills that related to tech and programming.
The keywords of python, ETL, and analytics may be contributing to the high value of data engineers and scientists that also have been popularized in the past few years.
Stay away from these skills
An unusual and counterintuitive metric to look at also would be the skills that could possibly detract from a job’s salary. Because the model assumes an initial neutral value of 75K salary, there must be skills that contribute to negative values on overall salary because many jobs pay less than 75K a year.
A clear difference can be seen of the management like keywords in the first graph and the more entry level keywords in the negative facing bar graph above.
Microsoft Excel surprisingly has a very high negative impact on job’s salary. But while everyone working in an office probably has to know Microsoft Excel, it might be noted that the jobs that set it as requirements in their description are generally entry level office workers that may work on data entry tasks or simple formulas.
Software like Microsoft Excel which people use the interface for show up in the graph above as well like Adobe, Quickbooks, Autocad, and Facebook. Along with those skills there are also more entry level non-office keywords like cleaning, bookkeeping, underwriting, organizing, etc… that demonstrate more manual labor like jobs.
A quick look at some of the lowest coefficient stemmed position titles also show very junior type of adjectives like associate or analyst while the lowest coefficient seems to be restaurant.