ATD Blog
Tue May 29 2012
On a conceptual basis, job keywords make sense to everyone. However, when you dig into the details of how computers identify keywords, many challenges arise. I am going to explain the logic to clear up the confusion and hopefully start an ongoing discussion about keywords. I will use laymen terms to remove the complexity of understanding computational linguistics.
Computers Look for Different Keywords than Humans
As for the job keywords, lots of advice suggests focusing on industry keywords and functional keywords. While this advice is still valid, a new era has emerged in which computers look for keywords in a résumé before a hiring manager reviews a résumé.
Unlike humans, computers do not try to decipher meaning from individual words (e.g., does “manage” mean managing people or managing products). Instead, they apply complex mathematical formulas to determine the words and phrases that can precisely and compactly represent the content of the job description. Then, these phrases are searched for in the résumé. Based on the search, a complex ranking system is used to compare one candidate’s résumé to another’s. Complex? Yes!
I think a good example of how a computer identifies keywords is to use a sample job description. Let’s focus on just three lines of the Requirements section:
Requirements:
Bachelors degree in a relevant scientific discipline or equivalent.
At least 2 years of relevant experience as a CRA in the biotech / pharmaceutical industry.
3+ years CRA experience is preferred; Knowledge of GCP and ICH guidelines.
Computers identify keywords by determining how often phrases are used among other job descriptions, then the computer looks for the phrases in a candidate’s résumé and ranks the candidate based on the findings. To understand the process, we can break it into four parts.
1. Identifying the Keyword Phrases
The computers first begin by analyzing the job description to identify all the keyword phrases in the job description. What is a “phrase?” A “phrase” is one or more words in succession from the job description. Phrases can be single words like “CRA,” from our example, or longer strings of words like “Bachelors degree in a relevant scientific discipline.”
2. Determine Frequency of the phrases
With the phrases identified, next, the computer identifies how many times that phrase is found in all the other job descriptions. The more it is found, a higher score is assigned to the phrase. The less it is found, the lower the score. For instance, let’s look at the first two lines of our sample job description.
Requirements:
Bachelors degree in a relevant scientific discipline or equivalent.
At least 2 years of relevant experience as a CRA in the biotech / pharmaceutical industry.
If we had 10 other job descriptions and counted the frequency of the phrases, we might end up with something like this:
Phrase | Frequency |
Bachelors | 10 |
Bachelors degree | 10 |
Bachelors degree in | 10 |
Bachelors degree in a | 8 |
relevant | 10 |
relevant scientific | 7 |
relevant scientific discipline | 5 |
relevant scientific discipline or equivalent | 4 |
At | 10 |
At least | 8 |
At least 2 | 3 |
At least 2 years | 3 |
At least 2 years of | 3 |
At least 2 years of | 3 |
relevant | 10 |
relevant experience | 10 |
relevant experience as | 10 |
relevant experience as a | 10 |
CRA | 2 |
CRA in | 1 |
CRA in the | 1 |
CRA in the biotech | 1 |
CRA in the biotech pharmaceutical | 1 |
|
What the computer does is start with a word and counts its frequency (i.e., how many times was it found in all job descriptions). Then, it will add on additional words and get a count.
Once the frequency is determined, then the computer decides what are the keywords for a job. With the information we have above, we could claim the words that show up less frequently are the most important phrases for this job, and the words that show up more frequently are too generic. For instance, if a phrase appears in 10 job descriptions, we may think this is not important (this is the case with “Bachelors degree”). However, the phrase “CRA in the biotech pharmaceutical” is very unique to this job. Therefore, we could assert “any phrase with a count of three or less is a keyword phrase.”
But these don’t look like keywords. The reason it appears to be incomplete phrases or gibberish is due to the added words in the phrase that make it less frequent. The computer is not looking for grammatical or commonly used phrases. For example, you may believe “2 years of experience” is the keyword but a computer may say “At least 2 years of” is the keyword, because “2 years of experience” shows up in too many job descriptions.
3. Searching Résumés for Keyword Phrases
Once the computer has a set of keyword phrases, next it searches a résumé for the keywords. If it finds a match or partial match, it will give it a score. Let’s use “at least 2 years experience” as the keyword phrase. If the résumé had “I have more than 2 years experience in…”, we would get a partial match with “2 years experience” being the overlap. Changing tense of a word and/or adding or removing plurality or possession will result in getting a partial match. Partial matches are not bad. It is unlikely any résumé will match the job description exactly without copying it word for word. Therefore, the goal is to eliminate any missing keywords and fill your résumé with matched and partially matched keywords.
4. Assigning a Résumé Rank
Once the computer has a list of all the matched and partially matched keywords, the computer assigns a rank or value. The rank is weighted based on the matches and the frequency of the keyword phrase. A keyword phrase that is less frequently found will get a higher weight than a keyword phrase that is more frequently found. An exact match will get a higher weight than a partial match. Within the partial match, the closer to the exact phrase you can get, the higher the rank. The computer takes all of these into account and assigns a weighting.
For every résumé that comes in, a rating can be assigned.
Does it work?
Using our example, let’s do a simple test to see if the process works. Let’s assume we get hundreds of résumés. If 10 résumés have “CRA” or “CRA in the biotech” versus 100 résumés that have “Bachelor’s Degree,” a hiring manager could quickly narrow the applicant list to just 10. While the hiring manager may miss out on a strong candidate who does not have this term, they do avoid having to read through hundreds of résumés.
There are plenty of arguments on why this may not result in the best hiring decisions, but in today’s economy where employees are required to do more with less, these systems are here to stay.
You've Reached ATD Member-only Content
Become an ATD member to continue
Already a member?Sign In