Other things on this site...

MCLD
music
Evolutionary sound
Listen to Flat Four Internet Radio
Learn about
The Molecules of HIV
MCLD
software
Make Oddmusic!
Make oddmusic!

The H-index can not be relied on for recruitment

The H-index is one of my favourite publication statistics. It's really simple to define: a person's H-index is the biggest number H of publications which have been cited H times each. It's robust to outliers: if you've a million publications with no citations, or one publication with a million citations, this doesn't influence the outcome - it's the "core" of your H most cited publications that matter. This makes it quite a nice heuristic for the academic impact of a body of work. A common source of the H-index is Google Scholar, which automatically calculates it for each scholar who has an account, and influential academics with long publication records typically have a high H-index.

However, the H-index should not be used as a primary measure for evaluating academics, e.g. for recruitment or promotion.

Why?

The main reason is it's straightforward, in fact almost trivial, to manipulate your own H-index. You can make it artificially high.

Google Scholar doesn't exclude self-citations from its counting. It even counts self-citations in preprints, so the citations might not even be peer-reviewed. You could chuck a handful of hastily-written preprints into arXiv just before you apply for a job. (Should Google exclude self-citations? Yes, in my opinion: it's trivially easy given that they have groundtruth of which academic "owns" which paper. However, that wouldn't remove the vulnerability, because pairs of authors could go one level beyond and conspire to cross-cite each other etc.) Self-citations are often valid things to do, but they're also often used by academics to promote their own previous papers, so it's a grey area.

Google Scholar often automatically adds papers to a person's profile, using text matching to guess if the author matches. I've seen real examples in which an academic's profile included extremely highly-cited papers... that were not by them. In fact they were from completely different research topics! Google's text-matching isn't perfect, and like most text-matching it often has a problem with working out which names are actually the same author.

You can further manipulate your H-index, by choosing how to publish: you can divide research outputs into multiple smaller publications rather than single integrated papers.

Or you can do that after the fact, by tweaking your options in Google about whether two particular publications should be merged into one record or not. (Google has this option, since it often picks up two slightly-different versions of the same publication.)

Most of the vulnerabilities I've listed relate to Google's chosen way of implementing the H-score; however, at least some of them will apply however it is counted.

The H-index is a heuristic. It's OK to look at it as a quick superficial statistic, or even to use it as part of a general assessment making use of other stats and other evidence. But I'm increasingly seeing academic job adverts that say "please submit your Google Scholar H-index". This should not be done: it sends a public signal that this number is considered potentially decisive for recruitment (which it shouldn't be), creating a strong incentive to game the value. It also enforces a new monopoly position for a private company, demanding that academics create Google accounts in order to be eligible for a job. Academia is too important to have single points of failure centred on single companies (witness the recent debates around Elsevier!).

When trying to sift a large pile of applications, people like to have simple heuristics to help them make a start. That's understandable. It's naive to think that one's opinion isn't influenced by the first-pass heuristics - and so it's vital that you use heuristics that aren't so trivially gameable.

| science | Permalink

social