Thomson Scientific has posted a response (1) to our editorial on the reliability of their impact factor data (2). In it, they claim that our interpretation of the communication between our office and their Research Services Group was “misleading and inaccurate.” We have already published some excerpts from these communications in our previous editorial. For propriety's sake, however, we have refrained from publishing internal Thomson Scientific e-mails, sent to us accidentally, which substantiate our claim that they could not provide us with the original data underlying the published 2006 impact factor calculations.
Although Thomson Scientific's assertion that they do not have two separate databases may be correct, it is clear from their response that different groups within the corporation apply different filters to the data in their database, one of which removes erroneous records. Why this filter is not used for the published impact factors is still unclear.
Impact factors are determined from a dataset produced by searching the Thomson Scientific database using specific parameters. As previously stated, our aim was to purchase that dataset for a few journals. Even if those results were for some reason not stored by Thomson Scientific, it is inconceivable to us that they cannot run the same search over the same database to produce the same dataset. The citation data for a given year should be static. In essence, Thomson Scientific is saying that they cannot repeat the experiment, which would be grounds for rejection of a manuscript submitted to any scientific journal.
Thomson Scientific argues that we did not inform them of the methodologies we would apply to the data when we purchased it. This is like asking someone who is buying a dictionary what words they intend to look up. In fact, our methodology was the same as theirs: a simple addition of the citation numbers divided by the number of citable articles.
We will not refute other points made by Thomson Scientific in their rebuttal, as others have already done so to some extent (see box). Instead we close this discussion with a plea to our fellow publishers to make their citation data available in a publicly accessible database, and thus free this important information from Thomson Scientific's (and other companies') proprietary stranglehold.
The text below was posted on the ELDnet-l listserv of the Engineering Libraries Division (ELD) of the American Society for Engineering Education (http://mailman1.u.washington.edu/mailman/listinfo/eldnet-l), January 3, 2008. Reprinted with permission of the author and the listserv monitor. These comments are the opinion of the author and do not necessarily reflect the position of Stanford University.
Having read the Thomson reply, it seems to me that they do not negate most of the charges against them. For example:
1) “The impact factor calculation contains citation values in the numerator for which there is no corresponding value in the denominator.” To which Thomson replies: “more than 98% of the citations in the numerator of the Impact Factor are to items considered ‘citable’ and counted in the denominator.” So… they agree with the point, but defend themselves by saying that the degree of misrepresentation is small??? (Combine this with issue 4 below, and the impact of the 2% error *that Thomson admits* might be much more significant than 2%!).
2) “Some publishers negotiate with Thomson Scientific to change these designations in their favor. The specifics of these negotiations are not available to the public, but one can't help but wonder what has occurred when a journal experiences a sudden jump in impact factor.” Thomson flatly deigns doing so, but goes on to say: “It is not uncommon for a publisher or editor to request a review of the indexing of their content and how past changes to that content could have affected the determination of ‘citable items.’ Thomson staff will analyze and review up to three years of content to arrive at a fully informed determination of the proper indexing. Any required changes are then applied–most often from the current year onward rather than retroactively.” This sounds like some of the rhetoric coming out of the presidential race to me.
3) “Citations to retracted articles are counted in the impact factor calculation. In a particularly egregious example, Woo Suk Hwang's stem cell papers in Science from 2004 and 2005, both subsequently retracted, have been cited a total of 419 times (as of November 20, 2007). We won't cite them again here to prevent the creation of even more citations to this work.” Thomson agrees that it does not adjust for such problems and claims it isn't a bug … it's a feature!
4) “Because the impact factor calculation is a mean, it can be badly skewed by a ‘blockbuster’ paper.” In a response that will certainly be included in the next edition of “How to Lie With Statistics,” Thomson basically admits that this is true, but again tries to pass it off as a virtue.
For me some of this is irreverent. Even Thomson admits that the “Impact Factor” is an imperfect instrument for reflecting global impact. My point is that even a PERFECT global impact factor might be a very poor indicator of the value of a title for a particular university or corporation. If one is using these data to determine which titles should be retained in a serials cut, great harm could be done to local programs which deviate from average. Since it is exactly these areas of specialization that tend to bring in the big bucks from grant and contract funding, these are exactly the kinds of selection errors that are the most harmful to the institutions we serve. When we build a collection our first obligation is to serve the researchers, faculty and students we represent. Let's be honest, the appeal in using Thomson's impact factors is that they are a quick and easy metric that have the appearance of being “scientific” since they are represented as numeric expressions. For me, the JCB article only fuels a fire that has been burning for a long time.
Dr. Robert Schwarzwalder, Associate University Librarian for the Science and Engineering Libraries, Stanford University
Cecil H. Green Library, 557 Escondido Mall, Room 102, Stanford, CA 94305, T. 650-723-5553, F. 650-725-4902, firstname.lastname@example.org