On February 3, 2005, the National Institutes of Health (NIH) announced its policy on “enhancing public access to archived publications resulting from NIH-funded research.” Through this policy, the NIH requests that publications resulting from NIH-funded research be deposited by the authors in an archive at the National Library of Medicine (NLM). Authors can elect to have their publications released to the public immediately or up to 12 months after publication.
There are three stated reasons for this policy:
(1) to provide public access to the results of NIH-funded research.
(2) to create an archive of NIH-funded research.
(3) to make the full text of that archive searchable.
The NIH policy is in part a response to the refusal of commercial publishers to release their archival content from behind subscription controls, denying the public access to the results of research that they funded. At the Journal of Cell Biology, we have tried to balance our obligation to the public for funding the research we publish with our need to recoup the costs of peer review and journal production. To do this, we wait six months before releasing our content to the public for free, and we sell subscriptions to institutions and individuals who want to see that content in the first six months.
We have offered (through HighWire Press) to provide the NLM with all of the NIH grant information in our publications, which they can use to create records in their new database of NIH-funded publications. We have thus offered to automatically provide information to the NLM that they have only requested from authors, thereby enhancing the content of their database.
We are strongly in favor of the establishment of an archive of NIH-funded research; in fact, we would prefer to see a truly complete, electronic archive of all the scientific literature established, with limited access controls that allow publishers to recoup their costs. This is where we believe the NLM should direct their efforts.
To ensure that the final, published version of a paper is what is included in such an archive, we are willing to give the NLM all of our content as pdf files. This would prevent any problems of quality control related to html interpretation across platforms. We have been told by the NLM, however, that they want our complete html content, because they want to build a full-text search engine.
It is a useless duplication of effort for the NLM to host html (or SGML, or XML, or whatever comes next) simply for the purpose of full-text searching—Google and other search engines are currently indexing our full text, and already far more users arrive at our content via Google than via PubMed. If, despite the duplication, the NLM goes ahead and develops a full-text search engine, we have offered to allow them to index our text by crawling our website. In addition, the text content of pdf files can be indexed for searching, which is how full-text searches of our content from before 1997 are done on our website.
The current NIH policy is a misguided attempt to achieve laudable goals. We hope they can be convinced to reconsider how to achieve those goals.