As an Associate Fellow at the National Library of Medicine, I learned about the intellectual infrastucture that goes into indexing PubMed. Coders (indexers) label the key concepts of every article "indexed for MEDLINE." This is the backbone of how people search for articles, even if they do keyword searches and do not search for vocabulary terms directly.
Indexers choose from a controlled vocabulary maintained by the Medical Subject Headings (MeSH) section. That vocabulary has grown considerably to accommodate changes in health care, but any term is added only after careful consideration. MeSH is both conservative and expandable. An indexer can suggest terms to add to MeSH, but this is taken under advisement and not a guarantee.
At the time I focused on the competing interests between the indexers and vocabulary maintainers--the indexers desired very expansive terminology, while the MeSH team felt they had to guard the integrity of the vocabulary. I am sure this tension still exists. But today what interests me is what this intellectual endeavor symbolizes. All this bother is because the peer-reviewed scientific literature is deemed to be especially valuable, and thus worthy of discovery tools that are more sophisticated than a Google search.
In Michael Clarke's brilliant post about why scientific communication has not yet been disrupted, he points to four functions of the scholarly communications system: dissemination of a new idea; registration of that idea in the scholarly record; validation of that idea via peer review; and filtration of those ideas into recognizable buckets (journals), so that the research process is manageable.
Within Clarke's framework, NLM's indexers and MeSH section provide the tools for registraton of new ideas.
Clarke's essential insight is that the Internet only solves the dissemination problem, which is why scientific publishing has yet to change fundamentally. Just posting something online does not register it in the same way as having a citation in PubMed. Posting without peer review does not offer validation, and proliferating online posts make the filtration of ideas ever more difficult.
Changing scientific publishing at the root will require something more than the Web alone. First we need to get beyond the metaphor of the "paper" as the delivery mechanism for a piece of research, as this is tied to pre-Web times. Events like this year's "Beyond the PDF" conference are a good place to start. And we need ways for scholars to utilize the unique capabilities of the Web to conduct and report research. Peter Sefton's innovative work on scholarly HTML looks to be a way forward there.
As to peer review, NYU's innovation regarding open peer review is an exciting experiment. It is time for the lid of secrecy to be removed from the often suspect peer review process, making it more transparent.
In a world of open peer review and scholarly HTML that has gone beyond the PDF, the dissemination function of scholarly communication will evolve and the validation function will remain intact. That leaves filtration and registration.
Filtration first needs to be decoupled from the constraining idea of the journal, which is no mean feat. But remember--in this universe we've gone beyond the PDF and are utilizing the native capabilities of the Web. That should mean greater focus on ideas and their merits, and less on the container in which those ideas appear. Built-in filtration would be a natural byproduct of this evolution.
That leaves registration. Here is where librarians come in. We should build an index for born digital, scholarly HTML that is every bit as robust and sophisticated as the one we have built for journal articles. Google won't cut it in this case, because it yields too much noise. We shouldn't wait for someone else to build this index or rely upon market demand to make it happen. Instead we should take the Steve Jobs approach and build something people didn't even know they wanted.
Recent Comments