Malamud’s General Index: Research Gist, No Slap On The Wrist

Tired of that unsettling feeling you get from looking for paywalled papers on that one site that shall not be named? Yeah, us too. But now there’s an alternative that should feel a little less illegal: this new index of the world’s research papers over on the Internet Archive.

It’s an index of words and short phrases (up to five words) culled from approximately 107 million research papers. The point is to make it easier for scientists to gain insights from papers that they might not otherwise have access to. The Index will also make it easier for computerized analysis of the world’s research. Call it a gist machine.

Technologist Carl Malamud created this index, which doesn’t contain the full text of any paper. Some of the researchers with early access to the Index said that it is quite helpful for text mining. The only real barrier to entry is that there is no web search portal for it — you have to download 5TB of compressed files and roll your own program. In addition to sentence fragments, the files contain 20 billion keywords and tables with the papers’ titles, authors, and DOI numbers which will help users locate the full paper if necessary.

Nature’s write-up makes a salient point: how could Malamud have made this index without access to all of those papers, paywalled and otherwise? Malamud admits that he had to get copies of all 107 million articles in order to build the thing, and that they are safe inside an undisclosed location somewhere in the US. And he released the files under Public Resource, a non-profit he founded in Sebastopol, CA. But we have to wonder how different this really is from say, the Google Books N-Gram Viewer, or Google Scholar. Is the difference that Google is big enough to say they’re big enough get away with it?

If this whole thing reminds you of another defender of free information, remember that you can (and should) remove the DRM from his e-book of collected writings.

Via r/technology

Sci-Hub: Breaking Down The Paywalls

There’s a battle going on in academia between the scientific journal publishing companies that have long served as the main platform for peer review and spreading information, and scientists themselves who just want to share and have access to the work of their fellows. arxiv.org launched the first salvo, allowing researchers in physics to self-publish their own papers, and has gained some traction in mathematics and computer science. The Public Library of Science journals focus on biology and medicine and offer peer review services. There are many others, and even the big firms have been forced to recognize the importance of open science publication.

But for many, that’s still not enough. The high prestige journals, and most past works, are stuck behind paywalls. Since 2011, Sci-Hub has taken science publishing open by force, illegally obtaining papers and publishing them in violation of copyright, but at the same time facilitating scientific research and providing researchers in poorer countries with access that their rich-world colleagues take for granted. The big publishing firms naturally fought back in court and won, and with roughly $20 million of damages, drove Sci-Hub’s founder underground.

Continue reading “Sci-Hub: Breaking Down The Paywalls”