We have been no fans of the Google Books project, from its tortured use of sovereign immunity to take books out the back door of state libraries for commercial gain to the crappy deal for authors, to Google’s attack on the class representatives, to Lessig’s bizarre defense of the project (conveniently forgotten as it sank through one litigation catastrophe after another).
This started when I read the main European critique of the project, Google and the Myth of Universal Knowledge by Jean-Noël Jeanneney (former president of the Biblioteque nationale de France), followed by Professor Nunberg’s insightful article, “Google Book Search: A Disaster for Scholars” in which he revealed the shockingly awful state of Google’s book metatdata:
[Y]ou need reliable metadata about dates and categories, which is why it’s so disappointing that the book search’s metadata are a train wreck: a mishmash wrapped in a muddle wrapped in a mess.
Start with publication dates. To take Google’s word for it, 1899 was a literary annus mirabilis, which saw the publication of Raymond Chandler’s Killer in the Rain, The Portable Dorothy Parker, André Malraux’s La Condition Humaine, Stephen King’s Christine, The Complete Shorter Fiction of Virginia Woolf, Raymond Williams’s Culture and Society 1780-1950, and Robert Shelton’s biography of Bob Dylan, to name just a few. And while there may be particular reasons why 1899 comes up so often, such misdatings are spread out across the centuries. A book on Peter F. Drucker is dated 1905, four years before the management consultant was even born; a book of Virginia Woolf’s letters is dated 1900, when she would have been 8 years old. Tom Wolfe’s Bonfire of the Vanities is dated 1888, and an edition of Henry James’s What Maisie Knew is dated 1848.
Of course, there are bound to be occasional howlers in a corpus as extensive as Google’s book search, but these errors are endemic. A search on “Internet” in books published before 1950 produces 527 results; “Medicare” for the same period gets almost 1,600. Or you can simply enter the names of famous writers or public figures and restrict your search to works published before the year of their birth. “Charles Dickens” turns up 182 results for publications before 1812, the vast majority of them referring to the writer. The same type of search turns up 81 hits for Rudyard Kipling, 115 for Greta Garbo, 325 for Woody Allen, and 29 for Barack Obama. (Or maybe that was another Barack Obama.)
How frequent are such errors? A search on books published before 1920 mentioning “candy bar” turns up 66 hits, of which 46—70 percent—are misdated. I don’t think that’s representative of the overall proportion of metadata errors, though they are much more common in older works than for the recent titles Google received directly from publishers. But even if the proportion of misdatings is only 5 percent, the corpus is riddled with hundreds of thousands of erroneous publication dates.
Now we have “Google and the World Brain,” a movie that tells the story of the disaster of the Google Books litigation: “[A]uthors across the world launched a campaign to stop Google, which climaxed in a New York courtroom in 2011.”
I suppose if you set out to find a way to anger massive numbers of creators on a worldwide basis it is theoretically possible that you could have come up with something more complete than Google Books, but I doubt it. “Google and the World Brain” should be an interesting audiovisual on just how Google managed to screw up what could have been a good thing by doing an incompetent job of scanning, and incompetent job of litigating, but a blisteringly awesome job of alienating.
But never fear–the FTC would never dream of interpreting this high handed rough riding as evidence of monopolistic behavior targeting all the authors in the history of the world, past and future.