Google Books: How bad is the metadata? Let me count the ways….

Professor Nunberg provides some excellent slides illustrating his complaints about the abysmal quality of Google Book search and its metadata. Recall that Jean-Noël Jeanneney, former president of the Bibliothèque Nationale of France, warned of the same kind of problems and gave similar examples of botched scans of French cultural works in his 2007 work “Google and the Myth of Universal Knowledge“.

Jean-Noël Jeanneney will be partcularly galled (no pun intended) by yet another variety of metadata screwup in Google Books–authorship attributed to the writer of a forward–Madame Bovary by HENRY JAMES for example.

Now remember–this is The Google we are talking about. The Google who only hires the best and the brightest. The Google who only hires from the best schools. The Google who would have you believe that they are the second coming. The Google who seems to employ people who don’t know who wrote Madame Bovary and who don’t know that Tom Wolfe wasn’t born in 1888.

The librarians who trusted The Google to scan their works thought they would get something back that was going to further their mission. I feel very, very certain that the metadata that was delivered by the best libraries in the world along with the books to be scanned was correct. What these librarians got back was gibberish.

Given the right machine, you could train a reasonably intelligent pet to scan books. In fact, I have dogs that could do a bang-up job with a little training. If–all they had to do was hit the “scan” button.

I would not expect a dog to know who wrote Madame Bovary.

What is valuable about a registry of intellectual property is not the digitized assets. Any fool with some money and time can digitize books. What is valuable is the name, rank and serial number that are connected to the books. Not to mention how they are organized, which has all kinds of cultural overtones.

But if you can’t even know who the authors are with any reliability or if you can’t know when a book was published (that is–who to pay and how much), then you’ll never be able to associate the payee information (such as W-9) with the titles.

That’s if you actually ever intended to pay anyone anything.

I wonder how these librarians feel now. Looks like Marian got googled.