Professor Nunberg provides some excellent slides illustrating his complaints about the abysmal quality of Google Book search and its metadata. Recall that Jean-Noël Jeanneney, former president of the Bibliothèque Nationale of France, warned of the same kind of problems and gave similar examples of botched scans of French cultural works in his 2007 work “Google and the Myth of Universal Knowledge“.
Jean-Noël Jeanneney will be partcularly galled (no pun intended) by yet another variety of metadata screwup in Google Books–authorship attributed to the writer of a forward–Madame Bovary by HENRY JAMES for example.
Now remember–this is The Google we are talking about. The Google who only hires the best and the brightest. The Google who only hires from the best schools. The Google who would have you believe that they are the second coming. The Google who seems to employ people who don’t know who wrote Madame Bovary and who don’t know that Tom Wolfe wasn’t born in 1888.
The librarians who trusted The Google to scan their works thought they would get something back that was going to further their mission. I feel very, very certain that the metadata that was delivered by the best libraries in the world along with the books to be scanned was correct. What these librarians got back was gibberish.
Given the right machine, you could train a reasonably intelligent pet to scan books. In fact, I have dogs that could do a bang-up job with a little training. If–all they had to do was hit the “scan” button.
I would not expect a dog to know who wrote Madame Bovary.
What is valuable about a registry of intellectual property is not the digitized assets. Any fool with some money and time can digitize books. What is valuable is the name, rank and serial number that are connected to the books. Not to mention how they are organized, which has all kinds of cultural overtones.
But if you can’t even know who the authors are with any reliability or if you can’t know when a book was published (that is–who to pay and how much), then you’ll never be able to associate the payee information (such as W-9) with the titles.
That’s if you actually ever intended to pay anyone anything.
I wonder how these librarians feel now. Looks like Marian got googled.
The Copyright Alliance has launched a letter writing campaign that I would encourage you to sign up for if you believe this excerpt:
“[W]e [artists] are under assault. Our rights to control the distribution, use, and reproduction of our works in our vibrant digital age are dismissed by many who do not understand the value we bring to society. They tell us to work harder, create better, and give our works away. Some think that they should control our works and that they should be able to appropriate, perform, and copy them how they please, without our consent, benefit, or participation.”
If that’s you, click here.
Google’s Book Search: A Disaster for Scholars. Now that title caught my eye, not the least because it appeared in the Chronicle of Higher Education. The article is extraordinarily honest and well written, with solid research and supporting evidence.
We’ve become accustomed to librarians and academics uncritically fawning over the disaster that is Google Books (especially those privileged librarians among the sovereignly immune), but give this one by Professor Geoffrey Nunberg a read, particularly regarding the “metadata,” the fields in the Google Book Search that are supposed to contain information like year of publication, title, author, etc.
“Start with publication dates. To take Google’s word for it, 1899 was a literary annus mirabilis, which saw the publication of Raymond Chandler’s Killer in the Rain, The Portable Dorothy Parker, André Malraux’s La Condition Humaine, Stephen King’s Christine, The Complete Shorter Fiction of Virginia Woolf, Raymond Williams’s Culture and Society 1780-1950, and Robert Shelton’s biography of Bob Dylan, to name just a few. And while there may be particular reasons why 1899 comes up so often, such misdatings are spread out across the centuries. A book on Peter F. Drucker is dated 1905, four years before the management consultant was even born; a book of Virginia Woolf’s letters is dated 1900, when she would have been 8 years old. Tom Wolfe’s Bonfire of the Vanities is dated 1888, and an edition of Henry James’s What Maisie Knew is dated 1848….Google acknowledges the incorrect dates but says they came from the providers.”
Of course, Google claims that these mistakes are the copyright owners’ fault per usual–but what is interesting about this catalog of boneheaded errors is that the mistakes always seem to make the works OLDER than they actually are. Therefore more likely to be out of copyright and non-infringing, as opposed to NEWER than they actually are, therefore more likely to be in copyright and infringing. In fact–“to take Google’s word for it”–it seems a safe guess that all the listed works would be in the public domain according to the incorrect dates that Google has placed in their metadata. Which benefits Google. Of course, not even Google can make a book into a public domain work when it isn’t, but it does suggest that Google could say that an unlicensed work in copyright got into the claws of Google Books “by mistake”–an “innocent” infringement because the metadata said the work was in the public domain.
As one commenter to Professor Nunberg’s article notes: “Here’s another one: a recurring problem with date of publication is that all volumes of a journal are assigned the date of voulme 1.” That is–the oldest possible date. God knows what they did with the sheet music.
The applications running the Google Books registry will need to make a distinction between works in copyright and works out of copyright. That is a very important date in the settlement agreement. Where do you think that date is going to come from? It is starting to look like it will come from incorrrect data–data that makes the publication dates MUCH OLDER than they actually are.
Who’s going to check to see that the millions of copyright dates are correct? Nobody. And it’s yet one more thing for the overburdened copyright owner to sort out as Google continues its “cultural rape.”
“Innocent” infringement versus “intentional” infringement creates a rather large difference in how the punishment for the infringement is treated on judgement day–which would be on the later of the date that the plaintiff gets a final non-appealable judgement against Google for copyright infringement–or the author dies penniless. Also likely to foreclose criminal prosecution.
Perhaps this all has something to do with the mysteries of advertising placement? Professor Nunberg says he was told that “[t]he ad placement on Google’s book search right now is often comical, as when a search for Leaves of Grass brings up ads for plant and sod retailers.”
Hmmm. I think we noted that possibility two years ago in my review of “Google and the Myth of Universal Knowledge” where the absurdity of selling advertising in books was well argued by the prescient Jean-Noël Jeanneney, former president of the Bibliothèque Nationale of France:
“Recall Google CEO Eric Schmidt’s statements to the Wall Street Journal on the eve of the Viacom lawsuit: When asked to respond to the idea that “content” has intrinsic value, he said “prove it”. Which has to be one of the dumber, but yet illuminating, remarks to come from a Silicon Valley CEO on the subject of art and culture.
No wonder M. Jeanneney tells us that ‘[t]he visit I received from several Google executives after the beginning of my campaign [against Google Books] didn’t do much to reassure me.’
These statements echo and confirm one of the most important points raised in Google the Myth: M. Jeanneney writes, ‘What pays for the digitization of materials are linked advertisements from companies that have an interest in associating their image with old or recent works likely to promote that image. As a result, books will necessarily be hierarchized in favor of those best suited to satisfy the demands of advertisers, again, chosen according to the principal of the highest bidder [as is Google AdWords]. I wouldn’t want to see—although I’m amused by the thought—the text of Saint-Exupéry’s Le Petit prince accompanied by an ad for a sheep merchant.’”
Right on cue, Professor Nunberg tells us:
“Google’s fine algorithmic hand is also evident in a lot of classifications of recent works….[Google assigns a “]Religion[“] tag…to a 2001 biography of Mae West that’s subtitled An Icon in Black and White [and] the Health & Fitness label on a 1962 number of the medievalist journal Speculum….
But even when it gets the [bookseller’s standard] categories roughly right, the more important question is why Google would want to use those headings in the first place. People from Google have told me they weren’t included at the publishers’ request, and it may be that someone thought they’d be helpful for ad placement.”
So before you write off “cultural rape” as mere French “yankee go home” hyperbole, think again.
Professor Nunberg sums it up: “[Y]ou need reliable metadata about dates and categories, which is why it’s so disappointing that the book search’s metadata are a train wreck: a mishmash wrapped in a muddle wrapped in a mess.”
Maybe it’s not hyperbole, and maybe it is cultural rape for real. All those statements about what a great idea Google Books is, how it will make “millions” of titles available to the public–maybe that is yet more evidence of Google’s charm offensive to mask what an unmitigated disaster this project is from a cultural, copyright, antitrust and now scholarly perspective.
It’s nice to find an academic being honest about Google’s screw ups. Since Professor Nunberg is at Berkeley, Google might actually listen to him.
But it’s unlikely. The do-over to fix the metadata and cataloging system will take a very long time at vast cost. “Fixing” Google Books is not in Google’s interest and who can make them? As the scanning keeps going every minute of every day, it is becoming increasingly clear that Google thinks of Google Books as Google’s books–the entire intellectual capital of the world.
It shows again what happens when you put the sole retailer in charge of the metadata–the monopsonist buyer has no incentive to act to the benefit of its sellers, particularly when the sellers’ only enforcement mechanism is costly litigation against the monopsonist, particularly when the monopsonist has access to the public financial markets to raise its defense funds. (Which defense costs evidently are deemed not “material” by its public accounting firms and thus the true litigation exposure is not reflected in its public financial filings.)
Swiss they ain’t.
Good story on the Veoh case at CNET (and since it quotes MTP it’s a great story!)
Billboard has the story:
“‘We the undersigned wish to express our support for Lily Allen in her campaign to alert music lovers to the threat that illegal downloading presents to our industry and to condemn the vitriol that has been directed at her in recent days,’ said the statement, which went on to call for a three-strikes law that would result in restrictions to persistent offenders’ bandwidth levels to prevent P2P activity….[Radiohead’s] Ed O’Brien told BBC News the meeting was “quite emotional” and said Allen was “extremely brave” to turn up.”She’s taken a lot of flak for what she’s said. What she’s done has been brilliant because she started the process where artists have stood up and said, you know what, there is a consequence to illegal file-sharing,” he said. “In the meeting, we didn’t always agree but we came to an agreement that we thought was good for everyone.”
“Extremely brave” is right. It’s funny how one brave woman can help others find their courage. Or something else they lost. What once was lost now is found by whatever path, and the Brits have passed an important threshold standing together. This development arguably unites the country behind Lord Mandelson.
The statement of members of the Featured Artist Coalition (although not formally issued on behalf of the organization–weird, but I’ll take it) sensibly suggests what will likely be an economic sanction for the third strike (which MTP also favors–see “Return of the Hadopi“) and is a 180 from the capitulationists who seem to have evaporated.
“The current settlement agreement raises significant issues as demonstrated not only by the number of objections, but also by the fact that the objectors include countries, states, non-profit organizations, and prominent authors and law professors.”
Now why was this result so difficult to anticipate? Or maybe it wasn’t–Google is still scanning.
Somebody asked me if I knew The Bald Guy from the music business. Apparently there is someone out there sending out newsletters acting like he is/was/might have been in the music business and said some very nasty and misogynistic things about Lily Allen’s recently public statements against file “sharing” or what we call “bartering” around here. Not to mention gratuitous and homophobic things about Elton John.
I have never run into The Bald Guy on a deal, at a company, never known anyone by that name who actually sold a record, humped a trap case, nurtured an artist’s career, worked a hit record, or worked a stiff record for that matter. No one ever said—wait! I have to check with The Bald Guy. In short, I have never run into anyone by that name in the music business. Neither on the tech side of the house. Or indeed—anywhere.
Now I have heard of someone by that name who is in the email business. That’s not a business I’m very familiar with, so it’s entirely possible that the guy is an email rock star. I actually sat next to a bald guy at the music awards during Canadian Music Week this year. He seemed to be getting bad vibes from Gene Simmons. Of course—that’s kind of like my brother’s in the army, maybe you know him. I got the impression that The Bald Guy is kind of like the Glenn Beck of the email business or something.
The thrust of the email that I heard about from The Bald Guy is that Lily Allen isn’t pretty enough for his standards (???) and that she’s not a good enough singer (given his superstar A&R track record) and that artists don’t know anything about the record business. And the most damning fact in the email business—she hasn’t made it in New York. And then there were some things said about Sir Elton that just aren’t worth repeating.
Now—is there a connection between The Bald Guy and Sir Elton’s letter to Lord Mandelson against file bartering? Maybe, maybe not. But the timing is curious. Which do you think will get more weight from Lord Mandelson? The Bald Guy’s email or the views of one of the greatest songwriters of all time?
Would Sir Elton have written his letter were it not for Lily Allen? Maybe, maybe not. But the timing is curious.
I have learned that the very best person to ask about what to do with a record is the artist. They may not know all the answers, but they usually have some pretty good ideas. And it is, after all—their record.
I wouldn’t ask someone in the email business what they think about selling records, and I wouldn’t expect them to ask me about the email business. I’d be more likely to ask them what they think about giving email away for free, and they’d probably tell me.
The email business must be a tough business. It sounds like it must be like the music business was 30 or 50 years ago, a bunch of wannabe Svengalis (speaking of wannabes) telling girls that they aren’t pretty enough to get a ride in the big black car, or their record wasn’t good enough to be worth paying off 100 jocks to play it instead of something they wanted to play. And then of course, there was that world outside of New York—New Jersey. If you ain’t made it in Jersey, baby….
That’s the problem with the Internet—everyone’s a critic. Even people in the email business. And it is very important to some that they tear down anyone who stands up. Particularly women. We have a name for that.
Us Baldry alums have to stick together.