October 21, 2015

As the major non-state actor funding copyright irredentist movements across the globe, Google put another notch in its lawfare belt by taking down authors, illustrators and photographers.  Yes, the bizarre Google Books case was upheld on appeal in the Second Circuit.

That’s right—appointed for life judges think that it’s OK for Google to scan and exploit millions of books without regard to what creators’ rights are implicated.

If there’s not something that wrenches your guts out about the very idea of this Nixonian-level violation of the human rights of artists by a multinational crony capitalist, you probably can stop reading now.  On the other hand if it does bother you deeply—like it did the governments of Australia, Canada, France and Germany who complained about Google to the Obama Administration, then read on.  (I know that sounds like a joke, but these countries actually thought that the U.S. would support its own authors over Obama pal Eric Schmidt.)

Let’s look at some of the less obvious implications that the Court didn’t trouble itself with. While the strange geeks at your State university library awake from dreaming the dreams of the sovereignly immune and catch a breath after spewing the sanctimony of the “digital library of Alexandria,” ask them how they feel about scans of their books being used to help the National Security Agency with its “chatter” problem.  These are typically called “nondisplay uses” of the corpse…sorry, the corpus…of the books included in Google Books.

Example:  How can you teach a machine to recognize language and translate it into dozens of languages?  Realize that machines don’t learn languages the way humans do.  We conjugate verbs, learn vocabulary, study how to read and write.   Then we do the same in a different language than our own.

Machines?  Not so much.  Machines don’t really have a “native” language.  Machines are usually asked to translate a particular word or phrase, also called a text string.  The machine requires a very large number of works to refer to, preferably works that have been translated into many languages.  Or more preferably still, works that someone else has laboriously translated into many languages at someone else’s expense.

Catching on yet?  Then you’re miles ahead of the Second Circuit.

This way, the machine can look for words in a text string that appear in a certain sequence in these things called “books” and then look for the corresponding words in various languages.

This is called “corpus machine translation” and Google actually has a handy video on the subject posted on YouTube.

So corpus machine translation works if you already have some text in a language you or your client might be interested in, oh say, Pashto for example.  Or Russian, for all you new cold warriors out there.  How would you use corpus machine translation if you were interested in “chatter” or lots and lots of conversation, you know, like telephone conversations you might have to listen to with some guys who were hanging around at the time over Across the River.

Fortunately, Google has another handy product that solves that problem—Google Voice, the descendant of GOOG-411.  As the ever helpful Marissa Meyer told Infoworld in 2008:

You may have heard about our [directory assistance] 1-800-GOOG-411 service. Whether or not free-411 is a profitable business unto itself is yet to be seen. I myself am somewhat skeptical. The reason we really did it is because we need to build a great speech-to-text model … that we can use for all kinds of different things, including video search.

The speech recognition experts that we have say: If you want us to build a really robust speech model, we need a lot of phonemes, which is a syllable as spoken by a particular voice with a particular intonation. So we need a lot of people talking, saying things so that we can ultimately train off of that. … So 1-800-GOOG-411 is about that: Getting a bunch of different speech samples so that when you call up or we’re trying to get the voice out of video [such as from YouTube], we can do it with high accuracy.

Marissa Meyer, interviewed in Info World, October 23, 2007–nearly 8 years ago.

And more recently, the Google Now product.

Google is not just listening to your searches, but the search engine is also recording and storing every single voice search you make.

Google is incredibly accurate at understanding your voice. The company secretly stores its users’ searches from its voice-activated assistant Google’s Voice Search and search feature Google Now to turn up relevant advertisements as well as improve the feature.

But what many of you do not realize is that after every voice searches you made, Google makes a recording of it and stores it in a remote part of your account.

So…Google Books provides the corpus for the corpus machine translation.  Google Voice provides the ability to turn voice into text through speech recognition, and Google Now provides the ability to recognize the voice as yours.

And who might want to have that capability?  Oh, just some guys who were hanging around at the time over there Across the River.  Perhaps clients of Google the defense contractor.

Thanks, Second Circuit.  Major twofer for The Man 2.0—slam the crap out of artists and also give the NSA a nice new toy.  Illuminating discussion in the “fair use” opinion…oh, snap.  It’s not there.

