You may have heard about our [directory assistance] 1-800-GOOG-411 service. Whether or not free-411 is a profitable business unto itself is yet to be seen. I myself am somewhat skeptical. The reason we really did it is because we need to build a great speech-to-text model … that we can use for all kinds of different things, including video search.
The speech recognition experts that we have say: If you want us to build a really robust speech model, we need a lot of phonemes, which is a syllable as spoken by a particular voice with a particular intonation. So we need a lot of people talking, saying things so that we can ultimately train off of that. … So 1-800-GOOG-411 is about that: Getting a bunch of different speech samples so that when you call up or we’re trying to get the voice out of video [such as from YouTube], we can do it with high accuracy.
Marissa Meyer, interviewed in Info World, October 23, 2007–nearly 8 years ago.
Do you remember Google’s 411 service? Did anyone ever tell you that Google was using your call to GOOG411 to train their voice to text bots? No? Don’t remember that part? How about this part (from Dan Froomkin in The Intercept):
Most people realize that emails and other digital communications they once considered private can now become part of their permanent record.
But even as they increasingly use apps that understand what they say, most people don’t realize that the words they speak are not so private anymore, either.
Top-secret documents from the archive of former NSA contractor Edward Snowden show the National Security Agency can now automatically recognize the content within phone calls by creating rough transcripts and phonetic representations that can be easily searched and stored.
The documents show NSA analysts celebrating the development of what they called “Google for Voice” nearly a decade ago.
I think people don’t understand that the economics of surveillance have totally changed,” Jennifer Granick, civil liberties director at the Stanford Center for Internet and Society, told The Intercept.
“Once you have this capability, then the question is: How will it be deployed? Can you temporarily cache all American phone calls, transcribe all the phone calls, and do text searching of the content of the calls?” she said.
Of course the “you” in that last sentence is the key, right? If “you” is the NSA, most people would flip out. If “you” is Google, many millions–at least so far–allow it to happen in a multitude of Google products hundreds of millions of times a day. What’s to worry, right?
The Second Circuit’s decision this week in ACLU et al v. Clapper et al should tell you all you need to know. This is the decision in which the Court ruled that the government’s bulk collection of phone records ostensibly under § 215 of the Patriot Act violates the Constitution and is illegal. As the Court ruled:
The interpretation [of § 215] that the government asks us to adopt defies any limiting principle. The same rationale that it proffers for the “relevance” of telephone metadata cannot be cabined to such data, and applies equally well to other sets of records. If the government is correct, it could use § 215 to collect and store in bulk any other existing metadata available anywhere in the private sector, including metadata associated with financial records, medical records, and electronic communications (including e‐mail and social media information) relating to all Americans.
So when we say that YouTube is actually a data mining honeypot that happens to be a video service…
And Google Books? Why do you think Google fought many countries to keep hold of the corpus of the world’s books?
Perhaps you have never heard of machine translations, but if you use one of the many translation algorithms available for “free” online, you have experienced a translation by a machine. The video above is a good summary of how Google uses machine translation of a particular kind–a “corpus machine translation.”
Simply put, “corpus machine translation” is an offshoot of speech recognition, often studied under the name “language technologies” or something similar (Carnegie Mellon University has a “Language Technologies Institute”, for example. Carnegie Mellon’s Language Technology Institute is chaired by Dr. Jaime Carbonell, faculty advisor to a number of Googlers and formerly Chief Scientist at a company called Meaningful Machines.)
The way this works with text-based translations is that machines are taught to recognize written speech patterns in a language. If the machine wants to translate a sentence from English to French, for example, rather than teaching the machine a language the way humans would study it (conjugating verbs, for example), the machine learns phrases, sentences and expressions–or “strings” of text. (Similar to the “phonemes” with voice recognition.)
When a machine “translates” a sentence (or string of text) from English to French, it first tries to match the text string in its English “memory” of text strings (a large database). Then it will try to compare that string to what might be called a “known known”–a corresponding text string in its French database that the machine has been told is an exact or good enough match to the English string.
A good way to accomplish this is with books in translation. You know, a lot of books. Like Google Books.
How would the machine know a translation of a particular string? One way would be if the machine had scanned into its English database a book in English, say Bonfire of the Vanities by Tom Wolfe, that had been translated into French and that had the French translation mapped to the English version so the machine could compare the two. Or if it had a book in French, say L’Être et le néant by Jean-Paul Sartre that had been translated into English (Being and Nothingness) and mapped to the French so the machine could compare the two.
If the machine had more than one translation of Bonfire of the Vanities and Being and Nothingness and could compare a number of examples, then the machine could take advantage of all the work done by the translators (and the publishers that paid the translators for their work) and reliably compare strings of text. The user of the machine could then have a high degree of confidence that the two strings really did mean what they should mean based on their sequential location in the two books, fifty books, or as many languages as the publishers had the books translated.
And who might the user be, do you think?
So if GOOG 411 uses voice recognition to translate voice into text, and if Google Books allows translation to or from another language into the user’s language…English, for example…then you can pretty much store text renderings of voice intercepts…sorry…voice recordings made voluntarily, of course…translate them into English and store them. For something.
And who might have a lot of voice intercepts hanging about in need of translating?
And in any event–how would you ever know?
So if you’re the kind of person who goes online without a thought in your head and says, “OK Google, show me some ads for stuff I want to buy”, it won’t bother you that not only are you being tracked but every communication you make can be tracked, sliced, diced, cataloged and stored. Because after all, you haven’t got a thought in your head that’s not put there by advertising. Or do you?
On the other hand…maybe you actually would prefer not to be tracked, sliced, diced, cataloged and stored. So think about that the next time you use Google search, Google Voice, Google Apps for Education, Google Docs, Gmail, YouTube, Google Books or any other Google product.