The Value Gap is Bigger Than You Thought: Member of EU Parliament Calls Out Google’s Data Harvesting

According to MusicAlly, a Member of the European Parliament from Germany has called out Google’s non-display uses of music that are pure profit for Google.  Christian Ehler has his eye on the right ball:

“The American platforms have been very successful as it’s a liar’s poker that suggested an alliance between the consumer and their commercial interests. We have heard the notion that it is free and for consumers. This is a pretension as [YouTube is] not for free. [YouTube] gets access to you and you are bombarded with advertisements. We are living now in the time of the second level of revenues – this is the data the consumers are giving to these platforms […] Consumer data becomes more and more important and it’s not well understood that this is not for free […] We are selling our future. Creativity is the USP of Europe. They [the digital companies] accumulate money. Why is Netflix producing TV series? Why is YouTube creating YouTube stars? They do understand that their business is content, not distribution […] We are simply selling our economic future if we are going to lose this battle.”

I have been banging the table for years about Google’s non-display uses of music and the fans that we drive to their various platforms so MEP Ehler’s view is very welcome.  “Non-display uses” include data scraping but could mean virtually anything because Google cannot be trusted to disclose what they are really doing with any of their products because they have a long history of not telling the truth about their business practices.

Google’s business practices raises several important questions for artists that no one is asking.  The first question is do you want your music and your fans to be used in this way in the first place?

And since this is all a byproduct of what Mr. Ehler correctly describes being “bombarded with advertisements”, it is important to understand that even if you use YouTube’s tools to block YouTube from selling advertising against your work, Google’s exploitation against your fans doesn’t stop there.

Google routinely captures data from every conceivable contact with your fans and they do it surreptitiously, in relative secrecy in the background.  How they do it is not easy to discover, but a significant number of their techniques and implementing technology was disclosed in a recent class action brought against Google by consumers for privacy violations of Gmail.

As Jeff Gould wrote in a highly recommended article “The Natural History of Gmail Data Mining” Google’s plan is to be able to scrape as much information as possible in return for the “free” use of Gmail:

The most striking thing about the early Gmail patents is how exhaustive they were in attempting to anticipate every conceivable attribute of an email message that might one day be exploited for ad targeting purposes. In many cases it would be years before Google was actually able to make these ideas operational in Gmail. The first version of ad serving in Gmail exploited only concepts directly extracted from message texts and did little or no user profiling — this method would only be put into practice much later. Some attributes have still not been implemented today and perhaps never will be. For example, as far as I know, Google does not reach into your PC’s file system to examine other files residing in the same directory as the file you attach to a Gmail message, even though the patents explicitly describe this possibility.

Are you willing to bet that Google doesn’t scrape the same kind of behavioral data about your fans on YouTube?  And what is stopping Google from scraping the same data from children attracted to YouTube?

As Mr. Gould reports, the data mining is what makes the real money for Google:

When Gmail was finally released to the public in April 2004, its ad serving system used a sophisticated data mining algorithm known as PHIL, the subject of another Google patent filed by Georges Harik and a colleague. Already implemented the previous year in Google’s AdSense program that serves ads to web sites operated by third party publishers, PHIL stands for Probabilistic Hierarchical Inferential Learner. Despite the forbidding name, the basic idea is straightforward.

Words in documents such as emails [or lyrics] occur not randomly but in certain clusters. When allowed to crunch through a vast number of such documents, simple software algorithms can identify clusters that are more or less likely to occur and group them together as “concepts”. For example, PHIL can learn to distinguish the entirely different meanings of two concepts such as “ski resort” and “lender of last resort” without being tripped up by the fact that the term “resort” occurs in both.  [But Google can’t distinguish between “Fragile” and “Fragile (Live)” for address unknown NOIs].

In AdSense, PHIL matched concepts derived from sets of keywords provided by advertisers with concepts extracted from the web pages where publishers wanted Google to place ads. The idea was that the better the match, the more likely a visitor to the publisher’s site would be to click on the ad, which was the revenue generating event for Google.

MEP Ehler has put his finger right on one of the implied issues in the value gap and it’s a value that isn’t usually measured in these discussions.  The fact is the gap is so wide that it’s hard to know the value of the income transfer.

 

Post navigation

Leave a Reply