Should Google Ask Authors Before Using Their Books for AI Research?
September 30, 2016
Google is using over 11,000 books—mostly fiction—to fuel its artificial intelligence research, reports the Guardian. The problem, authors say, is that they didn’t authorize Google’s use of their books for this purpose.
In their paper, the researchers explain that the books used were taken from “the Books Corpus,” a collection of free books from the web “written by [as] yet unpublished authors” for Google Brain research. (This corpus is supposedly available for download from the University of Toronto, but at the time of writing it was unavailable via the Guardian’s link.)
Recently, a spokesperson at Google—who did not want to be named—said that the research is just a “proof of concept” to “help Google understand and produce a broader, more nuanced range of text for any given task.”
“We could have used many different sets of data for this kind of training, and we have used many different ones for different research projects,” he added. “But in this case, it was particularly useful to have language that frequently repeated the same ideas, so the model could learn many ways to say the same thing—the language, phrasing and grammar in fiction books tend to be much more varied and rich than in most nonfiction books.”
But this is “blatantly commercial use of expressive authorship,” said Mary Rasenberger, executive director of the Authors Guild. “We’ve seen this movie before,” she added.
The Guild and Google have been in dispute since 2005, over Google’s project to digitize library books; however, Google won in 2013 with the district court ruling that “all society benefits” from the project.
“Why shouldn’t authors be asked permission, or even informed—not to mention compensated—before their work is used in this manner?” Rasenberger asked. “There’s no doubt the company has the means to do so.”
Google has not responded to questions about whether they have plans to compensate the authors, or if getting ahold of the authors to notify them of their research was out of reach, according to the Guardian.