now listening
shared items
...more shared items

11/01/2003 - 12/01/2003

12/01/2003 - 01/01/2004

01/01/2004 - 02/01/2004

02/01/2004 - 03/01/2004

03/01/2004 - 04/01/2004

04/01/2004 - 05/01/2004

05/01/2004 - 06/01/2004

06/01/2004 - 07/01/2004

07/01/2004 - 08/01/2004

08/01/2004 - 09/01/2004

09/01/2004 - 10/01/2004

10/01/2004 - 11/01/2004

11/01/2004 - 12/01/2004

12/01/2004 - 01/01/2005

01/01/2005 - 02/01/2005

02/01/2005 - 03/01/2005

03/01/2005 - 04/01/2005

04/01/2005 - 05/01/2005

05/01/2005 - 06/01/2005

06/01/2005 - 07/01/2005

07/01/2005 - 08/01/2005

08/01/2005 - 09/01/2005

09/01/2005 - 10/01/2005

10/01/2005 - 11/01/2005

11/01/2005 - 12/01/2005

12/01/2005 - 01/01/2006

01/01/2006 - 02/01/2006

02/01/2006 - 03/01/2006

03/01/2006 - 04/01/2006

04/01/2006 - 05/01/2006

05/01/2006 - 06/01/2006

06/01/2006 - 07/01/2006

07/01/2006 - 08/01/2006

08/01/2006 - 09/01/2006

09/01/2006 - 10/01/2006

10/01/2006 - 11/01/2006

11/01/2006 - 12/01/2006

12/01/2006 - 01/01/2007

01/01/2007 - 02/01/2007

02/01/2007 - 03/01/2007

03/01/2007 - 04/01/2007

04/01/2007 - 05/01/2007

05/01/2007 - 06/01/2007

06/01/2007 - 07/01/2007

07/01/2007 - 08/01/2007

08/01/2007 - 09/01/2007

09/01/2007 - 10/01/2007

10/01/2007 - 11/01/2007

11/01/2007 - 12/01/2007

12/01/2007 - 01/01/2008

01/01/2008 - 02/01/2008

02/01/2008 - 03/01/2008

03/01/2008 - 04/01/2008

04/01/2008 - 05/01/2008

05/01/2008 - 06/01/2008

06/01/2008 - 07/01/2008

07/01/2008 - 08/01/2008

08/01/2008 - 09/01/2008

09/01/2008 - 10/01/2008

10/01/2008 - 11/01/2008

11/01/2008 - 12/01/2008

12/01/2008 - 01/01/2009

01/01/2009 - 02/01/2009

02/01/2009 - 03/01/2009

03/01/2009 - 04/01/2009

04/01/2009 - 05/01/2009

05/01/2009 - 06/01/2009

06/01/2009 - 07/01/2009

07/01/2009 - 08/01/2009

08/01/2009 - 09/01/2009

09/01/2009 - 10/01/2009

10/01/2009 - 11/01/2009

11/01/2009 - 12/01/2009

12/01/2009 - 01/01/2010

01/01/2010 - 02/01/2010

02/01/2010 - 03/01/2010

03/01/2010 - 04/01/2010

Tuesday, November 22, 2005 
how google print could help fight plagiarism
there's been a lot of chatter in the past couple months about google print.

the idea is this: google wants to scan millions of books using OCR technology and create a massive index of book content. users could search this index, and google print would return abstracts of books that fit the search query, along with short excerpts from the books. if users like what they see in the excerpt, they can go to amazon or their favorite brick-and-mortar store to buy the book. publishers could mandate how short or long the excerpts from their books would be, or could opt-out their books from the whole thing.

this would be a fantastic tool for helping readers find books. struggling creators know that the biggest threat to their livelihood is not obscurity. thousands of books are published every year, with countless older books in the back catalog (we call 'em "back list") waiting for new readers. the majority of these books vanish into obscurity, to be read only by a tiny minority. google print would help readers find books they would like, and thus it would sell more books.

but a lot of big publishers and celebrity authors don't see it that way. in fact, they want to sue google print for using their copyrighted material without permission. really they just want to be the ones to control any indexes of their content, in the event that they someday get off their asses and implement something similar (amazon's search-in-a-book feature is similar, but is opt-in and contains a fraction of the number of books google print would contain). but that's what search engines do: they index content, without asking permission first. if google print is illegal then all search engines are illegal.

the publishers' arguments are somewhat disingenuous and require some logical contortions. in effect, to believe the publishers' arguments, you must accept that google is lying. you might hear the canard that google wants to "give away our books for free", which is ridiculous since google makes it clear that only short excerpts will be offered to readers—you won't be able to read the da vinci code on or maybe you'll hear the "they want to make money off our content" line, despite the fact that google says google print will not feature advertisements.

i hadn't posted about this to date, despite it being a hot IP story, despite working in the publishing industry as i do, because i didn't have much to add that, say, the folks at boingboing or the eff hadn't already said better. but recently i had a revelation about how google print could actually help me, as an editor, do my job better. it would actually be a very powerful tool for tracking down plagiarism.

it might seem a bit odd for me to blog about plagiarism, as i strongly believe in fair use rights, sampling rights, and the like. but there is a world of difference between sampling or parodying a work and taking the whole thing and passing it off as your own work. the former is a fragmentary, transformative use (and a creative one), whereas the latter isn't. samplers and remixers are generally pretty honest about what they have taken. the literary equivalent of sampling/remixing is called quoting. quoting is perfectly acceptable as long as sources are cited; in some situations quoting is even strongly encouraged. in contrast, the musical equivalent of plagiarism would be stealing someone else's song and claiming you wrote it. besides, as an editor for a multinational publishing/entertainment company, it's my job to be vigilant for plagiarism issues. so i hope we're clear on the distinction.

i don't actively check for plagiarism too often. generally i give my authors the benefit of the doubt unless i spot something suspicious in the text. if an author's text is usually awful but i come across a passage that is quite well-written, that's suspicious. or if an author has been consistently spelling things one way and suddenly skips to a different spelling, or somehow changes voice in mid-chapter, these are red flags.

when i do decide to start looking for plagiarized content, my first stop is naturally google. i start plugging phrases into google and see what turns up. this technique is remarkably effective. i have even found instances of seeming plagiarism on accident: i came across something a little confusing, went to google to verify the information, and the first page i found contained the exact text and figures from the chapter. oops.

but as powerful as google's web search is, it can only search content that is online. obviously. the internet is a very popular place to plagiarize from (just ask high school teachers), perhaps the #1 most popular place to do do, but it's not the only place. but a smart plagiarist, one who doesn't want to be caught, will realize that maybe copying text from the web isn't wise. "if i was able to find this website in 15 seconds," the plagiarist might think, "then my teacher/editor might be able to find it too."

so a smart plagiarist will want to copy from sources that are not indexed online, like printed materials. like books.

some books are online, but most aren't. or excerpts of them, articles adapted from them, and so on exist online but the bulk of the book doesn't. and i'm pretty sure google's web search doesn't index ebooks. so catching such plagiarism is not really possible online. teachers can still use the old trick of tracking down any books listed in a paper's bibliography and manually searching for copied content, but man is that tedious. and the trick relies on the writer including the source of their plagiarized content in the bibliography. a smart plagiarist probably would not want to cite the source that he's plagiarizing from. and most books don't have bibliographies.

google print could change all that. if google is successful (and isn't forced to stop by short-sighted legal challenges), google print could be a remarkable tool for catching plagiarists. if i came across suspicious text, i could paste it into google's web search, check there, and then with a couple more clicks, switch to google print and check there. if i got no results from either search, i could be fairly confident that the phrase in question was not plagiarized.

thus google print, rather than infringing on publishers' copyrights, would be a powerful tool for protecting copyright. and it would increase book sales by helping readers find books they want to read.

as wonderful as this would be for the publishing industry, i suspect it would be even more useful for teachers, who could almost instantly determine whether students' papers are plagiarized.

do it for the children! save our kids by saving google print!


Powered by Blogger hosted by Sensory Research