Opting Out of TurnItIn

by @edent | , , | 4 comments | Read ~409 times.

The web service TurnItIn is a "plagiarism detector". Lots of universities use it to assess whether their students are copy-n-pasting content which they haven't written.

I'm not a big fan of it. First, I'll explain how to opt-out your websites. Then I'll explain why I don't like the service.

Block Their Robot

TurnItIn scans the web and records everything on your website. It then uses that to tell Universities whether a student has plagiarised from you. While I don't condone cheating, I haven't given TurnItIn permission to store my content and profit from it.

Their website gives you details on how to prevent your site being slurped up in the first place.

It's as simple as adding the following to robots.txt:

User-agent: TurnitinBot
Disallow: /  

That should stop TurnItIn from crawling your site in future. But how do you remove all the content they've stored?

Remove your website

I fundamentally disagree that a private company should be able to profit from my works by storing and selling access to it. I wrote to their legal department, and got this in response:

In order to delete content from our database, please provide proof that the content you are requesting to remove belongs to you. Acceptable proof would be:
* A confirmation from an instructor at the university that the work is yours
* A confirmation from the admin at the university that the work is yours
* Draft papers, proof of communication with the university, proof of upload
Please also reconfirm the URL of the content that needs to be deleted.

I emailed them to correct their misunderstanding. My blog is not an academic work (LOL!) and was not written for a University. After a couple of weeks - and several further emails from me to escalate - I got this back:

We have encountered an issue when trying to remove your content. The problem is that the website has been around for some time and was initially crawled by our old crawler.
Please rest assured that we take the request to remove your content seriously, and our legal and content teams are already currently working on populating a complete crawl set from our old indexed content. Once this tool is created, we will be able to fully unindex your website, including any links crawled by our old crawler.

A little later, they updated me with this:

Just to further add, our crawler engineers have stated that they’ve unindexed and deleted all of your pages that they are aware of, until we have the ability to perform a granular search on our database for the older indexed links. If you find any other content still indexed, please let us know, and we will remove it.

Testing

I submitted a draft paper which had a selection of paragraphs drawn from my blog posts. Essentially trying to see if they'd accuse me of plagiarising myself. It came back saying that it detected plagiarism and then displayed the full contents of my blog post.

Screenshot of the website. On the right is the full text of my blog post.

The left shows the paragraph I uploaded. The right shows the full text of my original blog post.

There were over 60 URLs of mine that TurnItIn said it had records for:

Long list of URls.

After 5 months of complaining to TurnItIn, I discovered that they cannot remove content which was stored prior to 2015. So I've submitted DMCA complaints to their hosting company and registrar.

OK, but why?

There are a couple of arguments, that I have. Firstly, I have rights over my blog posts. There are Fair Dealing exceptions to UK copyright law but I don't think TurnItIn can take advantage of any of them.

Regarding copyright of my essays, here's what TurnItIn say:

Students who submit papers to Turnitin retain the copyright to the work they created. A copy of submitted papers is retained in a Turnitin database archive to be compared with future submissions—a practice that helps protect and strengthen copyright ownership.

I mean... Yes. I haven't assigned my copyright to them. I just don't think they have the right to store and profit from my essays in perpetuity. TurnItIn are a bit defensive about the subject of copyright, arguing:

A U.S. District Court judge ruled that archiving student papers to assess the originality of newly-submitted papers constitutes a fair use under the U.S. Copyright Act, provides “a substantial public benefit" and helps protect the papers from being exploited by others.
The summary judgment was unanimously affirmed by a U.S. Court of Appeals.
TurnItIn

My argument is "Fuck You. I don't want you to hoover up my content and then sell it without my permission."

Frankly, TurnItIn can either pay me for my content, or remove it. They sell my content to universities - so they obviously think it has value. I'd like to be compensated for it.

If TurnItIn were a not-for-profit, or a community resource, I'd have some sympathy with them using the web to detect academic misconduct. But, as it is, I don't derive any benefit from them reselling access to my content. So they can get in the bin.

I'm sympathetic to the argument that having students steal my content as pass it off as their own is damaging to me. But I've no evidence to suggest that's a problem. It isn't like TurnItIn sends me an alert whenever it detects plagiarism.

So, I'm opting out. And I hope this guide is useful if you also want to do so.

4 thoughts on “Opting Out of TurnItIn

  1. One my former university/library colleagues might be interested in. Especially if they care about copyright.

  2. S. F. Griffin says:

    I wonder what the rules for 'substantial public benefit' are. There are some very expensive academic works.

  3. Funny how an anti-plagiarism service relies on plagiarism to function.


  4. There's a bit of conflation of various items going on here: UK (and EU) and US copyright law, and the difference between essays you've submitted as a student and blog posts you've made and they've crawled.

    The court case dealt with essays submitted as a student and US copyright law (which also applies to the DMCA copyright act). For student essays, the students agreed to the terms and conditions of the website, which allows Turnitin to archive the essay. The court found that the alternative is to not agree to the T&Cs, and come to a different arrangement with the school or teacher that requests it. Furthermore, it found that Turnitin's use of the work, while commercial, was sufficiently transformative as it was used for preventing plagiarism, and thus (together with the three other factors) satisfied the criteria for fair use in the US. I think the fair use argument could reasonably apply to your blog posts made, and crawled, before you signed the T&Cs of Turnitin (assuming you did as part of your course). Their legal department's approach of agreeing to take them down, then saying it was impossible because they were introduced before 2015 is strange.

    UK (and EU) law is much more restrictive when it comes to fair use. As you've mentioned, there's no transformation exception. I think you're in the right there: you've informed them that all works from your blog infringe your copyright, giving examples of specific pages. They should "act expeditiously" to remove the infringing pages, but are not.

Leave a Reply

Your email address will not be published. Required fields are marked *