The web service TurnItIn is a "plagiarism detector". Lots of universities use it to assess whether their students are copy-n-pasting content which they haven't written.
I'm not a big fan of it. First, I'll explain how to opt-out your websites. Then I'll explain why I don't like the service.
Block Their Robot
TurnItIn scans the web and records everything on your website. It then uses that to tell Universities whether a student has plagiarised from you. While I don't condone cheating, I haven't given TurnItIn permission to store my content and profit from it.
Their website gives you details on how to prevent your site being slurped up in the first place.
It's as simple as adding the following to
User-agent: TurnitinBot Disallow: /
That should stop TurnItIn from crawling your site in future. But how do you remove all the content they've stored?
Remove your website
I fundamentally disagree that a private company should be able to profit from my works by storing and selling access to it. I wrote to their legal department, and got this in response:
In order to delete content from our database, please provide proof that the content you are requesting to remove belongs to you. Acceptable proof would be:
* A confirmation from an instructor at the university that the work is yours
* A confirmation from the admin at the university that the work is yours
* Draft papers, proof of communication with the university, proof of upload
Please also reconfirm the URL of the content that needs to be deleted.
I emailed them to correct their misunderstanding. My blog is not an academic work (LOL!) and was not written for a University. After a couple of weeks - and several further emails from me to escalate - I got this back:
We have encountered an issue when trying to remove your content. The problem is that the website has been around for some time and was initially crawled by our old crawler.
Please rest assured that we take the request to remove your content seriously, and our legal and content teams are already currently working on populating a complete crawl set from our old indexed content. Once this tool is created, we will be able to fully unindex your website, including any links crawled by our old crawler.
A little later, they updated me with this:
Just to further add, our crawler engineers have stated that they’ve unindexed and deleted all of your pages that they are aware of, until we have the ability to perform a granular search on our database for the older indexed links. If you find any other content still indexed, please let us know, and we will remove it.
I submitted a draft paper which had a selection of paragraphs drawn from my blog posts. Essentially trying to see if they'd accuse me of plagiarising myself. It came back saying that it detected plagiarism and then displayed the full contents of my blog post.
The left shows the paragraph I uploaded. The right shows the full text of my original blog post.
There were over 60 URLs of mine that TurnItIn said it had records for:
After 5 months of complaining to TurnItIn, I discovered that they cannot remove content which was stored prior to 2015. So I've submitted DMCA complaints to their hosting company and registrar.
OK, but why?
There are a couple of arguments, that I have. Firstly, I have rights over my blog posts. There are Fair Dealing exceptions to UK copyright law but I don't think TurnItIn can take advantage of any of them.
Regarding copyright of my essays, here's what TurnItIn say:
Students who submit papers to Turnitin retain the copyright to the work they created. A copy of submitted papers is retained in a Turnitin database archive to be compared with future submissions—a practice that helps protect and strengthen copyright ownership.
I mean... Yes. I haven't assigned my copyright to them. I just don't think they have the right to store and profit from my essays in perpetuity. TurnItIn are a bit defensive about the subject of copyright, arguing:
A U.S. District Court judge ruled that archiving student papers to assess the originality of newly-submitted papers constitutes a fair use under the U.S. Copyright Act, provides “a substantial public benefit" and helps protect the papers from being exploited by others.
The summary judgment was unanimously affirmed by a U.S. Court of Appeals.
My argument is "Fuck You. I don't want you to hoover up my content and then sell it without my permission."
Frankly, TurnItIn can either pay me for my content, or remove it. They sell my content to universities - so they obviously think it has value. I'd like to be compensated for it.
If TurnItIn were a not-for-profit, or a community resource, I'd have some sympathy with them using the web to detect academic misconduct. But, as it is, I don't derive any benefit from them reselling access to my content. So they can get in the bin.
I'm sympathetic to the argument that having students steal my content as pass it off as their own is damaging to me. But I've no evidence to suggest that's a problem. It isn't like TurnItIn sends me an alert whenever it detects plagiarism.
So, I'm opting out. And I hope this guide is useful if you also want to do so.