I got a DOI from arXiv for my MSc!


The DOI logo.

Welcome to acronym city! I recently published my Master's Dissertation. I say "published" - I just shoved it up on a website. But real academic publications should have a DOI - it's an identifier which is supposed to make it easier for people to find and cite paper. You know how books have a unique […]

Continue reading →

Page numbers aren't the answer


The PDF file icon with a big red line through it.

There's a new pre-print paper called Pinpointing the problem: Providing page numbers for citations as a crucial part of open science by Leon Y. Xiao and Nick Ballou. It's a short, easily understandable paper, and well worth a read. I think I disagree with nearly all of its conclusions! The main point, I agree with. […]

Continue reading →

Why is there no Semantic Ontology of Sentiment in Academic Citations?


Screenshot from Google Scholar. The book On farting: Language and laughter in the middle ages by V Allen has been cited by 106 other authors.

About a million years ago, I was discussing the FOAF (Friend of a Friend) ontology with its early proponents. It allowed you to define a machine-readable semantic relationship like "Alice is married to Bill" and "Bill is Carol's child" and "Carol works for David". That sort of thing. At the time, all the FOAF relationships […]

Continue reading →

Check if your code is cited in academic works


List of citations, including one of mine.

I am a vain man. For a few years, I've been tracking academic papers which cite my blog posts. Recently, someone let me know that they'd found one of GitHub repos in a paper they'd read. It hadn't even occurred to me to search for those! So, shove your GitHub URl into Google Scholar - […]

Continue reading →

Opting Out of TurnItIn


Screenshot of TurnItIn displaying a list of my blog posts.

The web service TurnItIn is a "plagiarism detector". Lots of universities use it to assess whether their students are copy-n-pasting content which they haven't written. I'm not a big fan of it. First, I'll explain how to opt-out your websites. Then I'll explain why I don't like the service. Block Their Robot TurnItIn scans the […]

Continue reading →

Where is the original "Overview of SHARD" paper?


A citation in a modern paper.

One thing I'm finding extremely frustrating in academia is the number of people citing papers which don't seem to actually exist. As part of a data analytics class, I'm learning about "database sharding". That is, the process of splitting data between multiple machines. But where does the term come from? Wikipedia - the source of […]

Continue reading →

What's the origin of the phrase "Big Data Doesn't Fit In Excel"?


Slide saying "It Doesn't fit in Excel".

Welcome to Yak Shaving School! As part of my MSc I'm reading a book about Data Analytics. So I've been chasing down quotes to find their origin. One paper had this popular quote in it (emphasis added): As with many rapidly emerging concepts, Big Data has been variously defined and operationalized, ranging from trite proclamations […]

Continue reading →

If you don't sell it, I can't buy it


A book towering above some flames.

I don't understand the world of academic publishing. Incredibly niche books, some barely longer than a novella, are sold for ridiculously high prices. Or, worse than that, they're not sold at all. Let me explain. A friend of mine recommended an obscure book, published a couple of years ago. The blurb made it look right […]

Continue reading →