How to Dismantle Knowledge of an Atomic Bomb


A confused little cardboard robot is lost amongst the daisies

The fallout from Meta's extensive use of pirated eBooks continues. Recent court filings appear to show the company grappling with the legality of training their AI on stolen data. Evidence shows an employee asking if what they're doing it legal? Will it undermine their lobbying efforts? Will it lead to more regulation? Will they be fined? And, almost as an afterthought, is this fascinating snippet: If we were to use models trained on LibGen for a purpose other than internal evaluation, we…

Continue reading →

GitHub's Copilot lies about its own documentation. So why would I trust it with my code?


Me asking Copilot how I switch it off. Copilot responds with a link.

In the early part of the 20th Century, there was a fad for "Radium". The magical, radioactive substance that glowed in the dark. The market had decided that Radium was The Next Big Thing and tried to shove it into every product. There were radioactive toys, radioactive medicines, radioactive chocolate bars, and a hundred other products. The results weren't pretty. In the early part of the 21st Century, there was a fad for "AI". The magical, Artificial Intelligence which provided all the…

Continue reading →

LLMs are good for coding because your documentation is shit


A pet cat typing on a computer keyboard.

That's it. That's the post. Fine! I'll expand a little more. Large Language Models are a type of Artificial Intelligence. They can read text, parse it, process it using the known rules of English, and then regurgitate parts of it on demand. This means they can read and parse a question like "In Python, how do I add two numbers together?" and then read and parse the Python documentation. It will produce an output like: What happens if you search the official Python documentation for…

Continue reading →