Importing IntenseDebate Comment XML into Commentics
This is ridiculously niche. If this is of help to anyone other than to me... please shout!
The IntenseDebate comment system is slowly dying. It hasn't received any updates from Automattic for years. Recently it stopped being able to let users submit new comments.
So I've switched to Commentics which is a self-hosted PHP / MySQL comment system. It's lightweight, pretty good at respecting users' privacy, and very customisable. But it doesn't let you easily import comments. Here's how I fixed that.
Export From IntenseDebate
Go to your site's dashboard and click export. They'll email you a link when the process has finished. It's an XML file which looks something like this:
XML<?xml version="1.0" encoding="UTF-8"?>
<output>
<blogpost>
<url>https%3A%2F%2Fopenbenches.org%2Fbench%2F123</url>
<title>The Page Title</title>
<guid>https://openbenches.org/bench/123</guid>
<comments>
<comment id='123456798' parentid='0'>
<isAnon>0</isAnon>
<name><![CDATA[Terence Eden]]></name>
<email>terence.eden@example.com</email>
<url>https://example.com</url>
<ip>198.51.100.123</ip>
<text><![CDATA[You can read more about the song at Wikipedia <a href="https://en.wikipedia.org/wiki/Pokarekare_Ana" target="_blank">https://en.wikipedia.org/wiki/Pokarekare_Ana</a> ]]></text>
<date>2017-08-01 14:51:55</date>
<gmt>2017-08-01 14:51:55</gmt>
<score>1</score>
</comment>
</comments>
</blogpost>
...
Understand the Commentics Table Structure
Once you've installed and configured Commentics, you will be able to replace the database with your old comments. To do that, you'll need to convert your exported XML file into three CSV files.
there are 3 tables you need to understand.
Users Table
This is the easiest one to understand. The columns are:
id, avatar_id, avatar_pending_id, avatar_selected, avatar_login, name, email, moderate, token, to_all, to_admin, to_reply, to_approve, format, ip_address, date_modified, date_added
Hopefully they're self-explanatory. The moderate
is always set to default
. The token
can be any random string. A typical user will look like this:
2, 0, 0, , , Terence Eden, terence.eden@example.com, default, abc123, 1, 1, 1, 1, html, 127.0.0.1, 2017-08-01 14:51:55, 2017-08-01 14:51:55
Note The user's URL is not part of this table! That confused me at first. It is part of the comment
table.
Pages Table
Every comment is associated with a page. Therefore every page needs a table.
id, site_id, identifier, reference, url, moderate, is_form_enabled, date_modified, date_added
Again, pretty straightforward. A typical page looks like:
123, 1, openbenches.org/bench/123, OpenBenches - Bench 123, https://openbenches.org/bench/123, default, 1, 2023-07-16 21:29:52, 2023-07-16 21:29:52
The id
is a unique number. I've set it to be the same as my page's actual ID - but it doesn't need to be.
Comments Table
This is a slightly cumbersome table:
id, user_id, page_id, website, town, state_id, country_id, rating, reply_to, headline, original_comment, comment, reply, ip_address, is_approved, notes, is_admin, is_sent, sent_to, likes, dislikes, reports, is_sticky, is_locked, is_verified, number_edits, session_id, date_modified, date_added
You can ignore most of them - unless you really want to record someone's home town - and a typical user looks like this:
4, 17, 123, , , , , , , , Wow! That's a great bench. And I imagine the view must be special too. , Wow! That's a great bench. And I imagine the view must be special too., , 127.0.0.1, 1, Moderating all comments., 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, abc123, 2017-08-10 17:56:50, 2017-08-10 17:56:50
The id
is unique per comment. The user_id
is matched to the id
on the users
table. And the page_id
is the id
on the pages
table. notes
appears to be hardcoded to Moderating all comments.
Horrible evil no-good Python
Here's a Python script I made after a few beers. It ingests the XML file and spits out 3 CSV files.
Python 3import csv
import random
import string
import xml.etree.ElementTree as ET
tree = ET.parse('IntenseDebate.xml')
root = tree.getroot()
comment_header = ["id","user_id","page_id","website","town","state_id","country_id","rating","reply_to","headline","original_comment","comment","reply","ip_address","is_approved","notes","is_admin","is_sent","sent_to","likes","dislikes","reports","is_sticky","is_locked","is_verified","number_edits","session_id","date_modified","date_added"]
i = 1
with open('comments_to_import.csv', 'w') as file:
writer = csv.writer(file)
# writer.writerow(comment_header)
for child in root:
for children in child:
if (children.tag == "guid") :
guid = children.text
page_id = int(''.join(filter(str.isdigit, guid)))
if (children.tag == "comments") :
for comments in children :
commentID = comments.attrib["id"]
for commentData in comments :
if (commentData.tag == "name") :
username = commentData.text
if (commentData.tag == "text") :
commentHTML = commentData.text
if (commentData.tag == "url") :
userurl = commentData.text
if (commentData.tag == "date") :
commentDate = commentData.text
i = i+1
digits = random.choices(string.digits, k=10)
letters = random.choices(string.ascii_lowercase, k=10)
sample = random.sample(digits + letters, 20)
session_id = "".join(sample)
row = [i,i,page_id,userurl,"","","","","","",commentHTML,commentHTML,"","127.0.0.1","1","Moderating all comments.","0","1","0","0","0","0","0","0","0","0",session_id,commentDate,commentDate]
writer.writerow(row)
pages_header = ["id","site_id","identifier","reference","url","moderate","is_form_enabled","date_modified","date_added"]
with open('pages_to_import.csv', 'w') as file:
writer = csv.writer(file)
# writer.writerow(pages_header)
for i in range(30000):
row = [i,"1","openbenches.org/bench/" + str(i),"OpenBenches - Bench " + str(i),"https://openbenches.org/bench/" + str(i),"default","1","2023-07-16 21:29:52","2023-07-16 21:29:52"]
writer.writerow(row)
users_header = ["id","avatar_id","avatar_pending_id","avatar_selected","avatar_login","name","email","moderate","token","to_all","to_admin","to_reply","to_approve","format","ip_address","date_modified","date_added"]
i = 1
with open('users_to_import.csv', 'w') as file:
writer = csv.writer(file)
# writer.writerow(users_header)
for child in root:
for children in child:
if (children.tag == "guid") :
guid = children.text
page_id = int(''.join(filter(str.isdigit, guid)))
if (children.tag == "comments") :
for comments in children :
commentID = comments.attrib["id"]
for commentData in comments :
if (commentData.tag == "name") :
username = commentData.text
if (commentData.tag == "url") :
userurl = commentData.text
if (commentData.tag == "date") :
commentDate = commentData.text
i = i+1
digits = random.choices(string.digits, k=10)
letters = random.choices(string.ascii_lowercase, k=10)
sample = random.sample(digits + letters, 20)
session_id = "".join(sample)
row = [i,"0","0","","",username,"","default",session_id,"1","1","1","1","html","127.0.0.1",commentDate,commentDate]
writer.writerow(row)
Note my pages all follow a numeric sequence /1
, /2
etc, hence the loop to quickly regenerate them. Your pages may be different.
Importing
You will need to use PHPmyAdmin or a similar database manager. TRUNCATE
the tables for pages
, users
, and comments
. Then import the CSV files into each one.
If everything works, you will have all your old comments imported into your new system.
Enjoy!