Importing IntenseDebate Comment XML into Commentics


This is ridiculously niche. If this is of help to anyone other than to me... please shout!

The IntenseDebate comment system is slowly dying. It hasn't received any updates from Automattic for years. Recently it stopped being able to let users submit new comments.

So I've switched to Commentics which is a self-hosted PHP / MySQL comment system. It's lightweight, pretty good at respecting users' privacy, and very customisable. But it doesn't let you easily import comments. Here's how I fixed that.

Export From IntenseDebate

Go to your site's dashboard and click export. They'll email you a link when the process has finished. It's an XML file which looks something like this:

 XML<?xml version="1.0" encoding="UTF-8"?>
<output>
    <blogpost>
        <url>https%3A%2F%2Fopenbenches.org%2Fbench%2F123</url>
        <title>The Page Title</title>
        <guid>https://openbenches.org/bench/123</guid>
        <comments>
            <comment id='123456798' parentid='0'>
                <isAnon>0</isAnon>
                <name><![CDATA[Terence Eden]]></name>
                <email>terence.eden@example.com</email>
                <url>https://example.com</url>
                <ip>198.51.100.123</ip>
                <text><![CDATA[You can read more about the song at Wikipedia <a href="https://en.wikipedia.org/wiki/Pokarekare_Ana" target="_blank">https://en.wikipedia.org/wiki/Pokarekare_Ana</a>  ]]></text>
                <date>2017-08-01 14:51:55</date>
                <gmt>2017-08-01 14:51:55</gmt>
                <score>1</score>
            </comment>
        </comments>
    </blogpost>
    ...

Understand the Commentics Table Structure

Once you've installed and configured Commentics, you will be able to replace the database with your old comments. To do that, you'll need to convert your exported XML file into three CSV files.

there are 3 tables you need to understand.

Users Table

This is the easiest one to understand. The columns are:

id, avatar_id, avatar_pending_id, avatar_selected, avatar_login, name, email, moderate, token, to_all, to_admin, to_reply, to_approve, format, ip_address, date_modified, date_added

Hopefully they're self-explanatory. The moderate is always set to default. The token can be any random string. A typical user will look like this:

2, 0, 0, , , Terence Eden, terence.eden@example.com, default, abc123, 1, 1, 1, 1, html, 127.0.0.1, 2017-08-01 14:51:55, 2017-08-01 14:51:55

Note The user's URL is not part of this table! That confused me at first. It is part of the comment table.

Pages Table

Every comment is associated with a page. Therefore every page needs a table.

id, site_id, identifier, reference, url, moderate, is_form_enabled, date_modified, date_added

Again, pretty straightforward. A typical page looks like:

123, 1, openbenches.org/bench/123, OpenBenches - Bench 123, https://openbenches.org/bench/123, default, 1, 2023-07-16 21:29:52, 2023-07-16 21:29:52

The id is a unique number. I've set it to be the same as my page's actual ID - but it doesn't need to be.

Comments Table

This is a slightly cumbersome table:

id, user_id, page_id, website, town, state_id, country_id, rating, reply_to, headline, original_comment, comment, reply, ip_address, is_approved, notes, is_admin, is_sent, sent_to, likes, dislikes, reports, is_sticky, is_locked, is_verified, number_edits, session_id, date_modified, date_added

You can ignore most of them - unless you really want to record someone's home town - and a typical user looks like this:

4, 17, 123, , , , , , , , Wow! That&#039;s a great bench. And I imagine the view must be special too. , Wow! That&#039;s a great bench. And I imagine the view must be special too., , 127.0.0.1, 1, Moderating all comments., 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, abc123, 2017-08-10 17:56:50, 2017-08-10 17:56:50

The id is unique per comment. The user_id is matched to the id on the users table. And the page_id is the id on the pages table. notes appears to be hardcoded to Moderating all comments.

Horrible evil no-good Python

Here's a Python script I made after a few beers. It ingests the XML file and spits out 3 CSV files.

Python 3 Python 3import csv
import random
import string

import xml.etree.ElementTree as ET
tree = ET.parse('IntenseDebate.xml')
root = tree.getroot()

comment_header = ["id","user_id","page_id","website","town","state_id","country_id","rating","reply_to","headline","original_comment","comment","reply","ip_address","is_approved","notes","is_admin","is_sent","sent_to","likes","dislikes","reports","is_sticky","is_locked","is_verified","number_edits","session_id","date_modified","date_added"]

i = 1

with open('comments_to_import.csv', 'w') as file:
    writer = csv.writer(file)
    # writer.writerow(comment_header)  
    for child in root:
        for children in child:
            if (children.tag == "guid") :
                guid = children.text
                page_id = int(''.join(filter(str.isdigit, guid)))
            if (children.tag == "comments") :
                for comments in children :
                    commentID = comments.attrib["id"]
                    for commentData in comments :
                        if (commentData.tag == "name") :
                            username = commentData.text
                        if (commentData.tag == "text") :
                            commentHTML = commentData.text
                        if (commentData.tag == "url") :
                            userurl = commentData.text
                        if (commentData.tag == "date") :
                            commentDate = commentData.text
                            i = i+1
                            digits  = random.choices(string.digits, k=10)
                            letters = random.choices(string.ascii_lowercase, k=10)
                            sample  = random.sample(digits + letters, 20)
                            session_id  = "".join(sample)
                            row = [i,i,page_id,userurl,"","","","","","",commentHTML,commentHTML,"","127.0.0.1","1","Moderating all comments.","0","1","0","0","0","0","0","0","0","0",session_id,commentDate,commentDate]
                            writer.writerow(row)

pages_header = ["id","site_id","identifier","reference","url","moderate","is_form_enabled","date_modified","date_added"]
with open('pages_to_import.csv', 'w') as file:
    writer = csv.writer(file)
    # writer.writerow(pages_header)
    for i in range(30000):
        row = [i,"1","openbenches.org/bench/" + str(i),"OpenBenches - Bench " + str(i),"https://openbenches.org/bench/" + str(i),"default","1","2023-07-16 21:29:52","2023-07-16 21:29:52"]
        writer.writerow(row)

users_header = ["id","avatar_id","avatar_pending_id","avatar_selected","avatar_login","name","email","moderate","token","to_all","to_admin","to_reply","to_approve","format","ip_address","date_modified","date_added"]
i = 1
with open('users_to_import.csv', 'w') as file:
    writer = csv.writer(file)
    # writer.writerow(users_header)
    for child in root:
        for children in child:
            if (children.tag == "guid") :
                guid = children.text
                page_id = int(''.join(filter(str.isdigit, guid)))
            if (children.tag == "comments") :
                for comments in children :
                    commentID = comments.attrib["id"]
                    for commentData in comments :
                        if (commentData.tag == "name") :
                            username = commentData.text
                        if (commentData.tag == "url") :
                            userurl = commentData.text
                        if (commentData.tag == "date") :
                            commentDate = commentData.text
                            i = i+1
                            digits  = random.choices(string.digits, k=10)
                            letters = random.choices(string.ascii_lowercase, k=10)
                            sample  = random.sample(digits + letters, 20)
                            session_id  = "".join(sample)
                            row = [i,"0","0","","",username,"","default",session_id,"1","1","1","1","html","127.0.0.1",commentDate,commentDate]
                            writer.writerow(row)

Note my pages all follow a numeric sequence /1, /2 etc, hence the loop to quickly regenerate them. Your pages may be different.

Importing

You will need to use PHPmyAdmin or a similar database manager. TRUNCATE the tables for pages, users, and comments. Then import the CSV files into each one.

If everything works, you will have all your old comments imported into your new system.

Enjoy!


Share this post on…

  • Mastodon
  • Facebook
  • LinkedIn
  • BlueSky
  • Threads
  • Reddit
  • HackerNews
  • Lobsters
  • WhatsApp
  • Telegram

What are your reckons?

All comments are moderated and may not be published immediately. Your email address will not be published.

Allowed HTML: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong> <p> <pre> <br> <img src="" alt="" title="" srcset="">