Commenting system in weblogs

A weblog is called a weblog if it contains these basic features:

  • Content, either syndicated or from your brain
  • Permanent hyperlinks, often abbreviated as permalink
  • Daily, monthly, yearly, and/or categorical archives
  • RSS and/or Atom subscription feeds
  • Commenting system

Without them, the overall blogging experience is not complete.

This means my weblog is not complete because a decent weblog commenting system is nowhere to be seen at all. No, it's not Blogger's fault. Just that I disable comments on purpose. Why? This is a question that makes me think for days and I'm unable give a simple answer. It evolves into a whole list of questions which keep popping up in my head and overload my memory capacity. So, here I finally open up my mind and release them to the world wild web.

The commenting system, explained. Sort of.

What is a commenting system?

Generally, it's a posting system, not very much different from forums, guestbooks, tagboards and weblog scripts. A weblog script such as WordPress lets you post anything you like on your personal blog. A tagboard is more simplified with character limitations, IP address blocking, blacklists and words censoring. A guestbook is a bit like a tagboard but with less character limitations, additional archiving system and better administration interface. A forum such as phpBB is more complicated with sub-forums, avatar customisation, file uploading, member control panels, built-in search engine and more.

A weblog commenting system is like somewhere in between a tagboard and a guestbook. Its purpose is for readers to provide feedback specifically to your weblog posts, which also acts as a conversational tool and encourages exchange of ideas between the author and the reader.

I know there are other non-weblog sites that have commenting systems out there, such as A List Apart. Anyway, I will focus only on weblogs and some points mentioned here might apply to those sites as well, vice versa.

How do I comment on a weblog post?

There are three ways.

  • Use the comment interface readily provided by most weblog sites. It's a simple form where you put in your details such as name, e-mail address, web site address and your message, then submit. Within a certain time, your comment will appear there. Done.

  • Use Trackback or Pingback, which helps much to form the relationship of blog-to-blog commentary. Though both works in a slightly different manner, each provides an automated way that encourage folks to write comments on their own blogs, instead. It also serves a good reason for them to start their own blog and get involved in the blogosphere.

  • Technorati cosmos is another way, to track all various blogs linking to your blog post. One of the blogs I know that implements this is Tantek's log. Pretty neat.

Other possible ways would be alternative forms of communication, maybe e-mail, instant messaging or forums.

How should the markup looks like for comments listing?

A very common markup is:

<div class="comment">
<p>time and date, commentator commented:</p>
<p>comments</p>
</div>
...

Though the div tag acts as a container for the comment, it doesn't sound semantical enough. Since the comments are in the form of a list, we should use the list module:

<ol class="comments-list">
<li class="comment">
<p>time and date, commentator commented:</p>
<p>comments</p>
</li>
...
</ol>

The above example uses an ordered list, ol. Unordered list, ul, may be used too, as quoted by Anne van Kesteren:

If you don't like the idea of chronologically ordering comments or you think comments can be in reply to multiple other comments you can use the unordered list element.

http://annevankesteren.nl/2005/04/comment-markup

When I look at the markup, I could sense there's something missing, as there are no elements to contain the posted comments. Then, I thought if blockquote can be used to quote the comments:

<ol class="comments-list">
<li class="comment">
<p>time and date, commentator commented:</p>
<blockquote><p>comments</p></blockquote>
</li>
...
</ol>

But later, I discovered that Henrik Lied said, it's an incorrect usage of blockquote:

This is absolutely not the correct use of BLOCKQUOTE. The HTML 4.01 specification explains the usage of BLOCKQUOTE as 'Long quotations from external sources.'.

The text specified in a on-site comment isn't a quotation from an external resource. This means that it's unsemantic to wrap your comments in a BLOCKQUOTE-element.

http://misinterpreted.org/archives/2005/04/11/get-semantic

Alright then. It may not be correct for on-site comments but for off-site comments such as trackbacks and pingbacks, it's suitable. Two types of code now:

<ol class="comments-list">

<!-- on-site comments -->
<li class="comment">
<p>time and date, commentator commented:</p>
<p>on-site comments</p>
</li>

<!-- off-site comments -->
<li class="comment">
<p>time and date, commentator or weblog name commented:</p>
<blockquote cite="URL to the cited web page"><p>off-site comments</p></blockquote>
</li>

...
</ol>

Besides this, definition lists also can be considered as another option. The HTML 4.01 specification stated that it is also for marking up dialogues:

<dl class="comments-list">
<dt>time and date, commentator commented:</dt>
<dd><p>comments</p></dd>
...
</dl>

However, the current Web Applications 1.0 working draft stated otherwise:

The dl element is inappropriate for marking up dialogue, since dialogue is ordered (each speaker/line pair comes after the next).

http://whatwg.org/specs/web-apps/current-work/#the-dl

So, definition list is out and ordered list would be my preferred choice, for now. Talking about this, I might be curious if there are any microformats for comments? From the Microformats wiki site, I found Andy Smith's mfComment draft. Following the examples, I came up with this:

<ol class="comments-list">

<!-- on-site comments -->
<li class="mfcomment">
<p><abbr class="dtcommented" title="20050630T18:00+0800">6.00PM, 30 June 2005</abbr>, <a href="http://tantek.com/" class="commentator url fn">Tantek</a> commented:</p>
<div class="description"><p>Some comments here.</p></div>
</li>

<!-- off-site comments -->
<li class="mfcomment">
<p><abbr class="dtcommented" title="20050630T19:00+0800">7.00PM, 30 June 2005</abbr>, <a href="http://tantek.com/log/" class="commentator">Tantek's Thoughts</a> commented:</p>
<blockquote class="description" cite="http://tantek.com/log/"><p>Some comments from Tantek's site.</p></blockquote>
</li>

...
</ol>

As I check through the dictionary, the word 'commenter' does not exist. It's supposed to be 'commentator'. Basically, it features few similarities to hReview, reusing hCard and hCalendar. Human-readable and ISO8601 dates are presented with the abbr element.

By the way, I use Tantek's name and site URL above because I admire his work a lot. The reason I provide nearly similar markup and interface for comments, trackbacks and/or pingbacks is that he once mentioned, a comment is a comment is a comment. Think about it.

What is comment spam? How do I prevent them?

Simple. It's like e-mail spam. Bots or humans run around your site and submit evil links to your comments section. It's a fact of a blogger's life.

There are many ways to combat comment spam. Wordpress has a good documentation on this topic. Few of the most used techniques are comment moderation, spam detection algorithms, blacklists, members registration, Google's rel="nofollow" codes, Captcha authentication tests, and Bayesian filters. These techniques have advantages and disadvantages, in terms of efficiency, usability and accessibility. Comments moderation needs manual inspection, blacklists needs constant maintenance, members registration is too sloppy, rel="nofollow" code is unfair in certain cases, Captcha tests are inaccessible to visually-impaired users, and Bayesian filters need time to learn. Also be aware that there are Captcha-decoding software available out there.

I've also look out for other techniques such as Jason Calacanis's Star System and Eric Meyer's WP-Gatekeeper. Both are worth a try though. Besides, please keep in mind that spammers will get smarter and smarter every second. Now and then, the battle will never stop. It goes on as long as you try keeping your blog comments clean and unharmed. Even Keith Robinson clarifies:

It seems that no matter how clever the solution, spam still gets through some how. Spammers are simply smarter than I am. It just kills me that people are taking advantage of my hard work.

http://7nights.com/asterisk/archive/2004/06/comment-system-dilemma

Sounds like a lot of work? You bet.

Could people abuse the commenting system?

Of course. The comments section may suddenly turn into a tagboard or an instant messaging interface, like ICQ or IRC, when your readers start to post off-topic comments that were irrelevant to the scope of your blog article. Conversations become uncontrollable as commentators reply to one another till you don't even understand what they're talking about. They may cause flames involving third parties, include hate messages, and say discouraging words. This is one of the problems that sometimes happen in popular forums. The difference is forums have moderators and administrators to prevent this from happening in the first place, however on your blog, you're alone, unless your blog is a collaborative multiple-author content management system.

Therefore, it's wise to set the commenting system to limit this by automatically or manually disabling the comments after a period of time, maybe 7 days or a month. Would you still be expecting feedback on a 10-month-old article? Forced preview would be great to prevent accidents like mistyped words or double posting. It'll be an extra step to evaluate the posted content, probably run it through spam words detection and HTML validation scripts. If your blog publishes socio-political issues, comment moderation would be critical.

Should I enable HTML input for comments?

Yes and no.

Enabling HTML is useful to include semantics into comment posts. Elements such as blockquote, strong, em, a, pre, code, ul, ol and li are necessary in particular situations, especially for web standards enthusiasts. Unfortunately, there are risks in this implementation. Some people might put malicious Javascript links and submit malformed markup that could mess up your site. For XHTML-validated sites, tag soup is definitely a no-no. Just imagine if someone submits this:

<i>Hi, <b />I am <a href="http://spam.com/ title='some weird text > plus few characters & more">spam</a></i/> man.</b>

Or perhaps a little more advanced...

<div style="position: absolute; top: 0; left: 0; width: 100%; height: 100%; font-size: 5em;">I've hacked your blog! Hahaha!</div>

Of course, none of these could happen in most matured blogging systems, thanks to solutions like Ulf Harnhammar's kses and Simon Willison's SafeHtmlChecker.

Another point that needs consideration is your target audience. How are they going to comment if they don't have any or have partial knowledge of HTML? I always see certain cases when commentators forgot to use the pre or code tags for computer codes resulting it contained under a p tag instead. For message boards, BBCode emerges as a simpler alternative to HTML. Nevertheless, commentators still have to learn it and not many will actually spend so much time on it just to leave a comment.

Some blog owners, due to fear of abuse and spam, have disabled HTML for comments. Line breaks are automatically converted to br or p tags. Since commentators cannot use a tags, they have to type the whole URL which later converted into clickable links. Long URLs will break the layout unless truncated or overflow: hidden is applied to crop them. List items are coded with br tags instead of li tags. No more structured text to indicate emphasis, citation, code fragments and instances. Semantics are gone. This is purely unacceptable.

In this context, HTML plays an important role to achieve structural markup and keep content in control. Thus, I shift my focus to humane text markup languages which are mostly devised by blog authors, probably following the concept of separation between content and markup. Dean Allen's Textile and John Gruber's Markdown are two good examples. Both convert simple text and characters to HTML, quite familiar to how we compose e-mail messages. Simple, clean and no codes at all.

There are two levels of users here, those either know HTML or don't know at all. When composing in a comment form, HTML-literates might be wondering if they should code p tags for paragraphs instead of inserting double line breaks, or encode HTML entities like '&' to '&amp;', especially for URLs. If HTML is disabled, all the codes they've typed will display gibberish characters in the comments. As an aside, HTML-illiterates might get confused why their line breaks don't change to br elements.

A comment posting guideline would be useful to inform commentators on such issues. Which tags and attributes can be used? Does it use a text markup language? Or simply no HTML is allowed?

An idea worth toying around was mentioned by Richard Allsebrook:

A basic and advanced comment option. Basic allows NO hand coded markup and squirts it out as a <pre> tag. The Advanced option would allow markup (which would be validated before being accepted). That way you would keep most level of user happy.

http://7nights.com/asterisk/archive/2004/02/random-acts-of-validation#comment21

Hmm, maybe a little too complex, I think. I never see any blogs implement this idea yet.

What's next? WYSIWYG?

Do comments have permanent links?

Now we see two permanent links. One is for your weblog post, the other is for individual comments. The reason to include permanent links for comments is to allow direct references. You might want to quote a commentator's words and point to that specific comment.

To make this works, numeric IDs are applied to each comment, for example:

<ol class="comments-list">
<li id="comment-1">...</li>
<li id="comment-2">...</li>
<li id="comment-3">...</li>
<li id="comment-x">...</li>
...
</ol>

Or...

<ol class="comments-list">
<li id="comment-99">...</li>
<li id="comment-105">...</li>
<li id="comment-121">...</li>
<li id="comment-x">...</li>
...
</ol>

What's the difference? The first example shows that the comments are assigned with IDs unique to that particular blog article. The second shows the comments assigned with IDs unique to the whole blog archives. The first comment for a blog article might be the 99th comment for the whole blog. For some remotely-hosted weblogs, assigned IDs are unique to the whole system, which means all hosted blogs!

Permanent links for comments are not like blog posts'. They append to a blog post's permanent link in the form of /2005/07/post-title#comment-1. The numerals of the IDs may not be significant to indicate their uniqueness. Just don't complicate it to this, #comment-013514167121. Unnecessary, in my opinion.

How do I reply to comments?

Let's say 10 people commented on a blog post and you're going to reply each of them one by one. How are you going to do that? The most common technique is to type the name of the commentator, followed by a semi-colon or comma and your reply:

commentator #1: bla bla

commentator #2: bla bla

commentator #3: bla bla

commentator #4: bla bla

commentator #5: bla bla

...

You have to do this because this posted reply will appear at the very last part of the comments section, so the name of the commentator is required so that readers know who and which comment you're replying to.

Another more semantical way is to use blockquote to quote the text typed by the commentator and append your reply below it, much like how you reply e-mail messages. A little help with the cite attribute might do the trick:

<blockquote cite="#comment-x">quoted text from commentator #x</blockquote>
<p>your reply</p>

I guess this will get very complicated if there are hundreds of comments.

Basically, it's a flat commenting system. Not a threaded one. If it's threaded, your reply may appear directly under the commentator's post, styled with some left margin to indicate that it's a child element. One very good example is Slashdot's powerful comment interface. Unfortunately, the way threaded comments display will make your site looks a bit bloated and takes up a lot of space.

Besides flat and threaded comments system, there is one more called relational comment system, introducing Dunstan Orchard's wonderful blog. The concept is quite innovative, providing a visual guide to comments navigation. You can alert a specific comment and able to see comments that are inspired by others. Few Javascript and CSS tricks make the comments easily recognisable from its child, parent or focused state.

On a related matter, there is one usability problem on comments. Imagine a list of 50 comments and the comment form located at the very bottom of the page. Let's say you want to reply to the 12th comment or to quote a text from the article above, then for sure you'll be scrolling back and forth, strenuously trying to refer to those sections.

Popup comments can help to solve this problem by staying fixed on the screen while you scroll up or down the article. Thus this enables commentators to read and write at the same time. But nowadays, popups are not favourable anymore due to ever rising popup-blocking technologies. So, we enter fixed positioning for comment forms, a technique found by Jonathan Snook and implemented as part of his 78th redesign. For those who want to implement the functionality right away, Derek Featherstone's dockable comments is highly recommended too. Copy and paste hasn't been so much easier.

Is it useful for comments to have its own separate subscription feed?

I am a reader of your blog. I post a comment and later surf to another web site. If I were to wonder if you, the blog author or any other commentators have replied to my comment, I would have to check and load your site again. But what if I've left my comments for 50 weblogs? Should I check them one by one all over again? Ridiculous.

Email notification is one of the solution, but rather impractical because it would throw junk into my inbox. Comment spam might eventually turn into e-mail spam! Another solution is RSS or Atom feed for them. Simply subscribe to the comments feed and you'll be able to keep track of any comments of a specific post or blog site. The irony now is would anyone do that? For either per-post or per-blog comments feed, it feels a bit weird for anyone to add them into their personalised feed readers such as FeedDemon, NetNewsWire, Bloglines or Firefox's built-in Live Bookmarks.

I would be very interested to know if any research or study has been conducted on subscriptions of comment feeds. For me, I certainly wouldn't subscribe to any comment feeds. There are over 100 subscriptions in my Bloglines and another 30 Live Bookmarks on my Firefox! It's crazy, you know.

Can I have fun modifying, editing or deleting comments?

If someone accidentally made a typing error and has posted the comment, who should fix it? Commentator or blog author? If someone posted an off-topic or offensive comment, should you delete it? Blog owners have the authority to do whatever they like to the comments left by anyone. If you found that someone has criticised your blog article and made you angry, you can delete his comments and ban him right away!

It may be fun to have such administrative power to control your blog comments but there is a side effect. From Gervase Markham's blog, I learnt about the limitations in comments on Bugzilla. Surprisingly, comments are permanent. Gervase reasoned:

Editing comments is a bad idea because it is altering history. You said that stuff and, even if you wish you hadn't, it had an effect on things other people had said. If you revise what you said, and they don't, then there's potential for confusion and inconsistency.

http://weblogs.mozillazine.org/gerv/archives/006449.html

Wow, this time we're talking history. His points are very good in the sense that we should preserve historical elements as what we've learn in school. When you alter your comments, other commentators will get confused not knowing why you delete them, when the post was deleted and what was posted before. Replies to deleted comments will make the situation more confusing and get people lost. Yet, I'm not very sure if there are any conceptual differences between the commenting of a bug tracking system and a weblog system.

On the commentator's side, do you think that they should have the option to edit their own comments? If you think that disallowing comment editing is too restrictive and possibly want to enable it, please review Ryan Brill's methodologies before diving in. Different perspectives will affect the result of comments management. How are you going to control what and when the comment should be edited? Can you edit a comment even after it has been read by someone else? Could such changes affect the flow of this conversational comments thread? Hmm...

One idea that pops up in my brain is to alter the comments semantically, using ins and del tags. For example to delete a comment, the markup would be like this:

<ol class="comments-list">
<li><del datetime="date and time of deletion">
commentator name, time, date and comments
</del></li>
...
</ol>

With CSS, you may either fade the colours of the comment item or apply display: none to it. You can also tidy up a comment:

...
<p>Your article is so <del datetime="date and time of deletion" cite="#comment-x">intersting</del> <ins datetime="date and time of insertion" cite="#comment-x">interesting</ins> and informative!</p>
...

Note that the spelling error has been fixed. In addition to the cite attributes, it can be used to designate another posted comment to explain the changes made. Cool. Anyone dare to take a step forward putting this into practice?

Whoa, do I need to care so much about this commenting stuff?

You have the choice to ignore everything you read here. Go on and enable comments for your blog. I'm not stopping anyone. Mind you, this article is only a set of possible questions and answers, not a complete hitchhiker's guide. I may not have much experience in handling comments, but I do learn from other people's weblogs.

And I care.

The end. Thanks for reading.