Of Tags and Categories

Tags. For as long as I used them on this blog (and the ones before, for that matter), they just seem to come naturally. I thought it was time to do some tag cleaning on my blog, and got sucked into Wikipedia. Here are some of the things I’ve learned.

When I searched more information about tags, I ended up with ads and promotions for online SEO courses. What I found was mostly about categorization. Unlike tags in blogs, categories are hierarchical and meant to be exclusive. In the case of blogging, this means posts belong to a common parent, but should break down to an exclusive child category. For example:

- tech
  - emacs
    - org-mode
    - dired
- life
  - hobbies
    - photography

In the example above, a post can be under “Dired” category. This means it belongs in “hobbies,” which belongs in “life”. Similarly, a post labeled “org-mode” fits under “emacs” which goes under “tech”. In both cases, the end-category should be final, making both posts siblings, but different.

While this system works for biology (as linked above), there is a logical paradox here anyone who writes a blog can smell a mile away: what of posts that are both about org-mode and Dired? Or, to make things more obvious, consider a post about org-mode and photograph, using the example above? In the later case, the end-categories don’t even have a common parent. We could, if we wanted, extend the parent categories to their perspective parent (in this case, something like “self” perhaps), but such a category is going to be so broad it’s essentially useless.

This is why we use tags. Tags are mutually inclusive, meaning, more than one can exist with the other. While I couldn’t find a good Wikipedia article or some philosophical explanations for tags, I discovered Karl voit is not only familiar with the topic, but also wrote his thesis about it and even came up with software for tagging.

Karl has more information than I could handle, and I left his paper mostly unread (I was trying to tackle a couple of additional projects as well1). Karl also has a helpful video presentation linked at the post above, as well as a couple of tips for tagging.

At the end of the day (and the end of the research) I decided to mostly stick to my own tagging “philosophy” while adopting some of Karl’s points. Here are my rules:

  1. Aim to have the least amount of tags as possible
  2. Tags should be singular
  3. Tags should be in lowercase
  4. Tags should be single-worded, without spaces.
    1. If this is not possible, use underscore only ( _ )

Besides number two, I think Karl and I mostly see eye to eye. In his video presentation, he advocates for tags being in plural form. He explains this just seems to be what most people do. Personally, I think this doesn’t make sense. In English especially, many words can be “converted” to singular form, but the same is not true for plural form. Photos to photography, books to reading, etc. The other way around does not work. What’s the plural for Emacs…?

The first rule is the most important one. I think at one point or another all bloggers rush to create tags and this leads to messiness. Some tags are duplicates (in my case, “life” and “self”) while others are barely used. The other reason to keep tags to minimum is usability. The idea is to create tags as you go with increasing need. I came up with a simple process: keeping a list of topics I want to open tags for, and keep on going without adding them. When a certain topic repeats three times in different posts, it’s time to open a tag using the rules above. This means tags involve over time. I can start the blog with “Life” but with time new tags will come to play, like photography, hiking, work, etc. At this point, “Life” will still be the parent tag which connects all the others. I believe this helps readers; there’s a choice to read all Life-related posts, or, focus on something specific.

Karl admits that his blog does not follow the first rule. He has a good reason for this: he’s been around for a long time and re-publishing all the posts with altered tags is going to be a serious pain in the behind. I don’t have that many posts. All I had to do is a regex-replacement in Emacs for certain tags in my org file to switch or simply remove tags altogether. ox-hugo and Netify took care of the rest.

By the time you’ll read this post, I’ve corrected the tags on this blog. As usual, feel free to let me know what you think about tags and categories, especially if you blog yourself. Remember, if you use the comments, you can use gitlab or github to sign in, so I know who you are – or stay anonymous2.

Footnotes


  1. During the tagging research, I found out that I’m still publishing this blog on Hugo 71.00, which is some 30 versions behind the current one. This in turn lead to a whole different project I should blog about. In addition, TAONAW’s brother blog, Curious Musings, got some recent tweaking which was fun to discuss over in Mastodon. Have a look! ↩︎

  2. OK, at this point I realize I need to explain the commenting system on this blog. Commento deserves its own post. The more I write, the more stuff there’s to write… ↩︎