Org ID, Org Attach & Better Folder Names

2022-03-15 / emacs

You might have heard of org-mode headers IDs. By default, these are Universally Unique Identifiers (UUIDs). In this post, I want to talk about what they are, why I use them (and you should, too), and how to make them into slugs - human-readable IDs that make sense. This will be a bit of a long explanation of what I discovered in org-mode, so buckle up..!

What are UUIDs and why Should I Care?

The org-mode manual doesn’t exactly point at header IDs unless you know where to look. Because of this, a casual user of org-mode might overlook these powerful organizational tools.

Referencing to a header as itself in a link in org-mode, such as myorfile.org/myproject/subproject-task is risky because we refile headers all the time. The org-mode workflow is built on moving headers around: from capture templates to a generic project to journal entry, these move from their original files and parent headers on a regular basis. This means that any links using a path like the one above will break¹.

For this reason, we have header IDs. They require that we include org-id in our init (as part of org-mode’s modules), and then we can generate them for any header by calling org-id-get-create: “create an ID for the current entry and return it. If the entry already has an ID, just return it.”

By default, org-id-get-create creates a string of random numbers and letters: our UUID. Now, every time we want to reference the header, we link to the UUID instead of the header’s path, like so: [[id:our UUID here][description here]].

This is great, but the problem with these UUIDs is that they are a string of random characters that make no sense to us. There’s no way a header with an ID like 05576976-a33c-11ec-9da6-020017000b7b tells us anything. We will return to this problem in a bit, because first I want to discuss another great (and perhaps overlooked) org-mode feature that makes this even more problematic.

Referencing to files with Org-Attach

Org-mode comes with a built-in attachment mechanism called org-attach. You can summon it with C-c C-a when inside or-mode under a header. Org-attach was one of the features I always knew existed but never used, and I suspect the reason for that is the same for many others.

The problem is that we already have a system to navigate and organize our files ingrained deep into our mind, be it Finder, Windows Explorer, or whatever GUI version is on your Linux distro. For us dedicated Emacs users, there’s of course the excellent Dired, which blows them all out of the water.

What’s more though, org-attach is using some seemingly weird system to store our files deep in folders that don’t make sense to us, and the only way to find them later is to use org-attach again to summon these attachments. At least, that’s what I thought.

Assuming you haven’t tweaked around with your org IDs, org-attach will nest an attachment under three folders. First, the default “data” folder, which is where all the attachments are stored. Second, under it, a folder with the first two characters of the header’s UUID. Then, third, a folder with the rest of that UUID as its name.

This is a bit confusing, so let’s break it down a bit. Again, the following workflow assumes you don’t have any settings affecting org-attach. Launch Emacs -q if you have any doubts:

Go visit an org file, and navigate to one of its headers.
Summon org-attach with C-c C-a. This will bring up a menu. For now, just use the first option, a.
Next, org-attach will ask you which file you want to attach. Navigate to one and select it. Don’t' worry, this will create a copy; the original will stay where it is.
Standing on the header you just worked with, which should now have an :ATTACH: tag, summon org-attach again, but this time call option F, which will open the directory the file is in with Dired.

What you’re going to see is that you’re inside your org file’s folder, inside a data folder, and then inside a weird two-letter folder, and then inside a long string of the rest of the UUID. Something like this: /home/user/orgfiles/data/ab/f4b2cf-4b38-45ec-9333-346b42861d24.

With this default org-attach behavior in place, no wonder you’d prefer to use other methods to store your files. That’s too bad because you’re missing out on org-mode’s excellent ability to organize your projects with their files attached right to them.

This is a huge miss. Think of all the data you can organize this way if you could only make these folders make sense. Well… You guessed it. you can. It just needs a bit of tweaking.

From Org-IDs to Timestamp Slugs

By now, you can probably see where I’m going with this (and if not, you’ve guessed from the sub-title). timestamp org-mode header IDs.

The idea is simple: change how these IDs are created in org-mode by modifying org-id-method. By default, its value is a UUID, which was explained previously. We can change it to ts²: (setq org-id-method 'ts). That’s it. The next time we’ll create an ID using org-id-get-create, it will produce something like 20220315T083403.413614. Still a bit confusing to read, but much better than UUIDs! year, month, date, followed by T for time, and then the current time down to the fraction of the second. I believe it’s broken down to these tiny time fragments to reassure a unique ID.

This will set up our unique IDs as timestamp³, but we still need to configure org-attach to use it. Because org-mode is set up to use UUIDs by default, org-attach is set to create directories that are meant to work with UUID. The functions that create directories for org-attach are defined in another function, org-attach-id-to-path-function-list. Specifically, it points to two functions: org-attach-id-uuid-folder-format and org-attach-id-ts-folder-format. You can go into org-attach.el and see that they break down the folder structure in a pretty straightforward way: the UUID function (which is the one used by default) takes the first two characters of the UUID and makes a parent folder out of those (as seen above), while the ts function takes the first six. The first six characters make sense because they include the year and the month.

By default, if we use the above example of 20220315T083403.413614 as a timestamp, we will get the following directory structure: /home/user/orgfiles/data/20/220315T083403.413614. Not very useful: you will need to keep using org-mode until the year 2100 for a new sub-folder to be created! This is exactly what happened to me, and it required some head-scratching and diving into org-attach to figure out. I tried to mess around with the functions in org-attach directly, but that didn’t go well until someone on IRC pointed me at what I missed: what needs to be changed is org-attach-id-to-path-function-list. It is as simple as changing the order of the functions on this list, so org-attach will know to use the function first.

Together with org-id-method, which we defined above, we can write the whole thing like so:

(setq org-id-method 'ts)
(setq org-attach-id-to-path-function-list
  '(org-attach-id-ts-folder-format
    org-attach-id-uuid-folder-format))

Now when you use org-attach, it will use the ts function and create the following directory (to use the example above): /home/user/orgfiles/data/202203/15T083403.413614. This makes much more sense. You could also build your own function that would look like org-attach-id-ts-folder-format, but perhaps using the first 4 characters, to create a parent directory for the year only. You will just need to make sure your custom function shows up first in org-attach-id-to-path-function-list.

A few Extra Things

I mentioned I used an excellent package called org-super-links. In a nutshell, this package automates creating org-IDs, linking them to an org-header, and creating a backlink from that header to the one linking from it. You should read Karl’s post about it and how he uses it to get a better idea than what I’m letting on here if you haven’t yet. As a matter of fact, if you want to get some more background, read the previous post mentioned above and you’ll see I’ve been trying to change org-IDs into timestamp slugs for a while. So much so, in fact, that I wrote my own function to do that for me until I discovered org has a system built-in already. As it goes with Emacs though, there’s no wrong answer. The previous attempt took me deeper into elisp, which was a fun learning experience in itself.

With org-super-links, the process above is quicker, since I don’t need to bother with org-id-get-create. I just search for the header I want to link to, and everything’s created automatically: an ID for the header I’m on, an ID for the header I’m linking to, a link linking to that ID at my marker, and a backlink at the header I am linking to pointing back to the header I’m on. You’d probably need to read that last one again. The bottom line is that if you’re serious about header IDs, you should probably take a look at that wonderful package.

Another post that’s been floating around since 2016 by Matthew Lee Hinman approaches org-mode IDs from a different angle, that of publishing to HTML. In the post, Hinman explains that if you use IDs in your org-file, you will also benefit when you export it to an HTML file: the header links you will use will link where you need them to go, and that’s even after you moved headers around.

I’m actually using HTML more and more at work when I want to explore my org-mode files into KB articles that go into wikis. This is especially helpful with you include a table of contents: the headers the TOC will connect to will not break if you use IDs. This is maybe a bit more of a niche use, but having a TOC in a how-to article makes a lot of sense, and org-mode creates one for you automatically because it’s amazing like that. I don’t understand why will anyone want to use something like Word and transfer these docx files over to a wiki by copy-pasting and seeing all the spaces and broken bullet lists and…. ugh. OK OK, Emacs is not exactly something your coworkers all know about. But with something like what I just described above, doesn’t it make you feel sorry for them sometimes?

Footnotes

As a matter of fact, I think breaking links to headers and losing information is one of the reasons for org-roam’s popularity. Of course, the new versions come up with a lot more than just linking notes across a huge database, but at its core, this is why people started adopting it. ↩︎
Emacs documentation specify a third method, org: “Org’s own internal method, using an encoding of the current time to microsecond accuracy, and optionally the current domain of the computer. See the variable ‘org-id-include-domain’.” This generates what seems to be a random string of text that is also not human-friendly. I’m not sure about the computer’s domain part, but this might be interesting for folks who have several computers on a domain using emacs. ↩︎
giving headers unique IDs as timestamps is useful also because it leaves “clues” to help locate lost information later. For example, when you create a project with this unique ID, and later on you forgot what it was, you can use the date to clue you in. It gives you another layer of search on your agenda (“202203” for example) to show all the projects created in March, provided you create an ID for each project, which you totally should. Because this is a simple text string, you can also use this outside of Emacs with other scripts to automate tasks that will look for this ID. It opens a world of options now that you have a range of unique IDs that you understand in your head, AKA, slugs. Also, it just looks better. ↩︎