Why I built a static site generator

23rd November 2024

“Don’t re-invent the wheel” has its place. The problem is when you over-generalise the sentiment - you end up ignoring the problems simply because solving them is, itself, a bigger problem.

On Thursday, I finally gave up with the system I had used to generate this site previously. I was using a Python script which I had written to effectively take the human-readable MarkDown files I had written and batch convert them over to a machine-readable HTML format for posting on the web. This is, strictly speaking, a static site generator. It generates a series of HTML web-pages which your browser can process, which don’t require a server at all (apart from to actually transfer the files across the web). However, for things like the list of latest blogs on the home page, I had to manually write those by hand, which quite often resulted in mistakes and broken links on the site. There was a lot to remember to do before I could publish a blog. My process looked like this:

Write a blog you’re happy with
Add in things like the nav bar and title using MarkDown
Use the Python script to convert the files over to HTML
Edit the home page to have a link to the new blog
Edit the category page to have a link to the blog
Manually add a new item to the RSS feed so that people who subscribe get a ping
Push to GitHub for the automatic build and deployment process to be handled
After a minute or two, the update is visible

There were so, so many problems with this system. Foremost, there were so many manually edited files where things kept going wrong. I have found broken links on the site which have sat there undetected for months, which is the sort of thing that keeps me up at night! Secondly, I am a very big fan of semantic separation. When I write a blog in MarkDown, I shouldn’t have to handle things like the navbar etc: that should be handled during the compilation process. Also, I wasn’t happy with the organisation system. My old site had 4 main categories of blog:

Walks and outdoors
Thoughts and ideas
Projects
Art and music

When I was picking those categories, I tried to find 4 which reflected me, and the idea always was that I could add more if the need ever arose. But I didn’t want to add a whole new category for just one blog - that wouldn’t feel right. I ended up stuffing most things into “thoughts and ideas” as a catch-all category, that also didn’t feel quite right either. I had seen a lot of sites on the indie web using tags as an architecture for organising posts.

In a tags architecture, posts exist as an independent, disorganised entity. There is a folder which contains every single post, regardless of date, time, topic, length etc. Each post has thematic tags attached to it, and during the build process, the compiler reads these tags, and builds a new page for each one. Every post tagged with the specific tag (say for example posts tagged as “walks”) will be linked from this page. That means that anyone who wants to read about all of my walks needs to go onto the tags page, find the “walks” tag, and click through all the blogs on there.

Tags have another major advantage over my old hierarchical system too: one post can be given multiple tags. If I had, in the past, written a post which fell both into the categories of “walks” and “ideas”, I’d have had to choose which of those categories to file it under. This is why I kept filing things under “thoughts and ideas”, as I specifically kept it as a rather ambiguous category; what is a blog if not a collection of thoughts and ideas, after all?

The wheel

Re-inventing the wheel, then. If what I wanted was a dynamically generated home page and a tags architecture, then I could have used plenty of pieces of open-source software to do that for me.

One of the points I am very much trying to hammer in through my Honours dissertation is that not all software is suitable in all scenarios. I’m focusing on digital Integrated Library Systems (ILS) for home use there, and my own experience over the summer of trying to find a piece of software which would allow me to tag my books without having to set up complex systems which would benefit a large lending library but which are to me, a royal pain.

Indeed, this seems to hold true of static site generators too. What I wanted was something that could: - Batch convert my MarkDown to HTML - Build a dynamic home page etc. - Build a tags architecture - Build the relevant RSS feeds

All I found I could get was something that would do that, but overcomplicated the process and included things like HTTP servers, preview environments, templates, databases, etc. It was too much. I understand that there are people in the world who really benefit from being able to shave a few seconds off their site’s build time, but I am not one of them. I am more than happy to hit build and wait for a few minutes whilst my site does its thing.

All I needed really was something that could quickly feed the relevant files into Pandoc for conversion, and then build a few “pseudo-markdown” files with the dynamic contents, and feed those into Pandoc too.

I started out by defining my design goals for this system:

Write in MarkDown, convert to HTML
MarkDown supports inline HTML
Auto-generate RSS/ATOM feed based on YAML headers
/latest redirects to latest blog automatically
POSSE syndication of notes
Pandoc based build process
Tag-based organisation system

Most of these are implemented, some of them (like /latest and POSSE) I still need to do at some point soon.

I drew this little data-flow chart explaining my build chain, and then went and wrote that in Python and a bit of Bash.

A hand-drawn data diagram which shows lots of connections between nodes representing the processes needed to build the site automatically

And really, this is why I built a static site generator. Not because I could do it better (in fact it currently takes 1 minute 30 seconds to build this site - I think I might have got a loop wrong somewhere), but because I could do it better for me. When I was 400 lines of Python deep, silently lamenting at at Pandoc for not doing what I was expecting (my fault, naturally, I’d messed something up; pandoc is amazing), the available options did look appealing. But I am very glad I persisted, because I was able to sit down and think about what I needed, then build it all in a day.

The problems we work on these days are far more complex than wheels. We have far more moving parts and systems, and sometimes it can take burning them all down and starting right from scratch with the scars and pains which the last system gave you to actually make progress.

Those who know me personally are absolutely sick to death of hearing my catchphrase, “Progression is not always progress”. It means making a change for the sake of making a change is not the way to progress something - it reflects the lack of forethought I see in a lot of decision making. I’d like to propose a spin-off: “Progress is rarely progression”. This phrase means something distinctly different; continuing with the status quo might keep us moving in what feels like the right direction, but that’s no good if what we really need to do is stop and go back to the beginning of the problem.

Tagged as: web programming thoughts indie-web python bash