Going Dark (Building A Shinobi Site)

Mon, 18 Apr 2022

I’m revamping my personal website yet again, having found inspiration on a little site called shinobi.website. It seems to be UI designer Bradley Taunt’s latest project, and he has the following to say:

A shinobi website is a text-based, RSS focused blogging “system”. I put the word system in quotes since it’s really just a simple bash script that converts plain text files into an RSS feed. So, it isn’t an actual blogging platform or website in the traditional sense.

Why the name “shinobi”? Well, a shinobi was a covert agent or mercenary during the time of feudal Japan. Due to their focus on infiltration and assassination, they required a strong focus on stealth and being unseen.

A shinobi website follows the same principles of being secretive and unseen (minus the assassinations and espionage). Only those who choose to include your feed in their respective RSS readers can view your content via the included URL.

I loved this idea, and immediately set about making my own. I no longer had to worry about HTML or CSS. I no longer had to pretend I gave a shit about SEO. Browser support was no longer something I had to deal with when I wasn’t getting paid.

I could just write in plain text, the way I used to on my very first computer (a secondhand IBM that had nothing but DOS and its built-in tools installed), and just upload the files and a feed. If people found my feed and subscribed, great! If not, that was fine too.

The only question I had was whether I should manually wrap my text at 72 columns or leave it unwrapped. However, it occurred to me that hard-wrapping might result in mobile browsers mangling the flow, so I leave my text un-wrapped. If you’re viewing this on a desktop or laptop with a high-resolution display, I suggest you resize the window or increase text zoom.

Limitations of the Original Design

It wasn’t hard to get my shinobi site going with a few posts. Once I got started, however, I got to thinking about how limited Taunt’s approach had turned out to be. His original script for generating feeds assumes you’re putting text files in a single directory and that the first four lines contain metadata in the following format:

  1. Date
  2. Title
  3. (blank)
  4. Summary

This works just fine if you only want to make a blog out of text files, but I wasn’t content with this. It occurred to me that I wanted a feed for my fiction as well as a RSS blog. No problem; I could still do that with Taunt’s script.

However, if I wanted to share photos after a trip, I was screwed. Where was I going to get metadata for the feed? Out of the EXIF data? Sure, there’s a tool for that, but what if the EXIF data isn’t there because I was smart about security for once and made sure that my devices automatically wiped all EXIF data?

Here’s another use case for you: suppose I wanted to share a playlist without a YouTube or Spotify account? You can do that with RSS by creating one entry per URL, but where do you get the metadata for each URL so you can generate the feed?

It was plain that Bradley Taunt’s feed generator, which he adapted from the original by Len Falken, wasn’t suited to all of the use cases I had in mind. However, Bradley Taunt isn’t my local Burger King. If I wanted it done my way, I had to do it myself.

I Did it My Way

This was fine; I’m a FULLSTACK THAUMATURGE, after all, and it wasn’t like I was being called upon to build a cathedral on quicksand from blueprints scribbled on bar napkins. I knew what I wanted, and knew how to do it.

You see, James Tomasino wrote a little blog post back in 2020 about a GNU package called recutils.

GNU Recutils is a set of tools and libraries to access human-editable, plain text databases called recfiles. The data is stored as a sequence of records, each record containing an arbitrary number of named fields. The picture below shows a sample database containing information about GNU packages, along with the main features provided by Recutils.

I didn’t do anything with recutils when I first learned about it in 2020 because I didn’t have a use case in mind, but for some reason it stuck in the back of my mind. Maybe it was because of the mascots: two male turtles named Fred and George merrily humping away.

However, once I started making my own shinobi site it occurred to me that maybe I should be storing metadata in recfiles, so that I can run them through templates to generate feeds.

So, that’s what I did. First, I needed to establish a format. Fortunately, the recutils manual explains how to do this if you need more detail than James Tomasino provides in his post.

My Recfile Format

I only have a few fields, and most of them are mandatory. Though the Atom format only requires an <updated> date, I also provide the <published> date. Unfortunately, recutils only sorts in ascending order. This makes generating the feeds a bit complicated.

# -*- Mode: rec -*-

%rec: Posts
%mandatory: Title Url Description Created Updated
%key: Id
%auto: Id
%sort: Id
%type: Created date
%type: Updated date
%type: Sort int
%type: Url line
%type: Description line
+ A plain-text database that can be used to generate HTML index,
+ RSS feeds, and XML sitemaps. Used with GNU recutils.

When adding an entry, I only provide the mandatory fields. Here’s the entry for this post:

Title: Going Dark
Url: https://matthewgraybosch.com/posts/going-dark.txt
Description: I found a little site at https://shinobi.website/, and I liked it enough that I want to make my own.
Created: 2022-04-18T16:33:56-05:00
Updated: 2022-04-22T18:50:20-05:00

When I build the feeds using my makefile, it uses recfix to set the Id field. Whenever I add an entry, I add it at the top of the file, just after the format documentation. Since I’m using GNU Emacs to edit my files, I can use M-x flush-line and have it remove every line that starts with the “^Id.*” regexp (don’t include the quotes!).

Setting the Created and Updated dates is a bit of a pain, but using GNU Emacs makes it easier. Invoking C-u M-! (CTRL+u and then ALT+!) will give me a shell prompt that will insert the output wherever the cursor happens to be. I then run the following command to get a timestamp:

$ date +"%Y-%m-%dT%H:%M:%S-05:00"

The date command should be available on any UNIX-style system whether it’s GNU/Linux, BSD, macOS, Solaris, AIX, etc1. You might even find it on Windows if you’ve installed Git for Windows and had the installer include all of the extras (like bash, the Bourne Again shell). The format string is defined by the standard C library strftime(3). If you want details, type man strftime at a shell prompt.

This seems pretty simple to me, but I’m used to it. Once the recfile is ready, we’re ready to process it using a shell script that calls recsel and recfmt.

My Atom Feed Generator

Unlike Bradley Taunt and Len Falken, I prefer to use the more modern Atom format instead of RSS. Atom has a formal IETF specification (RFC 4287) and an online validator. Most feed readers will handle both RSS and Atom, so it’s really just a matter of personal preference.

#!/usr/bin/env sh

# © 2022 Matthew Graybosch <contact@matthewgraybosch.com>
# This is anti-capitalist software. See LICENSE for details.
# If the file isn't present, please visit https://anticapitalist.software/

DATE=$(date +"%Y-%m-%dT%H:%M:%S-05:00")
YEAR=$(date +"%Y")
AUTHOR="Matthew Graybosch"
TITLE="Fullstack Thaumaturge"
SUBTITLE="a personal blog by ${AUTHOR}, accessible only with a feed reader"
RIGHTS="© ${YEAR} ${AUTHOR}, all rights reserved"
DATE=$(date +"%Y-%m-%dT%H:%M:%S-05:00")

echo "<?xml version=\"1.0\" encoding=\"utf-8\"?>
<feed xmlns=\"http://www.w3.org/2005/Atom\">
    <link href=\"https://${DOMAIN}/feeds/${FILE}.xml\" rel=\"self\"/>
    <generator>POSIX shell and GNU recutils</generator>

recsel ${FILE}.rec | recfmt -f atom.templ

echo "</feed>";

This script will dump the feed into standard output, so you need to redirect it to a file. I handle that in my makefile.

Hey, What’s atom.templ?

It’s a reference to the Temple of the Atom. No, that’s just me being silly. Here’s the deal: if you pipe the output of recsel into recfmt, you can specify a template for the data you’ve pulled out of the recfile. In this case, the template looks like this:

recfmt template <entry> <title>{{Title}}</title> <link href="{{Url}}" /> <id>{{Url}}</id> <updated>{{Updated}}</updated> <published>{{Created}}</published> <summary> {{Description}} </summary> </entry>

Of course, the recutils manual uses mail merge as an example, but recfmt works just as well for generating Atom feed entries. Hell, you could probably use it to generate JSON if you’re feeling masochistic but don’t have a taste for From Software’s action-RPGs.

Putting It All Together with GNU make

Some people (like Len Falken) might argue that my use of GNU make2 to build my feeds adds needless complexity and that I could get the same result by writing another shell script. They aren’t wrong, but once I got the hang of writing my own makefiles I’ve grown to appreciate make. It’s been part of UNIX for decades, and it does its job well. It allows me to not only build my feeds, but lets me deploy them too.

I even have a build target for local testing using Python’s built-in HTTP server, and I use the classic UNIX “make install” idiom to push my site to my host using rsync, since I host on NearlyFreeSpeech.net instead of Netlify or AWS.

.DEFAULT: build
include ./sshvars

.PHONY: build
build: posts.xml fiction.xml bookmarks.xml playlist.xml

posts.xml: posts.rec atom.templ
    recfix --auto posts.rec
    recfix --sort posts.rec
    ./posts.sh | tidy -q -i -w -utf8 -xml > feeds/posts.xml
    ./posts-index.sh > posts/index.txt

fiction.xml: fiction.rec atom.templ
    recfix --auto fiction.rec
    recfix --sort fiction.rec
    ./fiction.sh | tidy -q -i -w -utf8 -xml > feeds/fiction.xml
    ./fiction-index.sh > fiction/index.txt

bookmarks.xml: bookmarks.rec atom.templ
    recfix --auto bookmarks.rec
    recfix --sort bookmarks.rec
    ./bookmarks.sh | tidy -q -i -w -utf8 -xml > feeds/bookmarks.xml

playlist.xml: playlist.rec playlist.templ
    recfix --auto playlist.rec
    recfix --sort playlist.rec
    ./playlist.sh | sed -e 's/\ \&\ /\ \&amp;\ /g' | tidy -q -i -w -utf8 -xml > feeds/playlist.xml

serve: clean build
    python3 -m http.server --directory .

install: clean build
    rsync --rsh="ssh ${SSH_OPTS}" \
          --delete-delay \
          --exclude-from='./rsync-exclude.txt' \
          -acvz . ${SSH_USER}@${SSH_HOST}:${SSH_PATH}

.PHONY: clean
    rm feeds/*.xml posts/index.txt fiction/index.txt

Each feed target – “posts.xml” and “fiction.xml” as of 2022-04-22 – has a couple of dependency checks. If their respective recfiles and atom.templ aren’t present, make will abort with an error. If the files are present, we call recfix to set up auto-incremented Ids and sort the files before calling our feed generator scripts. Each feed generator pipes its output to tidy to pretty-print the resulting XML before writing to files.

Each feed target is a dependency for the “build” target. The “build” target is itself a dependency for the “serve” and “install” targets. If I simply want to build with my current files, I need only run “make” in my terminal. If I want to build and upload, “make install” will do the job.

GNU make also provides an “include” directive which lets me store variables in a separate file. That way I can commit the makefile to a git repository while putting the “sshvars” file in .gitignore for safety.

Serving index.txt With .htaccess

If you’re running a shinobi site on a traditional web host like Dreamhost or NearlyFreeSpeech, you should bear in mind that without an “index.html” file in your site’s root directory anybody visiting your site will get a directory listing by default if they’re lucky and an error message if they aren’t.

Fortunately, we can fix that with a file called “.htaccess”3, which allows us to give some directives to the HTTP daemon serving up our files.

DirectoryIndex index.txt

The command we want is “DirectoryIndex”, but you can use .htaccess to specify custom error pages, set up caching, configure redirects, and even block traffic referred from certain sites. If you’re curious, I’d suggest starting with Apache’s documentation.

Make Your Own

I think I’ve provided all of the information you need to build your own shinobi site, but please email me if you have any questions. And don’t forget that Bradley Taunt’s version0 is still fine if you want to keep it simple. Have fun!

  1. Just bear in mind that GNU date (gdate when installed on BSD systems) works differently than POSIX date. Be sure to review the date(1) manual page.↩︎

  2. My makefiles might work with BSD make, too, but don’t take my word for it. For example, OpenBSD’s make doesn’t support .PHONY. To be safe, make sure to invoke gmake when running on BSD or adjust your $PATH variable so that locally installed GNU make shows up before the system make command.↩︎

  3. .htaccess is a mechanism provided by the Apache HTTP server for situations where it isn’t safe to give website operators access to the main configuration, such as shared hosting. It isn’t implemented in nginx.↩︎