Over the last week I have been moving my blog over to Octopress, a lightweight blogging framework for Jekyll, the static site generator powering Github Pages. I had previously been posting to a tumblr page and, over the nearly four years that I had been doing that I had somehow racked up just over 4000 posts. I was not looking forward to migrating across.
However, the fact that the Jekyll project has a number of scripts for migrating from other platforms assuaged my concerns about the difficulty of this task. That sense of relief was shortlived. Neither of the two tumblr migration scripts were of any assistance: both would die during their initial runs, probably due to some funky characters in the post titles, or perhaps the posts themselves.
I certainly had no intention of trying to wade through the entire back catalogue identifying the rogue posts. Rather that admit defeat, and probably more due to a sense of misguided optimism about the “straightforward” nature of the task, I saw this setback as an opportunity to cull all of the cruft1 from the blog and decided to manually import the fifty posts that I thought were of some interest.
Being an assiduous record keeper, all of the posts were helpfully bookmarked on Pinboard under one tag, and therefore it was simple enough to create a list of the required URLs. Armed with this list, it was just a matter of cobbling together a script to do the bulk of the work for me.
The first task was to retrieve the posts from the list:
Then I needed to remove all of the HTML
surrounding the actual posts: an
awk one-liner took care of that.
The final result was fifty markdown files holding all of my posts, almost ready to be committed to github. I say “almost” because the files still required what turned out to be a reasonable amount of cleaning up. Pandoc did a great job, for example, but would inexpicably break multi word hyperlinks over two lines. Similarly all of the internal links to my other posts pointed to the (meaningless) tumblr URLs2.
While the migration was not entirely pain-free, I am pleased that I have done it. Tumblr’s service increasingly left a lot to be desired but as it was a free service, I couldn’t complain too much. Or, more accurately, when I did complain, no-one actually listened…
Indeed, moving to a paid service like Github (yes, it’s free at first, but once you have enough data there you need to pay a small amount every month) makes a lot of sense. The paid services I do use, like Pinboard and Tarsnap are both inexpensive and much more reliable than their free counterparts3; and you get to invest in great software that is a pleasure to use.
- Initially, I had set up the site as a simple holding page and dumped a whole lot of feeds into it: twitter, bookmarks, scrobbled music, etc. Those 4000 posts were mostly just that sort of internet detritus…
- For creating redirections (Github pages do not support .htaccess) I can’t recommend enough the Jekyll Alias Generator. Just. Brilliant.
- And much more scrupulous about how they use your personal data.