jasonwryan.com

Miscellaneous ephemera…

Pruning Tarsnap Archives

I started using Tarsnap about three years ago and I have been nothing but impressed with it since. It is simple to use, extremely cost effective and, more than once, it has saved me from myself; making it easy to retrieve copies of files that I have inadvertently overwritten or otherwise done stupid things with1. When I first posted about it, I included a simple wrapper script, which has held up pretty well over that time.

What became apparent over the last couple of months, as I began to consciously make more regular backups, was that pruning the archives was a relatively tedious business. Given that Tarsnap de-duplicates data, there isn’t much mileage in keeping around older archives because, if you do have to retrieve a file, you don’t want to have to search through a large number of archives to find it; so there is a balance between making use of Tarsnap’s efficient functionality, and not creating a rod for your back if your use case is occasionally retrieving single—or small groups of—files, rather than large dumps.

I have settled on keeping five to seven archives, depending on the frequency of my backups, which is somewhere around two to three times a week. Pruning these archives was becoming tedious, so I wrote a simple script to make it less onerous. Essentially, it writes a list of all the archives to a tmpfile, runs sort(1) to order them from oldest to newest, and then deletes the oldest minus whatever the number to keep is set to.

The bulk of the code is simple enough:

snapclean
# generate list
tarsnap --list-archives > "$tmpfile"

# sort by descending date, format is: host-ddmmyy_hh:mm
{
  rm "$tmpfile" && sort -k 1.11,1.10 -k 1.8,1.9 -k 1.7,1.6 > "$tmpfile"
} < "$tmpfile"

# populate the list
mapfile -t archives < "$tmpfile"

# print the full list
printf "%s\n%s\n" "${cyn}Current archives${end}:" "${archives[@]#*-}"

# identify oldest archives
remove=$(( ${#archives[@]} - keep ))
targets=( $(head -n "$remove" "$tmpfile") )

# if there is at least one to remove
if (( ${#targets[@]} >= 1 )); then
  printf "%s\n" "${red}Archives to delete${end}:"
  printf "%s\n" "${targets[@]#*-}"

  read -p "Proceed with deletion? [${red}Y${end}/N] " YN

  if [[ $YN == Y ]]; then
    for archive in "${targets[@]}"; do
      tarsnap -d --no-print-stats -f "$archive"
    done && printf "%s\n" "${yel}Archives successfully deleted...${end}"

    printf "\n%s\n" "${cyn}Remaining archives:${end}"
    tarsnap --list-archives
  else
    printf "%s\n" "${yel}Operation aborted${end}"
  fi
else
  printf "%s\n" "Nothing to do"
  exit 0
fi

You can see the rest of the script in my bitbucket repo. It even comes with colour.

Once every couple of weeks, I run the script, review the archives marked for deletion and then blow them away. Easy. If you aren’t using Tarsnap, you really should check it out; it is an excellent service and—for the almost ridiculously small investment—you get rock solid, encrypted peace of mind. Why would you not do that?

Coda

This is the one hundredth post on this blog: a milestone that I never envisaged getting anywhere near. Looking back through the posts, nearly 60,000 words worth, there are a couple there that continue to draw traffic and are obviously seen at some level as helpful. There are also quite a few that qualify as “filler”, but blogging is a discipline like any other and sometimes you just have to push something up to keep the rhythm going. In any event, this is a roundabout way of saying that, for a variety of reasons both personal and professional, I am no longer able to fulfil my own expectations of regularly pushing these posts out.

I will endeavour to, from time to time when I find something that I genuinely think is worth sharing, make an effort to write about it, but I can’t see that happening all that often. I’d like to thank all the people that have read these posts; especially those of you that have commented. With every post, I always looked forward to people telling me where I got something wrong or how I could have approached a problem differently or more effectively2; I learned a lot from these pointers and I am grateful to the people that were generous enough to share them.

Notes

  1. The frequency with which this happens is, admittedly, low; but not low enough to confidently abandon a service like this…
  2. Leaving a complimentary note is just as welcome, don’t get me wrong…

Multi-arch Packages in AUR

One of the easiest ways to contribute to Arch is to maintain a package, or packages, in the AUR; the repository of user contributed PKGBUILDs that extends the range of packages available for Arch by some magnitude. Given that PKGBUILDs are just shell scripts, the barrier to entry is relatively low, and investing the small amount of effort required to clear that barrier will not only give you a much better understanding of how packaging works in Arch, but will scratch your own itch for a particular package and hopefully assuage someone else’s similar desire at the same time.

Now that I have a Raspberry Pi1, I am naturally much more interested in packages that can be built for the ARMv6 architecture; especially those that are available in the AUR. It is worth a brief digression to note that Arch Linux ARM is an entirely separate distribution and, while they share features with Arch, support for each is restricted to their respective communities. It is with this consideration in mind that I had begun to think about multi-arch support in PKGBUILDs, particularly in the packages that I maintain in the AUR.

I have previously posted about using Syncthing across my network, including on a Pi as one of the nodes. As the Syncthing developer pushes out a release at least weekly, I have been maintaining my own PKGBUILD and, after Syncthing was pulled into [community], I uploaded it to the AUR as syncthing-bin.

Syncthing is a cross platform application so it runs on a wide range of architectures, including ARM (both v6 and v7). Initially, when I wrote the PKGBUILD, I would updpkgsums on my x86_64 machine, build the package and then, on the Pi, have to regenerate the integrity checks. This was manageable enough for my own use across two architectures, but wasn’t really going to work for people using other architectures (especially if they are using AUR helpers).

Naturally enough, this started me thinking about how I could more effectively manage the process of updating the PKGBUILD for each new release, and have it work across the four architectures—without having to manually copy and paste or anything similarly tedious. Managing multiple architectures in the PKGBUILD itself is not particularly problematic, a case statement is sufficient:

PKGBUILD
case "$CARCH" in
    armv6h) _pkgarch="armv6"
            sha1sums+=('a94e5d00cec32956eb27bc12dbbc4964b68913f9')
           ;;
    armv7h) _pkgarch="armv7"
            sha1sums+=('9b782abf95668a906bfe76ad5ceb4cda17ec2289')
           ;;
    i686) _pkgarch="386"
          sha1sums+=('b2e1961594a931201799246f5cf61cb1e1700ff9')
           ;;
    x86_64) _pkgarch="amd64"
            sha1sums+=('035730c09ca5383c90fdd9898baf66b90acdef24')
           ;;
esac

The real challenge, for me, was to be able to script the replacement of each of the respective sha1sums, and then to update the PKGBUILD with the new arrays. Each release of Syncthing is accompanied by a text file containing all of the sha1sums, each on its own line in a conveniently ordered format, like so:

sha1sums.txt.asc
b2e1961594a931201799246f5cf61cb1e1700ff9    syncthing-linux-386-v0.9.16.tar.gz
035730c09ca5383c90fdd9898baf66b90acdef24    syncthing-linux-amd64-v0.9.16.tar.gz
d743b64204f0ac7884e4b42d9b1865b2436f5ecb    syncthing-linux-armv5-v0.9.16.tar.gz

This seemed a perfect job for Awk, or more particularly, gawk’s switch statement, and an admittedly rather convoluted printf incantation.

    switch ($2) {
      case /armv6/:
        arm6 = $1
        break
      case /armv7/:
        arm7 = $1
        break
      case /linux-386/:
        i386 = $1
        break
      case /linux-amd64/:
        x86 = $1
        break
      }
  }
END {
  printf "case \"$CARCH\" in\n\t"\
         "armv6h) _pkgarch=\"armv6\"\n\t\tsha1sums+=(\047%s\047)\n\t\t;;\n\t"\
         "armv7h) _pkgarch=\"armv7\"\n\t\tsha1sums+=(\047%s\047)\n\t\t;;\n\t"\
         "i686) _pkgarch=\"386\"\n\t\tsha1sums+=(\047%s\047)\n\t\t;;\n\t"\
         "x86_64) _pkgarch=\"amd64\"\n\t\tsha1sums+=(\047%s\047)\n\t\t;;\n"\
         "esac\n",
         arm6, arm7, i386, x86
}

The remaining step was to update the PKGBUILD with the new sha1sums. Fortunately, Dave Reisner had already written the code for this in his updpkgsums utility; I had only to adapt it slightly:

excerpt from updpkgsums
{
  rm "$buildfile"
  exec awk -v newsums="$newsums" '
    /^case/,/^esac$/ {
      if (!w) { print newsums; w++ }
        next
      }; 1
      END { if (!w) print newsums }
  ' > "$buildfile"
} < "$buildfile"

Combining these two tasks means that I have a script that, when run, will download the current Syncthing release’s sha1sum.txt.asc file, extract the relevant sums into the replacement case statement and then write it into the PKGBUILD. I can then run makepkg -ci && mkaurball, upload the new tarball to the AUR and the two other people that are using the PKGBUILD can download it and not have to generate new sums before installing their shiny, new version of Syncthing. You can see the full version of the script in my bitbucket repo.

Notes

  1. See my other posts about the Pi

Creative Commons image of the Mosque at Agra, by yours truly.

Simple Reminders

Due to a rather embarrassing episode in #archlinux a couple of weeks ago, where I naively shared one of the first bash scripts I had written without first looking back over it1, and had to subsequently endure what felt like the ritual code mocking, but was in fact some helpful pointers as to how I could make the script suck less (a lot less) I have been going through those older scripts and applying the little knowledge that I have picked up in the interim; reappraising the usefulness of the scripts as I go.

One that has proved to be of some utility for many years now is a simple wrapper script I wrote to help manage my finances. Like many useful scripts, it was written quickly and has been in constant use ever since; becoming almost transparent it is so ingrained in my workflow.

The script allows me to manage the lag between when a company emails me an invoice and when the payment is actually due. I find that companies will typically email their invoices to me some weeks in advance, whereupon I will make a mental note and then, unsurprisingly, promptly forget all about it, thereby opening myself up for penalties for late payment. It didn’t take me long (well, in my defence, a lot less time than it took for invoices to become digital) to realise that there was a better way™ - a script.

The at command is purpose built for running aperiodic commands at a later time (whereas cron is for periodic tasks). So, using at(1), once I receive an invoice, I can set a reminder closer to the final payment window, thereby avoiding both the late payment penalty—and the loss of interest were I to pay it on receipt. I just needed a script to make it painless to achieve.

The main function of the script is pretty self-explanatory:

todo
aread() {
  read -p "Time of message? [HH:MM] " attime
  read -p "Date of message? [DD.MM.YY] " atdate
  read -p "Message body? " message

  timexp='^[0-9]{2}:[0-9]{2}'
  datexp='^[0-9]{2}.[0-9]{2}.[0-9]{2}'

  if [[ $attime =~ $timexp && $atdate =~ $datexp ]]; then
     at "$attime" "$atdate" << EOF
     printf '%s\n' "$message" | mutt -s "REMINDER" jasonwryan@gmail.com
EOF
  else
     printf '%s\n' "Incorrectly formatted values, bailing..." && exit 1
  fi
}

Now, an invoice arrives, I open it and fire up a scratchpad, and follow the prompts. A couple of weeks later, the reminder email arrives and I login to my bank account and dispatch payment. You could, of course, have the script trigger some other form of notification, but an email works well for me.

The rest of the script is similarly basic; just some options for listing and reading any queued jobs and some more rudimentary checking. The full script is in my bitbucket repo2.

Update 7/09/14

Not more than a couple of hours after posting this, Florian Pritz pinged me in #archlinux with some great suggestions for improving the script. I particularly liked relying on date(1) handling the input format for the time and date values. He also suggested a readline wrapper called (appropriately enough) rlwrap and a tmpfile to better manage input validation. You can see his full diff of changes. In the end, I adopted the date suggestion but passed on rlwrap. Thanks for the great pointers, Florian.

Notes

  1. In the interests of full disclosure, the most egregious line was myterm=$(echo $TERM) which I would hope I copied blindly from somewhere else, but accept full responsibility for nonetheless.
  2. Don’t poke around too much in there, I still have quite a lot of cleaning up to do…

Creative Commons image by Adelle and Justin on Flickr.