My blog has been running for more than a decade, using the same cute-but-creaky old software made by my chum Sam. It was a lo-fi PHP and MySQL blog, and it did everything I needed. (Oh and it suited my stupid lo-fi blog aesthetics too, the clunky visuals are entirely my fault.)
Now, if you were starting such a project today you wouldn't use PHP and you wouldn't use MySQL (just search the web for all the rants about those technologies). But if it isn't broken, don't fix it. So it ran for 10 years. Then my annoying web provider TalkTalk messed up and lost all the databases. They lost all ten years of my articles. SO. What to do?
Well, one thing you can do is simply drop it and move on. Make a fresh start. Forget all those silly old articles. Sure. But I have archivistic tendencies. And the web's supposed to be a repository for all this crap anyway! The web's not just a medium for serving you with Facebook memes, it's meant to be a stable network of stuff. So, ideal would be to preserve the articles, and also to prevent link rot, i.e. make sure that the URLs people have been using for years will still work...
So, job number one, find your backups. Oh dear. I have a MySQL database dump from 2013. Four years out of date. And anyway, I'm not going back to MySQL and PHP, I'm going to go to something clean and modern and ideally Python-based... in other words Pelican. So even if I use that database I'm going to have to translate it. So in the end I found three different sources for all my articles:
- The old MySQL backup from 2013. I had to install MySQL software on my laptop (meh), load the database, and then write a script to iterate through the database entries and output them as nice markdown files.
- archive.org's beautiful Wayback Machine. If you haven't already given money to archive.org then please do. They're the ones making sure that all the old crap from the web 5 years ago is still preserved in some form. They're also doing all kinds of neat things like preserving old video games, masses and masses of live music recordings, and more. ... Anyway I can find LOTS of old archived copies of my blog items. There are two problems with this though: firstly they don't capture everything and they didn't capture the very latest items; and secondly the material is not stored in "source" format but in its processed HTML form, i.e. the form you actually see. So to address the latter, I had to write a little regular expression based script to snip the right pieces out and put them into separate files.
- For the very latest stuff, much of it was still in Google's web cache. If I'd thought of this earlier, I could have rescued all the latest items, since Google is I think the only service that crawls fast enough and widely enough to have captured all the little pages on my little site. So, just like with archive.org, I can grab the HTML files from Google, and scrape the content out using regular expressions.
That got me almost everything. I think the only thing missing is one blog article from a month ago.
Next step: once you've rescued your data, build a new blog. This was easy because Pelican is really nice and well-documented too. I even recreated my silly old theme in their templating system. I thought I'd have problems configuring Pelican to reproduce my old site, but it's basically all done, even the weird stuff like my separate "recipes" page which steals one category from my blog and reformats it.
Now how to prevent linkrot? The Pelican pages have URLs like "/blog/category/science.html" instead of the old "/blog/blog.php?category=science", and if I'm moving away from PHP then I don't really want those PHP-based links to be the ones used in future. I need to catch people who are going to one of those old links, and point them straight at the new URLs. The really neat thing is that I could use Pelican's templating system to output a little lookup table, a CSV file listing all the URL rewrites needed. Then I write a tiny little PHP script which uses that files and emits HTTP Redirect messages. ........... and relax. a URL like http://www.mcld.co.uk/blog/blog.php?category=science is back online.
OK now here we've got lots of lovely good news. Not only have my colleague Andrew McPherson and his team created an ultra-low-latency linux audio board called Bela. Not only can it do audio I/O latencies measured in microseconds (as opposed to the usual milliseconds). Not only did it just finish its kickstarter launch and received eleven times more funding than they asked for.
The extra good news is that we've got SuperCollider running on Bela. So you can run your favourite crazy audio synthesis/processing ideas on a tiny little low-latency box, almost as easily as running it on a laptop.
Can everyone use it? Well not just yet - the code to use Bela's audio driver isn't yet merged into the main SuperCollider codebase, and you need to compile my forked version of SC. So this blog is just to preview it. But we've got the code, as well as instructions for compiling, in this fork over here, and two of the Bela crew (Andrew and Giulio) have helped get it to the point where now I can run it in low-latency mode with no audio glitching.
Where do we go from here? It'd be nice if other people can test it out. (All those Kickstarter backers who are receiving their boards sometime soon...) There are a couple of performance improvements that can hopefully be done. Then eventually I hope we can propose it gets merged in to the SC codebase, perhaps for SC 3.8 or suchlike.
The whole idea of static site generators is interesting, especially for someone who has had to deal with the pain of content management systems for website making. I've been dabbling with a static site generator for our research group website and I think it's a good plan.
What's a static site generator?
Firstly here's a quick historical introduction to get you up to speed on what I'm talking about:
- When the web was invented it was largely based on "static" HTML webpages with no interaction, no user-customised content etc.
- If you wanted websites to remember user details, or generate content based on the weather, or based on some database, the server-side system had to run software to do the clever stuff. Eventually these evolved into widely-used "content management systems" (CMSes) - such as drupal, wordpress, plone, mediawiki.
- However, CMSes can be a major pain in the arse to look after. For example:
- They're very heavily targeted by spammers these days. You can't just leave the site and forget about it, especially if you actually want to use CMS features such as user-edited content, or comments - you need to moderate the site.
- You often have to keep updating the CMS software, for security patches, new versions of programming languages, etc.
- They can be hard to move from one web host to another - they'll often have not-the-right version of PHP or whatever.
- That gets rid of many security issues and compatibility issues.
- It also frees you up a bit: you can use whatever software you like to generate the content, it doesn't have to be software that's designed for responding to HTTP requests.
- It does prevent you from doing certain things - you can't really have a comments system (as in many blogs) if it's purely client-side, for example. There are workarounds but it's still a limitation.
It's not as if SSGs are poised to wipe out CMSes, not at all. But an SSG can be a really neat alternative for managing a website, if it suits your needs. There are lots of nice static site generators out there.
Static site generators for academic websites
So here in academia, we have loads of old websites everywhere. Some of them are plain HTML, some of them are CMSes set up by PhD students who left years ago, some of them are big whizzy CMSes that the central university admin paid millions for and doesn't quite do everything you want.
If you're setting up a new research group website, questions that come to mind are:
- How much pain it would take to convince the IT department to install this specific version of python/PHP/ruby, plus all the weird little plugins that this software demands?
- Who's going to maintain the website for years, applying security patches, dealing with hacks, etc?
- If I go through this hassle of setting up a CMS, which of its whizzy features do I actually want to use? Often you don't really care about many core CMS features, and the features you do want (such as publications lists) are handled by some half-baked plugin that a half-distracted academic cobbled together years ago and now doesn't work properly.
So using a static site generator (SSG) might be a really handy idea. So that's what I've done. I used a static site generator called Poole which is written in Python and it appealed to me because of how minimal it is.
Poole does ALMOST NOTHING.
It has one HTML template which you can make yourself, and then it takes content written in markdown syntax and puts the two together to produce your HTML website. It lets you embed bits of python code in the markdown too, if there's any whizzy stuff needed during page generation. And that's it, it doesn't do anything else. Fab!
But there's more: how do people in our research group edit the site? Do they need to understand this crazy little coding system? No! I plugged Poole together with github for editing the markdown pages. The markdown files are in a github project. As with any github project, anyone can propose a change to one of the textfiles. If they're not pre-authorised then it becomes a "Pull Request" which someone like me checks before approving. Then, I have a little script that regularly checks the github project and regenerates the site if the content has changed.
(This is edging a little bit more towards the CMS side of things, with the server actually having to do stuff. But the neat thing is firstly that this auto-update is optional - this paradigm would work even if the server couldn't regularly poll github, for example - and secondly, because Poole is minimal the server requirements are minimal. It just needs Python plus the python-markdown module.)
We did need a couple of whizzy things for the research site: a publications list, and a listing of research group members. We wanted these to come from data such as a spreadsheet so it could be used in multiple pages and easily updated. This is achieved via the embedded bits of python code I mentioned: we have publications stored in bibtex files, and people stored in a CSV file, and the python loads the data and transforms it into HTML.
It's really neat that the SSG means we have all our content stored in a really portable format: a single git repository containing some of the most widely-handled file formats: markdown, bibtex and CSV.
So where is this website? Here: http://c4dm.eecs.qmul.ac.uk/
Agh, I just got caught out by a "silent" change in the behaviour of scipy for Python. By "silent" I mean it doesn't seem to be in the scipy 0.12 changelog even though it should be. I'm documenting it here in case anyone else needs to know:
Here's the simple code example - using scoreatpercentile to find a percentile for some 2D array:
import numpy as np from scipy.stats import scoreatpercentile scoreatpercentile(np.eye(5), 50)
On my laptop with scipy 11.0 (and numpy 1.7.1) the answer is:
array([ 0., 0., 0., 0., 0.])
On our lab machine with scipy 13.3 (and numpy 1.7.0) the answer is:
In the first case, it calculates the percentile along one axis. In the second, it calculates the percentile of the flattened array, because in scipy 12 someone added a new "axis" argument to the function, whose default value "None" means to analyse the flattened array. Bah! Nice feature, but a shame about the compatibility. (P.S. I've logged it with the scipy team.)
I'm going to a conference next week, and the conference invites me to "Download the app!" Well, OK, you think, maybe a bit of overkill, but it would be useful to have an app with schedules etc. Here is the app listed on google play.
Oh and here's a list (abbreviated) of permissions that the app requires:
"""This application has access to the following:
- Your precise location (GPS and network-based)
- Full network access
- Connect and disconnect from Wi-Fi - Allows the app to connect to and disconnect from Wi-Fi access points and to make changes to device configuration for Wi-Fi networks.
- Read calendar events plus confidential information
- Add or modify calendar events and send email to guests without owners' knowledge
- Read phone status and identity
- Camera - take pictures and videos. This permission allows the app to use the camera at any time without your confirmation.
- Modify your contacts - Allows the app to modify the data about your contacts stored on your device, including the frequency with which you've called, emailed, or communicated in other ways with specific contacts. This permission allows apps to delete contact data.
- Read your contacts - Allows the app to read data about your contacts stored on your device, including the frequency with which you've called, emailed, or communicated in other ways with specific individuals. This permission allows apps to save your contact data, and malicious apps may share contact data without your knowledge.
- Read call log - Allows the app to read your device's call log, including data about incoming and outgoing calls. This permission allows apps to save your call log data, and malicious apps may share call log data without your knowledge.
- Write call log - Allows the app to modify your device's call log, including data about incoming and outgoing calls. Malicious apps may use this to erase or modify your call log.
- Run at start-up.
Now tell me, what fraction of those permissions should a conference-information app legitimately use? (I've edited out some of the mundane ones.) Should ANYONE install this on their phone/tablet?
I saw Brandon Mechtley's splmap which is for plotting sound-pressure measurements on a map. He mentioned a problem: the default "heatmap" rendering you get in google maps is really a density estimate which combines the density of the points with their values. "I need to find a way to average rather than add" he says.
Just playing with this, here's my take on the situation. You don't average the values, you create some kind of interpolated overall map, but separately you also use the density of datapoints to decide how confident you are in your estimate at various points on the map. Python code is here and here's an example plot:
Dataviz folks might already have a name for this...
Over Christmas I helped someone set up their brand new Kindle Fire HD. I hadn't realised quite how coercive Amazon have been: they're using Android as the basis for the system (for which there is a whole world of handy free stuff), but they've thrown various obstacles ...
I've been storing a lot of my files in a private git repository, for a long time now. Back when I started my PhD, I threw all kinds of things into it, including PDFs of handy slides, TIFF images I generated from data, journal-article PDFs... ugh. Mainly a lot ...
Just notes for posterity: I am installing Cyanogenmod on my HTC Tattoo Android phone.
There are instructions here which are perfectly understandable if you're comfortable with a command-line. But the instructions are incomplete. As has been discussed here I needed to install "tattoo-hack.ko" in order to get the ...
I've been enjoying writing my research code in Python over the past couple of years. I haven't had to put much effort into optimising it, so I never bothered, but just recently I've been working on a graph-search algorithm which can get quite heavy - there was one ...