I was setting a new laptop up recently. If you're not familiar with Linux you probably don't know how amazing is the ecosystem of software you can have for free, almost instantly. Yes sure the software is free but what's actually impressive is how well it all stitches together through "package managers". I use Ubuntu (based on Debian) and Debian provides this amazing jiu-jitsu wherein you can just type
sudo apt install sonic-visualiser
and hey presto, you get Sonic Visualiser nicely installed and ready to go.
So what that means for me is that when I'm setting up a new computer, I don't need to go running around clicking on a million websites, clicking through download links and licence agreements. I can just copy over the list of all my favourite software packages, and
apt will install them for me in just a few steps.
For whatever reason - for my own recollection, at least - here's a list of lots of great packages I tend to install on my desktop/laptop. General useful stuff, plus things that an audio hacker, Python machine-learning developer, and computer science academic might use. I'll add some comments to highlight notable things:
# file sharing, synchronisation syncthing # for fabulous dropbox-without-dropbox file synchronising syncthing-gtk git transmission-gtk # graphics/photo editing cheese darktable gimp # great for bitmap (e.g. photo) editing imagemagick inkscape # great for vector graphics openshot # great for video editing # for a nice desktop environment: pcmanfm gnome-tweak-tool caffeine-indicator # helps to pause screensaver etc when you need to watch a film, give a talk, etc xcalib # I use this to invert colours sometimes # academic jabref r-base texlive texlive-latex-extra texlive-bibtex-extra texlive-fonts-extra texlive-fonts-recommended texlive-publishers texlive-science texlive-extra-utils # for texcount (latex word counting) graphviz gnuplot latexdiff # Super-useful for comparing original text against the re-submission text... poppler-utils # PDF manipulation psutils bibtex2html pandoc # for python programming fun jupyter-notebook virtualenv python-matplotlib python-nose python-numpy python-pip python-scipy python-six python-skimage python3-numpy cython ipython ipython3 # for music playback mopidy mopidy-local-sqlite ncmpcpp pavucontrol paprefs brasero banshee qjackctl jack-tools jackd2 mixxx mplayer vlc # music/audio file manipulation audacity youtube-dl ffmpeg rubberband-cli sndfile-tools sonic-visualiser sox id3v2 vorbis-tools lame mencoder mp3splt # audio programming libraries libsndfile1 libsndfile1-dev libfftw3-dev librubberband-dev libvorbis-dev # for blogging / websiting: pelican lftp # office simple-scan ttf-ubuntu-font-family thunderbird-locale-en-gb orage xul-ext-lightning # alt calendar software # misc programming stuff ansible ant build-essential ccache cmake cmake-curses-gui debhelper debianutils default-jdk default-jre devscripts git-buildpackage vim-gtk # system utilities apparmor apport anacron nmap hfsprogs printer-driver-hpijs dconf-editor chkrootkit dmidecode zip zsh # zsh is so much better than bash gparted htop baobab wireshark-qt bzip2 curl dnsutils dos2unix dvd+rw-tools less openssh-server openvpn screen unrar unzip wget
I've been trying out an e-ink reader for my academic work.
"e-ink" - these are greyscale LCD-like displays. You see the image by light reflectance, almost the same way you read a printed page, not by luminance like a TV/laptop screen. This should be better in lots of ways: better on your eyes, low-power, and you can read outside. The low-power comes because it doesn't need a full jolt of energy 50 times a second as does an LED display: if the image doesn't change, no power is needed, the image stays there for free.
Why for academic work? A LARGE portion of my everyday work consists of looking at academic article PDFs, scribbling on them, then giving/implementing feedback. This comes from students, collaborators, reviews for journals/conferences, and from editing my own work as I do it. Some people can do this kind of stuff directly on a laptop screen. I'm afraid I can't. It's less effective, less detailed when I do that - so, for years, I've been printing things out, scribbling notes on them, then throwing them away afterwards.
If an e-ink reader can replace all that, maybe that's a good thing for the environment?
Note: it takes a lot of resources to build an e-reader. At what point is it "better" to print thousands of pages of paper, versus manufacture one e-reader? I don't know.
You can't use any old e-reader, for academic reviewing: it needs to be large enough to render an A4 PDF well (ideally, full A4 size), and needs some way of annotating. This one I'm trying has a stylus that you can use to scribble, and it works. Surprisingly good so far.
NOW THE NEXT STEP:
We sometimes have sunny days, you know. For some reason this often happens when we've a workshop or conference organised. "Why don't we have the session outside on the grass?" I'm tempted to say. The answer would be... because you can't really look at people's slideshow slides out on the grass. Pass a laptop around? Broadcast the slides to everyone's smartphones? Redraw everything from scratch on a flipchart? Meh.
What I'd like to see is an e-ink screen, large enough to host a seminar with. The resolution doesn't need to be as high as all that, certainly doesn't need to be as high as is needed for reviewing PDFs. it just needs to be big. It would be great if there was a stylus or some other way of scribbling on the screen too.
Most academic slides are not animated. So an e-ink type screen is much more suitable than an LED screen, and would use much much less power. (Ever noticed the amount of cooling needed for those LED advertising signs in the street? Crazy power consumption.)
On Wednesday we had a "flagship seminar" from Prof Andy Hopper on Computing for the future of the planet. How can computing help in the quest for sustainability of the planet and humanity?
Lots of food for thought in the talk. I was surprised to come out with a completely different take-home message than I'd expected - and a different take-home message than I think the speaker had in mind too. I'll come back to that in a second.
Some of the themes he discussed:
- Green computing. This is pretty familiar: how can computing be less wasteful? Low-energy systems, improving the efficiency of computer chips, that kind of thing. A good recent example is how DeepMind used machine learning to reduce the cooling requirements of a Google data centre by 40%. 40% reductions are really rare. Hopper also have a nice example of "free lunch" computing - the idea is that energy is going unused somewhere out there in the world (a remote patch of the sea, for example) so if you stick a renewable energy generator and a server farm there, you essentially get your computation done at no resource cost.
- Computing for green, i.e. using computation to help us do things in a more sustainable way. Hopper gave a slightly odd example of high-tech monitoring that improved efficiency of manufacturing in a car factory; not very clear to me that this is a particularly generalisable example. How about this much better example? Open source geospatial maps and cheap new tools improve farming in Africa. "Aerial drones, crowds of folks gathering soil samples and new analysis techniques combine as pieces in digital maps that improve crop yields on African farms. The Africa Soil Information Service is a mapping effort halfway through its 15-year timeline. Its goal is to publish dynamic digital maps of all of Sub-Saharan Africa at a resolution high enough to serve farmers with small plots. The maps will be dynamic because AfSIS is training people now to continue the work and update the maps." - based on crowdsourced and other data, machine-learning techniques are used to create a complete picture of soil characteristics, and can be used to predict where's good to farm what, what irrigation is needed, etc.
- While we're on the subject of computing-for-green - have a look at Jack Kelly's excellent list of tips Climate change mitigation for hackers
Then Hopper also talked about replacing physical activities by digital activities (e.g. shopping), and this led him on to the topic of the Internet, worldwide sharing of information and so on. He argued (correctly) that a lot of these developments will benefit the low-income countries even though they were essentially made by-and-for richer countries - and also that there's nothing patronising in this: we're not "developing" other countries to be like us, we're just sharing things, and whatever innovations come out of African countries (for example) might have been enabled by (e.g.) the Internet without anyone losing their own self-determination.
Hopper called this "wealth by proxy"... but it doesn't have to be as mystifying as that. It's a well-known idea called the commons.
The name "commons" originates from having a patch of land which was shared by all villagers, and that makes it a perfect term for what we're considering now. In the digital world the idea was taken up by the free software movement and open culture such as Creative Commons licensing. But it's wider than that. In computing, the commons consists of the physical fabric of the Internet, of the public standards that make the World Wide Web and other Internet actually work (http, smtp, tcp/ip), of public domain data generated by governments, of the Linux and Android operating systems, of open web browsers such as Firefox, of open collaborative knowledge-bases like Wikipedia and OpenStreetMap. It consists of projects like the Internet Archive, selflessly preserving digital content and acting as the web's long-term memory. It consists of the GPS global positioning system, invented and maintained (as are many things) by the US military, but now being complemented by Russia's GloNass and the EU's Galileo.
All of those are things you can use at no cost, and which anyone can use as bedrock for some piece of data analysis, some business idea, some piece of art, including a million opportunities for making a positive contribution to sustainability. It's an unbelievable wealth, when you stop to think about it, an amazing collection of achievements.
The surprising take-home lesson for me was: for sustainable computing for the future of the planet, we must protect and extend the digital commons. This is particularly surprising to me because the challenges here are really societal, at least as much as they are computational.
There's more we can add to the commons; and worse, the commons is often under threat of encroachment. Take the Internet and World Wide Web: it's increasingly becoming centralised into the control of a few companies (Facebook, Amazon) which is bad news generally, but also presents a practical systemic risk. This was seen recently when Amazon's AWS service suffered an outage. AWS powers so many of the commercial and non-commercial websites online that this one outage took down a massive chunk of the digital world. As another example, I recently had problems when Google's "ReCAPTCHA" system locked me out for a while - so many websites use ReCAPTCHA to confirm that there's a real human filling in a form, that if ReCAPTCHA decides to give you the cold shoulder then you instantly lose access to a weird random sample of services, some of those which may be important to you.
Another big issue is net neutrality. "Net neutrality is like free speech" and it repeatedly comes under threat.
Those examples are not green-related in themselves, but they illustrate that out of the components of the commons I've listed, the basic connectivity offered by the Internet/WWW is the thing that is, surprisingly, perhaps the flakiest and most in need of defence. Without a thriving and open internet, how do we join the dots of all the other things?
But onto the positive. What more can we add to this commons? Take the African soil-sensing example. Shouldn't the world have a free, public stream of such land use data, for the whole planet? The question, of course, is who would pay for it. That's a social and political question. Here in the UK I can bring the question even further down to the everyday. The UK's official database of addresses (the Postcode Address File) was... ahem... was sold off privately in 2013. This is a core piece of our information infrastructure, and the government - against a lot of advice - decided to sell it as part of privatisation, rather than make it open. Related is the UK Land Registry data (i.e. who owns what parcel of land) which is not published as open data but is stuck behind a pay-wall, all very inconvenient for data analysis, investigative journalism etc.
We need to add this kind of data to the commons so that society can benefit. In green terms, geospatial data is quite clearly raw material for clever green computing of the future, to do good things like land management, intelligent routing, resource allocation, and all kinds of things I can't yet imagine.
As citizens and computerists, what can we do?
- We can defend the free and open internet. Defend net neutrality. Support groups like the Mozilla Foundation.
- Support open initiatives such as Wikipedia (and the Wikimedia Foundation), OpenStreetMap, and the Internet Archive. Join a local Missing Maps party!
- Demand that your government does open data, and properly. It's a win-win - forget the old mindset of "why should we give away data that we've paid for" - open data leads to widespread economic benefits, as is well documented.
- Work towards filling the digital commons with ace opportunities for people and for computing. For example satellite sensing, as I've mentioned. And there's going to be lots of drones buzzing around us collecting data in the coming years; let's pool that intelligence and put it to good use.
If we get this right, 20 years from now our society's computing will be green as anything, not just because it's powered by innocent renewable energy but because it can potentially be a massive net benefit - data-mining and inference to help us live well on a light footprint. To do that we need a healthy digital commons which will underpin many of the great innovations that will spring up everywhere.
My blog has been running for more than a decade, using the same cute-but-creaky old software made by my chum Sam. It was a lo-fi PHP and MySQL blog, and it did everything I needed. (Oh and it suited my stupid lo-fi blog aesthetics too, the clunky visuals are entirely my fault.)
Now, if you were starting such a project today you wouldn't use PHP and you wouldn't use MySQL (just search the web for all the rants about those technologies). But if it isn't broken, don't fix it. So it ran for 10 years. Then my annoying web provider TalkTalk messed up and lost all the databases. They lost all ten years of my articles. SO. What to do?
Well, one thing you can do is simply drop it and move on. Make a fresh start. Forget all those silly old articles. Sure. But I have archivistic tendencies. And the web's supposed to be a repository for all this crap anyway! The web's not just a medium for serving you with Facebook memes, it's meant to be a stable network of stuff. So, ideal would be to preserve the articles, and also to prevent link rot, i.e. make sure that the URLs people have been using for years will still work...
So, job number one, find your backups. Oh dear. I have a MySQL database dump from 2013. Four years out of date. And anyway, I'm not going back to MySQL and PHP, I'm going to go to something clean and modern and ideally Python-based... in other words Pelican. So even if I use that database I'm going to have to translate it. So in the end I found three different sources for all my articles:
- The old MySQL backup from 2013. I had to install MySQL software on my laptop (meh), load the database, and then write a script to iterate through the database entries and output them as nice markdown files.
- archive.org's beautiful Wayback Machine. If you haven't already given money to archive.org then please do. They're the ones making sure that all the old crap from the web 5 years ago is still preserved in some form. They're also doing all kinds of neat things like preserving old video games, masses and masses of live music recordings, and more. ... Anyway I can find LOTS of old archived copies of my blog items. There are two problems with this though: firstly they don't capture everything and they didn't capture the very latest items; and secondly the material is not stored in "source" format but in its processed HTML form, i.e. the form you actually see. So to address the latter, I had to write a little regular expression based script to snip the right pieces out and put them into separate files.
- For the very latest stuff, much of it was still in Google's web cache. If I'd thought of this earlier, I could have rescued all the latest items, since Google is I think the only service that crawls fast enough and widely enough to have captured all the little pages on my little site. So, just like with archive.org, I can grab the HTML files from Google, and scrape the content out using regular expressions.
That got me almost everything. I think the only thing missing is one blog article from a month ago.
Next step: once you've rescued your data, build a new blog. This was easy because Pelican is really nice and well-documented too. I even recreated my silly old theme in their templating system. I thought I'd have problems configuring Pelican to reproduce my old site, but it's basically all done, even the weird stuff like my separate "recipes" page which steals one category from my blog and reformats it.
Now how to prevent linkrot? The Pelican pages have URLs like "/blog/category/science.html" instead of the old "/blog/blog.php?category=science", and if I'm moving away from PHP then I don't really want those PHP-based links to be the ones used in future. I need to catch people who are going to one of those old links, and point them straight at the new URLs. The really neat thing is that I could use Pelican's templating system to output a little lookup table, a CSV file listing all the URL rewrites needed. Then I write a tiny little PHP script which uses that files and emits HTTP Redirect messages. ........... and relax. a URL like http://www.mcld.co.uk/blog/blog.php?category=science is back online.
OK now here we've got lots of lovely good news. Not only have my colleague Andrew McPherson and his team created an ultra-low-latency linux audio board called Bela. Not only can it do audio I/O latencies measured in microseconds (as opposed to the usual milliseconds). Not only did it just finish its kickstarter launch and received eleven times more funding than they asked for.
The extra good news is that we've got SuperCollider running on Bela. So you can run your favourite crazy audio synthesis/processing ideas on a tiny little low-latency box, almost as easily as running it on a laptop.
Can everyone use it? Well not just yet - the code to use Bela's audio driver isn't yet merged into the main SuperCollider codebase, and you need to compile my forked version of SC. So this blog is just to preview it. But we've got the code, as well as instructions for compiling, in this fork over here, and two of the Bela crew (Andrew and Giulio) have helped get it to the point where now I can run it in low-latency mode with no audio glitching.
Where do we go from here? It'd be nice if other people can test it out. (All those Kickstarter backers who are receiving their boards sometime soon...) There are a couple of performance improvements that can hopefully be done. Then eventually I hope we can propose it gets merged in to the SC codebase, perhaps for SC 3.8 or suchlike.
The whole idea of static site generators is interesting, especially for someone who has had to deal with the pain of content management systems for website making. I've been dabbling with a static site generator for our research group website and I think it's a good plan.
What's a static site generator?
Firstly here's a quick historical introduction to get you up to speed on what I'm talking about:
- When the web was invented it was largely based on "static" HTML webpages with no interaction, no user-customised content etc.
- If you wanted websites to remember user details, or generate content based on the weather, or based on some database, the server-side system had to run software to do the clever stuff. Eventually these evolved into widely-used "content management systems" (CMSes) - such as drupal, wordpress, plone, mediawiki.
- However, CMSes can be a major pain in the arse to look after. For example:
- They're very heavily targeted by spammers these days. You can't just leave the site and forget about it, especially if you actually want to use CMS features such as user-edited content, or comments - you need to moderate the site.
- You often have to keep updating the CMS software, for security patches, new versions of programming languages, etc.
- They can be hard to move from one web host to another - they'll often have not-the-right version of PHP or whatever.
- That gets rid of many security issues and compatibility issues.
- It also frees you up a bit: you can use whatever software you like to generate the content, it doesn't have to be software that's designed for responding to HTTP requests.
- It does prevent you from doing certain things - you can't really have a comments system (as in many blogs) if it's purely client-side, for example. There are workarounds but it's still a limitation.
It's not as if SSGs are poised to wipe out CMSes, not at all. But an SSG can be a really neat alternative for managing a website, if it suits your needs. There are lots of nice static site generators out there.
Static site generators for academic websites
So here in academia, we have loads of old websites everywhere. Some of them are plain HTML, some of them are CMSes set up by PhD students who left years ago, some of them are big whizzy CMSes that the central university admin paid millions for and doesn't quite do everything you want.
If you're setting up a new research group website, questions that come to mind are:
- How much pain it would take to convince the IT department to install this specific version of python/PHP/ruby, plus all the weird little plugins that this software demands?
- Who's going to maintain the website for years, applying security patches, dealing with hacks, etc?
- If I go through this hassle of setting up a CMS, which of its whizzy features do I actually want to use? Often you don't really care about many core CMS features, and the features you do want (such as publications lists) are handled by some half-baked plugin that a half-distracted academic cobbled together years ago and now doesn't work properly.
So using a static site generator (SSG) might be a really handy idea. So that's what I've done. I used a static site generator called Poole which is written in Python and it appealed to me because of how minimal it is.
Poole does ALMOST NOTHING.
It has one HTML template which you can make yourself, and then it takes content written in markdown syntax and puts the two together to produce your HTML website. It lets you embed bits of python code in the markdown too, if there's any whizzy stuff needed during page generation. And that's it, it doesn't do anything else. Fab!
But there's more: how do people in our research group edit the site? Do they need to understand this crazy little coding system? No! I plugged Poole together with github for editing the markdown pages. The markdown files are in a github project. As with any github project, anyone can propose a change to one of the textfiles. If they're not pre-authorised then it becomes a "Pull Request" which someone like me checks before approving. Then, I have a little script that regularly checks the github project and regenerates the site if the content has changed.
(This is edging a little bit more towards the CMS side of things, with the server actually having to do stuff. But the neat thing is firstly that this auto-update is optional - this paradigm would work even if the server couldn't regularly poll github, for example - and secondly, because Poole is minimal the server requirements are minimal. It just needs Python plus the python-markdown module.)
We did need a couple of whizzy things for the research site: a publications list, and a listing of research group members. We wanted these to come from data such as a spreadsheet so it could be used in multiple pages and easily updated. This is achieved via the embedded bits of python code I mentioned: we have publications stored in bibtex files, and people stored in a CSV file, and the python loads the data and transforms it into HTML.
It's really neat that the SSG means we have all our content stored in a really portable format: a single git repository containing some of the most widely-handled file formats: markdown, bibtex and CSV.
So where is this website? Here: http://c4dm.eecs.qmul.ac.uk/
Agh, I just got caught out by a "silent" change in the behaviour of scipy for Python. By "silent" I mean it doesn't seem to be in the scipy 0.12 changelog even though it should be. I'm documenting it here in case anyone else needs to know:
Here's the …
I'm going to a conference next week, and the conference invites me to "Download the app!" Well, OK, you think, maybe a bit of overkill, but it would be useful to have an app with schedules etc. Here is the app listed on google play.
Oh and here's a list …
I saw Brandon Mechtley's splmap which is for plotting sound-pressure measurements on a map. He mentioned a problem: the default "heatmap" rendering you get in google maps is really a density estimate which combines the density of the points with their values. "I need to find a way to average …
Over Christmas I helped someone set up their brand new Kindle Fire HD. I hadn't realised quite how coercive Amazon have been: they're using Android as the basis for the system (for which there is a whole world of handy free stuff), but they've thrown various obstacles in your way …