I am a research fellow, conducting research into automatic analysis of bird sounds using machine learning.
—> Click here for more about my research.
It looks like the bigger consequences of the Brexit vote are about to hit. Everyone thought "no deal" was a laughable extreme back in 2016, and now our government seems to be sailing deliberately towards it.
Do we blame David Cameron, who naively called an ill-prepared vote? Theresa May who dogmatically stuck to extreme "red lines" in the negotiations, failed to get a majority in a General Election, failed to get the agreement voted through, and yet dogmatically refused to consider compromise positions, again and again? Or do we blame incompetent cartoon character Boris Johnson - whose main achievement as London Mayor was the foolish "garden bridge" plan that went nowhere after wasting tons of our money - now arrogantly pushing us to no-deal despite the danger for ordinary people, as well as the democratic deficit?
It's crucial to remember that none of these people is at the root of all this. The Conservative Party as a whole should carry the blame. (Or the credit, if no-deal is a success - sure, why not.) They are currently pushing a heck of a lot of effort into trying to blame the EU for any no-deal Brexit: pretending the EU is refusing to negotiate, when in fact the EU is refusing to reopen negotiations it's just spent two years on, or at least refusing to reopen them unless the UK government proposes a way forwards. The Conservative party have got us into this mess - not just the British public! There were many different routes the government could have taken from 2016 onwards, and this embarrassing mismanagement comes from the Conservative party, again and again. Their MPs and leaders, the government ministers, the membership. We can consider blaming Corbyn for not being an EU cheerleader, but frankly, his subdued triangulating is a minor footnote in this shambles of bad tactics.
I might be tempted to go deeper and say our First-Past-The-Post (FPTP) voting system is on the hook, since that's the reason that the Conservative Party (and Labour) sticks together in the shape it does, and why they feel the need to pander to extremists. But that might be a little too indirect.
We're already suffering the consequences of the Brexit vote. (Personally: I've lost multiple colleagues who have gone overseas, lost opportunities, etc. Collectively, we've lost a lot of influence, and we've wasted a heck of a lot of time we could have been spending fixing the climate crisis.) I'm particularly concerned about what will happen next. In particular, a perverse incentive of a crisis such as no-deal: it gives the sitting government a rare opportunity to ram through emergency legislation which might reshape the British settlement more radically than is normally achievable. The Conservative Party has already shown simple-minded cruelty in voluntarily imposing a cost-slashing "austerity" agenda on the UK for a decade - an agenda wihch may have led to over a hundred thousand excess deaths. They have demonstrated that they don't worry about whether poorer people suffer the side-effects of their big ideas. What more would they like to do?
We see some hints of changes that Brexit might enable: today the food industry claimed it's going to need exemption from the laws of fair competition in order to keep everybody fed. That argument wouldn't get anywhere near the table in ordinary times.
I'm opposed to Brexit but I certainly think naively "cancelling" it would be as harmful as going forward. The right way to proceed, clearly, is to take more time to find a true compromise outcome (which may well turn out to be a soft-ish Brexit, although no-one on any side wants to admit that). The rush now is for selfish reasons. We don't yet have a plan. - And, for our current government, perhaps it's more than just a necessary evil, perhaps not-having-a-plan is an opportunity they are looking forward to?
We've sampled LOTS of alcohol-free beer in the past year. Why? Well - you might not believe me about this - some of it's getting really good. And it's great to be able to have a lovely beer even if you don't want to be woozy afterwards.
So WE HAVE DONE THE RESEARCH and YOU can benefit from it! We've drunk beers good and bad from all over Europe. Here are six fabulous alcohol-free beers that are truly excellent and you should DRINK THEM!
Others which are ALSO recommended:
Plus there are plenty I don't recommend, of course :) but I won't bore you with them. Some more of my tasting notes are in this twitter thread.
While writing this, I found a nice list of top 70 low-alcohol beers from "Steady Drinker". However... it looks like he hasn't been to Sweden! He hasn't reviewed Mariestads, Värmdö or Sigtuna, which is why he's missed some of my top ones. In Sweden they have quite hefty taxes on alcohol, which may be why it's become a great place for innovative low-alcohol beers.
Our holiday this year was great "grounded travel" - we went from the UK to Sweden, going all the way by train! We stopped in multiple cities on the way, in Germany and Denmark as well as Sweden.
I want to tell you how we did it. But before all that there's one handy thing you need to know:
We met LOTS of people on our travels who said "Oh I thought that was just for under-25s". It's not. There are some extra-cheap offers for young people, but even without those it was the most economical way for us to do it.
I'm not going to tell you the details about Interrail passes, because I don't need to: the magnificent Seat 61 Interrail guide is all you need. We bought ourselves Interrail passes, and then added a couple of reservations: there are some services in particular Eurostar (Channel tunnel) where you'll need a reserved seat in addition to the pass. I used the UK phoneline for Deutsche Bahn to book my Eurostar and other reservations, and it was all really easy and friendly.
Taking the train in Europe is great. The trains are generally more modern, spacious and relaxing than UK trains, at least in the countries we've seen. You get to see some great countryside - fields, mountains, lakes, rivers, little town centres - from your seat. And of course there's none of the hassle of flying (getting to the airport; going through security; hanging round after security). We only had to show our passports at two points: the Eurostar, and at the Danish border when we got off a boat.
Oh yes, a boat: we didn't 100% exactly take the train all the way. There was one point in Denmark where we took a rail-replacement bus. And in order to get from Germany to Denmark we took the train that goes on a ferry, woo!
We met lots of lovely people on the way. We shared food with people, we got some excellent local tips for things to do. We even played Yahtzee with some strangers, and played a game of memory-game with a six-year old Swedish girl :)
How far did we get? Stockholm. It takes two days to get from London to Stockholm (stopover in Hamburg or Cologne) and seat61 has some tips for other ways to do it.
We then went into the Swedish countryside and stayed in a... converted train! In a beautiful setting by a lake.
We spent about £350 each on getting the Interrail pass that lets you travel on 10 different days (over a stretch of two months), plus about £60 extra on reservations (mainly the Eurostar). In the end we only travelled on seven of the days meaning we could have gone for a cheaper (£300) ticket, but we weren't sure which we'd need.
You can do it much cheaper if you don't want to visit other places on the way. We deliberately wanted to hop around.
Here's our route:
Some random tips for you:
And enjoy it! We did :)
I've been enjoying using Mopidy and ncmpcpp as my music player.
On Ubuntu though I encountered a persistent problem:
However, it kept failing to connect to PulseAudio. It seemed to be unable to make any sound play back, unless I manually killed pulse, after which it could start playing.
Yes I had followed all the instructions. I had changed Mopidy's config as well as PulseAudio's config. I tried everything, setting PulseAudio to be as permissive as possible (allowing remote connections etc).
The solution, as far as I can tell, for my standard Ubuntu 18.04 is: you should not do both of these things. This is because if you run as a service, you're running something that starts up as soon as the computer boots, and doesn't use your login userid. PulseAudio, however, does not run on startup, but is part of your login session. So it simply isn't there for Mopidy to connect to, until you log in. (I still don't know why that should mean it's unable to connect, even after logging in. I suspect it's because it's running under a different user id.)
So here's what I've got now: instead of running Mopidy as a service (using
mopidyctl to control it), I'm running it as a "startup application", i.e. something that runs on my own user account as part of my graphical desktop login. On Ubuntu there's a program called "Startup Applications" that you use to add/remove things easily. The one small drawback with this approach is that the music player won't be running if the machine reboots and I haven't logged in yet. However, there's usually a logged-in session running.
You might think that "running as a service" and "outputting audio via PulseAudio" would be compatible, especially as they're both listed on the same piece of documentation on the Mopidy website. Perhaps there's an extra trick (e.g. some permissions) that my system needs. But this non-service setup works fine.
The OSM UK community is great, but it's hard to guarantee we can do the detective work to spot all 800,000 solar PV installations. There's a "long tail" of solar panels tucked away down side-streets. It's very much a needle in a haystack, and we would benefit from as many hints as possible.
(We're already working with (a) machine learning and (b) official data sources. They're good sources of hints too, but not the full picture.)
We could ask the general public for help with this. Almost everyone must pass a solar panel during their daily commute, their weekend stroll, or suchlike. But we can't expect the general public to use map editing tools, or in fact anything that requires technical commitment or expertise. Also nothing that requires login or user registration.
Can we make a tool that makes it so simple, that many thousands of people can send in just one or two sightings each?
We don't have the resources to make a fancy phone app. (And would people use it if we did?)
Option one: "drop a pin in a map" approach. Provide a simple webpage which lets people, with no login required, put a pin at a location where they think there's a solar panel. This is fairly easy to code (and could use the OSM Notes API, e.g. with a specific pre-agreed template for the Note text). However it's not ideal for people out-and-about with a smartphone, since it's all about the top-down birds-eye view.
Option two: people can take a photo of a solar panel they see, and post it to a service they already use. (Smartphones often record GPS location along with photos.) Posting to Twitter (with a particular hashtag) would be easy to set up and to scrape. However not everyone uses Twitter. Loads of people use WhatsApp. Can they report their solar spottings directly through a WhatsApp number? It would need someone to set up a number, fine - and then the coding required is to create a system that can slurp whatever photos were sent in, do a bit of sanity-checking and maybe some basic "bot" interactivity, and output a dataset of suggested-geolocations for panels. (These suggested-geolocations would not go into OSM directly, they're not appropriate for that. We can simply provide them as a dataset for mappers to refer to.)
Thanks to Max+Esther at 10:10 for suggesting the second idea. I approached them because 10:10 has previously been involved in something a little bit similar (a mobile app for spotting rooves that would be good places for new panels). They've clearly got the right idea for the kind of simple everyday interaction that's needed.
The H-index is one of my favourite publication statistics. It's really simple to define: a person's H-index is the biggest number H of publications which have been cited H times each. It's robust to outliers: if you've a million publications with no citations, or one publication with a million citations, this doesn't influence the outcome - it's the "core" of your H most cited publications that matter. This makes it quite a nice heuristic for the academic impact of a body of work. A common source of the H-index is Google Scholar, which automatically calculates it for each scholar who has an account, and influential academics with long publication records typically have a high H-index.
However, the H-index should not be used as a primary measure for evaluating academics, e.g. for recruitment or promotion.
The main reason is it's straightforward, in fact almost trivial, to manipulate your own H-index. You can make it artificially high.
Google Scholar doesn't exclude self-citations from its counting. It even counts self-citations in preprints, so the citations might not even be peer-reviewed. You could chuck a handful of hastily-written preprints into arXiv just before you apply for a job. (Should Google exclude self-citations? Yes, in my opinion: it's trivially easy given that they have groundtruth of which academic "owns" which paper. However, that wouldn't remove the vulnerability, because pairs of authors could go one level beyond and conspire to cross-cite each other etc.) Self-citations are often valid things to do, but they're also often used by academics to promote their own previous papers, so it's a grey area.
Google Scholar often automatically adds papers to a person's profile, using text matching to guess if the author matches. I've seen real examples in which an academic's profile included extremely highly-cited papers... that were not by them. In fact they were from completely different research topics! Google's text-matching isn't perfect, and like most text-matching it often has a problem with working out which names are actually the same author.
You can further manipulate your H-index, by choosing how to publish: you can divide research outputs into multiple smaller publications rather than single integrated papers.
Or you can do that after the fact, by tweaking your options in Google about whether two particular publications should be merged into one record or not. (Google has this option, since it often picks up two slightly-different versions of the same publication.)
Most of the vulnerabilities I've listed relate to Google's chosen way of implementing the H-score; however, at least some of them will apply however it is counted.
The H-index is a heuristic. It's OK to look at it as a quick superficial statistic, or even to use it as part of a general assessment making use of other stats and other evidence. But I'm increasingly seeing academic job adverts that say "please submit your Google Scholar H-index". This should not be done: it sends a public signal that this number is considered potentially decisive for recruitment (which it shouldn't be), creating a strong incentive to game the value. It also enforces a new monopoly position for a private company, demanding that academics create Google accounts in order to be eligible for a job. Academia is too important to have single points of failure centred on single companies (witness the recent debates around Elsevier!).
When trying to sift a large pile of applications, people like to have simple heuristics to help them make a start. That's understandable. It's naive to think that one's opinion isn't influenced by the first-pass heuristics - and so it's vital that you use heuristics that aren't so trivially gameable.
New journal article from us!
"Automatic acoustic identification of individuals in multiple species: improving identification across recording conditions" - a collaboration published in the Journal of the Royal Society Interface.
For machine learning, the main takeaway is that data augmentation is not just a way to create bigger training sets: used judiciously, it can mitigate the effect of confounds in the training data. It can also be used at test time to check a classifier's robustness.
For bioacoustics, the main takeaway is that previous automatic acoustic individual ID research may have been overconfident in their claimed accuracy, due to dataset confounds - and we provide methods to try and quantify such issues, even without gathering new data.
This journal article is the output of a nice collaboration we've been working on, to try and bring machine learning closer to solved the problems zoologists really need solved. It's been very pleasant working on these ideas with Pavel Linhart and Tereza Petrusková (I didn't actually meet Martin Šálek!). The problem of detecting individual animals' vocal signatures is not yet a solved one, but I hope this paper helps nudge us part of the way there, and helps the field to get there more efficiently by a careful use of audio datasets.
Where are all the solar panels in Britain? Are they in the south? The sunny east? The countryside, the city?
The UK's office "Ofgem" publishes open data about the solar PV installations that they know about. In the latest "feed-in tariff" (FiT) data, there are about 800,000 of them. The "installed capacity" adds up to about 4.9 gigawatts, about half of which comes from big industrial field-scale installations and half from domestic rooftop solar.
It would be handy to know where the solar panels are - for example, if you're searching for solar panels to map...
For privacy purposes, Ofgem don't publish exact locations, nor unique IDs, in their big spreadsheet. So the data aren't perfect for mapping, but they do give us the postcode district for 90% of these 800 thousand. So, using that postcode info, I've taken their data and simply plotted them on a choropleth. Let's take a look!
Before you look, please note that I'm plotting the raw numbers per postcode district, and NOT normalising the data to account for the size of the district. This partly explains why the plots look "dark" in the regions (such as London) which are chopped up into lots of small districts. Smaller districts should have fewer things in... but on the other hand, smaller districts are supposed to equate to higher density of households, so maybe the postcode district is a good unit of analysis after all.
Here are the plots - three plots showing, respectively, the raw number of installations per district, the total installed capacity in each district, and finally to get an idea of household density I also plot the number of households there are in each district according to census data:
And here's a CSV spreadsheet of the summary FiT numbers I used to plot these. Sorry for not showing (Northern) Ireland, it's not in the data I found.
(The CSV and the images are all derived from Ofgem's FiT data which are published under the Open Government Licence.)
Note that there are A LOT of caveats about this data. About 10% of the solar installations (80 thousand!) whose postcode district was listed as "unknown". Also some postcodes are allegedly not quite right (e.g. some of them are the postcode of the person who registered, not the location of the thing itself). Some of the installations they've listed might have been discontinued, and we don't really have much way of knowing. Oh, and... the postcode area data I'm using seems to have some omissions, hence the occasional white gap in Britain. But notwithstanding all that, this gives us some indication of the distribution.
One thing that pops out to me is that these three plots don't seem very correlated. I'd have expected them all to be highly correlated. For some reason it looks like a relatively high number of small-capacity installations across Yorkshire down into Essex. There's plenty of regional variation and clustering, which may be due to geographical/weather differences, or perhaps to local initiatives.