I had a great time at the Biodiversity_next conference, meeting a lot of people involved in Biodiversity informatics. (I was part of an "AI" session discussing the state of the art in deep learning and bioacoustic audio.)
I was glad to get more familiar with the biodiversity information frameworks. GBIF is one worth knowing, an aggregator for worldwide observations of species. It's full of millions and millions of observations. Plants, animals, microbes... expert, amateur, automatic observations - lots of different types of "things spotted in the wild". They use the cutely-named "Darwin core" as a data formatting standard (informatics folks will get the joke!).
Here's my first play at downloading some GBIF data. I downloaded all the data they've got about rose-ringed parakeets in Britain - the bright green parrots that are quite a new arrival in Britain, an invasive species which we can see in many city parks now. I plotted the observations per year. I also plotted a second species on the same chart, just to have a baseline comparison. So the parakeets are plotted in green, and the other species (common sandpiper) in yellow:
Many caveats with this data. For a start, each dot represents an "observation" not an "individual" - some of the observations are of a whole flock. I chose to keep it simple, not least because some of the observations list "5000" birds at a time, which may well be true but might swamp the visualisation! Also, some of the co-ordinates are scrambled, for data-privacy reasons - you can see it in the slight grid-like layout of the dots - and some are exact.
Further, I don't think I have any way of normalising for the amount of survey effort, at least for most of the data points. There seems to be a strange spike of parakeet density in 2009 - probably due to some surveying initiative, not to some massive short-term surge in the bird numbers! I think if the numbers really had increased eight-fold and then fallen back again, someone would have said something...
Regarding "survey effort": GBIF does offer ways of indicating survey effort, and also "absences" as well as "presences", but most of the data submissions don't make use of those fields.
The sandpiper data fluctuates too. There's definitely an increase as time goes by, primarily due to the increasing amount of surveys adding to GBIF. That's why I added a comparison species. Even with that, you can clearly see the difference in distribution between the two.