Yesterday I went to a philosophy talk by Margaret Moore, on timbre and the ontology of music. I'd better say up front that I'm not a philosopher and I don't know the literature she was referring to. But I found it a frustrating talk - she was considering a position she calls "timbral sonicism" attributed to Julian Dodd, and asserting what she held to be problems with adding timbre (as well as pitch and duration) into the account of what a musical work can be, in terms of it being a normative description which a particular performance might or might not match.
I thought her argument had a couple of weird components in it: the dodgy assertion that there can never be a synthesiser whose sound was indistinguishable from that of a real instrument (unless the synth actually was functionally equivalent), and the requirement that a performance would have to match all dimensions of timbre (rather than just, say, the brightness dimension) in a performance before Dodd's inclusion of timbre as normative could make sense. But those problems are irrelevant for me because this "timbral sonicist" view is part of the "aesthetic empiricist" approach in which you have to claim that our evaluation of a music performance must only be done in terms of the sonic content of that performance. This is so clearly misguided that I don't see the point talking about it: this is the main reason I was frustrated. Music performances are so many and varied, and many other criteria come into our assessments - not only assessments of whether it was a good performance, but more importantly of whether it was indeed a performance of a particular work. We judge based on our own background and cultural expectations, we judge based upon what we see, on what we believe (e.g. whether the performers are humans or holograms).
But there are some interesting things in this philosophical consideration of the ontology of music, and it led me to think, so let me address one issue in my own way (with an uninformed disregard for any literature on the topic!):
This question is one that was floating about: What is a musical work? and more pertinently How do we judge whether a particular performance is indeed an instantiation of a particular musical work?
For me there are two really important components to answer this:
The concept of "a musical work" only has meaning in some musical traditions, e.g. Western classical or Western pop. In other traditions (e.g. free improv, raga, and I think gamelan) the abstract structures that give form to a musical act have different granularities, and are brought to bear in different combinations.
As Moore said, a musical work can be described as an abstract "sound structure" or a "normative type". The latter is Moore's preferred, and I think she draws some difference between those two, though I can't be sure what the exact differences are. I think the idea of a musical work as a normative type is a useful one, and it reminds me strongly of the idea of an abstract class or abstract type in object-oriented programming: a composer might specify a particular series of notes, for example, and not bother to specify every note's timbre, or not bother to specify which instrument must be used, so we consider it an incomplete specification. The specification is fuzzy as well as incomplete: a composer might specify "getting faster" but not exactly how much.
So in my way of thinking, putting these two points together, a musical work is not special: other abstract things that can be instantiated in a performance (genres, cliches, keys) are the same kind of normative type, and they don't have to sit in a hierarchical relationship to each other. Musical works don't have special status in general, but are a bundle of normative constraints which have a particular granularity that we are used to in Western music.
To say a musical performance is an instance of a particular musical work, then, we check if the constraints are satisfied. We'd need to allow for errors (a few constraints not met, a few constraints sort-of-met) - our tolerance depends on our expectations (maybe we tolerate timbre deviations more readily than pitch deviations, in a particular tradition; maybe we tolerate wider deviations in a school band than a professional orchestra). Criteria should also depend on context in the form of the background corpus - are enough contraints met that we can positively say this is a performance of work A and not of another work B?
But again, to describe it as work A vs work B is only really relevant in the Western idea of a "musical work", in which the piece (e.g. the sequence of notes) is so tightly specified that it's generally only ever a realisation of one work. In other situations, a performer might simultaneously be performing two traditional Irish tunes, woven in and out of each other, and that's the way these tunes are expected to be treated: the result is not a bastardised new work but a simultaneous realisation of two known normative types.
I must also state explicitly that I don't believe for a second that such normative types must only ever include acoustic or psychoacoustic properties (which is the line Moore was sticking to in her talk - whether to criticise it from within, or whether she believes it, I don't know). In some traditions in may be explicit or implicit that a work can only be played on a piano and not on a synthesiser: that's a constraint about the means of production, not about the sound that is produced. Our choice of how strongly to attend to that part of the specification affects our judgment of whether a particular performance counts as an instantiation of a particular work. But there is no a priori way to know what balance of judgments is correct: constraints are always fuzzy (was that definitely a C#, or was it slightly flat?) and pretty much any normative description of musical structure is under-specified.
In this view, pitch, timbre, rhythm, duration, instrumentation, lyrics, and potentially other stuff such as the performer's clothing all have the same status: they are examples of things that in the Western tradition are specified to a greater or lesser extent at the level of a "musical work". (Note that there's not much limit to what might be specified: in raga, the time of day is specified, though that idea might be a surprise to many Western listeners.) And musical works have the same status as genres, cliches, motifs etc, as bundles of constraints which I hope fit Moore's term "normative types". These constraints are brought to bear in what a performer chooses to do in a given performance, and also brought to bear by observers in deciding if it really was "a good/faithful rendition of the piece" or "a trad jazz show".
So is there a use for this? I can't speak for the philosophers, but in Music Information Retrieval I'm reminded of the task of "cover song identification", i.e. determining automatically if a recording is an instantiation of a particular piece (which might be represented as score, or might be represented as a reference recording). All too often, this task is reduced depressingly quickly to the question of whether the melody or chord sequence matches sufficiently. This is an impoverished idea of the "cover song" and fails badly for many widespread genres - an obvious one is hip-hop, but also much club music.
If it were possible, I'd like to imagine a system which does something like "cover song identification" by identifying from a wide number of potential dimensions the specific constraints that a musical work represents, over and above the constraints of any assumed background such as genre or common corpus of known works. It would then use these constraints to identify matching instances. In order to do this usefully, it would need to identify enough constraints that distinguish a work from other candidate works, but would need to leave enough dimensions free (or loosely specified) to allow interpretative variation. What can be held fixed, and what can be allowed to vary, clearly depends on musical tradition, so the context for such an inference would need to be aware not just of a corpus of musical work but probably some cultural parameters that couldn't be inferred directly from audio, no matter how much audio is available.