Metadata aggregation services

A presentation by Paul Walk at OAI7, covering the 'aggregating metadata' space.

There's a lot of stuff here, most of which is very good but some of which I think I disagree with (but I suspect the slides have lost something without hearing what Paul said).

Slide 3 - the "why aggregate metadata?" bit is missing a "because you got your content architecture wrong" bullet (see below). Also, I see little evidence that aggregating metadata raises the visibility of content thru Google by some kind of 'gaming'. I mean, I accept that it is technically possible to see how this might happen, but I'm not sure that anyone is trying to do that currently and I'm not really sure how well it would work if one tried it. I think slide 10 touches on this.

Slide 5 "data" or "data service" presumably refers forward to the comment on slide 11 that "a well supported API might be more open than a dump of gigabytes of data" but I'm not sure the slide makes this clear. The use of 'might' here is a bit of a cop out. I mean, yes, it might, but then again, it might not - so what? API vs data dump has no impact on the semantic interoperability issues around data - but I agree that an API might play to the mindset (even if it's a stupid mindset) of developers who don't like dealing with large datasets? It's not totally clear to me that we can infer developers' preferences around metadata aggregations from the preferences of different developers working with different data?

Slide 15 - "Google, etc. are committed to precision thru microdata" - yes... in some areas but it's not clear what the extent of their commitment is going to be.

I think one could make the argument that the need for metadata aggregation (as we most commonly see it - i.e. around institutional repositories) is really just a recognition that we got the repository architecture wrong in the first place. Or, to put it another way, discipline repositories care a lot less about metadata aggregation than institutional ones.

Also... the weirdest thing about metadata aggregation is that we most often see it in the context of supposedly open full text resources - i.e. where the full text of content could be aggregated and indexed (i.e. as Google does) but where we choose, for some weird reason, to only aggregate the metadata most of the time :-)