Sunday, March 11, 2018

Correspondent Comments on the Suitability of Pleiades Data for Scholars


A friend of mine replied to my post on the inaccuracy and unsuitability of Pleiades data for scholarly work.  (http://mycenaeanatlasproject.blogspot.com/2018/02/pleiades-data-does-crowd-sourcing-for.html)

I reproduce his letter here:

"Pleiades isn't structured to provide a single, accurate set of coordinates, though I think it hopes to evolve in that direction. Its most useful role currently is as a set of identifiers that allow links to superior gazetteers. For example, the huge error for ancient Messene is the result of displaying a calculated representative point that includes one spurious DARMC location (from the modern village of Messene), in addition to a mildly inaccurate DARMC location plus a very accurate DARE location. (DARE has assimilated a bunch of Google Earth-validated ToposText points for Greece, and from other sources as well, but uses Pleiades IDs as an easy pivot to other resources). Some of the tools Pleiades funding has produced for the purpose of improving its data are not being used very much -- one problem being a technological gap between laborious on-the-ground collectors ... and people who automate things."

Now I look at it piece by piece (original letter in red, my replies in black)

"Pleiades isn't structured to provide a single, accurate set of coordinates,"

So then where do we go from here?

" ..., though I think it hopes to evolve in that direction."

Spoiler alert: they're not going to. This would involve an enormous amount of work - actual scholarship. They're not going to commit to this because they think that this can be done on the cheap - through copying other data sets or through crowd sourcing. That's not the way that any of this works. My experience with them is that they will correct an error if you bring it forcefully to their attention but not otherwise.

"For example, the huge error for ancient Messene is the result of displaying a calculated representative point that includes one spurious DARMC location (from the modern village of Messene), in addition to a mildly inaccurate DARMC location plus a very accurate DARE location. (DARE has assimilated a bunch of Google Earth-validated ToposText points for Greece, and from other sources as well, but uses Pleiades IDs as an easy pivot to other resources). "

You've explained Messene but what about all the other errors? Nor have you questioned my estimate that approx. 1/3 of Pleiades has serious errors. What you're describing sounds like a real incestuous tangle. I don't even want to get into unpacking this beyond saying that topographical accuracy does not come from copying other data sets. It's like the old saw about buying a used car: you're just buying someone else's problems.

" Some of the tools Pleiades funding has produced for the purpose of improving its data are not being used very much -- one problem being a technological gap between laborious on-the-ground collectors (like me) and people who automate things. "

Sounds like you're describing Recogito. Is that what you mean? Are there other tools that they support? I tried out their conversion tool Geocollider. It failed miserably.

Everything about Pleiades/Pelagios/Peripleo is sham.  The Barrington Atlas data was useful for its printed purpose but now they're trying to roll that data over into the digital world where its approximative nature makes it unfit for use. And they've wrapped the whole thing up with bad and outmoded ideas - not from scholarly practice, from anthropology or toponymy or history or classical studies or any other relevant discipline - but from computer science. None of what they're doing (crowd sourcing and linked data) has anything to do with any scholarly practice or purpose but this is what they're selling and they're getting pots of money for it. In the end actual scholars wind up exactly where they started - having to do the topography of the Mediterranean from scratch. I actually know a fellow (from a very prestigious school) who's preparing a study of Mediterranean habitations. I was shown one of his spreadsheets and it was stuffed with errors since he had relied on Pleiades. In fact that's where my blog post came from.

I've asked myself what their game is. I suspect that what they want is to license their data (or perhaps their follow-on project Pelagios/Peripleo) to schools for so much per seat and deny access to non-customers. That's a time-honored approach in Computer Science. First get a lot of contributors to fork over their work for nothing under the name of something noble-sounding like 'Open Data' or the 'Semantic Web'. Second, license the whole to third parties and keep all the money.

Although I'm not sure that they can really carry this out successfully because it's the Underpants Gnomes business model.


Bibliography

DARE: The Digital Atlas of the Roman Empire.  https://dare.ht.lu.se/

DARMC: The Digital Atlas of Roman and Medieval Civilizations. https://darmc.harvard.edu/

Geocollider: https://pleiades.stoa.org/news/blog/introducing-geocollider

Recogito:  https://recogito.pelagios.org/

Underpants Gnomes: https://vimeo.com/79954057

No comments:

Post a Comment