Sunday, February 25, 2018

Pleiades Data - Does Crowd Sourcing for Toponymy Actually Work?

In a previous post I discussed the magnitude of the errors we could expect in a positioning system limited to two-place decimal fractions.  I suggested that the average error that we could expect in such a system at latitudes approximately 35° N would be about 400 m.  I thought that this imprecision would bar such a system from serious work in toponymy and I asked what product would use a system so limited in mathematical range and so obviously unsuitable for the purpose.

The Pleiades Project is an attempt to adapt crowd-sourcing to the field of toponymy.  They have received generous financial support from the National Endowment for the Humanities to the amount of $1,140,780.  You can read about their grants here, here, here, and here.

What has the world of scholarship received for all this money?

Recently I was checking the positions for a random sample of geographic locations of classical and hellenistic locations in Greece.  The locations were derived from Pleiades.  This list was not selected by me but by a colleague.  Out of 45 locations 17 (36%) had significant errors.  

The smallest error was 357 m. and the largest was 3775 m.  The average error or arithmetic mean (in the erroneous part of the sample) was 1,514.7 m.   The median error was 1200 m.   I present no standard deviation because the errors are not normally distributed. In fact it appears as though the error distribution in Pleiades data might be bimodal. This suggests that there is more than one underlying cause for the Pleiades system’s inaccuracies.   

From my error worksheet.  Y-dimension is error in m.

The 13 less erroneous locations may derive from crowd contributions.  The uppermost 4 positions  (Passaron, Skotoussa, Messene, Antigonia) may be remnants of the original digitization of the Barrington Atlas data – however that was accomplished.  In other words I am suggesting that it appears as though crowd-sourcing tends to smooth out but not eliminate the original digitizing errors.  I emphasize that these are suppositions on my part.  But, clearly, the complicated history of Pleiades' data generation has left a signature in the error results.

Here is a link to my worksheet.  Occasional references in that worksheet to sites as 'Fnnn' or 'Cnnn' may be resolved at the site

If these results are upheld by others then I would suggest that Pleiades is not an appropriate component of any scholarly work.  If a system with a two-place fractional component has an average error of nearly 400 m. then the average error of Pleiades data of more than 1500 m. suggests that Pleiades data - at some point in its generation - never had an accuracy better than 0.5 to 1.5 fractional places (10^-0.5 to 10^-1.5).

I estimate that it takes at least 2 hours of research to reliably establish a location from Bronze Age or later sites in Greece.  I do not know how many data points Pleiades claims but if it is, for example, 10000 points then it would require an effort of about 20000 man hours to complete a reliability review for Pleiades.  At 2200 man hours in a man year that would require about 9 man years to complete.  This is an order of magnitude estimate.

Crowd-sourcing in toponymy studies does not appear to work.

This defective data of Pleiades casts a shadow downstream - for example in such derivative products as Pelagios/Peripleo.

If Pleiades cannot undertake a good faith reliability study it should be rejected by the scholarly community.

1 comment:

  1. Pleiades isn't structured to provide a single, accurate set of coordinates, though I think it hopes to evolve in that direction. Its most useful role currently is as a set of identifiers that allow links to superior gazetteers. For example, the huge error for ancient Messene is the result of displaying a calculated representative point that includes one spurious DARMC location (from the modern village of Messene), in addition to a mildly inaccurate DARMC location plus a very accurate DARE location. (DARE has assimilated a bunch of Google Earth-validated ToposText points for Greece, and from other sources as well, but uses Pleiades IDs as an easy pivot to other resources). Some of the tools Pleiades funding has produced for the purpose of improving its data are not being used very much -- one problem being a technological gap between laborious on-the-ground collectors (like me) and people who automate things.