Monday, March 7, 2022

Lies, Damn Lies, and Digitization

So about the year 2000 the data which now makes up the Pleiades 'database' was created by digitization either from Barrington Atlas maps directly or from maps that were contributory to the Barrington Atlas.  This data has not been looked at, proofed, or corrected since that time.

How do I know?

Let's take a look.

On the coast of southern Laconia a small peninsula juts out into the Aegean.  For centuries it has been the home of the community of Monemvasia - a world-renowned tourist destination.  In antiquity it was called the promontory of Minoa.  Its location is 36.683 N and 23.05 E.  I show it here as it appears  in the Barrington Atlas.

Pleiades has 'Minoa Prom.' in a very different place: 36.75 N and 23.25 E.  How far is the real Minoa Pr. from the Pleiades marker?  More than eighteen kilometers and smack dab in the middle of the ocean.  Here it is in Google Earth:

The distance between the two is more than 18 km.

Now I'm not picking on Pleiades because of this one error.  There are lots of errors in my own Mycenaean Atlas.  I have made errors of commission, omission, faulty reasoning, and pure ignorance.  I know this because I sometimes catch them.  This error of Pleiades', however, is something different.  Failure to see this glaring digitizing error results from never actually working with their own data - a dataset which has been in existence for more than twenty years.  

There are more examples of failure to catch digitization errors:

Pleiades 603253 is described as "An ancient place, cited: BAtlas 61 E4 unnamed quarry (on Horomedon M. on Kos)"   Pleiades locates it in the middle of the strait separating Kos from the Turkish mainland.  This is about 8.7 km distant from its true location which is shown on the Barrington Atlas at about  36.8312 N,  27.234 E.  This is how it looks in Google Earth:

Mt.Dhikeos on Kos is the ancient Horomedon.  The radius of the circle centered on Pleiades 603253 and extended to the quarry's approximate location in the Barrington Atlas  is ~8700 m.  

Pleiades 570688 is called "Spiraion Pr." and placed at 37.75 N, 23.25 E, in the middle of the Saronic Gulf.  The true location of Spiraion (the modern Spiri) is 37.8025 N, 23.1754 E, about 8500 m. distant.  Topostext (green arrow) places it correctly.

The Mycenaean Atlas Project's new Digital Atlas of Antiquity.
Pleiades position (red arrow) is about 8500 m. from the
true position (green arrow) where the Topostext marker is.

The ancient Grotta was located where the Chora of Naxos is located now, at 37.1084 N, 25.3748 E.  Pleiades (599630) has it in the middle of the bay at 37.1129,  25.3783 just over 500 m. distant.  This error, though very minor, is a particular clue because it was digitized accurately.  The Barrington Atlas shows it in the middle of the Bay (in order to print clear of other labelled places) in the very place Pleiades shows it.  But no one caught this error when going from printed map to digitized data point.

The harbor of Naxos with Pleiades 599630 in the middle of the bay just as depicted in the Barrington Atlas.

Pleiades 585129 described as "An ancient place, cited: BAtlas 59 B3 unnamed wall (Phalerikon Teichos, Leophoros Syngrou)" is shown deep in the Saronic Gulf at 37.875 N, 23.625 E which is about 8700 m from the nearest part of the Phaleron wall.    

The Piraeus.  End of the Phaleron wall at green. 
Pleiades 585129 in the middle of the bay (red).

I found these problems in just a few minutes because the Mycenaean Atlas Project is now offering a way to see the entire Pleiades data set.  I anticipate running across many more of these digitizing errors.

At one time I thought I saw a curious error signature in Pleiades' data.  There appeared to be two different values around which their errors tended to cluster which made the error distribution bimodal.  We even see that a little bit here.  Three of the five values I've noticed here are about 8500-8700 m. off.  This has to be a regularity.  I was puzzled by this at the time I first saw it and yet now I may have discovered the reason.  I suspect that Pleiades digitized their data from maps of different scales.  Errors would be greater on small-scale maps and smaller on large-scale maps.

What factors might be responsible for Pleiades' lack of interest in their own data?  There are several.

1. Size.  The Pleiades database (if that's the word I want) contains nearly forty thousand items.  The consensus among people who work in creating maps of antiquity is that it takes at least one hour per data point.  Often it takes five or six hours.  In six years of steady work on the Mycenaean Atlas I have mapped about eleven-thousand sites both modern and Bronze Age.  A man-year is 2200 hours.  Six man years is 13200 hours.  So: 1.2 hour per site.  Using the same number of 1.2 hours on 38000 sites (the size of Pleiades' database) suggests that it would require 20.72 man years to validate the sites in the Pleiades DB.  Who has that kind of time?  It's clear that when the Pleiades managers were faced with this potential cost they decided that what they had was based on the Barrington atlas and so the raw digitized data was great.  Pleiades as a scholarly endeavor was never in the cards.

2. Pleiades appears not to have their data in a true database.  They provide downloads in several different formats: .csv, .kml, .json, .xml, .turtle, etc. but no .sql or anything that looks like true DB output.  I created a true relational DB from their data in about four days but I had to use their .csv version.   If I'm right in my opinion that there is no true relational DB behind Pleiades then it means that their basic representation of all this data is just incredibly long lists, probably in .json format.  If that's true it might  explain the instability of their Peripleo product.  There is a lot of talk on the Internet about .json  databases.  That's a contradiction.  .json formatted data is just a heap of data, not a database.  It has none of the advantages of a DB and all of the disadvantages of a verbose, over-inflated list, including long run times which get longer as the list gets longer.

3. Lack of interest.  I have read a number of the Pleiades papers over the years.  From these papers it is obvious that the Pleiades team's real interest is Computer Science.  Their primary focus is the netherworld of symbol-based Artificial Intelligence where computers are 'semantically aware' and new knowledge is always just one Improved Reasoner away.  It is a world of religious belief, of the faith that moves mountains and in which continued funding is the new version of everlasting life.  It's not that they don't talk much about toponymy - they don't talk about toponymy at all.

4.  Inability to easily see their data on a map.  It's not clear to me that the Pleiades team ever had
decent tools for looking at their own data.

Soon, perhaps next time, I'm going to put the Pleiades enterprise into perspective and try to explain what they're really doing.

No comments:

Post a Comment