Sunday, February 25, 2018

Pleiades Data - Does Crowd Sourcing for Toponymy Actually Work?




In a previous post I discussed the magnitude of the errors we could expect in a positioning system limited to two-place decimal fractions.  I suggested that the average error that we could expect in such a system at latitudes approximately 35° N would be about 400 m.  I thought that this imprecision would bar such a system from serious work in toponymy and I asked what product would use a system so limited in mathematical range and so obviously unsuitable for the purpose.

The Pleiades Project is an attempt to adapt crowd-sourcing to the field of toponymy.  They have received generous financial support from the National Endowment for the Humanities to the amount of $1,140,780.  You can read about their grants here, here, here, and here.

What has the world of scholarship received for all this money?

Recently I was checking the positions for a random sample of geographic locations of classical and hellenistic locations in Greece.  The locations were derived from Pleiades.  This list was not selected by me but by a colleague.  Out of 45 locations 17 (36%) had significant errors.  

The smallest error was 357 m. and the largest was 3775 m.  The average error or arithmetic mean (in the erroneous part of the sample) was 1,514.7 m.   The median error was 1200 m.   I present no standard deviation because the errors are not normally distributed. In fact it appears as though the error distribution in Pleiades data might be bimodal. This suggests that there is more than one underlying cause for the Pleiades system’s inaccuracies.   

From my error worksheet.  Y-dimension is error in m.


The 13 less erroneous locations may derive from crowd contributions.  The uppermost 4 positions  (Passaron, Skotoussa, Messene, Antigonia) may be remnants of the original digitization of the Barrington Atlas data – however that was accomplished.  In other words I am suggesting that it appears as though crowd-sourcing tends to smooth out but not eliminate the original digitizing errors.  I emphasize that these are suppositions on my part.  But, clearly, the complicated history of Pleiades' data generation has left a signature in the error results.

Here is a link to my worksheet.  Occasional references in that worksheet to sites as 'Fnnn' or 'Cnnn' may be resolved at the site http://helladic.info/.

If these results are upheld by others then I would suggest that Pleiades is not an appropriate component of any scholarly work.  If a system with a two-place fractional component has an average error of nearly 400 m. then the average error of Pleiades data of more than 1500 m. suggests that Pleiades data - at some point in its generation - never had an accuracy better than 0.5 to 1.5 fractional places (10^-0.5 to 10^-1.5).

I estimate that it takes at least 2 hours of research to reliably establish a location from Bronze Age or later sites in Greece.  I do not know how many data points Pleiades claims but if it is, for example, 10000 points then it would require an effort of about 20000 man hours to complete a reliability review for Pleiades.  At 2200 man hours in a man year that would require about 9 man years to complete.  This is an order of magnitude estimate.

Crowd-sourcing in toponymy studies does not appear to work.

This defective data of Pleiades casts a shadow downstream - for example in such derivative products as Pelagios/Peripleo.

If Pleiades cannot undertake a good faith reliability study it should be rejected by the scholarly community.

Sunday, February 11, 2018

Fact Computing, Part 2






In a previous post I said that Pelagios Commons/Peripleo does not have its roots in the world of scholarship but in the world of computer science - specifically the ideas of Tim Berners-Lee.(1)  Now let's concentrate more specifically on what Pelagios Commons/Peripleo really does.

The first thing that must be clearly understood is that Pelagios Commons/Peripleo creates no scholarly content and has nothing whatsoever to do with any Classical scholarship.  It is strictly a computer science construct and, with a different database, would be perfectly at home in the world of migration tracking, chemical research, or anything else.

Peripleo is simply a front-end site or data aggregator of a very common type.

Pelagios Commons links large amounts of data produced by other non-related sites and entities and subsumes them under a common format.  It then exploits this umbrella format in order to write its own front-end viewing tools (Peripleo).

Its business model is exactly like that of Huffington Post and any one of hundreds of similar sites.  Through an agreement with providers it reproduces their work tout court.  They say that these unpaid contributors are members of a ‘Community’ but this ‘Community’ is nothing more than the stable of content providers who give away to Pelagios the fruits of their labors.  The most amusing statement on the Pelagios website strenuously denies this plainly obvious fact:


Well,

Pelagios provides no original scholarly content.

Pelagios exclusively displays content provided by others.

Pelagios forces their providers to reduce their own work into a Pelagios format in order for Pelagios' software to display it.

Peripleo implements numerous search options.


What else can Pelagios/Peripleo be but an aggregator/search portal?  In fact, if you go to their Peripleo splash page they clearly say 'Peripleo is a search engine ...'.  The fact that their content providers cooperate in the theft of their own labors does not change the essential nature of the arrangement (This was true for Huffington Post which disguised its essential nature until the moment it went public).  The content providers are said by Pelagios to be members of a ‘Community’.  From my many years as a professional computer scientist I can assure my readers that this type of dishonest rebranding is quite common everywhere in the online world.  The first step in any internet grift is to give it a name that expresses the opposite of what it really is.

That it is the contributors who are to do all the work is also obvious from the tools that Pelagios Commons provides:

Recogito   This is an ‘online platform for collaborative document annotation’.  But it is not the staff of Pelagios Commons that’s going to do this annotation (how could they?).  It is the contributor, the member of the ‘Community’ who creates this content.

Their Cookbook  makes it easy to see who it is who does all the work for Pelagios (hint: not the Pelagios staff themselves).  In every case the contributors are responsible for massaging all their data into a form that Pelagios can accept.   This is a cost to the contributor of many hours of uncompensated labor.  Pelagios should disguise this aspect better than they do.

The following picture should make these several relationships clearer.



I have claimed that the Pelagios Commons enterprise creates no content.  Strictly speaking that is not quite true.  In fact, Pelagios Commons has achieved the Holy Grail of academia: it is a perpetual motion machine for producing conference papers and web presentations.  If you inspect the list to which I’ve linked you will quickly see who it is who specifically benefits from the Pelagios Commons enterprise.

~~~~~~~~~~~~~~~~~~~~~~~~~

Casting doubt on the Pelagios enterprise is not to deny that some sort of digital structuring of the data that we have from Mediterranean societies of antiquity would be useful.  It would be useful.  But how is that goal to be attained?

The data that comes to us (or generated by us) relative to antiquity is of the most heterogeneous forms.  Locations, building plans, daily customs, food stuffs and their hypothesized yields, customs, clothing, trade, etc.  Everything of human interest falls within the purview of scholars of antiquity.  This is a classic data fusion problem.  Data fusion problems arise in environments where a number of sensors of different types provide data of interest that is to be presented in a uniform view.  Such problems arise in the cockpits of fighter pilots and in very many environmental studies where, again, different sensors (or the same types of sensors with different capabilities) are used to gather data which is then to be united, combined or fused into a single point of view.

Pelagios Commons dimly recognizes that this is the real problem.  But they have performed this task backwards.  They start from the assumption that Linked Data is the solution to everything.  Upon that ideology  they built a product which is useful for no one.  That’s the essential problem.  The site really isn’t good for anything because it started ideologically.   It did not start by asking what it is that scholars of ancient societies really need in the form of digital support.

How should the social data from ancient Mediterranean societies be fused?  But, before that, what does it mean, from the digital point of view, to support such scholars?  Particularly in view of the fact that the scholars in such fields have radically differing interests.



Notes

1) Pelagios Commons here links directly to a discussion of Tim Berners-Lee idea of Linked Data here.



Friday, February 9, 2018

Release of Database 52 to Helladic.info




The announcement reads as follows:


Release of DB 52. MAP_Rev_52__02_05_18. Twenty seven new sites. Various corrections and emendations. Search table files updated.

Geographic Positions with Two-Digit Fractions


I’m going to keep this simple:

Let’s pretend that the earth is flat (1); for our purposes it clarifies nothing to introduce spherical trig.  



Here you see part of a grid that marks locations.  The horizontal lines are latitude lines.  They measure distances north and south.  The vertical lines are longitude lines.  They measure distances east and west.  

Now let’s say that the locational precision in our system is two decimal places.  That means that no position on our flat earth can be described with numbers more precise than one one hundredth of a degree.  All lat/lon values must be in one of these forms:

0.nn
N.nn
NN.nn
NNN.nn

Our system forbids anything else.  

So let’s say that the lines in our grid representation are exactly two fractional decimal places apart – a line (lat or lon) every one one hundredth (0.01) of a degree.


I’ve put in some suggested numbers for us to work with.  The first thing that you must notice is that unless a location is sitting on one of the vertices (that’s where the lines actually cross) it is NOT REPRESENTABLE in our system.

Let’s look at an example.  Let us suppose that there is a place called, oh I don’t know, let’s make up some unusual name like ‘Prophitis Elias’.  


Now PE is a disadvantaged site because it happens to be located exactly halfway between all the vertices;  at position 34.345 N and 22.835 E.  Our system only allows us two decimal points of representation and so the position of PE is rendered as 34.35, 22.84. (2)   That shifts the position of PI onto the vertex at that position.  This is a deliberately introduced mistake (called ‘aliasing’) that tries to keep the town of PE in the system.  But it’s still a falsification which arises out of the constraints of our system and so it’s up to users to determine how serious this aliasing is.

How serious is it?  How far is PE from its true position under this kind of aliasing?

I’m going to introduce some simplifying assumptions.   I calculate that the circumference of the earth at  35.34 N is 20,291.484 miles [3] and so one one hundredth of a degree longitude at that latitude is 0.56365 miles or 2976.0851 feet.

One one hundredth of a degree in latitude is 3652.14666 feet.  Since our town of PE is sitting exactly equal distances from the vertices the error would be the hypotenuse (A) of the triangle shown here



The aliasing error for the town of PE would be 1826.0733 feet in latitude (NS) and 1488.043 feet in longitude (EW).  These numbers,  1826.1 and 1488.0, are half of the distances mentioned just above because PE is half-way from all the vertices. The actual error (A) is merely the hypotenuse of the resulting right triangle.  Solving for A by the Pythagorean theorem gives us the maximum aliasing error in this system which is 2355.592 feet or 717.984 m.   So the maximum error in this system is almost  ¾ of a kilometer.  I emphasize that this is true only for this latitude; these errors would grow smaller towards the poles and larger towards the equator.

More important: what is the average error?  How far off will we be in the usual case?

The average error has to be less than the maximum error but how much less?  Is it half?
To answer this question I created a simulation and ran 1,000,000 trials several times.  It turns out that the error is not quite normally distributed (a little skewed to the low end) and the average (the arithmetic median) size of the error converges on ~ 1272 feet (388 m.)


Here's a bar chart of 5,000,000 runs.  Each cell from L to R represents an additional 73 m. in error.

To make a long story short – In a positional system limited to two decimal places in the fraction you can expect an aliasing error of nearly 400 m.

Who would create such a limited cockamamie geographical positioning system and expect it to be used for precise work such as describing the position of, oh I don't know, say something like Bronze Age find sites?

Who indeed?


Notes

     (1)    Did you hear the one about the Flat Earth Society circular that claims ‘Thousands of members around the globe’.  Around the Globe!  Get it?, get it?  O.k. back to work you lot!  
     
     (2)   Under normal rounding rules.


     (3)  For radius and circumference of the earth calculation at 35.34 N:

Fact Computing



In this life, we want nothing but Facts, sir; nothing but Facts!
Mr. Gradgrind in Hard Times
Dickens


Recently a friend suggested to me that Helladic.info was a natural candidate for Pelagios Commons.


Sites of classical learning being assimilated
 into Pelagios Commons/Peripleo

‘Absolutely not’, I replied.

And then found myself tongue-tied because I couldn’t adequately express why not.  So that’s what this blog post is about. 

Pelagios Commons is an effort to bring together a large number of websites (themselves all concerned with various aspects of Classical learning) under a single umbrella and, in some sense, merge them into a common resource.  Pelagios is, in computer parlance, a 'front end' or 'concentrator'.  They perform no original scholarship; they take the results and research of various other sites, smash them into homogeneous factlets, and then spew them out again through their results engine which is called Peripleo.

Since they contribute nothing to scholarship itself, what is it that Pelagios Commons is really trying to accomplish?  They are trying to demonstrate that Classical Studies, in all its forms, is suitable for, and can be represented by, something called 'Open Linked Data'.
What is ‘Open Linked Data’?
Wikipedia defines ‘Linked Data’ as:
 In computinglinked data ... is a method of publishing structured data so that it can be interlinked and become more useful through semantic queries.’[2]
(emphasis is mine)

(‘Open Data’ is merely linked data which has no copyright or other usage restrictions.)
‘Open Linked Data’ might be visualized as an enormous network of little fact nuggets.  The ‘interlinking’ mechanism is, of course, the internet but the underlying goal is given in the definition – it is to facilitate ‘Semantic Queries’ and to make the entire congeries of facts ‘more useful’.
It’s the ‘Semantic Queries’ part which gives it away – Open Linked Data is, conceptually, part of the Semantic Web of Tim Berners-Lee.  Now I’m opposed to the idea of any scholar's wasting their time on Semantic Web efforts and have blogged about it here.  I strongly urge my readers to go back and read that entire post but I reproduce the conclusion here:
“And no automation can replace scholarship.  By scholarship I mean the several activities of gathering evidence, organizing, patient collation, reflection, judgment and the expression of these activities in the form of essays, books, diagrams and, yes, even in the form of web sites or blogs.  There is no grand slam against reality; no Tower to the Heavens that we can build that will let us storm the citadel of knowledge.   We have to patiently scrape away at the matrix of the Unknown with our small intellects in order to see it more plainly.

I oppose the Semantic Web because it is a totalizing (and trivializing) view of knowledge that is inappropriate for fields in the Social Sciences such as Classical Studies.
For example, as the proprietor of Helladic.info I take my field to be the locations of Bronze Age sites.  So far, there are about 2400 such sites in my database and it seems as though site locations would be prima facie appropriate for an Open Linked Data approach.  But it turns out that there is ambiguity about nearly every one of those sites and many of them - as they stand in the DB right now - are likely to be wrong.  My database is not a database of sites so much as it is a database of scholarly arguments about the location and meanings of Bronze Age sites.  In other words, a database of ambiguities, many of which can never be resolved.   And if this is true of relatively straight-forward things like lat/lon pairs how much more true must it be of other Social Science topics in Classical Studies?  Where will we find the semantic web approach to Slavery in ancient Greece and Rome?  Where will we find the Open Linked Data representation of the efflorescence in Fifth-century Athens?  Where will we find the automated web-linked explanation of the collapse at the end of LH III?  Oh, but I forget.  There are available simple fact-based answers, suitable for the Semantic Webon all these topics.  For example:
Slavery was a bad but necessary thing for Greece and Rome in an age with no petroleum, electricity, or engines.  Classical Athens experienced an efflorescence in the Fifth-century because of the indomitable Will of its people and their love of Freedom.  Mycenaean civilization collapsed at the end of LH III because some Invaders from the Sea destroyed everything.
But I exalt myself onto a plane where I do not belong.  Tolstoy was here long before me.  The famous chapter 1 in Epilogue 2 of War and Peace is perfectly ready for semantic net representation. [2]
Open Linked Data is a trivializing approach to knowledge.
Ready for your close-up, Mr. Gradgrind?
Notes
[1] It's pointless to footnote Wikipedia but the above quote was taken from the article 'Linked Data' on February 9, 2018.
[2] "Louis XIV was a very proud and self-confident man; he had such and such mistresses and such and such ministers and he ruled France badly. His descendants were weak men and they too ruled France badly. And they had such and such favorites and such and such mistresses. Moreover, certain men wrote some books at that time. At the end of the eighteenth century there were a couple of dozen men in Paris who began to talk about all men being free and equal. This caused people all over France to begin to slash at and drown one another. ..."

Thursday, February 8, 2018

Lost In Venice




In the Id the principle of non-contradiction does not apply.’
Freud

~~~

Night.   Bed.   

A susurrus of distant whispers; the surf on the Lido.  But it’s only S. breathing quietly on her pillow.

We travel in order to be lost.  If we want to be found we can stay at home.  It’s cheaper.

A feeling of alienation comes suddenly upon the traveler after having been in Venice for a few days.  Being lost there is not as it is in other places.  Being lost in the medieval warren that is Toledo in Spain is being lost in only two dimensions.  You’re merely displaced in linear space from where you wish to be.  But when you’re lost in Venice you’re lost in three dimensions; you’re not only displaced in a linear fashion from your desired destination, you feel as though you’re lost in some third, unspecified, dimension.  That’s confounding because Toledo is on a hill and Venice is as flat as a board.  There’s no way you can be lost in vertical space there but the creepy feeling is that you are.  Or perhaps it’s time that’s the missing dimension.  You have the feeling that not only are you not at your destination – that quotidian hotel, albergo, restaurant, cafe, or campo where you have arranged your rendezvous – it may not even currently exist. 

It doesn’t help that the map of Venice, when turned upside down, still looks right-side up.  Nor does it help that Venice, even at noon, is in a perpetual twilight and that every shop looks like one that you’ve seen just moments ago.  That paper shop; how familiar it seems.  Those are the identical leather bindings that we saw but half an hour since.  That shop of mascherie looks very much like the one we passed in the Ghetto.  Same owner?  Same shop?  Or is it coincidence?  Perhaps we’re not lost at all; we simply haven’t the wit to recognize that we’ve reached our goal.


 Average Venetians

Your senses are already overloaded by the exotica for sale: Paper goods, glass vases, parti-colored fish forever fixed in the glassy interior of a paper weight (85 inches around), masks, costumes, leather-bound books, capes, cloaks, tricornes, Punchinellos, antiques and faux antiques – Canalettos, Titians, Veroneses – all gently pulsing in the golden Venetian twilight.  Objects endlessly recurring in a riot of feathers, papier mache, leather, and silk.  A welter of swords, canes, and objets en verre.

A friend of mine explains in a restaurant – “A few years ago there was a butcher shop on this street and a drug store, – you could live here – Now it’s all masks, costumes, leather, and glass.”  He’s right.

Signs point in opposite directions – both say ‘Rialto’.

Another acquaintance confirms: “Street names mean nothing here, signore.”
Others solemnly warn you against Venice; ‘It’s not healthy.  The cold, the damp, molto artritico.’

To live here would be an agony; it’s cold, misty, foggy, and damp in the winter.  Spring brings warm rain and the mosquitoes (in 2002 I killed a dozen in my hotel room).  In summer the canals smell like drains .. and not good drains either.  The acqu’alta is likely to strike at any time.  Always there is the plague of tourists.  (And it is a plague; only Waikiki could possibly be worse.)  Only autumn (late September or October) is really bearable.

Venice is an enigma; it is the home of esoteric knowledge, of Hermeticism, of Kabbala, of the Tarot.  Why has it never found its Robert Byron or its Lawrence Durrell? [1]  It’s a scheme; Casanova planning his next conquest or your restaurant man passing off the shark as scallops.  

Florence is rational, a dream of the Renaissance emerging from the Medieval in a shower of rose-colored glories.  Venice is a grey contemplation of the Hidden and Revealed, of Hermes Trismegistus.  Venice’s patron saint should be Giambattista Vico for Descartes has no place here.    Ambiguity and obscurantism are the great themes of Venice.  Whenever I dream in Venice I’m always in a library.  I’m hauling down some musty volume of Casanova’s Vita or of Piccolomini.  I’m surrounded by volumes by A.D. Nock, by Festugiere, ..  , even Marx.  There’s a sense in my dream of real urgency; I’m always about to learn … it.    But I never do.   

Muddled identities, ambiguous sex.  Misrule; upset; reversal.  Love affairs by small bridges, of fights, strange cries and obscure alarms in the dark.  And how much is just oneself? The other day at sunset on the balustrade in front of San Marco, standing beneath the bronze horses, we heard a muffled thud; it sounded as though half of Venice had blown up.  It was only New Year’s fireworks; … perhaps I’m the only one who heard it.

Everywhere is the smell of fresh pastry.  It’s the cries of the watermen like birds of prey swooping by in their barchieti.   And around every corner the bright shop windows.  Last night a pretty shop woman said, ‘You speak Italian well!’ At the time I was paying her 800 euro for a turquoise necklace. 

Damp and dirty, decaying, decrepit (in San Marco the marbles are in ruins and cold as death); even to me the city seems to have gone downhill since I first visited in ’93.  The garbage lies in street corners; filth and decay, but then you wake up to a blanket of virginal snow.

The other day, (was it today?), S.  and I crossed over the Ponte ‘Storto or ‘twisted bridge’.  Who are they kidding? They’re all distorted.  And don’t get me started on the campanili.  They all lean at different angles (Santo Stefano looks on the verge of collapse); Pisa has little to boast of with just the single leaning campanile.
 
SS. Giovanni e Paolo, Santa Maria Formosa

But I was speaking of being lost.  Here’s an example of what happens:
Some time ago I determined to visit the Church of the Zanipolo (Venetian dialect for ‘SS.  Giovanni e Paolo’) and off I ventured.  I followed the map very carefully and – for my pains – was led to the Church of Santa Maria di Formosa.  A charming and important church, certainly, but not my destination.  I sat down, consulted the map and set off again.  By following the map to the letter it was only a short time before I was led, ineluctably, back to Santa Maria di Formosa.  In despair, and like the prophets of old, I lifted my face unto the sky and, to be sure, I could see the top of the Zanipolo’s dome.  Keeping it before my eyes – like the proverbial column of smoke – I was able to guide my faltering steps there.

After my visit I turned my back on it and crossed a little bridge away from the Campo di Zanipolo.  On the other side I turned – like Lot’s wife – and gazed back.  This giant Gothic church (the largest in Venice save for San Marco itself) had disappeared as utterly as though it had never existed.  It had gone back into the same imaginative space where we keep Aladdin’s cave – or Samarkand.   

For those who call the Zanipolo their own parish church the exterior world with its linear streets and strictly numbered houses must be as alien and disconcerting as the warrens of Venice are to the outsider.


Notes

[1] Although in the Alexandria Quartet, in Balthazar I think, there’s a brilliant bagatelle which has Venice as the setting.   Subject?  Vampires.   

Yes, I find it in Durrell [1958] 196:

“When I was twenty, I went to Venice for the first time at the invitation of an Italian poet with whom I had been corresponding, …”

Bibliography

Durrell [1958]:  Durrell, Lawrence.  Balthazar.  Penguin. 1958.