Sunday, October 8, 2017

Promiscuous Lex



‘.. let us go down, and there confound their language,
that they may not understand one another’s speech.‘
Genesis  xi.7

‘Girl number twenty unable to define a horse!’ said Mr Gradgrind, ...  [1]

Edmond:         There are 72,519 stones in my walls. I've counted them many times.
Abbé Faria:     But have you named them yet? [2]


   So, imagine that there are these two websites, http://www.napoleon-scholar-a.com and http://www.napoleon-scholar-b.org and they both blog about, guess what, Napoleon Bonaparte.  And these are two reputable scholars although A blogs mostly about the period before 1804 and B blogs mostly Imperial period with some overlaps.  Here’s the idea: someone says ‘what a great resource it would be if we could put these two together somehow’.    And when we consider that there’s a large amount of material on the web related to Napoleon it would be great if this could be automated.  That is, from a multiplicity of on-line resources, to create one large indexable or searchable reference on Napoleon![3]  



And not just search; we should be able to create an automated reasoner to which we could ask questions about Napoleon.  Simple questions like ‘When was the Battle of Marengo?’ and more complex questions such as ‘Was Napoleon good for France?’  Automation is the key, but how would that be done?  Well, as a number of computer types have pointed out, all these sources use the very same nouns or referents.  For example they all use such words as: ‘Napoleon’, ‘empire’, ‘Pope’, ‘Josephine’, ‘France’, ‘Marengo’, ‘Austerlitz’, etc., etc.  What we need to do is come up with a formal way of representing all the relevant nouns, enumerate their properties, and relate them to each other.  The sites, after all, are quite different in style and presentation but they are semantically similar.  And that leads us to formulate such quasi-RDF (Resource Description Format) triples as:


‘Napoleon’ ‘has-a’ ‘wife’;
‘Napoleon’ ‘is-a’ ‘general’;
‘Marengo’ ‘is-a’ ‘battle’
‘General’ ‘has-a’ ‘army’;
‘Josephine’ ‘is-a’ ‘wife’;
‘wife’ ‘has-a’ ‘husband’


And if we defined enough of these, a large number to be sure but graduate students have lots of time, we’d create a representational form strong enough to describe Napoleon and all of his works.  Our automated reasoner would search the aforementioned blogs and find each noun and relate it to the relevant triplet in our database and automatically place it in context.  In that way the various sites on Napoleon would be united in a Semantic Web.  We would be able to ask questions about Napoleon or related subjects and not only learn where the answers are but the answers themselves.  And, of course, not just Napoleon but every conceivable subject – a grand semantic web that unites all knowledge (on-line at least) and allows us to ask questions about anything and receive complete and detailed answers along with the degree of the reliability of that answer.   And the important thing is that all of this would be automated.

This kind of work is attributed to Tim Berners-Lee.   But, of course, none of this is new.  Philosophers have been trying to reduce reality to a series of unambiguous predicates since the dawn of time.  The only thing that’s new here is the intended scale and the means; computers have allowed dreamers to envision a totally automated and effortlessly constructed compendium of every conceivable statement about reality.

It is darkly curious, then, that none of this ever seems to succeed.   No matter how many buzz-words are invented, no matter how many convincing papers are written, no matter how many conferences are held, web-sites designed, contributors or gullible foundations (looking at you, NEH) milked for 'start-up' money - none of this ever seems to work.   

So, whenever you DHers hear the words ‘Semantic Web’, ‘RDF’, or ‘XML’ I solemnly warn you that you are about to be bamboozled into wasting huge amounts of your valuable time.  Please take that to heart.  Just ignore advocates of such schemes; like the religious fanatics that  pass out pamphlets at your door (and religious fanaticism is exactly what drives the concept of the semantic web) they’ll go away if you ignore the door bell.  Remember that you are scholars and RDF/XML semantic web schemes are the death of scholarship.

What’s so wrong about the Semantic Web?

Reality.[4]

The problem is that there is an infinite number of domains of discourse and no Semantic Web can ever hope to unite them.  To see this imagine that we have a third website to be covered by our Napoleonic RDF.  It is called ‘www.napoleon-scholar-c.com’ and it tells the compelling story of how Napoleon came from outer space to wreak havoc in order to pave the way for an alien invasion.  But, foiled by the crafty British, and imprisoned on Saint Helena, his avatar went back into space – there to bide its time on a moon of Saturn where it waits to try again.[5]  And, even though this site uses the same referents as the first two sites, ‘Napoleon’, ‘Marengo’, ‘Josephine’, etc., and even though its propositions can be expressed in the same or similar RDF,  it does not belong to the same domain of discourse.  No attempt to semantically unite these three sites can ever lead to anything except nonsense.[6]  Now, of course, you’ll say that no coo-coo web site like that should be included in our semantic web.  But a human being would have to make that judgment.  To make the judgment, that is, that this web site belongs to a totally different domain of discourse.  So much for the dream of automation.

And, in fact, there's no guarantee that any particular web site is consistent in the domains of discourse that it presents.  That means that even if you choose a website to include in your semantic web scheme that someone knowledgeable still has to go through each statement and test it for reliability (however reliability is defined in your particular semantic web).[6a]


Darker examples could be adduced.  Imagine two web-sites, ‘www.darwin-savior-of-mankind.com’ and ‘www.the-beagle-was-only-a-dog.info’, the first a pro-evolution site and the second vehemently anti-evolution.  They both use the same terms, ‘evolution’, ‘fitness’, ‘selection’, and in, probably, very similar ways.  The same RDF could be formed for both.  But at some point someone is naively going to ask our semantic web about the truth value of evolution and survival of the fittest.  Any semantic web that tries to unite these two domains of discourse will be incoherent on that question.  There is no knowledge schema that covers or can cover these two separate realities.  Again the problem could be solved by a human being culling the web sites covered.  That is, by reading all of them and making a human judgment about which are reliable.  (Another name for this is 'scholarship'.)  Again, the death of automation.

And what about these two: ‘www.abortion-is-murder.com’ and ‘www.celebrating-roe-v-wade.net’?  Or these two: ‘www.gay-is-the-future.org’ and ‘www.true-cause-of-hurricanes-revealed.net’?  Or these two: ‘www.united-nations-benefits.gov’ and ‘www.real-no-shit-black-helicopter-sightings.info’?  Or these two: ‘www.my-guns-my-self.me’ and ‘www.gun-control-failure-scandal.info’?  Or these two: https://www.cdc.gov/coronavirus/2019-ncov/index.html vs. 'www.wake_up_sheeple.us'?

In other words the proposed RDF schemes will fail precisely where we, as human beings, are most concerned to know something reliable.[7]  That is, where our very selves are most involved, RDF and related schemes are powerless.  RDFs through all time have relied on the idea that all knowledge is one; that Truth is One.  I blame Plato for this but that’s just me.  The fact that some of these RDF schemes are ‘ISO-certified’ is just the rotted icing on the absurdist cake. [8]


All knowledge is not reducible to atoms.  And call me a grumpy old man but I have decades of experience in advanced computer science and I've never personally encountered a computer scientist who was educated about anything outside the narrow field of computers (and it is a narrow field).  They are not to be trusted on the issues with which the rest of us are concerned (although I might make an exception for Jaron Lanier).

What divides us as human beings isn’t just a few propositions which, once we learn them, will put us on the track to ‘right thought’.  It is not information that divides us.  This is the classic mistake of computer scientists – and the Holy Grail for every totalitarian.  Au fond, most computer scientists really believe that words are things.  But they aren’t.  We, as human beings, live in our own inherently valuable universes.  Not all of those universes can be harmonized with all the others.  What separates these universes – these selves – are not wrong propositions, or bad-thought, but deeply felt passions, needs, appetites, and loves.  Other human universes cannot be stormed by the Dialectic.  Our connections have to be built up patiently over time.

And no automation can replace scholarship.  By scholarship I mean the several activities of gathering evidence, organizing, patient collation, reflection, judgment and the expression of these activities in the form of essays, books, diagrams and, yes, even in the form of web sites or blogs.  There is no grand slam against reality; no Tower to the Heavens that we can build that will let us storm the citadel of knowledge.   We have to patiently scrape away at the matrix of the Unknown with our small intellects in order to see it more plainly.

Just as we have to work to see each other more plainly.


Endnotes

[1] Hard Times, Charles Dickens

[2] The Count of Monte CristoJay Wolpert, 2002.

[3] Paul Ford suggests exactly this approach for sociobiology.  See Ford [2003].

[4] The best critique I know on this subject is Hubert Dreyfus’ invaluable (it deserved a Pulitzer) What Computers Can’t Do: A Critique of Artificial Reason from 1972 and his new edition, What Computers Still Can’t Do, from 1992.  The budding DHer can also benefit by reading the amusing remarks of Turow (2010) on Tort law.  Turow shines a brilliant light on this very problem of the connection between clearly expressed facts and reasoning about these same facts in various contexts.  The money quotes are :

"Him and his goddamn questions, I thought, his crazy hypos: If battery is a mere offensive touching, 'Is it battery to kiss a woman good night, if she demurely says no?  To push a man off a bridge that's about to collapse?  ...
    I wondered when he would cut it out.  There was no answer to these questions.  There never would be.
    I sat still for a second.  Then I repeated what I'd just thought to myself: There were no answers.  That was the point, the one Zechman - and some of the other professors, less tirelessly - had been trying to make for weeks.  Rules are declared.  But the theoretical dispute is never settled.  If you start out in Torts with a moral system that fixes blame on the deliberately wicked - the guy who wants to run somebody over - what do you do when that running down is only an accident?   How do you parcel out blame when A hopes to hurt B in one way - frighten him by shooting a gun; and ends up injuring him in another freakishly comic manner - clobbered on the head with a falling duck?"   Scott Turow, One-L, pp. 112-113.

and this:

"Was it assault if a midget took a harmless swing at Muhammad Ali?  Was it negligent to refuse to spend $200,000 for safeguards on a dam which could wash away $100,000 worth of property?", p. 62.

Turow's example of the collapsing bridge is a very simple formulation of a famous problem in Law which is described in Leo Katz', Bad Acts and Guilty Minds, Chicago, 1987, p. 210:
"Henri plans a trek through the desert.  Alphonse, intending to kill Henri, puts poison in his canteen.  Gaston also intends to kill Henri  but has no idea what Alphonse has been up to.  He punctures Henri's canteen, and Henri dies of thirst.  What has caused Henri's death?  Was it Alphonse?  How could it be, since Henri never swallowed the poison.  Was it Gaston?  How could it be, since he only deprived Henri of some poisoned water that would have killed him more swiftly even than thirst.  Was it neither then?  But if neither had done anything, Henri would still be alive.  So who killed Henri?"  Katz follows up with a number of real-world examples.

Now if we tried to express these facts in triples form we might have this:

(1) Alphonse - Poison Water - Henri
(2) Gaston - Steal Water - Henri
(3) Steal Water - cause - Thirst
(4) Thirst - cause - Death
(5) Poison Water - cause - Death
(6) Henri - Death - thirst

Now that we have our DB of triples we ask our automated Reasoner 'Who killed Henri?'  It's hard to imagine a Reasoner that wouldn't conclude that Gaston killed Henri with a certitude of 100% and then only because it happens on the 'Henri - Death - Thirst' triple first in the database.  Thus the ambiguity in the situation is elided by the completely unrelated chance ordering of the triples in the DB.   A Greek teacher pointed out to me once that expressing an argument in Greek imposed an 'artificial clarity' on the argument.  So here.  

An automated Reasoner will start out with our question          

<blank> - cause death - Henri

and are asked to  fill in the <blank>.  It looks for a triple about Henri's death and does this:

(6) Henri - Death - Thirst
(4) Thirst - Cause - Death
(3) Steal Water - cause - Thirst
(2) Gaston - Steal Water - Henri
so:
<Gaston> - cause death - Henri

No fuss, no muss.  Ambiguities resolved.  Our automated reasoner sends Gaston off to prison for life as a reward for his saving Henri from a horrible death by poison.

[5] A very mild formulation compared to what we often find on the internet.  Even the practitioners of the Pleiades Linked Open Data initiative have some slight awareness of the difficulties of describing even a single individual in RDF triples.  In Isaksen et al. [2014] we read the following:

"people can be harder to denote, especially where the evidence is fragmentary. Should Aristotle be defined by his place of birth, his association with Athens (of which he was not a citizen), his contributions to philosophy (which?), his tutoring of Alexander the Great, or a combination of these and other ‘facts’?"

I love the shudder quotes around the word 'facts'.  And 'fragmentary'?  Aristotle is one of the best described individuals from antiquity.  If we can't describe Aristotle what will we do with Jesus?, Socrates?, Julius Caesar?  Would anyone want to be the worker who reduces Alexander the Great to RDF triples?  

[6] 'unite' : The real  purpose behind the creation of all these RDF triples is to create a reasoning machine (in our terms a piece of software) that would virtually traverse the Semantic Web answering questions.

[6a] On issues of consistency, non-contradiction, varying authorities, out-of-date data, etc. in the context of the semantic web see Wright [2011] 77-78.  The money quote here is: "Gil and Artz define the challenges thus: Content trust is often subjective, ... "  Ya' think?

[7]  Facing exactly this problem of scaling up Dreyfus [1972] says (quoting from memory) We don't want to play automated chess.  We want to know how to find our way out of the woods when we're lost.  We want to know which fork to use for the salad when dining at the White House.

[8] ISO is another bad idea from the ’80s whose sell-by date has long passed.


Bibliography

Dreyfus [1972]: Dreyfus, Hubert, What Computers Can't Do: A Critique of Artificial Reason.  Harper and Row.  1972

Dreyfus [1992]: Dreyfus, Hubert. What Computers Still Can't Do.  MIT Press. 1992.

Ford [2003] :   Ford, Paul. 'A Response to Clay Shirky's “The Semantic Web, Syllogism, and Worldview”',  www.ftrain.com/ContraShirky.  November, 2003.   Online here.

Isaksen et al. [2014] :  Isaksen, Leif and Simon Rainer, Elton Barker, Pau de Soto.  'Pelagios and the emerging graph of ancient world data'.  June 2014.  DOI: 10.1145/2615569.2615693.  Online here.

Katz [1987] : Katz, Leo.  Bad Acts and Guilty Minds; Conundrums of the Criminal Law.  University of Chicago Press.  ISBN: 0-226-42592-4.  1987.

Turow [2010]: Turow, Scott.  One-L.  Penguin Books (reprinted 2010).

Wright [2011] : Wight, Holly M., Seeing Triple; Archaeology, Field Drawing and the Semantic Web. Dissertation for the degree of Ph.D. , Department of Archaeology. The University of York, England. September 2011. Online here.

No comments:

Post a Comment

Locating a Late Minoan Settlement near Prina on Crete (C7884)

In Hayden [2005] there is a description of a Late Minoan settlement in the Vrokastro area of Crete.  The site sits just below the western bo...