‘.. let us go down, and there confound their language,
that they may not understand one another’s speech.‘
So, imagine that there are two websites, http://www.napoleon-scholar-a.com and http://www.napoleon-scholar-b.org and they both blog about, guess what, Napoleon Bonaparte. And these are two reputable scholars although A blogs mostly about the period before 1804 and B blogs mostly Imperial period with some overlaps. Here’s the idea: someone says ‘what a great resource it would be if we could put these two together somehow’. And when we consider that there’s a large amount of material on the web related to Napoleon it would be great if this could be automated. That is, from a multiplicity of on-line resources, to create one large indexable or searchable reference on Napoleon!
And not just search; we should be able to ask questions. Simple questions like ‘When was the Battle of Marengo?’ and more complex questions such as ‘Was Napoleon good for France?’ Automation is the key, but how would that be done? Well, as a number of computer types have pointed out, all these sources use the very same nouns or referents. For example they all use such words as: ‘Napoleon’, ‘empire’, ‘Pope’, ‘Josephine’, ‘France’, ‘Marengo’, ‘Austerlitz’, etc., etc. What we need to do is come up with a formal way of representing all the relevant nouns, enumerate their properties, and relate them to each other. The sites, after all, are quite different in style and presentation but they are semantically similar. And that leads us to formulate such quasi-RDF (Resource Description Format) triples as:
‘Napoleon’ ‘has-a’ ‘wife’;
‘Napoleon’ ‘is-a’ ‘general’;
‘Marengo’ ‘is-a’ ‘battle’
‘General’ ‘has-a’ ‘army’;
‘Josephine’ ‘is-a’ ‘wife’;
‘wife’ ‘has-a’ ‘husband’
And if we defined enough of these, a large number to be sure but graduate students have lots of time, we’d create a representational form strong enough to describe Napoleon and all of his works. A computer program could be written that would search the aforementioned blogs and find each noun and relate it to the relevant triplet in our database and automatically place it in context. In that way the various sites on Napoleon would be united in a Semantic Web. We would be able to ask questions about Napoleon or related subjects and not only learn where the answers are but the answers themselves. And, of course, not just Napoleon but every conceivable subject – a grand semantic web that unites all knowledge (on-line at least) and allows us to ask questions about anything and receive complete and detailed answers along with the degree of the reliability of that answer. And the important thing is that all of this would be automated.
This kind of work is attributed to Tim Berners-Lee. But, of course, none of this is new. Philosophers have been trying to reduce reality to a series of unambiguous predicates since the dawn of time. The only thing that’s new here is the intended scale and the means; computers have allowed dreamers to envision a totally automated and effortlessly constructed compendium of every conceivable statement about reality.
It is darkly curious, then, that none of this ever seems to succeed. Whenever you DHers hear the words ‘Semantic Web’, ‘RDF’, or ‘XML’ I solemnly warn you that you are about to be bamboozled into wasting huge amounts of your valuable time.
Please take that to heart. Just ignore advocates of such schemes; like the religious fanatics that pass out pamphlets at your door (and religious fanaticism is exactly what drives the semantic web) they’ll go away if you ignore the door bell. Remember that you are scholars and RDF/XML semantic web schemes are the death of scholarship.
What’s so wrong about the Semantic Web?
The problem is that there is an infinite number of domains of discourse and no Semantic Web can ever hope to unite them. To see this imagine that we have a third website to be covered by our Napoleonic RDF. It is called ‘www.napoleon-scholar-c.com’ and it tells the compelling story of how Napoleon came from outer space to wreak havoc in order to pave the way for an alien invasion. But, foiled by the crafty British, and imprisoned on Saint Helena, his avatar went back into space – there to bide its time on a moon of Saturn where it waits to try again. And, even though this site uses the same referents as the first two sites, ‘Napoleon’, ‘Marengo’, ‘Josephine’, etc., and even though its propositions can be expressed in the same or similar RDF, it does not belong to the same domain of discourse. No attempt to unite these three sites can ever lead to anything except nonsense. Now, of course, you’ll say that no coo-coo web site like that should be included in our semantic web. But a human being would have to make that judgment. To make the judgment, that is, that this web site belongs to a totally different domain of discourse. So much for the dream of automation.
And, in fact, there's no guarantee that any particular web site is consistent in the domains of discourse that it presents. That means that even if you choose a website to include in your semantic web scheme that someone knowledgeable still has to go through each statement and test it for reliability (however reliability is defined in your particular semantic web).
Darker examples could be adduced. Imagine two web-sites, ‘www.darwin-savior-of-mankind.com’ and ‘www.the-beagle-was-only-a-dog.info’, the first a pro-evolution site and the second vehemently anti-evolution. They both use the same terms, ‘evolution’, ‘fitness’, ‘selection’, and in, probably, very similar ways. The same RDF could be formed for both. But at some point someone is naively going to ask our semantic web about the truth value of evolution and survival of the fittest. Any semantic web that tries to unite these two domains of discourse will be incoherent on that question. There is no knowledge schema that covers or can cover these two separate realities. Again the problem could be solved by a human being culling the web sites covered. That is, by reading all of them and making a human judgment about which are reliable. (Another name for this is 'scholarship'.) Again, the death of automation.
And what about these two: ‘www.abortion-is-murder.com’ and ‘www.celebrating-roe-v-wade.net’? Or these two: ‘www.gay-is-the-future.org’ and ‘www.true-cause-of-hurricanes-revealed.net’? Or these two: ‘www.united-nations-benefits.gov’ and ‘www.real-no-shit-black-helicopter-sightings.info’? Or these two: ‘www.my-guns-my-self.me’ and ‘www.gun-control-failure-scandal.info’?
In other words the proposed RDF schemes will fail precisely where we, as human beings, are most concerned to know something reliable. That is, where our very selves are most involved, RDF and related schemes are powerless. RDFs through all time have relied on the idea that all knowledge is one; that Truth is One. I blame Plato for this but that’s just me. The fact that some of these RDF schemes are ‘ISO-certified’ is just the rotted icing on the absurdist cake.(3)
The truth is not One. And call me a grumpy old man but I have decades of experience in advanced computer science and I've never personally encountered a computer scientist who was educated about anything outside the narrow field of computers (and it is a narrow field). They are not to be trusted on the issues with which the rest of us are concerned (although I might make an exception for Jaron Lanier).
What divides us as human beings isn’t just a few propositions which, once we learn them, will put us on the track to ‘right thought’. It is not information that divides us. This is the classic mistake of computer scientists – and the Holy Grail for every totalitarian. Au fond, most computer scientists really believe that things are only words. But they aren’t. We, as human beings, live in our own inherently valuable universes. Not all of those universes can be harmonized with all the others. What separates these universes – these selves – are not wrong propositions, or bad-thought, but deeply felt passions, needs, appetites, and loves. Other human universes cannot be stormed by the Dialectic. Our connections have to be built up patiently over time.
And no automation can replace scholarship. By scholarship I mean the several activities of gathering evidence, organizing, patient collation, reflection, judgment and the expression of these activities in the form of essays, books, diagrams and, yes, even in the form of web sites or blogs. There is no grand slam against reality; no Tower to the Heavens that we can build that will let us storm the citadel of knowledge. We have to patiently scrape away at the matrix of the Unknown with our small intellects in order to see it more plainly.
Just as we have to work to see each other more plainly.
(1) The best critique I know on this subject is Hubert Dreyfus’ invaluable (it deserved a Pulitzer) What Computers Can’t Do: A Critique of Artificial Reason from 1972 and his new edition, What Computers Still Can’t Do, from 1992. The budding DHer can also benefit by reading the amusing remarks of Turow (2010) on Tort law. Turow shines a brilliant light on this very problem of knowledge representation and of reasoning from slightly differing circumstances.
(2) A very mild formulation compared to what we often find on the internet.
(3) ISO is another bad idea from the ’80s whose sell-by date has long passed.
Dreyfus (1972): Dreyfus, Hubert, What Computers Can't Do: A Critique of Artificial Reason. Harper and Row. 1972
Dreyfus (1992): Dreyfus, Hubert. What Computers Still Can't Do. MIT Press. 1992.
Turow (2010): Turow, Scott. One-L. Penguin Books (reprinted 2010).