Tuesday, November 9, 2021

Buzzword Bingo: How to speak 'Pleiadese'

                 ἢ τίς ἐστιν ἐξ ὑμῶν ἄνθρωπος,
                 ὃν αἰτήσει ὁ υἱὸς αὐτοῦ ἄρτον, μὴ λίθον ἐπιδώσει αὐτῷ;
    "Or what man is there among you who, if his son asks for bread, will give him a stone?"

    Matthew 7:9


    Like most of you I'm a huge fan of the papers and websites that come out of the Pleiades/ Pelagios/Peripleo/Recogito workshop.  It turns out though that a lot of the vocabulary that they use is from the world of Computer Science and not from the Humanities.  The Pleiades Team has never met a computer buzz word that they didn't fall madly in love with and so their writings can be ... abstruse ... and require a lot of interpretation before you can figure out what the heck they're saying.  The result is that many of you want to know more about this software effort but can't get past the vocabulary.  

    How about a nice dictionary of all these buzzwords which would help you not only to understand what PT is on about but will have the potential to lift your own writing to a whole new level of buzz-worthiness?

    Well, I've got your back. 

    This post is a vocabulary list of some of the PT's favorite buzzwords.  I've arranged them alphabetically (more or less).  In my next post I'll begin the work of relating these terms to each other and putting them into a clear context. 

    I hope that you'll play along by submitting other buzz words from their writings which I may have missed and that you will suggest corrections.  And don't let the number of terms worry you.  Like all buzz words, most of them mean exactly the same thing.  

    Some of these entries refer to articles in Wikipedia.  I know, I know.  However Wiki often deals with technical topics in clear and precise terms and that's why I used it in these instances. In a post after this I'm going to try to tell the Pleiades story in terms that everyone can understand and which will show how these vocabulary items can be understood in the Pleiades context.

    A * before a word means that it's defined elsewhere in the list.

    Term
    Definition
    Sources
    ARCHESThis is the Getty sponsored *ontology.
    PGPP0002;
    ARCHES is described here.
    CIDOCAn *ontology.  An instance of a *Conceptual Reference Model (CIDOC CRM).  "The CIDOC Conceptual Reference Model (CRM) provides definitions and a formal structure for describing the implicit and explicit concepts and relationships used in cultural heritage documentation...to promote a shared understanding of cultural heritage information by providing a common and extensible semantic framework that any cultural heritage information can be mapped to."PGPP0002; Wikipedia provides more here.
    conceptual reference model  (CRM)A theoretical *ontology.  A controlled vocabulary with variations for many areas of professional work.  Here, in particular, the area of cultural heritage.Described by Wiki here.
    contextualisationHere it means lots of different people's data/projects hooking up to the same geographical network of nodes which  provides the context.
    CRM
    An *ontology.

    *conceptual reference model
    https://www.igi-global.com/dictionary/conceptual-reference-model-crm/5242
    decentralise,

    decentralization
    decentralise here means 'other people do the work and then attach it to our backbone.'
    From PGPP0002 (Abstract) 

    "This paper discusses an emerging cloud of Linked Open Data in the humanities sometimes referred to as the Graph of Ancient World Data (GAWD). It provides historical background to the domain, before going on to describe the open and decentralised characteristics which have partially characterised its development."
    discoveryHere it means 'You'll have more hits to your project if you attach it to our backbone.'   No numbers are presented.PGPP0002: " ..., as well as the increased likelihood of discovery by consumers following the same API in the opposite direction."
    Dublin Core"The Dublin Core™ Metadata Element Set is a vocabulary of fifteen properties for use in resource description."PGPP0009; Their homepage is here.
    emic concepts"emic concepts, i.e. those which originated in the languages of those periods."PGPP0002
    etic concepts"etic conceptual schemes developed by contemporary scholars to describe cultural phenomena for which no linguistic evidence survives."PGPP0002
    GAWDGraph of Ancient World Data
    R. Robineau. Graph of Ancient World Data, June 2012.  See an explanatory diagram here.

    GeoNames
    An enormous DB of geographical names which can be used as the basis for a standardized naming system which, i fit were adopted in Humanities research, all researchers would have to conform.


    "The GeoNames geographical database covers all countries and contains over eleven million placenames that are available for download free of charge."
    Their primary web page is here.
    graph databaseA database that supports the establishment and modification of data that is envisaged as a collection of linked nodes.

    A paper by Pokorný is here

    Pokorný makes the important point that graph structures can be represented by relational DBs.

    "For example, we can represent a graph by tables in a relational DBMS (RDBMS) and use sophisticated constructs of SQL or Datalog to express some graph queries."
    interconnection format
    PGPP0001, This term deprecated and now replaced by the *Linked Places Format
    ISAWInstitute for the Study of the Ancient WorldPGPP0002; Homepage is here.
    ISO, NISO
    International Standards Organization,

    National Information Standards Organization
    This is ominous to me.   Real scholarship has no business whatsoever being anywhere near any Standards Organization. 


    The NISO (founded in 1939) is described here and it is hand-in-glove with the International Standards Organization or ISO.  I should think that ISO wouldn't, for shame, be able to hold up its head after the disasters of the 80's and 90's.
    JiscA UK Funding body.  They provide 'digital solutions'.  God only knows what they actually do.PGPP0002; Their website is here.
    Linked Ancient World Data Institute  (LAWDI)A workshop held in 2012.  " ... the Linked Ancient World Data Institute (LAWDI), an internationally attended workshop funded by the National endowment for humanities' office of Digital humanities ... "  It was hosted by ISAW and Drew University.

    More information from the National Industrial Standards Organization (NISO) here.  This seems alarming to me.  What relevance could the NISO have to scholarship?

    LAWDI is dealt with by National Endowment for the Humanities here.
    Linked DataA structure consisting of virtual nodes of information each one of which has a connection to one or more others or is connected to by one or more others.I don't think too much harm can be caused by linking to Wikipedia in this instance.
    Linked Open DataFrom the Getty document : "When data is linked and open, it means that data is structured and published according to the principles of Linked Data, so that it can be both interlinked and made openly accessible and shareable on the Semantic Web."
    PGPP0002;  e.g. the Getty's LOD which is described here.
    These are  the documents which relate the Getty Institute vocabularies to an actual data linking standard.
    Linked Places Format
    A regimented name list:

    "Linked Places format is used to describe attestations of places in a standard way, primarily for linking gazetteer datasets."

    "LPF v1.2 is implemented in current versions of World Historical Gazetteer and Pelagios projects, including Recogito."
    Defined here.
    LinksA convention by which one entity may refer to another.
    PGPP0001
    "The purpose of the place records is to form anchor points for cross-linking between  different gazetteer datasets (no more, no less) in order to indicate similarity."   and  "defined as a relation “[...] used to link two concepts that are sufficiently similar that they can be used interchangeably in some  information retrieval applications”
    mereologicalThe study of the relationship of part to part or parts to a whole
    PGPP0002;  Defined here.

    I have to confess that this buzzword was my hands-down favorite.
    meta-Tags which are added to the semantic web in order to, among other things, establish the authoritative document from which the semantic node is derived.The main difficulty with the Semantic Web is that there is no built-in provision for sourcing subject or object nodes and no way to prevent different triples in the same system from being contradictory.  'Metatags' are layered on to try to correct this problem.  The resulting complexity which tags introduce often leads to the collapse of the system.
    namespaceHere:   A specific closed set (only these names are allowable for members that are formally connected to the specific namespace).This is also a concept in Anthropology where certain societies draw from a bounded set of names as the only proper names for their members.  See Harrison [1990].
    NEO4jA db tool that creates graph databases where the link is automatically stored with the node.  This product is orders of magnitude more powerful than anything Pleiades could ever conceivably require.

    There is a 1 minute video here which gets right to the (simple) nitty-gritty. 
    ontologyIn this instance: A formalism for defining object names and definitions unambiguously.  E.gg. CIDOC and Open Annotation.  Also the Getty ARCHES system.  The intent is usually to create an exhaustive *namespace for the primary entities in the Domain of Discourse.Really a 'cosmology'.  See the remarks in Cramer [2007].  [1]
    Open Annotation" ... an interoperable framework for creating associations between related resources, annotations, ... "
    PGPP0002; 

    More here.
    Open Annotation OntologyAn open annotation formalism that deals with objects.Online here.
    PDDLAn Open Data LicensePGPP0002
    RDF
    Resource Description Framework

    An RDF is a formalism that describes how triples should wear fancy dress.  See *serialization.

    From Ontotext website:
    "RDF is a standard for data interchange that is used for representing highly interconnected data. Each RDF statement is a three-part structure consisting of resources where every resource is identified by a URI."

    From the W3C wiki:  "RDF extends the linking structure of the Web to use URIs to name the relationship between things as well as the two ends of the link (this is usually referred to as a “triple”)."
    RDF dealt with clearly on the Ontotext website here.
    RDF-based *serialization format*Serialization based on RDF triples.
    PGPP0001;
    For a discussion of the advantages and disadvantages of various *serialization techniques see this.
    RecogitoA note-taking applicationHomepage is here.
    RESTRepresentational State Transfer.  A set of 'best practices' for programming on the Internet.  Makes use of the ideas of *URIs, etc.
    PGPP0009
    https://en.wikipedia.org/wiki/Representational_state_transfer

    https://shareurcodes.com/blog/creating%20a%20simple%20rest%20api%20in%20php
    RESTful"A web API that obeys the *REST constraints is informally described as RESTful."  God help us.https://en.wikipedia.org/wiki/Representational_state_transfer
    RFCRequest for Comments.  A document which specifies a  protocol or specific form or procedure for accomplishing something.
    SemanticIn all cases under consideration this refers to 'meaning'.  That is (the machine) understanding and reacting as though Athens is a real place as opposed to understanding 'Athens' as a coded token only.   That machines cannot interpret anything in terms of semantics is what drives 99% of everything being discussed here.
    Semantic WebA computer structure optimistically thought to embody *semantic meaning but really only interlinked computer tokens.  *linked dataSee this on Wiki.
    SeneschalSemantic ENrichment Enabling Sustainability of arCHAeological Links.  Vocabularies and tools aimed at homogenization of terminology in the field of Archaeology.  Although portions of it have been re-used the project itself seems defunct.PGPP0002;  Home page for Seneschal is here.
    serialization, serialization format
    A formal representation of RDF triples.  There are several ways that this is customarily done: e.gg.,   XML, RDFa, .n3, Turtle, N-Triples, N-Quads, JSON-LD, JSON-AD, HexTuples, HDT, RDF Binary Thrift. 
    After you form ('write down') your RDF triples then you'll have to 'serialize' them or convert them into some computer formalism meant to support triples.
    For a discussion of the advantages and disadvantages of various serialization techniques see this.
    Shared backboneThe use of a single reference gazetteer to which all the others must link : "to choose a single “reference gazetteer” (or a small number of them) to which every specialist gazetteer should strive to link."    " ... and are available as Linked Data."PGPP0001
    skosSimple Knowledge Organization System.  An *ontology.PGPP0001; Home page for SKOS is here.
    SNAPStandards for Networking Ancient Prosopographies
    Home page is here.


    stakeholdersParticipants and contributors to a project.PGPP0002
    TEIText Encoding Initiative;  A text markup formalism.  On their site: "a consortium which collectively develops and maintains a standard for the representation of texts in digital form."https://tei-c.org/
    triplethree entities although here: two objects connected by a link"A semantic triple, or RDF triple or simply triple, is the atomic data entity in the Resource Description Framework data model. As its name indicates, a triple is a set of three entities that codifies a statement about semantic data in the form of subject–predicate–object expressions". More in Wikipedia
    Turtle (RDF)
    A formalism for expressing an entity in a linked data system.
    http://snapdrgn.net/about)
    A sample definition (for Athens) is in PGPP0001 in Figure 8.5.
    URCUniversal Resource Characteristics
    A helpful blog entry that explains URI, URN, URL is here.

    This Venn Diagram shows the relationship between URL, URN, and URI.
    URIUniversal Resource Identifier.  An identifier of a specific resource,  e.gg. page, book, document.  Does not necessarily contain the means or address for accessing it.  See: PGPP0001  Also PGPP0002 : "the establishment of services providing stable URIs for shared categorical and instance thesauri."
    PGPP0002

    This Venn Diagram shows the relationship between URL, URN, and URI.
    URNUniform Resource Name: A URI that uses the URN scheme.  "persistent, location-independent identifiers assigned within defined namespaces, typically by an authority responsible for the namespace, so that they are globally unique and persistent over long periods of time"
    For more on URNs see the wiki article here.

    This Venn Diagram shows the relationship between URL, URN, and URI.
    Upper Ontology"In information science, an upper ontology (also known as a top-level ontology, upper model, or foundation ontology) is an ontology (in the sense used in information science) which consists of very general terms (such as "object", "property", "relation") that are common across all domains."https://en.wikipedia.org/wiki/Upper_ontology
    URLUniform Resource Locator.  All URLs are URIs but not all URIs are URLs. (A subset of URIs)   "special type of identifier that also tells you how to access it, such as HTTPs, FTP, etc.—like https://www.google.com." 
    A helpful blog entry that explains URI, URN, URL is here.
    This Venn Diagram shows the relationship between URL, URN, and URI.

    Vocabulary (controlled Vocabulary)Lists of standardized and homogenized terms for every aspect of the Humanities to which researchers will be forced to conform.PGPP0002 : "Controlled vocabularies   An extremely important development has been the establishment of services providing stable URIs for shared categorical and instance thesauri.  These include place gazetteers, type classifications for coins and canonical citations for classical literature.  Without them, earlier attempts at 'interoperability' were seriously hampered by the lack of common reference terms for analogous content despite the availability of ontologies that defined shared or equivalent properties."

    Footnotes

    [1] Cramer [2007] :  " ... what he and computer science  call "ontology" is, outside such jargon and in a more common sense  language, not an ontology, but a cosmology."  'He' is Tim Berners-Lee.


    Bibliography

    Berman et al. [2016] : Berman, Merrick Lex and Ruth Mostern, Humphrey Southall eds. Placing Names: Enriching and Integrating Gazetteers. Bloomington: Indiana University Press, pp. 97–109. 2016.

    Cramer [2007] :  Cramer, Florian.  'Animals that Belong to the Emperor',  Online here.

    PGPP0001 :  Simon, Rainer and Leif Isaksen, Elton Barker, Pau de Soto Cañamares, '8. The Pleiades Gazetteer and the Pelagios Project' in Berman et al. [2016], pp. 97-109.  Online here.

    PGPP0002 :  Isaksen, Leif and Elton Barker, Rainer Simon, Pau de Soto, 'Pelagios and the emerging graph of ancient world data', June 2014.  DOI: 10.1145/2615569.2615693.  Online here.

    PGPP0003 :  Barker, Elton, and Anna Foka, Kyriaki Konstantinidou, 'Coding for the Many, Transforming Knowledge for All: Annotating Digital Documents', Publications of the Modern Language Association, 135(1) pp. 195–202.  2020.  Online here.
                    RGPP0009 : Elliott, Tom and Sean Gillies.  'Digital Geography and Classics', 
                  Digital Humanities Quarterly(3:1) 2009.  Online here.

                 Harrison [1990] :  Harrison, Simon.  Stealing People's Names, Cambridge University Press, 1990.


        





No comments:

Post a Comment

Locating a Late Minoan Settlement near Prina on Crete (C7884)

In Hayden [2005] there is a description of a Late Minoan settlement in the Vrokastro area of Crete.  The site sits just below the western bo...