Semantic web

From Organic Design wiki
Revision as of 20:52, 13 May 2020 by Nad (talk | contribs) (Nad moved page Web 3.0 to Semantic web)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Glossary.svg This page describes a concept which is part of our glossary

MediaWiki's semantic support

The most well-known integration of MediaWiki with the semantic web is Semantic MediaWiki (SMW), but there is now a strong movement for native support.

First came Wikidata which is a free and open semantic network that can be read and edited by both humans and machines. The information can be exported as RDF, RDFS, OWL and other semantic web formats. Wikidata acts as central storage for the structured data of its Wikimedia sister projects including Wikipedia, Wikivoyage, Wikisource, and others.

Note: The WikiData search doesn't appear to be very useful for general purpose things, for example, I wanted to find properties that had equivalents in the RDFS schema, but the search wouldn't return any results for that. I had to use Google instead.

See Wikibase data model and Property talk:P31 (P31 is "instance of") for more detail.

Next there is a big movement under way to allow MediaWiki to fully support MicroFormats (see the page on MediaWiki progress here). And also native support for MicroData (The schema.org initiative started by Google, Yahoo, Bing and Yandex in 2011) is progressing well - for example the $wgAllowMicrodataAttributes configuration option has been removed in favour of default native support.

I've added in some custom changes to our skin on the AfterFinalPageOutput hook, which isn't a very efficient way of doing things, but serves as a good test. Before implementing this the search engine results for pages in Organic Design were very disorganised since they included any pages that referred to the query in the sidebar or other areas of the page. After adding schema.org microdata, only pages that include the results in the actual content were shown. This is our general page HTML structure no with the nicrodata highlighted:

<body class="ns-0 ns-subject page-Foo skin-monobook action-view user" itemscope itemtype="http://schema.org/WebPage">
	<div id="globalWrapper">
		<div id="languages" itemscope itemtype="http://www.schema.org/SiteNavigationElement"> ... </div>
		<div id="column-content" itemprop="mainContentOfPage" itemscope itemtype="http://schema.org/WebPageElement"> ... </div>
		<div id="column-one" itemscope itemtype="http://www.schema.org/WPSideBar" itemscope itemtype="http://www.schema.org/WPSideBar"> ... </div>
		<div class="visualClear"></div>
		<div id="footer" itemscope itemtype="http://www.schema.org/WPFooter"> ... </div>
	</div>
</body>

RDF Triples

RDF is a family of World Wide Web Consortium (W3C) specifications originally designed as a metadata model but which has come to be used as a general method of modeling information, through a variety of syntax formats.

The RDF metadata model is based upon the idea of making statements about resources in the form of subject-predicate-object expressions, called triples in RDF terminology. The subject denotes the resource, and the predicate denotes traits or aspects of the resource and expresses a relationship between the subject and the object. For example, one way to represent the notion "The sky has the color blue" in RDF is as a triple of specially formatted strings: a subject denoting "the sky", a predicate denoting "has the color", and an object denoting "blue".

This mechanism for describing resources is a major component in what is proposed by the W3C's Semantic Web activity: an evolutionary stage of the Web in which automated software can store, exchange, and use machine-readable information distributed throughout the web, in turn enabling users to deal with the information with greater efficiency and certainty. RDF's simple data model and ability to model disparate, abstract concepts has also led to its increasing use in knowledge management applications unrelated to Semantic Web activity.

The subject of an RDF statement is a resource, possibly as named by a Uniform Resource Identifier (URI). Some resources are unnamed and are called blank nodes or anonymous resources. They are not directly identifiable. The predicate is a resource as well, representing a relationship. The object is a resource or a Unicode string literal.

RDF at Runtime

Before talking about RDF at run time, a quick look at the relationship between XML and the Document Object Model (DOM) may be useful. XML is a standard way of describing hierarchical in textual form for storage and communication, but a textual representation is not very useful at run time because it needs to be converted into a system of objects and references native to the run time environment. The Document Object Model is a standard which defines how an XML document can be described and interacted with at Run time and provides a standard set of methods for navigating and manipulating such data, and how to transform information to and from XML text and DOM objects.

RDF can be described using XML and so can therefore be available within a run time environment that way, but this is not optimal because RDF is far simpler and more general than XML so it could be implemented with a lower-level, more efficient run-time data structure. RDF allows the simple definition of a set of subjects all containing many relationships (effectively key:value pairs). But it does not introduce any other concepts such as elements, attributes or entities.

URI's & Object References

A Uniform Resource Identifier (URI) is a string of characters used to identify a name or a resource on the Internet. Such identification enables interaction with representations of the resource over the network using specific protocols. Schemes specifying a concrete syntax and associated protocols define each URI.

One can classify URIs as locators (URLs), or as names (URNs), or as both. A Uniform Resource Name (URN) functions like a person's name, while a Uniform Resource Locator (URL) resembles that person's street address. In other words: the URN defines an item's identity, while the URL provides a method for finding it.

Large-scale organisation

The Semantic Web document by Tim Berners-Lee and a couple of others in 2001 is an excellent introduction to the concepts of how it would be used and how it works. It's set in the not too distant future at a time when the entire internet and its application are semantically structured. But one thing I find wrong with the picture they paint is that although the users experience of internet usage is very different, the physical organisations they're interacting with are completely unchanged in their nature, one example is a profit-driven health insurance company.

What I'm driving at is, if the users experience is so much rich, efficient and productive then so too would be the organisations. Think about some of the large open-source projects we have today. In as environment such as the one discussed in The Semantic Web, large open-source, decentralised projects would form and grow automatically as people supported the ideas with their resource.

Even now many such projects are branching out of the purely informational realm and into real world aspects such as reselling related products or services, or hiring developers and administrators. The corporations will find it harder and harder to compete because they can't be trusted, yet the new decentralised organisations have nothing hidden. Once this way of organisation becomes common-place it will spread into all areas of organisation and will eventually become the preferred method of handling organisation at the largest scale, such as health, transport, finance and industry.

Foundation Ontology

The N3 notation of the semantic web has one very important aspect in common with the foundation ontology of the nodal model which is that the they describe a data structure similar to standard hash-tables, associative-arrays or dictionaries, except that the keys are global object references (URI's) rather than just names. This allows the relationships themselves to exhibit universal meaning. Spaces whose nodes are all described by N3 triples are called triplespaces. A triplespace's nodes can be used as either objects, or as the relationships connecting other objects.

The optimal runtime data structure of a triplespace cannot be described by the usual hash-table implementations because they expect the key parameter to be a simple datatype, and cannot handle an object reference as a key. Some languages, such as Perl and LISP have implementations of associative arrays that do work with arbitrary types.

In information science, an Upper ontology (top-level ontology, or foundation ontology) is an attempt to create an ontology which describes very general concepts that are the same across all domains of knowledge and application. The aim is to have a large number on ontologies accessible under this upper ontology. It is usually a hierarchy of entities and associated rules (both theorems and regulations) that attempts to describe those general entities that do not belong to a specific problem domain.

Upper ontologies are commercially valuable, creating competition to define them. Peter Murray-Rust has claimed that this leads to "semantic and ontological warfare due to competing standards", and accordingly any standard foundation ontology is likely to be contested among commercial or political parties, each with their own idea of 'what exists' (in the philosophical sense).

No single upper ontology has yet gained widespread acceptance as a de facto standard. Different organisations are attempting to define standards for specific domains. The Process Specification Language (PSL) created by the National Institute for Standards and Technology (NIST) is one example.

  • GIST - The minimalist upper ontology
  • BFO - Basic Formal Ontology (manual)
  • SUMO - Suggested Upper Merged Ontology
  • PSL - Process Specification Language
  • OpenCYC
  • DOLCE - Descriptive Ontology for Linguistic and Cognitive Engineering
  • GFO - General Formal Ontology

TripleSpace (Semantic TupleSpace)

The web and the tuplespace have many things in common. They are both global information spaces for persistent publication. Therefore, they share many of their underlying principles. They differ in their application context. The web is a world wide information space for the human reader and the tuplespace is a local space for parallel computation. We propose to extend the tuplespace into a triple-space, where <subject, predicate, object> describe content and semantics of information. The object can become a subject in a new triple and so defining a graph structure capturing structural information.

- Dieter Fensel, Tim Berners-Lee & Eva Kühn, May 2004

RDF-triples are almost identical to the associative array, or key:value pair-based structure we're used to; triples are just a slightly more atomic way of describing the information. A statement of <subject, predicate, object> can be thought of as saying the subject contains the predicate:object association. See also the Notation3 (N3) format for triples.

Hydranode

Hydranode is more complete than the nodal reduction algorithm and already supports multiplexed and load-balanced communications over a diverse range of protocols, even though it may no longer be under development. In fact, any content distribution system that is protocol-independent can easily be turned into a work reduction system similar to the nodal model's approach, because any processeses that can be broken up into arbitrarily small work units are completed in the same way as sending or receiving any packet.

Tripoli

Tripoli is a GPL project written in Python that implements a TripleSpace over the Twisted infrastructure, which is an event-driven network programming framework under the MIT License.

Perl Object Environment

POE was originally developed as the core of a persistent object server and runtime environment. It has evolved into a general purpose multitasking and networking framework, encompassing and providing a consistent interface to other event loops. POE is a Perl equivalent to Tripoli.

Web4.0

I've been using the term Web4.0 to mean not only a semantic applicational internet (i.e. web3.0) but combined with p2p technology such as DHT's and Triplespaces - ie a serverless semantic internet. All the components of it are available already but just need to be glued together (which will be an automatic result of web3 anyway).

Writing

See also