Semantic web

This page describes a concept which is part of our glossary

The content of this article requires cleaning up to meet OD's quality standards. Check the wiki best practices for guidelines on improving article and categorisation quality.

RDF Triples

RDF is a family of World Wide Web Consortium (W3C) specifications originally designed as a metadata model but which has come to be used as a general method of modeling information, through a variety of syntax formats.

The RDF metadata model is based upon the idea of making statements about resources in the form of subject-predicate-object expressions, called triples in RDF terminology. The subject denotes the resource, and the predicate denotes traits or aspects of the resource and expresses a relationship between the subject and the object. For example, one way to represent the notion "The sky has the color blue" in RDF is as a triple of specially formatted strings: a subject denoting "the sky", a predicate denoting "has the color", and an object denoting "blue".

This mechanism for describing resources is a major component in what is proposed by the W3C's Semantic Web activity: an evolutionary stage of the Web in which automated software can store, exchange, and use machine-readable information distributed throughout the web, in turn enabling users to deal with the information with greater efficiency and certainty. RDF's simple data model and ability to model disparate, abstract concepts has also led to its increasing use in knowledge management applications unrelated to Semantic Web activity.

The subject of an RDF statement is a resource, possibly as named by a Uniform Resource Identifier (URI). Some resources are unnamed and are called blank nodes or anonymous resources. They are not directly identifiable. The predicate is a resource as well, representing a relationship. The object is a resource or a Unicode string literal.

RDF at Runtime

Before talking about RDF at run time, a quick look at the relationship between XML and the Document Object Model (DOM) may be useful. XML is a standard way of describing hierarchical in textual form for storage and communication, but a textual representation is not very useful at run time because it needs to be converted into a system of objects and references native to the run time environment. The Document Object Model is a standard which defines how an XML document can be described and interacted with at Run time and provides a standard set of methods for navigating and manipulating such data, and how to transform information to and from XML text and DOM objects.

RDF can be described using XML and so can therefore be available within a run time environment that way, but this is not optimal because RDF is far simpler and more general than XML so it could be implemented with a lower-level, more efficient run-time data structure. RDF allows the simple definition of a set of subjects all containing many relationships (effectively key:value pairs). But it does not introduce any other concepts such as elements, attributes or entities.

URI's & Object References

List Space

Nodes

Nodal Reduction

Large-scale organisation

The Semantic Web document by Tim Berners-Lee and a couple of others in 2001 is an excellent introduction to the concepts of how it would be used and how it works. It's set in the not too distant future at a time when the entire internet and its application are semantically structured. But one thing I find wrong with the picture they paint is that although the users experience of internet usage is very different, the physical organisations they're interacting with are completely unchanged in their nature, one example is a profit-driven health insurance company.

What I'm driving at is, if the users experience is so much rich, efficient and productive then so too would be the organisations. Think about some of the large open-source projects we have today. In as environment such as the one discussed in The Semantic Web, large open-source, decentralised projects would form and grow automatically as people supported the ideas with their resource.

Even now many such projects are branching out of the purely informational realm and into real world aspects such as reselling related products or services, or hiring developers and administrators. The corporations will find it harder and harder to compete because they can't be trusted, yet the new decentralised organisations have nothing hidden. Once this way of organisation becomes common-place it will spread into all areas of organisation and will eventually become the preferred method of handling organisation at the largest scale, such as health, transport, finance and industry.

Web4.0

I've been using the term Web4.0 to mean not only a semantic applicational internet (i.e. web3.0) but combined with p2p technology such as DHT's and Triplespaces - ie a serverless semantic internet. All the components of it are available already but just need to be glued together (which will be an automatic result of web3 anyway).

Ontologies

Even if not developing all the functionality ourselves and using many diverse open source applications, we still need to define all operations in our wiki-based ontologies:

Organisation: entities, roles, processes, workflow...
Accounting: transactions, accounts, reports...
People: HR, CRM, comms, groups...
Content: Research, packages, channels, backups...
Software: Development, testing, feedback, versioning, deployment...

Foundation Ontology

The N3 notation of the semantic web has one very important aspect in common with the foundation ontology of the nodal model which is that the they describe a data structure similar to standard hash-tables, associative-arrays or dictionaries, except that the keys are global object references (URI's) rather than just names. This allows the relationships themselves to exhibit universal meaning. Spaces whose nodes are all described by N3 triples are called triplespaces. A triplespace's nodes can be used as either objects, or as the relationships connecting other objects.

The optimal runtime data structure of a triplespace cannot be described by the usual hash-table implementations because they expect the key parameter to be a simple datatype, and cannot handle an object reference as a key. Some languages, such as Perl and LISP have implementations of associative arrays that do work with arbitrary types.

In information science, an Upper ontology (top-level ontology, or foundation ontology) is an attempt to create an ontology which describes very general concepts that are the same across all domains of knowledge and application. The aim is to have a large number on ontologies accessible under this upper ontology. It is usually a hierarchy of entities and associated rules (both theorems and regulations) that attempts to describe those general entities that do not belong to a specific problem domain.

Upper ontologies are commercially valuable, creating competition to define them. Peter Murray-Rust has claimed that this leads to "semantic and ontological warfare due to competing standards", and accordingly any standard foundation ontology is likely to be contested among commercial or political parties, each with their own idea of 'what exists' (in the philosophical sense).

No single upper ontology has yet gained widespread acceptance as a de facto standard. Different organisations are attempting to define standards for specific domains. The Process Specification Language (PSL) created by the National Institute for Standards and Technology (NIST) is one example.

GIST - The minimalist upper ontology
BFO - Basic Formal Ontology (manual)
SUMO - Suggested Upper Merged Ontology
PSL - Process Specification Language
OpenCYC
DOLCE - Descriptive Ontology for Linguistic and Cognitive Engineering
GFO - General Formal Ontology

Writing

http://semantic-conference.blogs.com - a good down to earth blog about the semantic web
http://www.sciam.com/print_version.cfm?articleID=00048144-10D2-1C70-84A9809EC588EF21 - By Tim Berners-Lee, James Hendler and Ora Lassila
http://www.oreillynet.com/pub/a/oreilly/tim/news/2005/09/30/what-is-web-20.html - Tim Orielly's article
http://tomgruber.org/writing/index.htm - Thomas Gruber

The Nodal Model

The idea behind the nodal model was to develop this functionality from simple common rules that can be easily described in terms of the foundation ontology. The current implementation of peerd.c runs a trivial test application in a functioning node space.

These rules are an ontology similar to the prototype-based object-oriented ontology by which objects can be merged together into composite objects. Constructor methods allow the objects to affect the container into which they are instantiated, according to their class.

It is extended to handle a universal way of describing process scheduling and workflow, using an execution model called nodal reduction. This allows a node to inherently have its content (its key/value pairs or attributes and methods) change over a schedule in accord with its class, thereby adding the time domain to the generic associative-array model.

Upon reflection of this method and the current direction of the semantic web, I find that it's still very relevant and is a nice compact way of handling triplespace strucuture at runtime. The only change in the fundamental level would be to change the serialise.c to work with N3 instead of an arbitrary wiki-based syntax, and to use a URI-compatible global-identity system.

Some applications forming the semantic web

Knoodl - Semantic wiki for communities and ontologies & organisation, very similar to mediawiki, but closed-source
Flickr
OpenBC
Various social networking sites
http://www.myspace.com/
http://www.friendster.com/
http://www.google.com/help/faq_clicktocall.html
http://www-128.ibm.com/developerworks/lotus/library/ae/
List of Google Apps

OD System

The Organic Design System uses a generic record type which is the foundation for the description of all changes of state in any dimension of the organisation. This has not yet been formalised into semantic annotations and templates in the wiki yet, but is conceptually very simple and is being used in spreadsheet form for various organisational aspects like book keeping, simple reporting, keeping track of stock and time management.

Records

In the wiki, the generic record type is a template containing named parameters. Queries can then be made via special pages that result in selections of record articles, and these can then be rendered as sortable and filterable tables. Each record is a row in the table, and the columns match the parameter names in the generic record template.

The record template parameters fit into groups, which in the wiki should probably be sub-templates with their own forms and sub-types. These groups are time (incl. date, duration, cycle, multiple etc), activity (type of job, event or change), transaction (from, to, amount, type, bals incl. $, hrs, qty, etc). Any other information such as textual description, images, specific annotations, categories etc are accessible from the main record form and are directed to the main content of the article.

All hierarchy and grouping (eg. many transactions making up a single job, or one transaction involving many accounts, etc) are done with normal categorisation, and it's up to SimpleDatabase to render this structure as a tree.

TripleSpace (Semantic TupleSpace)

The web and the tuplespace have many things in common. They are both global information spaces for persistent publication. Therefore, they share many of their underlying principles. They differ in their application context. The web is a world wide information space for the human reader and the tuplespace is a local space for parallel computation. We propose to extend the tuplespace into a triple-space, where <subject, predicate, object> describe content and semantics of information. The object can become a subject in a new triple and so defining a graph structure capturing structural information.

- Dieter Fensel, Tim Berners-Lee & Eva Kühn, May 2004

RDF-triples are almost identical to the associative array, or key:value pair-based structure we're used to; triples are just a slightly more atomic way of describing the information. A statement of <subject, predicate, object> can be thought of as saying the subject contains the predicate:object association. See also the Notation3 (N3) format for triples.

Hydranode

Hydranode is more complete than the nodal reduction algorithm and already supports multiplexed and load-balanced communications over a diverse range of protocols, even though it may no longer be under development. In fact, any content distribution system that is protocol-independent can easily be turned into a work reduction system similar to the nodal model's approach, because any processeses that can be broken up into arbitrarily small work units are completed in the same way as sending or receiving any packet.

Tripoli

Tripoli is a GPL project written in Python that implements a TripleSpace over the Twisted infrastructure, which is an event-driven network programming framework under the MIT License.

Perl Object Environment

POE was originally developed as the core of a persistent object server and runtime environment. It has evolved into a general purpose multitasking and networking framework, encompassing and providing a consistent interface to other event loops. POE is a Perl equivalent to Tripoli.

Semantic web

Contents