Difference between revisions of "Semantic web"
(→See also: Microformat) |
(→See also: oEmbed - protocol for remote content embedding) |
||
Line 100: | Line 100: | ||
*[http://dbpedia.org/About DBpedia] ''- structured data built from wikipedia and help in a triplespace'' | *[http://dbpedia.org/About DBpedia] ''- structured data built from wikipedia and help in a triplespace'' | ||
*[http://academic.research.microsoft.com/Paper/924163 How to make a Semantic Browser] ''- paper by D. A. Quan and R. Karger'' | *[http://academic.research.microsoft.com/Paper/924163 How to make a Semantic Browser] ''- paper by D. A. Quan and R. Karger'' | ||
+ | *[http://www.oembed.com/ oEmbed] ''- protocol for remote content embedding'' | ||
[[Category:Web3.0]] | [[Category:Web3.0]] |
Revision as of 06:45, 18 September 2011
Contents
RDF Triples
RDF is a family of World Wide Web Consortium (W3C) specifications originally designed as a metadata model but which has come to be used as a general method of modeling information, through a variety of syntax formats.
The RDF metadata model is based upon the idea of making statements about resources in the form of subject-predicate-object expressions, called triples in RDF terminology. The subject denotes the resource, and the predicate denotes traits or aspects of the resource and expresses a relationship between the subject and the object. For example, one way to represent the notion "The sky has the color blue" in RDF is as a triple of specially formatted strings: a subject denoting "the sky", a predicate denoting "has the color", and an object denoting "blue".
This mechanism for describing resources is a major component in what is proposed by the W3C's Semantic Web activity: an evolutionary stage of the Web in which automated software can store, exchange, and use machine-readable information distributed throughout the web, in turn enabling users to deal with the information with greater efficiency and certainty. RDF's simple data model and ability to model disparate, abstract concepts has also led to its increasing use in knowledge management applications unrelated to Semantic Web activity.
The subject of an RDF statement is a resource, possibly as named by a Uniform Resource Identifier (URI). Some resources are unnamed and are called blank nodes or anonymous resources. They are not directly identifiable. The predicate is a resource as well, representing a relationship. The object is a resource or a Unicode string literal.
RDF at Runtime
Before talking about RDF at run time, a quick look at the relationship between XML and the Document Object Model (DOM) may be useful. XML is a standard way of describing hierarchical in textual form for storage and communication, but a textual representation is not very useful at run time because it needs to be converted into a system of objects and references native to the run time environment. The Document Object Model is a standard which defines how an XML document can be described and interacted with at Run time and provides a standard set of methods for navigating and manipulating such data, and how to transform information to and from XML text and DOM objects.
RDF can be described using XML and so can therefore be available within a run time environment that way, but this is not optimal because RDF is far simpler and more general than XML so it could be implemented with a lower-level, more efficient run-time data structure. RDF allows the simple definition of a set of subjects all containing many relationships (effectively key:value pairs). But it does not introduce any other concepts such as elements, attributes or entities.
URI's & Object References
A Uniform Resource Identifier (URI) is a string of characters used to identify a name or a resource on the Internet. Such identification enables interaction with representations of the resource over the network using specific protocols. Schemes specifying a concrete syntax and associated protocols define each URI.
One can classify URIs as locators (URLs), or as names (URNs), or as both. A Uniform Resource Name (URN) functions like a person's name, while a Uniform Resource Locator (URL) resembles that person's street address. In other words: the URN defines an item's identity, while the URL provides a method for finding it.
Large-scale organisation
The Semantic Web document by Tim Berners-Lee and a couple of others in 2001 is an excellent introduction to the concepts of how it would be used and how it works. It's set in the not too distant future at a time when the entire internet and its application are semantically structured. But one thing I find wrong with the picture they paint is that although the users experience of internet usage is very different, the physical organisations they're interacting with are completely unchanged in their nature, one example is a profit-driven health insurance company.
What I'm driving at is, if the users experience is so much rich, efficient and productive then so too would be the organisations. Think about some of the large open-source projects we have today. In as environment such as the one discussed in The Semantic Web, large open-source, decentralised projects would form and grow automatically as people supported the ideas with their resource.
Even now many such projects are branching out of the purely informational realm and into real world aspects such as reselling related products or services, or hiring developers and administrators. The corporations will find it harder and harder to compete because they can't be trusted, yet the new decentralised organisations have nothing hidden. Once this way of organisation becomes common-place it will spread into all areas of organisation and will eventually become the preferred method of handling organisation at the largest scale, such as health, transport, finance and industry.
- Open-source business
- What's Wrong With Politics and Can Technology Do Anything To Fix It?
- What business can learn from Open Source
- Open Source Democracy
Foundation Ontology
The N3 notation of the semantic web has one very important aspect in common with the foundation ontology of the nodal model which is that the they describe a data structure similar to standard hash-tables, associative-arrays or dictionaries, except that the keys are global object references (URI's) rather than just names. This allows the relationships themselves to exhibit universal meaning. Spaces whose nodes are all described by N3 triples are called triplespaces. A triplespace's nodes can be used as either objects, or as the relationships connecting other objects.
The optimal runtime data structure of a triplespace cannot be described by the usual hash-table implementations because they expect the key parameter to be a simple datatype, and cannot handle an object reference as a key. Some languages, such as Perl and LISP have implementations of associative arrays that do work with arbitrary types.
In information science, an Upper ontology (top-level ontology, or foundation ontology) is an attempt to create an ontology which describes very general concepts that are the same across all domains of knowledge and application. The aim is to have a large number on ontologies accessible under this upper ontology. It is usually a hierarchy of entities and associated rules (both theorems and regulations) that attempts to describe those general entities that do not belong to a specific problem domain.
Upper ontologies are commercially valuable, creating competition to define them. Peter Murray-Rust has claimed that this leads to "semantic and ontological warfare due to competing standards", and accordingly any standard foundation ontology is likely to be contested among commercial or political parties, each with their own idea of 'what exists' (in the philosophical sense).
No single upper ontology has yet gained widespread acceptance as a de facto standard. Different organisations are attempting to define standards for specific domains. The Process Specification Language (PSL) created by the National Institute for Standards and Technology (NIST) is one example.
- GIST - The minimalist upper ontology
- BFO - Basic Formal Ontology (manual)
- SUMO - Suggested Upper Merged Ontology
- PSL - Process Specification Language
- OpenCYC
- DOLCE - Descriptive Ontology for Linguistic and Cognitive Engineering
- GFO - General Formal Ontology
TripleSpace (Semantic TupleSpace)
The web and the tuplespace have many things in common. They are both global information spaces for persistent publication. Therefore, they share many of their underlying principles. They differ in their application context. The web is a world wide information space for the human reader and the tuplespace is a local space for parallel computation. We propose to extend the tuplespace into a triple-space, where <subject, predicate, object> describe content and semantics of information. The object can become a subject in a new triple and so defining a graph structure capturing structural information.
- - Dieter Fensel, Tim Berners-Lee & Eva Kühn, May 2004
RDF-triples are almost identical to the associative array, or key:value pair-based structure we're used to; triples are just a slightly more atomic way of describing the information. A statement of <subject, predicate, object> can be thought of as saying the subject contains the predicate:object association. See also the Notation3 (N3) format for triples.
Hydranode
Hydranode is more complete than the nodal reduction algorithm and already supports multiplexed and load-balanced communications over a diverse range of protocols, even though it may no longer be under development. In fact, any content distribution system that is protocol-independent can easily be turned into a work reduction system similar to the nodal model's approach, because any processeses that can be broken up into arbitrarily small work units are completed in the same way as sending or receiving any packet.
Tripoli
Tripoli is a GPL project written in Python that implements a TripleSpace over the Twisted infrastructure, which is an event-driven network programming framework under the MIT License.
Perl Object Environment
POE was originally developed as the core of a persistent object server and runtime environment. It has evolved into a general purpose multitasking and networking framework, encompassing and providing a consistent interface to other event loops. POE is a Perl equivalent to Tripoli.
Web4.0
I've been using the term Web4.0 to mean not only a semantic applicational internet (i.e. web3.0) but combined with p2p technology such as DHT's and Triplespaces - ie a serverless semantic internet. All the components of it are available already but just need to be glued together (which will be an automatic result of web3 anyway).
Writing
- The Semantic Web - By Tim Berners-Lee, James Hendler and Ora Lassila
- http://semantic-conference.blogs.com - a good down to earth blog about the semantic web
- http://www.oreillynet.com/pub/a/oreilly/tim/news/2005/09/30/what-is-web-20.html - Tim Orielly's article
- http://tomgruber.org/writing/index.htm - Thomas Gruber
See also
- Wiki workflow
- MediaWiki Semantic Bundle
- Long live the web - a call for continued open standards and neutrality by Tim Berners-Lee
- DBpedia - structured data built from wikipedia and help in a triplespace
- Microformat - A microformat is a web-based approach to semantic markup that seeks to re-use existing XHTML and HTML tags to convey metadata and other attributes
- Intro to semantic web for noobs
- Operator - the Firefox Microformats addon
- DBpedia - structured data built from wikipedia and help in a triplespace
- Cloud storage in a post-SQL world
- P2P:Peer Governance
- EarthOS
- Self-aware e-society
- Introduction to Ontology - Barry Smith
- Barry Smith is behind the Basic Formal Ontology and is heavily influenced by Husserl (particularly Husserl's Logical Investigations)
- Open-source business
- WFS OpenOrganizations
- The Semantic Web - the defining document by Tim Berners-Lee and others in 2001
- RDF
- Grid
- Wikipedia 3.0
- DBpedia - structured data built from wikipedia and help in a triplespace
- How to make a Semantic Browser - paper by D. A. Quan and R. Karger
- oEmbed - protocol for remote content embedding