PeerPedia
Peerpedia is the notion of making a MediaWiki extension which allows the wiki to use a P2P network for it's file and database storage mechanism. This would be a downloadable package enabling users to install and run a MediaWiki daemon or service which could run in a standalone way not requiring a web-server or database-server.
We would not want this to be a MediaWiki fork, but rather an extension which would use the most current version of MediaWiki. Since the file aspect of PeerPedia is built on unified distributed space, the mediawiki source can be automatically maintained to the most recent version transparently for all users.
To develop the PeerPedia solution using a standard MediaWiki code base running on every peer. The peer-based MediaWiki doesn't need to be highly scalable and so doesn't require a powerful database server like MySQL or PostgreSQL and doesn't require a standalone web-server like Apache either. MediaWikiLite extension is being developed to replace the database server with a PHP library and allow the MediaWiki PHP code to run as a daemon which communicates directly with its clients over an HTTP socket instead of running as a module or CGI of a web server.
Extension:P2P.php extension is used in conjunction with MediaWikiLite to allow a P2P layer to directly connect clients who are editing the same articles so that they can synchronise data and reduce edit-conflicts. A DHT is used to maintain meta data about all the wiki::namespace:title's in use allowing all peers to act as redundant caches for the data they share. Within the maintained content should be an index of all articles.
It should be possible to allow PeerPedia to also function as a web overlay allowing articles to be associated with URL's and to integrate very tightly with the source site in the case of it being a MediaWiki. This way PeerPedia can become a decentralisation solution for Wikipedia.
Contents
Distributed storage
To allow migration of existing web-based wiki content into distributed space, the article space should include interwiki prefixes and exhibit a page listing all wikis.
Due to the potential size of some wikis (such as Wikipedia), most clients will prefer to store only the articles they have themselves visited or watched, so access to the text content needs to be adjusted to allow for distributed storage and collection.
Since PeerPedia uses MediaWikiLite, each wiki's data is stored in a file which makes low-level access and manipulation more practical to develop than for MySQL, the details of SQLite's interaction with the storage media are explained here, and this could be interacted with directly using PHP's File Wrappers. Hopefully this level of integration will not be necessary though, since it's only the text content and uploaded files which need to be distributed, and this should be able to be achieved by hooking in to the MediaWiki code at the SQL level and above.
Peercasting
Peercasting is using a p2p serverless infrastructure to impliment channels of content which can broadcast to many clients. This can't be done with normal p2p filesharing infrastructures because they're based on breaking whole files into smaller parts which are then all treated with the same priority, peer-casting uses multiplexing to create the effect of continuous streams and can also multicast.
Using an existing decentralised multicast channels solution benefits the nodal development effort greatly because every node would ideally have a live channel to all the places that make use of it. Having the one-to-many aspect is a conceptual level above a normal DHT, because it's essentially a DHT with built in change propagation amongst all the instances.
Old Notes
The current P2P technology such as DHT's are very mature now, and can easily integrate with a Linux or Windows filsystem interface. But currently there is no support for SQL-like database queries directly on the network. There is much development in the area of range queries on distributed spaces such as P-Grid and PIER, which are the foundation on which an SQL-like language coule be constructed.
A cut down version of SQL which is sufficient to cover MediaWiki's operational needs may be practical to develop. To do this we would first need to log and analyse all the patterns of SQL queries which the MediaWiki software performs.
If a more complete SQL interface to distributed spaces becomes available, the Peerpedia idea could be made even more generic by supplying a basic LAMP environment in which any standard web applications built in PHP, Perl, Python or Ruby etc could run without modification. They would be running as if connecting to a specific server for their file IO and database queries, but there would actually be no specific servers at all, the network itself would be responsible for resolving queries and requests, and would work persistently and reliably regardless of the spontaneous arrival and departure of indiviual peers in the network.
MediaWiki MySQL Query Analysis
I set our MySQL logs to capture all queries during which time I navigated through some special pages and pages containing DPL queries, and also edited and watched some pages etc to create a wide variety of SQL queries. After about five minutes I switched the logging off to begin analysis on the resulting log information which was about 5MB.
The first stage of the analysis will be to reduce the log down to the basic query structures used by removing all specific tables names and values from the queries and then removing duplicates. Here's a list of the replacement rules I'll be using to reduce these specifics and meta information.
- Each command starts with a number ending on chr 23
- Command type starts on chr 25 (Connect|Init DB|Query|Quit|Shutdown)
- Command content starts on chr 37 and may span many lines, no trailing semicolon
- Remove comments /*...*/
- Trim and remove double spaces
- Change all `...` to `TABLE`
- Change all '...' to 'NUMBER' and 'STRING'
- Change all number not in quotes to NUMBER
- Change all [a-z]\s*,\s*[a-z]... not in quotes to COLUMNS
- Change all [a-z] not in quotes to COLUMN
Pseudo P2P MediaWiki
A normal MediaWiki extension could be installed which would give all users of the extension the ability to access articles in a DHT using standard interwiki syntax. The general idea would be to be able to allow interwiki to be writable over various protocols/rules.
- This would also be able to allow FileSync
- Many destinations could be allowed
- What about revisions? store in XML-article-export format?
General P2P Wiki Concept
The general idea is for all users of the p2p wiki to run a local instance which allows normal browser access etc, but its completely distributed. It would need to be able to separate content in different wiki's by making the interwiki mandatory.
The MediaWiki code would need to have very little modification done to make it practical keeping up to date with the versions rather than creating a fork. Maybe even a generic PHP/MySQL idea that can run in a distrubuted way.
Components like GNUnet give us DHT and content distribution. Edit-conflicts aren't such a problem in a DHT because they're designed to handle content changes in ways at least as dynamic and responsive as the current web paradigm. The problem really is with querying.
- Decentralisation (including development)
- Intellectual property & cencorship concerns
- The freenet project has a question regarding cencorship of "undesirables" such as kiddie-porn distributors. They said (here) that,
- "Undesirables" are a personal view, and the biggest test of your commitment to bypass censorship is distributing materials that you consider "undesirable".
- But the Freenet project's philosophy is not in accord with the spiritual principles here because it's forcing people to use their bandwidth, processing and storage resource for content they don't wish to support.
- Nad 10:41, 18 Feb 2006 (NZDT)
- Routing and naming
Querying
The main problem with developing a wiki in p2p is the querying, how can SQL-like queries be performed on a distributed space? either maintaining the necessary indexes (prpbably as trees to avoid lists larger than the allowed size of DHT message). Or alternatively find some existing content network or component which handles distributed querying.
Wiki engines which use flat-file storage instead of SQL would make better candidates for migrating to a P2P environment, since P2P networks generally use their peers to form a persistent logical network. This can then be mapped to the local filesystem with FUSE or the wiki's filesystem access functions overridden and redirected. Two such file-based wiki engines are MoinMoin and UseModWiki
Existing P2P Wiki Ideas
- Wiki over Freenet
- TriblerWiki - A wiki is being developed over the Tribler P2P network
- http://www.cs.bsu.edu/homepages/chl/P2PWiki
- PIER
- GitTorrent
GIT & MediaWiki
- intro to GIT SVN
- ikiwiki - a wiki with a GIT backend
- MW:Git - mediawiki.org article on Git integration
- Blog entry about MediaWiki over Git
- Another blog entry about MediaWiki over Git
See also
- Tech Fusion Outline: Organising the World's Knowledge
- Wikipedia 3.0
- P2PFoundation Wiki
- PineWiki:PeerToPeer
- General P2P Articles
- BitTorrent
- File sharing
- Yochai Benkler - good talk on P2P and free software
- MediaWikiLite and Extension talk:Daemoniser.php
- Tribler - web overlay network
- Tagging in Peer-to-Peer Wikipedia - using web overlay for Wikipedia browser
- 11 May 2007 - local news article about a P2P Wikipedia project
- Globule - Open source content distribution system
- Edit conflict
- MediaWiki offline reader
- OneSwarm - privacy preserving P2P
- DBpedia - structured data built from wikipedia and help in a triplespace