PeerPedia
Peerpedia is the notion of making a MediaWiki extension which allows the wiki to use a P2P network for it's file and database storage mechanism. This would be a downloadable package enabling users to install and run a MediaWiki daemon or service which could run in a standalone way not requiring a web-server or database-server.
We would not want this to be a MediaWiki fork, but rather an extension which would use the most current version of MediaWiki. Since the file aspect of Peerpedia is built on unified distributed space, the mediawiki source can be automatically maintained to the most recent version transparently for all users.
The current P2P technology such as DHT's are very mature now, and can easily integrate with a Linux or Windows filsystem interface. But currently there is no support for SQL-like database queries directly on the network. There is much development in the area of range queries on distributed spaces such as P-Grid and PIER, which are the foundation on which an SQL-like language coule be constructed.
A cut down version of SQL which is sufficient to cover MediaWiki's operational needs may be practical to develop. To do this we would first need to log and analyse all the patterns of SQL queries which the MediaWiki software performs.
If a more complete SQL interface to distributed spaces becomes available, the Peerpedia idea could be made even more generic by supplying a basic LAMP environment in which any standard web applications built in PHP, Perl, Python or Ruby etc could run without modification. They would be running as if connecting to a specific server for their file IO and database queries, but there would actually be no specific servers at all, the network itself would be responsible for resolving queries and requests, and would work persistently and reliably regardless of the spontaneous arrival and departure of indiviual peers in the network.
MediaWiki MySQL Query Analysis
I set our MySQL logs to capture all queries during which time I navigated through some special pages and pages containing DPL queries, and also edited and watched some pages etc to create a wide variety of SQL queries. After about five minutes I switched the logging off to begin analysis on the resulting log information which was about 5MB.
The first stage of the analysis will be to reduce the log down to the basic query structures used by removing all specific tables names and values from the queries and then removing duplicates. Here's a list of the replacement rules I'll be using to reduce these specifics and meta information.
- Each command starts with a number ending on chr 23
- Command type starts on chr 25 (Connect|Init DB|Query|Quit|Shutdown)
- Command content starts on chr 37 and may span many lines, no trailing semicolon
- Remove comments /*...*/
- Trim and remove double spaces
- Change all `...` to `TABLE`
- Change all '...' to 'NUMBER' and 'STRING'
- Change all number not in quotes to NUMBER
- Change all [a-z]\s*,\s*[a-z]... not in quotes to COLUMNS
- Change all [a-z] not in quotes to COLUMN
Pseudo P2P MediaWiki
A normal MediaWiki extension could be installed which would give all users of the extension the ability to access articles in a DHT using standard interwiki syntax. The general idea would be to be able to allow interwiki to be writable over various protocols/rules.
- This would also be able to allow FileSync
- Many destinations could be allowed
- What about revisions? store in XML-article-export format?
See also
- Tribler - web overlay network
- Tagging in Peer-to-Peer Wikipedia - using web overlay for Wikipedia browser
- 11 May 2007 - local news article about a P2P Wikipedia project
- Globule - Open source content distribution system