Difference between revisions of "PeerPedia"
(Update) |
|||
Line 1: | Line 1: | ||
− | |||
Peerpedia is the notion of making a MediaWiki extension which allows the wiki to use a [[w:P2P|P2P]] network for it's file and database storage mechanism. This would be a downloadable package enabling users to install and run a MediaWiki ''daemon'' or ''service'' which could run in a standalone way not requiring a web-server or database-server. | Peerpedia is the notion of making a MediaWiki extension which allows the wiki to use a [[w:P2P|P2P]] network for it's file and database storage mechanism. This would be a downloadable package enabling users to install and run a MediaWiki ''daemon'' or ''service'' which could run in a standalone way not requiring a web-server or database-server. | ||
− | We would not want this to be a ''MediaWiki fork'', but rather an extension which would use the most current version of MediaWiki. Since the file aspect of | + | We would not want this to be a ''MediaWiki fork'', but rather an extension which would use the most current version of MediaWiki. Since the file aspect of PeerPedia is built on unified distributed space, the mediawiki source can be automatically maintained to the most recent version transparently for all users. |
+ | To develop the PeerPedia solution using a standard MediaWiki code base running on every peer. The peer-based MediaWiki doesn't need to be highly scalable and so doesn't require a powerful database server like MySQL or PostgreSQL and doesn't require a standalone web-server like Apache either. [[Extension:MediaWikiLite.php|MediaWikiLite extension]] is being developed to replace the database server with a PHP library and allow the MediaWiki PHP code to run as a daemon which communicates directly with its clients over an HTTP socket instead of running as a module or CGI of a web server. | ||
+ | |||
+ | [[Extension:P2P.php]] extension is used in conjunction with [[Extension:MediaWikiLite.php|MediaWikiLite]] to allow a P2P layer to directly connect clients who are editing the same articles so that they can synchronise data and reduce edit-conflicts. A DHT is used to maintain meta data about all the wiki::namespace:title's in use allowing all peers to act as redundant caches for the data they share. Within the maintained content should be an index of all articles. | ||
+ | |||
+ | It should be possible to allow PeerPedia to also function as a web overlay allowing articles to be associated with URL's and to integrate very tightly with the source site in the case of it being a MediaWiki. This way PeerPedia can become a decentralisation solution for Wikipedia. | ||
+ | |||
+ | == Old Notes == | ||
The current P2P technology such as [[DHT]]'s are very mature now, and can easily integrate with a Linux or Windows filsystem interface. But currently there is no support for SQL-like database queries directly on the network. There is much development in the area of ''range queries'' on distributed spaces such as [[w:P-Grid|P-Grid]] and [http://pier.cs.berkeley.edu PIER], which are the foundation on which an SQL-like language coule be constructed. | The current P2P technology such as [[DHT]]'s are very mature now, and can easily integrate with a Linux or Windows filsystem interface. But currently there is no support for SQL-like database queries directly on the network. There is much development in the area of ''range queries'' on distributed spaces such as [[w:P-Grid|P-Grid]] and [http://pier.cs.berkeley.edu PIER], which are the foundation on which an SQL-like language coule be constructed. | ||
Line 10: | Line 16: | ||
If a more complete SQL interface to distributed spaces becomes available, the ''Peerpedia'' idea could be made even more generic by supplying a basic [[w:LAMP|LAMP]] environment in which any standard web applications built in PHP, Perl, Python or Ruby etc could run without modification. They would be running as if connecting to a specific server for their file IO and database queries, but there would actually be no specific servers at all, the network itself would be responsible for resolving queries and requests, and would work persistently and reliably regardless of the spontaneous arrival and departure of indiviual peers in the network. | If a more complete SQL interface to distributed spaces becomes available, the ''Peerpedia'' idea could be made even more generic by supplying a basic [[w:LAMP|LAMP]] environment in which any standard web applications built in PHP, Perl, Python or Ruby etc could run without modification. They would be running as if connecting to a specific server for their file IO and database queries, but there would actually be no specific servers at all, the network itself would be responsible for resolving queries and requests, and would work persistently and reliably regardless of the spontaneous arrival and departure of indiviual peers in the network. | ||
− | = | + | === MediaWiki MySQL Query Analysis === |
− | |||
− | |||
− | |||
− | |||
− | == MediaWiki MySQL Query Analysis == | ||
I set our MySQL logs to capture all queries during which time I navigated through some special pages and pages containing DPL queries, and also edited and watched some pages etc to create a wide variety of SQL queries. After about five minutes I switched the logging off to begin analysis on the resulting log information which was about 5MB. | I set our MySQL logs to capture all queries during which time I navigated through some special pages and pages containing DPL queries, and also edited and watched some pages etc to create a wide variety of SQL queries. After about five minutes I switched the logging off to begin analysis on the resulting log information which was about 5MB. | ||
Line 33: | Line 34: | ||
*The ''generalizeSQL()'' method in ''Database.php'' | *The ''generalizeSQL()'' method in ''Database.php'' | ||
− | == Pseudo P2P MediaWiki == | + | === Pseudo P2P MediaWiki === |
A normal MediaWiki extension could be installed which would give all users of the extension the ability to access articles in a DHT using standard interwiki syntax. The general idea would be to be able to allow interwiki to be writable over various protocols/rules. | A normal MediaWiki extension could be installed which would give all users of the extension the ability to access articles in a DHT using standard interwiki syntax. The general idea would be to be able to allow interwiki to be writable over various protocols/rules. | ||
*This would also be able to allow FileSync | *This would also be able to allow FileSync | ||
Line 39: | Line 40: | ||
*What about revisions? store in XML-article-export format? | *What about revisions? store in XML-article-export format? | ||
− | == General P2P Wiki Concept == | + | === General P2P Wiki Concept === |
The general idea is for all users of the p2p wiki to run a local instance which allows normal browser access etc, but its completely distributed. It would need to be able to separate content in different wiki's by making the interwiki mandatory. | The general idea is for all users of the p2p wiki to run a local instance which allows normal browser access etc, but its completely distributed. It would need to be able to separate content in different wiki's by making the interwiki mandatory. | ||
Line 51: | Line 52: | ||
Wiki engines which use flat-file storage instead of SQL would make better candidates for migrating to a P2P environment, since P2P networks generally use their peers to form a persistent logical network. This can then be mapped to the local filesystem with [http://fuse.sourceforge.net/ FUSE] or the wiki's filesystem access functions overridden and redirected. Two such file-based wiki engines are [http://moinmoin.wikiwikiweb.de/ MoinMoin] and [http://www.usemod.com/cgi-bin/wiki.pl?UseModWiki UseModWiki] | Wiki engines which use flat-file storage instead of SQL would make better candidates for migrating to a P2P environment, since P2P networks generally use their peers to form a persistent logical network. This can then be mapped to the local filesystem with [http://fuse.sourceforge.net/ FUSE] or the wiki's filesystem access functions overridden and redirected. Two such file-based wiki engines are [http://moinmoin.wikiwikiweb.de/ MoinMoin] and [http://www.usemod.com/cgi-bin/wiki.pl?UseModWiki UseModWiki] | ||
− | == Existing == | + | == Existing P2P Wiki Ideas == |
*[http://torrentfreak.com/images/tribler_wiki.htm TriblerWiki] ''- A wiki is being developed over the Tribler P2P network'' | *[http://torrentfreak.com/images/tribler_wiki.htm TriblerWiki] ''- A wiki is being developed over the Tribler P2P network'' | ||
*http://www.cs.bsu.edu/homepages/chl/P2PWiki | *http://www.cs.bsu.edu/homepages/chl/P2PWiki | ||
*[http://pier.cs.berkeley.edu/ PIER] | *[http://pier.cs.berkeley.edu/ PIER] | ||
− | |||
− | |||
== See also == | == See also == | ||
Line 64: | Line 63: | ||
*[[11 May 2007]] ''- local news article about a P2P Wikipedia project'' | *[[11 May 2007]] ''- local news article about a P2P Wikipedia project'' | ||
*[http://www.globule.org Globule] ''- Open source content distribution system'' | *[http://www.globule.org Globule] ''- Open source content distribution system'' | ||
− | [[Category:Web2.0]] | + | [[Category:Web2.0]][[Category:Projects]] |
Revision as of 05:10, 6 January 2008
Peerpedia is the notion of making a MediaWiki extension which allows the wiki to use a P2P network for it's file and database storage mechanism. This would be a downloadable package enabling users to install and run a MediaWiki daemon or service which could run in a standalone way not requiring a web-server or database-server.
We would not want this to be a MediaWiki fork, but rather an extension which would use the most current version of MediaWiki. Since the file aspect of PeerPedia is built on unified distributed space, the mediawiki source can be automatically maintained to the most recent version transparently for all users.
To develop the PeerPedia solution using a standard MediaWiki code base running on every peer. The peer-based MediaWiki doesn't need to be highly scalable and so doesn't require a powerful database server like MySQL or PostgreSQL and doesn't require a standalone web-server like Apache either. MediaWikiLite extension is being developed to replace the database server with a PHP library and allow the MediaWiki PHP code to run as a daemon which communicates directly with its clients over an HTTP socket instead of running as a module or CGI of a web server.
Extension:P2P.php extension is used in conjunction with MediaWikiLite to allow a P2P layer to directly connect clients who are editing the same articles so that they can synchronise data and reduce edit-conflicts. A DHT is used to maintain meta data about all the wiki::namespace:title's in use allowing all peers to act as redundant caches for the data they share. Within the maintained content should be an index of all articles.
It should be possible to allow PeerPedia to also function as a web overlay allowing articles to be associated with URL's and to integrate very tightly with the source site in the case of it being a MediaWiki. This way PeerPedia can become a decentralisation solution for Wikipedia.
Contents
Old Notes
The current P2P technology such as DHT's are very mature now, and can easily integrate with a Linux or Windows filsystem interface. But currently there is no support for SQL-like database queries directly on the network. There is much development in the area of range queries on distributed spaces such as P-Grid and PIER, which are the foundation on which an SQL-like language coule be constructed.
A cut down version of SQL which is sufficient to cover MediaWiki's operational needs may be practical to develop. To do this we would first need to log and analyse all the patterns of SQL queries which the MediaWiki software performs.
If a more complete SQL interface to distributed spaces becomes available, the Peerpedia idea could be made even more generic by supplying a basic LAMP environment in which any standard web applications built in PHP, Perl, Python or Ruby etc could run without modification. They would be running as if connecting to a specific server for their file IO and database queries, but there would actually be no specific servers at all, the network itself would be responsible for resolving queries and requests, and would work persistently and reliably regardless of the spontaneous arrival and departure of indiviual peers in the network.
MediaWiki MySQL Query Analysis
I set our MySQL logs to capture all queries during which time I navigated through some special pages and pages containing DPL queries, and also edited and watched some pages etc to create a wide variety of SQL queries. After about five minutes I switched the logging off to begin analysis on the resulting log information which was about 5MB.
The first stage of the analysis will be to reduce the log down to the basic query structures used by removing all specific tables names and values from the queries and then removing duplicates. Here's a list of the replacement rules I'll be using to reduce these specifics and meta information.
- Each command starts with a number ending on chr 23
- Command type starts on chr 25 (Connect|Init DB|Query|Quit|Shutdown)
- Command content starts on chr 37 and may span many lines, no trailing semicolon
- Remove comments /*...*/
- Trim and remove double spaces
- Change all `...` to `TABLE`
- Change all '...' to 'NUMBER' and 'STRING'
- Change all number not in quotes to NUMBER
- Change all [a-z]\s*,\s*[a-z]... not in quotes to COLUMNS
- Change all [a-z] not in quotes to COLUMN
See also
- The generalizeSQL() method in Database.php
Pseudo P2P MediaWiki
A normal MediaWiki extension could be installed which would give all users of the extension the ability to access articles in a DHT using standard interwiki syntax. The general idea would be to be able to allow interwiki to be writable over various protocols/rules.
- This would also be able to allow FileSync
- Many destinations could be allowed
- What about revisions? store in XML-article-export format?
General P2P Wiki Concept
The general idea is for all users of the p2p wiki to run a local instance which allows normal browser access etc, but its completely distributed. It would need to be able to separate content in different wiki's by making the interwiki mandatory.
The MediaWiki code would need to have very little modification done to make it practical keeping up to date with the versions rather than creating a fork. Maybe even a generic PHP/MySQL idea that can run in a distrubuted way.
Components like GNUnet give us DHT and content distribution. Edit-conflicts aren't such a problem in a DHT because they're designed to handle content changes in ways at least as dynamic and responsive as the current web paradigm. The problem really is with querying.
Querying
The main problem with developing a wiki in p2p is the querying, how can SQL-like queries be performed on a distributed space? either maintaining the necessary indexes (prpbably as trees to avoid lists larger than the allowed size of DHT message). Or alternatively find some existing content network or component which handles distributed querying.
Wiki engines which use flat-file storage instead of SQL would make better candidates for migrating to a P2P environment, since P2P networks generally use their peers to form a persistent logical network. This can then be mapped to the local filesystem with FUSE or the wiki's filesystem access functions overridden and redirected. Two such file-based wiki engines are MoinMoin and UseModWiki
Existing P2P Wiki Ideas
- TriblerWiki - A wiki is being developed over the Tribler P2P network
- http://www.cs.bsu.edu/homepages/chl/P2PWiki
- PIER
See also
- Extension talk:SQLite.php and Extension talk:Daemoniser.php
- Tribler - web overlay network
- Tagging in Peer-to-Peer Wikipedia - using web overlay for Wikipedia browser
- 11 May 2007 - local news article about a P2P Wikipedia project
- Globule - Open source content distribution system