Difference between revisions of "PeerPedia"
m (→Existing P2P Wiki Ideas) |
(→Updated notes as of April 2010: MariaDB) |
||
Line 8: | Line 8: | ||
It should be possible to allow PeerPedia to also function as a web overlay allowing articles to be associated with URL's and to integrate very tightly with the source site in the case of it being a MediaWiki. This way PeerPedia can become a decentralisation solution for Wikipedia. | It should be possible to allow PeerPedia to also function as a web overlay allowing articles to be associated with URL's and to integrate very tightly with the source site in the case of it being a MediaWiki. This way PeerPedia can become a decentralisation solution for Wikipedia. | ||
+ | |||
+ | == Update as of February 2013 == | ||
+ | [https://mariadb.org/en/about/ MariaDB] is a drop-in replacement for MySQL but comes with many features that MySQL doesn't support such as a variety of storage engines and many performance benefits. The most exciting thing about MariaDB is that they're bridging the gap between the SQL and NoSQL worlds by developing support for [[w:Apache Cassandra|Apache Cassandra]] as a storage engine option. Wikimedia has [http://www.gossamer-threads.com/lists/wiki/wikitech/319925 begun a migration] from MySQL to MariaDB, but their motivations are that they prefer a truly open source solution (which MySQL no longer is) rather than performance (which has increased by about 8%) or the NoSQL potential. | ||
== Updated notes as of April 2010 == | == Updated notes as of April 2010 == |
Revision as of 17:55, 13 February 2013
Peerpedia is the notion of making a MediaWiki extension which allows the wiki to use a P2P network for it's file and database storage mechanism. This would be a downloadable package enabling users to install and run a MediaWiki daemon or service which could run in a standalone way not requiring a web-server or database-server.
We would not want this to be a MediaWiki fork, but rather an extension which would use the most current version of MediaWiki. Since the file aspect of PeerPedia is built on unified distributed space, the mediawiki source can be automatically maintained to the most recent version transparently for all users.
To develop the PeerPedia solution using a standard MediaWiki code base running on every peer. The peer-based MediaWiki doesn't need to be highly scalable and so doesn't require a powerful database server like MySQL or PostgreSQL and doesn't require a standalone web-server like Apache either. MediaWikiLite extension is being developed to replace the database server with a PHP library and allow the MediaWiki PHP code to run as a daemon which communicates directly with its clients over an HTTP socket instead of running as a module or CGI of a web server.
Extension:P2P.php extension is used in conjunction with MediaWikiLite to allow a P2P layer to directly connect clients who are editing the same articles so that they can synchronise data and reduce edit-conflicts. A DHT is used to maintain meta data about all the wiki::namespace:title's in use allowing all peers to act as redundant caches for the data they share. Within the maintained content should be an index of all articles.
It should be possible to allow PeerPedia to also function as a web overlay allowing articles to be associated with URL's and to integrate very tightly with the source site in the case of it being a MediaWiki. This way PeerPedia can become a decentralisation solution for Wikipedia.
Contents
Update as of February 2013
MariaDB is a drop-in replacement for MySQL but comes with many features that MySQL doesn't support such as a variety of storage engines and many performance benefits. The most exciting thing about MariaDB is that they're bridging the gap between the SQL and NoSQL worlds by developing support for Apache Cassandra as a storage engine option. Wikimedia has begun a migration from MySQL to MariaDB, but their motivations are that they prefer a truly open source solution (which MySQL no longer is) rather than performance (which has increased by about 8%) or the NoSQL potential.
Updated notes as of April 2010
Rather than trying to tackle the problem of how to distribute the SQL-oriented storage with a P2P network layer, I think a better approach is to instead create a simple means to access the full power of the MediaWiki parser including template processing and the effects of MediaWiki extensions such as tags, parser-functions.
The problem still remains though that many of these extensions perform SQL queries to determine the results such as DPL and RA, and also many special-pages require form posting and processing.
The general idea still remains feasible and would be based on a PHP command-line wrapper of the main MediaWiki functionality similar to api.php except that it would be designed to run as a daemon and stay resident, since it's designed to run on each client machine and serve requests only to that client locally. It could save a snapshot of it's entire $GLOBALS state at the ExtensionSetup hook and then return directly to that state after serving each request. This approach would not be possible if different requests could come from different users concurrently.
Distributed storage
To allow migration of existing web-based wiki content into distributed space, the article space should include interwiki prefixes and exhibit a page listing all wikis.
Due to the potential size of some wikis (such as Wikipedia), most clients will prefer to store only the articles they have themselves visited or watched, so access to the text content needs to be adjusted to allow for distributed storage and collection.
Since PeerPedia uses MediaWikiLite, each wiki's data is stored in a file which makes low-level access and manipulation more practical to develop than for MySQL, the details of SQLite's interaction with the storage media are explained here, and this could be interacted with directly using PHP's File Wrappers. Hopefully this level of integration will not be necessary though, since it's only the text content and uploaded files which need to be distributed, and this should be able to be achieved by hooking in to the MediaWiki code at the SQL level and above.
Peercasting
Peercasting is using a p2p serverless infrastructure to impliment channels of content which can broadcast to many clients. This can't be done with normal p2p filesharing infrastructures because they're based on breaking whole files into smaller parts which are then all treated with the same priority, peer-casting uses multiplexing to create the effect of continuous streams and can also multicast.
Using an existing decentralised multicast channels solution benefits the nodal development effort greatly because every node would ideally have a live channel to all the places that make use of it. Having the one-to-many aspect is a conceptual level above a normal DHT, because it's essentially a DHT with built in change propagation amongst all the instances.
Old Notes
The current P2P technology such as DHT's are very mature now, and can easily integrate with a Linux or Windows filsystem interface. But currently there is no support for SQL-like database queries directly on the network. There is much development in the area of range queries on distributed spaces such as P-Grid and PIER, which are the foundation on which an SQL-like language coule be constructed.
A cut down version of SQL which is sufficient to cover MediaWiki's operational needs may be practical to develop. To do this we would first need to log and analyse all the patterns of SQL queries which the MediaWiki software performs.
If a more complete SQL interface to distributed spaces becomes available, the Peerpedia idea could be made even more generic by supplying a basic LAMP environment in which any standard web applications built in PHP, Perl, Python or Ruby etc could run without modification. They would be running as if connecting to a specific server for their file IO and database queries, but there would actually be no specific servers at all, the network itself would be responsible for resolving queries and requests, and would work persistently and reliably regardless of the spontaneous arrival and departure of indiviual peers in the network.
MediaWiki MySQL Query Analysis
I set our MySQL logs to capture all queries during which time I navigated through some special pages and pages containing DPL queries, and also edited and watched some pages etc to create a wide variety of SQL queries. After about five minutes I switched the logging off to begin analysis on the resulting log information which was about 5MB.
The first stage of the analysis will be to reduce the log down to the basic query structures used by removing all specific tables names and values from the queries and then removing duplicates. Here's a list of the replacement rules I'll be using to reduce these specifics and meta information.
- Each command starts with a number ending on chr 23
- Command type starts on chr 25 (Connect|Init DB|Query|Quit|Shutdown)
- Command content starts on chr 37 and may span many lines, no trailing semicolon
- Remove comments /*...*/
- Trim and remove double spaces
- Change all `...` to `TABLE`
- Change all '...' to 'NUMBER' and 'STRING'
- Change all number not in quotes to NUMBER
- Change all [a-z]\s*,\s*[a-z]... not in quotes to COLUMNS
- Change all [a-z] not in quotes to COLUMN
Pseudo P2P MediaWiki
A normal MediaWiki extension could be installed which would give all users of the extension the ability to access articles in a DHT using standard interwiki syntax. The general idea would be to be able to allow interwiki to be writable over various protocols/rules.
- This would also be able to allow FileSync
- Many destinations could be allowed
- What about revisions? store in XML-article-export format?
General P2P Wiki Concept
The general idea is for all users of the p2p wiki to run a local instance which allows normal browser access etc, but its completely distributed. It would need to be able to separate content in different wiki's by making the interwiki mandatory.
The MediaWiki code would need to have very little modification done to make it practical keeping up to date with the versions rather than creating a fork. Maybe even a generic PHP/MySQL idea that can run in a distrubuted way.
Components like GNUnet give us DHT and content distribution. Edit-conflicts aren't such a problem in a DHT because they're designed to handle content changes in ways at least as dynamic and responsive as the current web paradigm. The problem really is with querying.
- Decentralisation (including development)
- Intellectual property & censorship concerns
- The freenet project has a question regarding censorship of "undesirables" such as kiddie-porn distributors. They said (here) that,
- "Undesirables" are a personal view, and the biggest test of your commitment to bypass censorship is distributing materials that you consider "undesirable".
- But the Freenet project's philosophy is not in accord with the spiritual principles here because it's forcing people to use their bandwidth, processing and storage resource for content they don't wish to support.
- Nad 10:41, 18 Feb 2006 (NZDT)
- Routing and naming
Querying
The main problem with developing a wiki in p2p is the querying, how can SQL-like queries be performed on a distributed space? either maintaining the necessary indexes (prpbably as trees to avoid lists larger than the allowed size of DHT message). Or alternatively find some existing content network or component which handles distributed querying.
Wiki engines which use flat-file storage instead of SQL would make better candidates for migrating to a P2P environment, since P2P networks generally use their peers to form a persistent logical network. This can then be mapped to the local filesystem with FUSE or the wiki's filesystem access functions overridden and redirected. Two such file-based wiki engines are MoinMoin and UseModWiki
Existing P2P Wiki Ideas
- File:Scalaris.pdf - this project used a p2p Wikipedia clone as its proof of concept (see also 7 April 2012)
- PeerVote
- Wiki over Freenet
- TriblerWiki - A wiki is being developed over the Tribler P2P network
- http://www.cs.bsu.edu/homepages/chl/P2PWiki
- PIER
- GitTorrent
GIT & MediaWiki
- intro to GIT SVN
- ikiwiki - a wiki with a GIT backend
- MW:Git - mediawiki.org article on Git integration
- Blog entry about MediaWiki over Git
- Another blog entry about MediaWiki over Git
See also
- MediaWikiLite - Running MediaWiki as a portable folder-structure without a web-server or database server
- Tech Fusion Outline: Organising the World's Knowledge
- Wikipedia 3.0
- P2PFoundation Wiki
- PineWiki:PeerToPeer
- Peer-to-peer network
- BitTorrent
- File sharing
- Yochai Benkler - good talk on P2P and free software
- Tribler - web overlay network
- Tagging in Peer-to-Peer Wikipedia - using web overlay for Wikipedia browser
- 11 May 2007 - local news article about a P2P Wikipedia project
- Globule - Open source content distribution system
- Edit conflict
- MediaWiki offline reader
- OneSwarm - privacy preserving P2P
- DBpedia - structured data built from wikipedia and held in a triplespace
- VoltDB - not quite what it appears at first site - does not supply an SQL pipe usable by other apps such as PHP