Difference between revisions of "Talk:Plone"
From Organic Design wiki
(alright, got an answer from a NEO developer! :-)) |
m |
||
(2 intermediate revisions by the same user not shown) | |||
Line 21: | Line 21: | ||
Thanks, | Thanks, | ||
Aran | Aran | ||
+ | </pre> | ||
+ | ---- | ||
+ | <pre> | ||
+ | First you should bring up arguments why the existing backends like ZEO, | ||
+ | Relstorage or NEO are not good enough in your case. Looking at the | ||
+ | development history of Relstorage or NEO: implementing an | ||
+ | enterprise-level storage for the ZODB seems to be hard and | ||
+ | time-consuming (and expensive). | ||
− | + | -- a j | |
− | + | </pre> | |
− | + | ---- | |
− | + | <pre> | |
− | + | I have looked at NEO which is the closest thing I've found to the | |
+ | answer, in fact NEO is why I felt Plone was the best choice of CMS to | ||
+ | inquire further about | ||
− | + | The problem is that it uses SQL for its indexing queries (they quote | |
− | + | "NoSQL" as meaning "Not only SQL"). SQL cannot work in P2P space, but | |
− | + | can be made to work on server-clusters. | |
− | + | We intend not to have any machines in our network other than the users | |
− | + | computers running the P2P application. So we would need to know exactly | |
− | + | what kinds of querying ZODB expects to be available in its interface to | |
+ | the storage layer. DHT's can be slow for the first read but cache | ||
+ | locally after that. | ||
− | + | -- Aran | |
− | + | </pre> | |
− | + | ---- | |
− | + | <pre> | |
− | + | Yes, we use MySQL, and it bites us on both worlds actually: | |
+ | - in relational world, we irritate developers as we ask questions like "why | ||
+ | does InnoDB load a whole row when we just select primary key columns", which | ||
+ | ends up with "don't store blobs in mysql" | ||
+ | - in key-value world, because NoSQL using MySQL doesn't look consistent | ||
+ | So, why do we use MySQL in NEO ? | ||
+ | We use InnoDB as an efficient BTree implementation, which handles persistence. | ||
+ | We use MySQL as a handy data definition language (NEO is still evolving, we | ||
+ | need an easy way to tweak table structure when a new feature requires it), but | ||
+ | we don't need any transactional isolation (each MySQL process used for NEO is | ||
+ | accessed by only one process through one connection). | ||
+ | We want to stop using MySQL & InnoDB in favour of leaner-and-meaner back-ends. | ||
+ | I would especially like to try kyoto cabinet[1] in on-disk BTree mode, but it | ||
+ | requires more work than the existing MySQL adaptor and there are more urgent | ||
+ | tasks in NEO. | ||
− | + | Just as a proof-of-concept, NEO can use a Python BTree implementation as an | |
− | + | alternative (RAM-only) storage back-end. We use ZODB's BTree implementation, | |
− | + | which might look surprising as it's designed to be stored in a ZODB... But | |
− | + | they work just as well in-RAM, and that's all I needed for such proof-of- | |
− | + | concept. | |
− | + | -- Vincent | |
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | + | [1] http://fallabs.com/kyotocabinet/ | |
− | + | </pre> | |
− | + | ---- | |
− | + | <pre> | |
− | + | Thanks for the feedback Vincent :-) it sounds like NEO is pretty close | |
+ | to being SQL-free. As one of the NEO team, what are your thoughts on the | ||
+ | practicality of running Plone in a P2P environment with the latencies | ||
+ | experienced in standard DHT (such as for example those based on | ||
+ | Kademlia) implemtations? | ||
− | + | -- Aran | |
+ | </pre> | ||
+ | ---- | ||
+ | <pre> | ||
+ | Something which may be worthwhile and give you an impression what the | ||
+ | storage backend does for a common operation would be to instrument the | ||
+ | ZODB code a bit. Just look at the current FileStorage and add a few log()s | ||
+ | into its load/store methods. Maybe there are other methods of interest, | ||
+ | too. Hooking this shouldn't take long. | ||
+ | Then I suggest you generate (or maybe you already have) a well-sized plone | ||
+ | to test on. Perform a typical request on a site and see what the storage | ||
+ | is doing. This will give you a solid idea what the storage has to do. | ||
− | + | My guess (and I mostly infer this from the zodb code I've looked at) is | |
− | + | that you can get many storage read requests for something like searching | |
− | + | the catalog. I guess this will happen, because you get a load() call for | |
− | + | each BTree bucket that you are traversing. Maybe I am totally wrong of | |
− | + | course :) However, instrumenting the storage will show. | |
+ | |||
+ | You might also find there are certain hotspots (like the catalog). | ||
+ | Depending on their size you could make your users download these | ||
+ | completely from the cloud before being able to use them. This would reduce | ||
+ | the number of small randomly-seeking requests a lot. | ||
+ | |||
+ | Another thing you need to consider is implementing the transaction | ||
+ | functionality. I'm not sure how to do something like this in the cloud or | ||
+ | even if it's possible at all (given certain performance limitations). | ||
+ | |||
+ | And finally, my experiences using P2P is that it takes a while to build up | ||
+ | speed for a certain download. And client uplinks are not as fast your | ||
+ | typical web server's uplink. Also firewalls seem to cause problems | ||
+ | sometimes (maybe workable with NAT punching?). | ||
+ | |||
+ | Maybe all of this is feasible in your situation, but this depends heavily | ||
+ | on your requirements, usage and goals. Doing some fast prototyping and | ||
+ | contacting authors who already wrote papers (or better implementations) is | ||
+ | probably the best way to get solid a solid idea. | ||
+ | |||
+ | -- Matthias | ||
+ | </pre> | ||
+ | ---- | ||
+ | <pre> | ||
+ | I'm not very optimistic about this I'm afraid. First the problems with | ||
+ | using Plone: | ||
+ | |||
+ | * Plone relies heavily on its in ZODB indexes of all content | ||
+ | (portal_catalog). This means that every edit will change lots of | ||
+ | objects (without versioning ~15-20, most of which are in the | ||
+ | catalogue). | ||
+ | |||
+ | * At least with archetypes a content object's data is spread over | ||
+ | multiple objects. (This should be better with Dexterity, though you | ||
+ | will still have multiple objects for locking and workflow) | ||
+ | |||
+ | * If you use versioning you'll see ~ 100 objects changed in an edit. | ||
+ | |||
+ | * Even loading the front-page will take a long time - In my | ||
+ | experiments writing an amazon s3 backend for ZODB the extra latency of | ||
+ | fetching each object was really noticeable. | ||
+ | |||
+ | But I'm not sure even a simpler ZODB CMS would be a good fit for a p2p DHT: | ||
+ | |||
+ | * ZODB is transactional using two phase commit. With p2p latencies, | ||
+ | these commits will be horribly slow - all clients storing changed | ||
+ | objects would need to participate in the transaction. | ||
+ | |||
+ | * Each client's object cache will need to know about invalidations, I | ||
+ | don't see any way of supplying these from a DHT. | ||
+ | |||
+ | I expect you'd have more success storing content items as single | ||
+ | content objects / pages in the DHT and then generating indexes based | ||
+ | on that. You'll need some way of storing parent - child relationships | ||
+ | between the content objects too, as updating a single list of children | ||
+ | object will be incredibly difficult to get right in a distributed | ||
+ | system. | ||
+ | |||
+ | -- Laurence | ||
+ | </pre> | ||
+ | ---- | ||
+ | <pre> | ||
+ | > As one of the NEO team, what are your thoughts on the practicality of | ||
+ | > running Plone in a P2P environment with the latencies experienced in | ||
+ | > standard DHT (such as for example those based on Kademlia) implemtations? | ||
+ | |||
+ | First, I must say that we have not run any benchmark on NEO outside LAN | ||
+ | conditions yet, because there are some issues which need attention before we | ||
+ | can increase test hostility. To name a few blockers, there is a need for | ||
+ | "peaceful" deadlock resolution/avoidance when the same set of objects gets | ||
+ | modified concurrently, and an important "from scratch" replication performance | ||
+ | issue. Another show-stopper for NEO production-readiness is the lack of backup | ||
+ | tools, as NEO currently relies on storage back-end tools (eg. mysqldump) and | ||
+ | on a replication scheme which is not implemented (useful in a all-nodes-in- | ||
+ | datacenter setup, not if nodes are to be scattered around the globe). | ||
+ | |||
+ | This is for the current implementation status, now I'll try to answer from | ||
+ | NEO's design point of view. | ||
+ | |||
+ | NEO was not designed with international network latency in mind, so I doubt it | ||
+ | would compete with Kademlia on this metric. | ||
+ | |||
+ | In NEO, each node knows the entire hash table. When loading an object, one | ||
+ | node known to contain that object is selected and a connection is established | ||
+ | (if not already available). The highest latency to fetch any piece of data is | ||
+ | the latency toward the node with most latency (plus extra latency if node | ||
+ | turns out to be offline, as the next valid node will be attempted). This | ||
+ | (latency cost & node absence late discovery) can be mitigated by integrating | ||
+ | node latency in the node weight, computed to select a node to connect to when | ||
+ | loading an object. So the more there are replicates, the lower worst-case | ||
+ | latency gets. This is not implemented, but would be a very welcome addition. | ||
+ | When writing an object, a client pushes copies to each and every node supposed | ||
+ | to contain that object (known via the hash table) and must wait for all | ||
+ | related acknowledgements, so it will always suffer from the worst-case | ||
+ | latency. This is already mitigated by pipelining stores, so that | ||
+ | acknowledgements are only waited during tpc_vote rather than proportionally to | ||
+ | the number of stored objects. It could be further mitigated by considering | ||
+ | multicast (currently, NEO does everything with unicast: TCP). | ||
+ | Although it's not required for all nodes to always have most up-to-date view | ||
+ | of the hash table for reading (besides causing absence-late-discovery | ||
+ | presented above), it will cause increasing problems when writing as nodes go | ||
+ | up and down more often. | ||
+ | |||
+ | -- Vincent | ||
</pre> | </pre> |
Latest revision as of 19:34, 4 January 2011
Lets see what the developers say :-) I sent the following message to the Plone and ZODB developers lists:
Hi, I'm part of a development team who are helping an organisation to architect a CMS based project that they want to work in a P2P network rather than using a centralised web-server. We'd prefer to use an existing popular CMS as a starting point so that it is mature, has a large development community and a wide range of extensions/modules available. From our initial research it seems that Plone should be more capable of moving in to the P2P space due to it using ZODB rather than SQL and that ZODB seems able to be connected to a variety of storage mechanisms. I'm wondering what you guys, the core developers, think of the practicalities of Plone in P2P, for example could ZODB use a DHT as its storage layer? what kind of querying is required on the DHT? We have a good budget available for this and will be developing it as a completely free open source component, so we'd also like to hear from developers who may be interested in working on the project too. Thanks, Aran
First you should bring up arguments why the existing backends like ZEO, Relstorage or NEO are not good enough in your case. Looking at the development history of Relstorage or NEO: implementing an enterprise-level storage for the ZODB seems to be hard and time-consuming (and expensive). -- a j
I have looked at NEO which is the closest thing I've found to the answer, in fact NEO is why I felt Plone was the best choice of CMS to inquire further about The problem is that it uses SQL for its indexing queries (they quote "NoSQL" as meaning "Not only SQL"). SQL cannot work in P2P space, but can be made to work on server-clusters. We intend not to have any machines in our network other than the users computers running the P2P application. So we would need to know exactly what kinds of querying ZODB expects to be available in its interface to the storage layer. DHT's can be slow for the first read but cache locally after that. -- Aran
Yes, we use MySQL, and it bites us on both worlds actually: - in relational world, we irritate developers as we ask questions like "why does InnoDB load a whole row when we just select primary key columns", which ends up with "don't store blobs in mysql" - in key-value world, because NoSQL using MySQL doesn't look consistent So, why do we use MySQL in NEO ? We use InnoDB as an efficient BTree implementation, which handles persistence. We use MySQL as a handy data definition language (NEO is still evolving, we need an easy way to tweak table structure when a new feature requires it), but we don't need any transactional isolation (each MySQL process used for NEO is accessed by only one process through one connection). We want to stop using MySQL & InnoDB in favour of leaner-and-meaner back-ends. I would especially like to try kyoto cabinet[1] in on-disk BTree mode, but it requires more work than the existing MySQL adaptor and there are more urgent tasks in NEO. Just as a proof-of-concept, NEO can use a Python BTree implementation as an alternative (RAM-only) storage back-end. We use ZODB's BTree implementation, which might look surprising as it's designed to be stored in a ZODB... But they work just as well in-RAM, and that's all I needed for such proof-of- concept. -- Vincent [1] http://fallabs.com/kyotocabinet/
Thanks for the feedback Vincent :-) it sounds like NEO is pretty close to being SQL-free. As one of the NEO team, what are your thoughts on the practicality of running Plone in a P2P environment with the latencies experienced in standard DHT (such as for example those based on Kademlia) implemtations? -- Aran
Something which may be worthwhile and give you an impression what the storage backend does for a common operation would be to instrument the ZODB code a bit. Just look at the current FileStorage and add a few log()s into its load/store methods. Maybe there are other methods of interest, too. Hooking this shouldn't take long. Then I suggest you generate (or maybe you already have) a well-sized plone to test on. Perform a typical request on a site and see what the storage is doing. This will give you a solid idea what the storage has to do. My guess (and I mostly infer this from the zodb code I've looked at) is that you can get many storage read requests for something like searching the catalog. I guess this will happen, because you get a load() call for each BTree bucket that you are traversing. Maybe I am totally wrong of course :) However, instrumenting the storage will show. You might also find there are certain hotspots (like the catalog). Depending on their size you could make your users download these completely from the cloud before being able to use them. This would reduce the number of small randomly-seeking requests a lot. Another thing you need to consider is implementing the transaction functionality. I'm not sure how to do something like this in the cloud or even if it's possible at all (given certain performance limitations). And finally, my experiences using P2P is that it takes a while to build up speed for a certain download. And client uplinks are not as fast your typical web server's uplink. Also firewalls seem to cause problems sometimes (maybe workable with NAT punching?). Maybe all of this is feasible in your situation, but this depends heavily on your requirements, usage and goals. Doing some fast prototyping and contacting authors who already wrote papers (or better implementations) is probably the best way to get solid a solid idea. -- Matthias
I'm not very optimistic about this I'm afraid. First the problems with using Plone: * Plone relies heavily on its in ZODB indexes of all content (portal_catalog). This means that every edit will change lots of objects (without versioning ~15-20, most of which are in the catalogue). * At least with archetypes a content object's data is spread over multiple objects. (This should be better with Dexterity, though you will still have multiple objects for locking and workflow) * If you use versioning you'll see ~ 100 objects changed in an edit. * Even loading the front-page will take a long time - In my experiments writing an amazon s3 backend for ZODB the extra latency of fetching each object was really noticeable. But I'm not sure even a simpler ZODB CMS would be a good fit for a p2p DHT: * ZODB is transactional using two phase commit. With p2p latencies, these commits will be horribly slow - all clients storing changed objects would need to participate in the transaction. * Each client's object cache will need to know about invalidations, I don't see any way of supplying these from a DHT. I expect you'd have more success storing content items as single content objects / pages in the DHT and then generating indexes based on that. You'll need some way of storing parent - child relationships between the content objects too, as updating a single list of children object will be incredibly difficult to get right in a distributed system. -- Laurence
> As one of the NEO team, what are your thoughts on the practicality of > running Plone in a P2P environment with the latencies experienced in > standard DHT (such as for example those based on Kademlia) implemtations? First, I must say that we have not run any benchmark on NEO outside LAN conditions yet, because there are some issues which need attention before we can increase test hostility. To name a few blockers, there is a need for "peaceful" deadlock resolution/avoidance when the same set of objects gets modified concurrently, and an important "from scratch" replication performance issue. Another show-stopper for NEO production-readiness is the lack of backup tools, as NEO currently relies on storage back-end tools (eg. mysqldump) and on a replication scheme which is not implemented (useful in a all-nodes-in- datacenter setup, not if nodes are to be scattered around the globe). This is for the current implementation status, now I'll try to answer from NEO's design point of view. NEO was not designed with international network latency in mind, so I doubt it would compete with Kademlia on this metric. In NEO, each node knows the entire hash table. When loading an object, one node known to contain that object is selected and a connection is established (if not already available). The highest latency to fetch any piece of data is the latency toward the node with most latency (plus extra latency if node turns out to be offline, as the next valid node will be attempted). This (latency cost & node absence late discovery) can be mitigated by integrating node latency in the node weight, computed to select a node to connect to when loading an object. So the more there are replicates, the lower worst-case latency gets. This is not implemented, but would be a very welcome addition. When writing an object, a client pushes copies to each and every node supposed to contain that object (known via the hash table) and must wait for all related acknowledgements, so it will always suffer from the worst-case latency. This is already mitigated by pipelining stores, so that acknowledgements are only waited during tpc_vote rather than proportionally to the number of stored objects. It could be further mitigated by considering multicast (currently, NEO does everything with unicast: TCP). Although it's not required for all nodes to always have most up-to-date view of the hash table for reading (besides causing absence-late-discovery presented above), it will cause increasing problems when writing as nodes go up and down more often. -- Vincent