Difference between revisions of "Talk:Plone"

From Organic Design wiki
(alright, got an answer from a NEO developer! :-))
(More good replies)
Line 22: Line 22:
 
Aran
 
Aran
  
  First you should bring up arguments why the existing backends like ZEO,
+
</pre>
  Relstorage or NEO are not good enough in your case. Looking at the
+
----
  development history of Relstorage or NEO: implementing an
+
<pre>
  enterprise-level storage for the ZODB seems to be hard and
 
  time-consuming (and expensive).    -- a j
 
  
    I have looked at NEO which is the closest thing I've found to the
+
First you should bring up arguments why the existing backends like ZEO,
    answer, in fact NEO is why I felt Plone was the best choice of CMS to
+
Relstorage or NEO are not good enough in your case. Looking at the
    inquire further about
+
development history of Relstorage or NEO: implementing an
 +
enterprise-level storage for the ZODB seems to be hard and
 +
time-consuming (and expensive).    -- a j
  
    The problem is that it uses SQL for its indexing queries (they quote
+
</pre>
    "NoSQL" as meaning "Not only SQL"). SQL cannot work in P2P space, but
+
----
    can be made to work on server-clusters.
+
<pre>
  
    We intend not to have any machines in our network other than the users
+
I have looked at NEO which is the closest thing I've found to the
    computers running the P2P application. So we would need to know exactly
+
answer, in fact NEO is why I felt Plone was the best choice of CMS to
    what kinds of querying ZODB expects to be available in its interface to
+
inquire further about
    the storage layer. DHT's can be slow for the first read but cache
 
    locally after that.    -- Aran
 
  
 +
The problem is that it uses SQL for its indexing queries (they quote
 +
"NoSQL" as meaning "Not only SQL"). SQL cannot work in P2P space, but
 +
can be made to work on server-clusters.
  
      Yes, we use MySQL, and it bites us on both worlds actually:
+
We intend not to have any machines in our network other than the users
      - in relational world, we irritate developers as we ask questions like "why
+
computers running the P2P application. So we would need to know exactly
        does InnoDB load a whole row when we just select primary key columns", which
+
what kinds of querying ZODB expects to be available in its interface to
        ends up with "don't store blobs in mysql"
+
the storage layer. DHT's can be slow for the first read but cache
      - in key-value world, because NoSQL using MySQL doesn't look consistent
+
locally after that.    -- Aran
  
      So, why do we use MySQL in NEO ?
+
</pre>
      We use InnoDB as an efficient BTree implementation, which handles persistence.
+
----
      We use MySQL as a handy data definition language (NEO is still evolving, we
+
<pre>
      need an easy way to tweak table structure when a new feature requires it), but
 
      we don't need any transactional isolation (each MySQL process used for NEO is
 
      accessed by only one process through one connection).
 
      We want to stop using MySQL & InnoDB in favour of leaner-and-meaner back-ends.
 
      I would especially like to try kyoto cabinet[1] in on-disk BTree mode, but it
 
      requires more work than the existing MySQL adaptor and there are more urgent
 
      tasks in NEO.
 
  
      Just as a proof-of-concept, NEO can use a Python BTree implementation as an
+
Yes, we use MySQL, and it bites us on both worlds actually:
      alternative (RAM-only) storage back-end. We use ZODB's BTree implementation,  
+
- in relational world, we irritate developers as we ask questions like "why
      which might look surprising as it's designed to be stored in a ZODB... But
+
does InnoDB load a whole row when we just select primary key columns", which
      they work just as well in-RAM, and that's all I needed for such proof-of-
+
ends up with "don't store blobs in mysql"
      concept.
+
- in key-value world, because NoSQL using MySQL doesn't look consistent
  
      Regards, -- V P        [1] http://fallabs.com/kyotocabinet/
+
So, why do we use MySQL in NEO ?
 +
We use InnoDB as an efficient BTree implementation, which handles persistence.
 +
We use MySQL as a handy data definition language (NEO is still evolving, we
 +
need an easy way to tweak table structure when a new feature requires it), but
 +
we don't need any transactional isolation (each MySQL process used for NEO is
 +
accessed by only one process through one connection).
 +
We want to stop using MySQL & InnoDB in favour of leaner-and-meaner back-ends.
 +
I would especially like to try kyoto cabinet[1] in on-disk BTree mode, but it
 +
requires more work than the existing MySQL adaptor and there are more urgent
 +
tasks in NEO.
  
 +
Just as a proof-of-concept, NEO can use a Python BTree implementation as an
 +
alternative (RAM-only) storage back-end. We use ZODB's BTree implementation,
 +
which might look surprising as it's designed to be stored in a ZODB... But
 +
they work just as well in-RAM, and that's all I needed for such proof-of-
 +
concept.
 +
 +
Regards, -- V P        [1] http://fallabs.com/kyotocabinet/
 +
 +
</pre>
 +
----
 +
<pre>
 +
 +
Thanks for the feedback Vincent :-) it sounds like NEO is pretty close
 +
to being SQL-free. As one of the NEO team, what are your thoughts on the
 +
practicality of running Plone in a P2P environment with the latencies
 +
experienced in standard DHT (such as for example those based on
 +
Kademlia) implemtations?  -- Aran
 +
 +
</pre>
 +
----
 +
<pre>
 +
 +
Something which may be worthwhile and give you an impression what the 
 +
storage backend does for a common operation would be to instrument the 
 +
ZODB code a bit. Just look at the current FileStorage and add a few log()s 
 +
into its load/store methods. Maybe there are other methods of interest, 
 +
too. Hooking this shouldn't take long.
 +
 +
Then I suggest you generate (or maybe you already have) a well-sized plone 
 +
to test on. Perform a typical request on a site and see what the storage 
 +
is doing. This will give you a solid idea what the storage has to do.
 +
 +
My guess (and I mostly infer this from the zodb code I've looked at) is 
 +
that you can get many storage read requests for something like searching 
 +
the catalog. I guess this will happen, because you get a load() call for 
 +
each BTree bucket that you are traversing. Maybe I am totally wrong of 
 +
course :) However, instrumenting the storage will show.
 +
 +
You might also find there are certain hotspots (like the catalog). 
 +
Depending on their size you could make your users download these 
 +
completely from the cloud before being able to use them. This would reduce 
 +
the number of small randomly-seeking requests a lot.
 +
 +
Another thing you need to consider is implementing the transaction 
 +
functionality. I'm not sure how to do something like this in the cloud or 
 +
even if it's possible at all (given certain performance limitations).
 +
 +
And finally, my experiences using P2P is that it takes a while to build up 
 +
speed for a certain download. And client uplinks are not as fast your 
 +
typical web server's uplink. Also firewalls seem to cause problems 
 +
sometimes (maybe workable with NAT punching?).
 +
 +
Maybe all of this is feasible in your situation, but this depends heavily 
 +
on your requirements, usage and goals. Doing some fast prototyping and 
 +
contacting authors who already wrote papers (or better implementations) is 
 +
probably the best way to get solid a solid idea.
 +
 +
-Matthias
 +
 +
</pre>
 +
----
 +
<pre>
 +
 +
I'm not very optimistic about this I'm afraid. First the problems with
 +
using Plone:
 +
 +
* Plone relies heavily on its in ZODB indexes of all content
 +
(portal_catalog). This means that every edit will change lots of
 +
objects (without versioning ~15-20, most of which are in the
 +
catalogue).
 +
 +
* At least with archetypes a content object's data is spread over
 +
multiple objects. (This should be better with Dexterity, though you
 +
will still have multiple objects for locking and workflow)
 +
 +
* If you use versioning you'll see ~ 100 objects changed in an edit.
 +
 +
* Even loading the front-page will take a long time - In my
 +
experiments writing an amazon s3 backend for ZODB the extra latency of
 +
fetching each object was really noticeable.
 +
 +
But I'm not sure even a simpler ZODB CMS would be a good fit for a p2p DHT:
 +
 +
* ZODB is transactional using two phase commit. With p2p latencies,
 +
these commits will be horribly slow - all clients storing changed
 +
objects would need to participate in the transaction.
 +
 +
* Each client's object cache will need to know about invalidations, I
 +
don't see any way of supplying these from a DHT.
 +
 +
I expect you'd have more success storing content items as single
 +
content objects / pages in the DHT and then generating indexes based
 +
on that. You'll need some way of storing parent - child relationships
 +
between the content objects too, as updating a single list of children
 +
object will be incredibly difficult to get right in a distributed
 +
system.    -- Laurence
 +
 +
</pre>
 +
----
 +
<pre>
 +
 +
> As one of the NEO team, what are your thoughts on the practicality of
 +
> running Plone in a P2P environment with the latencies experienced in
 +
> standard DHT (such as for example those based on Kademlia) implemtations?
 +
 +
First, I must say that we have not run any benchmark on NEO outside LAN
 +
conditions yet, because there are some issues which need attention before we
 +
can increase test hostility. To name a few blockers, there is a need for
 +
"peaceful" deadlock resolution/avoidance when the same set of objects gets
 +
modified concurrently, and an important "from scratch" replication performance
 +
issue. Another show-stopper for NEO production-readiness is the lack of backup
 +
tools, as NEO currently relies on storage back-end tools (eg. mysqldump) and
 +
on a replication scheme which is not implemented (useful in a all-nodes-in-
 +
datacenter setup, not if nodes are to be scattered around the globe).
 +
 +
This is for the current implementation status, now I'll try to answer from
 +
NEO's design point of view.
 +
 +
NEO was not designed with international network latency in mind, so I doubt it
 +
would compete with Kademlia on this metric.
 +
 +
In NEO, each node knows the entire hash table. When loading an object, one
 +
node known to contain that object is selected and a connection is established
 +
(if not already available). The highest latency to fetch any piece of data is
 +
the latency toward the node with most latency (plus extra latency if node
 +
turns out to be offline, as the next valid node will be attempted). This
 +
(latency cost & node absence late discovery) can be mitigated by integrating
 +
node latency in the node weight, computed to select a node to connect to when
 +
loading an object. So the more there are replicates, the lower worst-case
 +
latency gets. This is not implemented, but would be a very welcome addition.
 +
When writing an object, a client pushes copies to each and every node supposed
 +
to contain that object (known via the hash table) and must wait for all
 +
related acknowledgements, so it will always suffer from the worst-case
 +
latency. This is already mitigated by pipelining stores, so that
 +
acknowledgements are only waited during tpc_vote rather than proportionally to
 +
the number of stored objects. It could be further mitigated by considering
 +
multicast (currently, NEO does everything with unicast: TCP).
 +
Although it's not required for all nodes to always have most up-to-date view
 +
of the hash table for reading (besides causing absence-late-discovery
 +
presented above), it will cause increasing problems when writing as nodes go
 +
up and down more often.    -- Vincent
  
        Thanks for the feedback Vincent :-) it sounds like NEO is pretty close
 
        to being SQL-free. As one of the NEO team, what are your thoughts on the
 
        practicality of running Plone in a P2P environment with the latencies
 
        experienced in standard DHT (such as for example those based on
 
        Kademlia) implemtations?  -- Aran
 
 
</pre>
 
</pre>

Revision as of 19:31, 4 January 2011

Lets see what the developers say :-) I sent the following message to the Plone and ZODB developers lists:

Hi, I'm part of a development team who are helping an organisation to
architect a CMS based project that they want to work in a P2P network
rather than using a centralised web-server. We'd prefer to use an
existing popular CMS as a starting point so that it is mature, has a
large development community and a wide range of extensions/modules
available.

From our initial research it seems that Plone should be more capable of
moving in to the P2P space due to it using ZODB rather than SQL and that
ZODB seems able to be connected to a variety of storage mechanisms. I'm
wondering what you guys, the core developers, think of the
practicalities of Plone in P2P, for example could ZODB use a DHT as its
storage layer? what kind of querying is required on the DHT?

We have a good budget available for this and will be developing it as a
completely free open source component, so we'd also like to hear from
developers who may be interested in working on the project too.

Thanks,
Aran



First you should bring up arguments why the existing backends like ZEO,
Relstorage or NEO are not good enough in your case. Looking at the
development history of Relstorage or NEO: implementing an
enterprise-level storage for the ZODB seems to be hard and
time-consuming (and expensive).    -- a j



I have looked at NEO which is the closest thing I've found to the
answer, in fact NEO is why I felt Plone was the best choice of CMS to
inquire further about

The problem is that it uses SQL for its indexing queries (they quote
"NoSQL" as meaning "Not only SQL"). SQL cannot work in P2P space, but
can be made to work on server-clusters.

We intend not to have any machines in our network other than the users
computers running the P2P application. So we would need to know exactly
what kinds of querying ZODB expects to be available in its interface to
the storage layer. DHT's can be slow for the first read but cache
locally after that.    -- Aran



Yes, we use MySQL, and it bites us on both worlds actually:
- in relational world, we irritate developers as we ask questions like "why
does InnoDB load a whole row when we just select primary key columns", which
ends up with "don't store blobs in mysql"
- in key-value world, because NoSQL using MySQL doesn't look consistent

So, why do we use MySQL in NEO ?
We use InnoDB as an efficient BTree implementation, which handles persistence.
We use MySQL as a handy data definition language (NEO is still evolving, we 
need an easy way to tweak table structure when a new feature requires it), but 
we don't need any transactional isolation (each MySQL process used for NEO is 
accessed by only one process through one connection).
We want to stop using MySQL & InnoDB in favour of leaner-and-meaner back-ends.
I would especially like to try kyoto cabinet[1] in on-disk BTree mode, but it 
requires more work than the existing MySQL adaptor and there are more urgent 
tasks in NEO.

Just as a proof-of-concept, NEO can use a Python BTree implementation as an 
alternative (RAM-only) storage back-end. We use ZODB's BTree implementation, 
which might look surprising as it's designed to be stored in a ZODB... But 
they work just as well in-RAM, and that's all I needed for such proof-of-
concept.

Regards, -- V P         [1] http://fallabs.com/kyotocabinet/



Thanks for the feedback Vincent :-) it sounds like NEO is pretty close
to being SQL-free. As one of the NEO team, what are your thoughts on the
practicality of running Plone in a P2P environment with the latencies
experienced in standard DHT (such as for example those based on
Kademlia) implemtations?  -- Aran



Something which may be worthwhile and give you an impression what the  
storage backend does for a common operation would be to instrument the  
ZODB code a bit. Just look at the current FileStorage and add a few log()s  
into its load/store methods. Maybe there are other methods of interest,  
too. Hooking this shouldn't take long.

Then I suggest you generate (or maybe you already have) a well-sized plone  
to test on. Perform a typical request on a site and see what the storage  
is doing. This will give you a solid idea what the storage has to do.

My guess (and I mostly infer this from the zodb code I've looked at) is  
that you can get many storage read requests for something like searching  
the catalog. I guess this will happen, because you get a load() call for  
each BTree bucket that you are traversing. Maybe I am totally wrong of  
course :) However, instrumenting the storage will show.

You might also find there are certain hotspots (like the catalog).  
Depending on their size you could make your users download these  
completely from the cloud before being able to use them. This would reduce  
the number of small randomly-seeking requests a lot.

Another thing you need to consider is implementing the transaction  
functionality. I'm not sure how to do something like this in the cloud or  
even if it's possible at all (given certain performance limitations).

And finally, my experiences using P2P is that it takes a while to build up  
speed for a certain download. And client uplinks are not as fast your  
typical web server's uplink. Also firewalls seem to cause problems  
sometimes (maybe workable with NAT punching?).

Maybe all of this is feasible in your situation, but this depends heavily  
on your requirements, usage and goals. Doing some fast prototyping and  
contacting authors who already wrote papers (or better implementations) is  
probably the best way to get solid a solid idea.

-Matthias



I'm not very optimistic about this I'm afraid. First the problems with
using Plone:

 * Plone relies heavily on its in ZODB indexes of all content
(portal_catalog). This means that every edit will change lots of
objects (without versioning ~15-20, most of which are in the
catalogue).

 * At least with archetypes a content object's data is spread over
multiple objects. (This should be better with Dexterity, though you
will still have multiple objects for locking and workflow)

 * If you use versioning you'll see ~ 100 objects changed in an edit.

 * Even loading the front-page will take a long time - In my
experiments writing an amazon s3 backend for ZODB the extra latency of
fetching each object was really noticeable.

But I'm not sure even a simpler ZODB CMS would be a good fit for a p2p DHT:

 * ZODB is transactional using two phase commit. With p2p latencies,
these commits will be horribly slow - all clients storing changed
objects would need to participate in the transaction.

 * Each client's object cache will need to know about invalidations, I
don't see any way of supplying these from a DHT.

I expect you'd have more success storing content items as single
content objects / pages in the DHT and then generating indexes based
on that. You'll need some way of storing parent - child relationships
between the content objects too, as updating a single list of children
object will be incredibly difficult to get right in a distributed
system.     -- Laurence



> As one of the NEO team, what are your thoughts on the practicality of
> running Plone in a P2P environment with the latencies experienced in
> standard DHT (such as for example those based on Kademlia) implemtations?

First, I must say that we have not run any benchmark on NEO outside LAN 
conditions yet, because there are some issues which need attention before we 
can increase test hostility. To name a few blockers, there is a need for 
"peaceful" deadlock resolution/avoidance when the same set of objects gets 
modified concurrently, and an important "from scratch" replication performance 
issue. Another show-stopper for NEO production-readiness is the lack of backup 
tools, as NEO currently relies on storage back-end tools (eg. mysqldump) and 
on a replication scheme which is not implemented (useful in a all-nodes-in-
datacenter setup, not if nodes are to be scattered around the globe).

This is for the current implementation status, now I'll try to answer from 
NEO's design point of view.

NEO was not designed with international network latency in mind, so I doubt it 
would compete with Kademlia on this metric.

In NEO, each node knows the entire hash table. When loading an object, one 
node known to contain that object is selected and a connection is established 
(if not already available). The highest latency to fetch any piece of data is 
the latency toward the node with most latency (plus extra latency if node 
turns out to be offline, as the next valid node will be attempted). This 
(latency cost & node absence late discovery) can be mitigated by integrating 
node latency in the node weight, computed to select a node to connect to when 
loading an object. So the more there are replicates, the lower worst-case 
latency gets. This is not implemented, but would be a very welcome addition.
When writing an object, a client pushes copies to each and every node supposed 
to contain that object (known via the hash table) and must wait for all 
related acknowledgements, so it will always suffer from the worst-case 
latency. This is already mitigated by pipelining stores, so that 
acknowledgements are only waited during tpc_vote rather than proportionally to 
the number of stored objects. It could be further mitigated by considering 
multicast (currently, NEO does everything with unicast: TCP).
Although it's not required for all nodes to always have most up-to-date view 
of the hash table for reading (besides causing absence-late-discovery 
presented above), it will cause increasing problems when writing as nodes go 
up and down more often.    -- Vincent