Difference between revisions of "Binary traversal"
(pedia statement on trie speed) |
({{legacy}}) |
||
(12 intermediate revisions by the same user not shown) | |||
Line 1: | Line 1: | ||
+ | {{legacy}} | ||
Binary traversal is the algorithm that the [[nodal model]] uses to give a simple [[list space]] functionality of an [[w:Associative array|associative array]]. It does it by mapping the binary digits of an list-item's binary address to a traversal path between list-items in the space, effectively forming a bitwise [[w:Trie|trie]] data structure. | Binary traversal is the algorithm that the [[nodal model]] uses to give a simple [[list space]] functionality of an [[w:Associative array|associative array]]. It does it by mapping the binary digits of an list-item's binary address to a traversal path between list-items in the space, effectively forming a bitwise [[w:Trie|trie]] data structure. | ||
− | Although this process might sound slow, it is very cache-local and highly parallelisable due to the lack of register dependencies and therefore in fact performs excellently on modern out-of-order execution CPUs. A [[w:red-black tree|red-black tree]] for example performs much better on paper, but is highly cache-unfriendly | + | Although this process might sound slow, it is very cache-local and highly parallelisable due to the lack of register dependencies and therefore in fact performs excellently on modern out-of-order execution CPUs. A [[w:red-black tree|red-black tree]] for example performs much better on paper, but is highly cache-unfriendly on modern CPUs which makes that algorithm bound by memory latency rather than CPU speed. |
In comparison, a bitwise trie rarely accesses memory and when it does it does so only to read, thus avoiding SMP cache coherency overhead, and hence is becoming increasingly the algorithm of choice for code which does a lot of insertions and deletions such as memory allocators (e.g. recent versions of the famous [[w:Malloc#dlmalloc and its derivatives|Doug Lea's allocator (dlmalloc) and its descendents]]). | In comparison, a bitwise trie rarely accesses memory and when it does it does so only to read, thus avoiding SMP cache coherency overhead, and hence is becoming increasingly the algorithm of choice for code which does a lot of insertions and deletions such as memory allocators (e.g. recent versions of the famous [[w:Malloc#dlmalloc and its derivatives|Doug Lea's allocator (dlmalloc) and its descendents]]). | ||
− | |||
− | |||
− | |||
== Making a unique bit sequence from a node index == | == Making a unique bit sequence from a node index == | ||
− | The [[node]]s of a [[node space]] are all identified by an | + | The [[node]]s of a [[node space]] are all identified by an integer index which starts at 0 for the first element created (the ''root'' node), and 1 for the next and so on. All integers can be represented as a unique binary sequence which allows them to all exhibit an address based on a unique path, this kind of data structure is called [[Wikipedia:Trie|trie]]. Here is a table showing some integers and their binary and path representations: |
{{code|1= | {{code|1= | ||
<table border=0 width=100> | <table border=0 width=100> | ||
Line 71: | Line 69: | ||
Both the decimal and the binary numbers really have an infinite number of leading zeros before them, but we can still maintain a unique sequence of digits to represent every integer even if we omit all leading zeros as we have in this table. The point is that this rule means that every single number except for 0 always ends with a non-zero digit (numbers are read from right to left). While this may not be of much interest for most number systems, you will notice that with binary it means that every number except zero ends with a 1 - and if we forget about the two first items in the list, then we could drop the last bit altogether. As can be seen from the ''Nodal Path'' column in the list, that's exactly what we do in path traversal. This can't usually work with normal numeric data types because ''0'' and ''00'' evaluate to the same, but a ''sequence'' of binary digits is more like a string comparison of ''"0"'' and ''"00"''. | Both the decimal and the binary numbers really have an infinite number of leading zeros before them, but we can still maintain a unique sequence of digits to represent every integer even if we omit all leading zeros as we have in this table. The point is that this rule means that every single number except for 0 always ends with a non-zero digit (numbers are read from right to left). While this may not be of much interest for most number systems, you will notice that with binary it means that every number except zero ends with a 1 - and if we forget about the two first items in the list, then we could drop the last bit altogether. As can be seen from the ''Nodal Path'' column in the list, that's exactly what we do in path traversal. This can't usually work with normal numeric data types because ''0'' and ''00'' evaluate to the same, but a ''sequence'' of binary digits is more like a string comparison of ''"0"'' and ''"00"''. | ||
− | In the actual implementations of the ''listTraverse'' function, a multiplication by three must also occur to account for the fact that [[list item]]s are groups of three [[node references]]. | + | In the actual implementations of the ''listTraverse'' function, a multiplication by three must also occur to account for the fact that [[list item]]s are groups of three [[node references]]. For example here's the Binary the function shown using the C language. |
+ | {{:LT.c}} | ||
== Zero length traversal == | == Zero length traversal == | ||
− | In the table above showing the conversion of integer | + | In the table above showing the conversion of integer node references into binary paths, we see that the first two indexes, 0 and 1 both translate to a null path which involves no traversal at all, and then index 2 and 3 translate to the paths of "0" and "1" which involve a single iteration of traversal. In the actual implementations of the ''listTraverse'' function an addition of 1 occurs before translation to a binary sequence so that only index 0 results in no traversal, and indexes 1 and 2 result in the two single iteration traversals. |
This ''zero length traversal'' is still valid because the subject node from which traversal would start has a value just like every other [[node]] or [[list item]]. If a non-zero value is returned from ''nodeGetValue'' when passed a zero key, the value is the [[focus]] of the subject node's [[loop]]. | This ''zero length traversal'' is still valid because the subject node from which traversal would start has a value just like every other [[node]] or [[list item]]. If a non-zero value is returned from ''nodeGetValue'' when passed a zero key, the value is the [[focus]] of the subject node's [[loop]]. | ||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
== Eastern philosophy == | == Eastern philosophy == | ||
− | As mentioned above, the meaning of | + | As mentioned above, the meaning of the ''root'' node (the node at address 0) should be reserved for the meaning that is most commonly traversed, then the two single-step traversals reserved for the next most popular. The most popular traversal is the one that goes withinward, called ''loop'' or ''within'', and the next two most popular are ''next'' and ''previous''. |
− | Another more philosophical reason to make the first node be the movement withinward is that if the size of the space is reduced to just a single item having address zero, then the nodal reduction process still makes sense. In such a space, references occupy no bits (a space of size two has single-bit-references), and the single only reference refers to itself. | + | Another more philosophical reason to make the first node be the movement ''withinward'' is that if the size of the space is reduced to just a single item having address zero, then the nodal reduction process still makes sense. In such a space, references occupy no bits (a space of size two has single-bit-references), and the single only reference refers to itself. |
− | The first thing nodal reduction does is start at node 0 and go withinward by following the zero-length traversal, this is set to zero (itself) which is an infinite process. This fits perfectly with the eastern traditions of Taoism, Zen and Advaita in which the ultimate source of everything is nothing and exhibits the attribute of self-reference or awareness-of-awareness. | + | The first thing nodal reduction does is start at node 0 and go withinward by following the zero-length traversal, this is set to zero (itself) which is an infinite process. This fits perfectly with the eastern traditions of [[Taoism]], Zen and Advaita in which the ultimate source of everything is nothing and exhibits the attribute of self-reference or awareness-of-awareness. |
== Notes == | == Notes == | ||
*It may be possible to make traversal work as a [[w:Patricia trie|Patricia trie]] which combines all path segments that have only one direction | *It may be possible to make traversal work as a [[w:Patricia trie|Patricia trie]] which combines all path segments that have only one direction | ||
+ | |||
+ | == See also == | ||
*[[Wikipedia:Trie]] | *[[Wikipedia:Trie]] | ||
+ | *[[w:Patricia trie|Patricia trie]] | ||
+ | *[https://blog.datopia.io/2018/11/03/hitchhiker-tree/ The Hitchhiker tree] | ||
*[[Wikipedia:Linked list]] | *[[Wikipedia:Linked list]] | ||
+ | *[[FFT]] | ||
+ | *[[w:Chinese remainder theorem|Chinese remainder theorem]] ''- Some weird theorem to do with modulo arithmetic closely related to multiplexing from 300AD'' | ||
[[Category:Nodal Concepts]]__NOTOC__ | [[Category:Nodal Concepts]]__NOTOC__ |
Latest revision as of 14:18, 4 December 2019
Binary traversal is the algorithm that the nodal model uses to give a simple list space functionality of an associative array. It does it by mapping the binary digits of an list-item's binary address to a traversal path between list-items in the space, effectively forming a bitwise trie data structure.
Although this process might sound slow, it is very cache-local and highly parallelisable due to the lack of register dependencies and therefore in fact performs excellently on modern out-of-order execution CPUs. A red-black tree for example performs much better on paper, but is highly cache-unfriendly on modern CPUs which makes that algorithm bound by memory latency rather than CPU speed.
In comparison, a bitwise trie rarely accesses memory and when it does it does so only to read, thus avoiding SMP cache coherency overhead, and hence is becoming increasingly the algorithm of choice for code which does a lot of insertions and deletions such as memory allocators (e.g. recent versions of the famous Doug Lea's allocator (dlmalloc) and its descendents).
Making a unique bit sequence from a node index
The nodes of a node space are all identified by an integer index which starts at 0 for the first element created (the root node), and 1 for the next and so on. All integers can be represented as a unique binary sequence which allows them to all exhibit an address based on a unique path, this kind of data structure is called trie. Here is a table showing some integers and their binary and path representations:
Both the decimal and the binary numbers really have an infinite number of leading zeros before them, but we can still maintain a unique sequence of digits to represent every integer even if we omit all leading zeros as we have in this table. The point is that this rule means that every single number except for 0 always ends with a non-zero digit (numbers are read from right to left). While this may not be of much interest for most number systems, you will notice that with binary it means that every number except zero ends with a 1 - and if we forget about the two first items in the list, then we could drop the last bit altogether. As can be seen from the Nodal Path column in the list, that's exactly what we do in path traversal. This can't usually work with normal numeric data types because 0 and 00 evaluate to the same, but a sequence of binary digits is more like a string comparison of "0" and "00".
In the actual implementations of the listTraverse function, a multiplication by three must also occur to account for the fact that list items are groups of three node references. For example here's the Binary the function shown using the C language.
// Start at subject listItem and traverse the key as an association to a new listItem
// - key is also a listItem reference and its binary address is used as the traversal path
// - subject and key (all list-item references are ints starting at item 0)
// - if key is 0 the subject is returned unchanged, key 1 and 2 are the two single-iteration traversals
item listTraverse(item subject, item key) {
if(key++ > 0) {
int i,j;
for(i = 1; i <= key >> 1; i <<= 1) {
j = subject * 3;
if(key & i) j++;
if(space[j] == 0) space[j] = listInsert();
subject = space[j];
}
}
return subject;
}
Zero length traversal
In the table above showing the conversion of integer node references into binary paths, we see that the first two indexes, 0 and 1 both translate to a null path which involves no traversal at all, and then index 2 and 3 translate to the paths of "0" and "1" which involve a single iteration of traversal. In the actual implementations of the listTraverse function an addition of 1 occurs before translation to a binary sequence so that only index 0 results in no traversal, and indexes 1 and 2 result in the two single iteration traversals.
This zero length traversal is still valid because the subject node from which traversal would start has a value just like every other node or list item. If a non-zero value is returned from nodeGetValue when passed a zero key, the value is the focus of the subject node's loop.
Eastern philosophy
As mentioned above, the meaning of the root node (the node at address 0) should be reserved for the meaning that is most commonly traversed, then the two single-step traversals reserved for the next most popular. The most popular traversal is the one that goes withinward, called loop or within, and the next two most popular are next and previous.
Another more philosophical reason to make the first node be the movement withinward is that if the size of the space is reduced to just a single item having address zero, then the nodal reduction process still makes sense. In such a space, references occupy no bits (a space of size two has single-bit-references), and the single only reference refers to itself.
The first thing nodal reduction does is start at node 0 and go withinward by following the zero-length traversal, this is set to zero (itself) which is an infinite process. This fits perfectly with the eastern traditions of Taoism, Zen and Advaita in which the ultimate source of everything is nothing and exhibits the attribute of self-reference or awareness-of-awareness.
Notes
- It may be possible to make traversal work as a Patricia trie which combines all path segments that have only one direction
See also
- Wikipedia:Trie
- Patricia trie
- The Hitchhiker tree
- Wikipedia:Linked list
- FFT
- Chinese remainder theorem - Some weird theorem to do with modulo arithmetic closely related to multiplexing from 300AD