Difference between revisions of "Binary traversal"

From Organic Design wiki
(Remove obsolete comment)
({{legacy}})
 
(42 intermediate revisions by 2 users not shown)
Line 1: Line 1:
[[Category:Documents]]
+
{{legacy}}
The array elements of the @space are all identified by an integer index which starts at 0 for the first element created, and one for the next and so on. All integers can be represented as a unique binary sequence as in the ''decimal'' and ''binary'' columns of the following list of example index numbers:
+
Binary traversal is the algorithm that the [[nodal model]] uses to give a simple [[list space]] functionality of an [[w:Associative array|associative array]]. It does it by mapping the binary digits of an list-item's binary address to a traversal path between list-items in the space, effectively forming a bitwise [[w:Trie|trie]] data structure.
<table class=layout style="text-align:right;">
+
 
 +
Although this process might sound slow, it is very cache-local and highly parallelisable due to the lack of register dependencies and therefore in fact performs excellently on modern out-of-order execution CPUs. A [[w:red-black tree|red-black tree]] for example performs much better on paper, but is highly cache-unfriendly on modern CPUs which makes that algorithm bound by memory latency rather than CPU speed.
 +
 
 +
In comparison, a bitwise trie rarely accesses memory and when it does it does so only to read, thus avoiding SMP cache coherency overhead, and hence is becoming increasingly the algorithm of choice for code which does a lot of insertions and deletions such as memory allocators (e.g. recent versions of the famous [[w:Malloc#dlmalloc and its derivatives|Doug Lea's allocator (dlmalloc) and its descendents]]).
 +
 
 +
== Making a unique bit sequence from a node index ==
 +
The [[node]]s of a [[node space]] are all identified by an integer index which starts at 0 for the first element created (the ''root'' node), and 1 for the next and so on. All integers can be represented as a unique binary sequence which allows them to all exhibit an address based on a unique path, this kind of data structure is called [[Wikipedia:Trie|trie]]. Here is a table showing some integers and their binary and path representations:
 +
{{code|1=
 +
<table border=0 width=100>
  
 
<tr><th>Decimal&nbsp;&nbsp;&nbsp;<th>Binary&nbsp;&nbsp;&nbsp;<th>Nodal Path
 
<tr><th>Decimal&nbsp;&nbsp;&nbsp;<th>Binary&nbsp;&nbsp;&nbsp;<th>Nodal Path
 
<tr><td style="text-align:right;">0&nbsp;&nbsp;&nbsp;
 
<tr><td style="text-align:right;">0&nbsp;&nbsp;&nbsp;
 
<td style="text-align:right;">0&nbsp;&nbsp;&nbsp;
 
<td style="text-align:right;">0&nbsp;&nbsp;&nbsp;
<td style="text-align:right;">''never traversed''
+
<td style="text-align:right;">-
  
 
<tr><td style="text-align:right;">1&nbsp;&nbsp;&nbsp;
 
<tr><td style="text-align:right;">1&nbsp;&nbsp;&nbsp;
 
<td style="text-align:right;">1&nbsp;&nbsp;&nbsp;
 
<td style="text-align:right;">1&nbsp;&nbsp;&nbsp;
<td style="text-align:right;">''never traversed''
+
<td style="text-align:right;">-
  
 
<tr><td style="text-align:right;">2&nbsp;&nbsp;&nbsp;
 
<tr><td style="text-align:right;">2&nbsp;&nbsp;&nbsp;
Line 23: Line 31:
 
<td style="text-align:right;">100&nbsp;&nbsp;&nbsp;
 
<td style="text-align:right;">100&nbsp;&nbsp;&nbsp;
 
<td style="text-align:right;">0-0
 
<td style="text-align:right;">0-0
 +
 +
<tr><td style="text-align:right;">&#8942;&nbsp;&nbsp;&nbsp;
 +
<td style="text-align:right;">&#8942;&nbsp;&nbsp;&nbsp;
 +
<td style="text-align:right;">&#8942;
  
 
<tr><td style="text-align:right;">100&nbsp;&nbsp;&nbsp;
 
<tr><td style="text-align:right;">100&nbsp;&nbsp;&nbsp;
 
<td style="text-align:right;">1100100&nbsp;&nbsp;&nbsp;
 
<td style="text-align:right;">1100100&nbsp;&nbsp;&nbsp;
 
<td style="text-align:right;">0-0-1-0-0-1
 
<td style="text-align:right;">0-0-1-0-0-1
 +
 +
<tr><td style="text-align:right;">&#8942;&nbsp;&nbsp;&nbsp;
 +
<td style="text-align:right;">&#8942;&nbsp;&nbsp;&nbsp;
 +
<td style="text-align:right;">&#8942;
  
 
<tr><td style="text-align:right;">1000&nbsp;&nbsp;&nbsp;
 
<tr><td style="text-align:right;">1000&nbsp;&nbsp;&nbsp;
 
<td style="text-align:right;">1111101000&nbsp;&nbsp;&nbsp;
 
<td style="text-align:right;">1111101000&nbsp;&nbsp;&nbsp;
 
<td style="text-align:right;">0-0-0-1-0-1-1-1-1
 
<td style="text-align:right;">0-0-0-1-0-1-1-1-1
 +
 +
<tr><td style="text-align:right;">&#8942;&nbsp;&nbsp;&nbsp;
 +
<td style="text-align:right;">&#8942;&nbsp;&nbsp;&nbsp;
 +
<td style="text-align:right;">&#8942;
  
 
<tr><td style="text-align:right;">1023&nbsp;&nbsp;&nbsp;
 
<tr><td style="text-align:right;">1023&nbsp;&nbsp;&nbsp;
Line 45: Line 65:
  
 
</table>
 
</table>
Both the decimal and the binary numbers really have an infinite number of leading zeros before them, but we can still maintain a unique sequence of digits to represent every integer even if we omit all leading zeros as we have in this table. The point is that this rule means that every single number except for 0 always ends (when reading from right to left) with a non-zero digit. While this may not be of much interest for most number systems, you will notice that with binary it means that every number except zero ends with a 1 - and if we forget about the two first items in the list, then we could drop the last bit altogether. As can be seen from the ''Nodal Path'' column in the list, that's exactly what we do in path traversal. This can't usually work with normal numeric data types because ''0'' and ''00'' evaluate to the same, but a ''sequence'' of binary digits is more like a string comparison of ''"0"'' and ''"00"''.
+
}}
 +
 
 +
Both the decimal and the binary numbers really have an infinite number of leading zeros before them, but we can still maintain a unique sequence of digits to represent every integer even if we omit all leading zeros as we have in this table. The point is that this rule means that every single number except for 0 always ends with a non-zero digit (numbers are read from right to left). While this may not be of much interest for most number systems, you will notice that with binary it means that every number except zero ends with a 1 - and if we forget about the two first items in the list, then we could drop the last bit altogether. As can be seen from the ''Nodal Path'' column in the list, that's exactly what we do in path traversal. This can't usually work with normal numeric data types because ''0'' and ''00'' evaluate to the same, but a ''sequence'' of binary digits is more like a string comparison of ''"0"'' and ''"00"''.
 +
 
 +
In the actual implementations of the ''listTraverse'' function, a multiplication by three must also occur to account for the fact that [[list item]]s are groups of three [[node references]]. For example here's the Binary the function shown using the C language.
 +
{{:LT.c}}
 +
 
 +
== Zero length traversal ==
 +
In the table above showing the conversion of integer node references into binary paths, we see that the first two indexes, 0 and 1 both translate to a null path which involves no traversal at all, and then index 2 and 3 translate to the paths of "0" and "1" which involve a single iteration of traversal. In the actual implementations of the ''listTraverse'' function an addition of 1 occurs before translation to a binary sequence so that only index 0 results in no traversal, and indexes 1 and 2 result in the two single iteration traversals.
 +
 
 +
This ''zero length traversal'' is still valid because the subject node from which traversal would start has a value just like every other [[node]] or [[list item]]. If a non-zero value is returned from ''nodeGetValue'' when passed a zero key, the value is the [[focus]] of the subject node's [[loop]].
 +
 
 +
== Eastern philosophy ==
 +
As mentioned above, the meaning of the ''root'' node (the node at address 0) should be reserved for the meaning that is most commonly traversed, then the two single-step traversals reserved for the next most popular. The most popular traversal is the one that goes withinward, called ''loop'' or ''within'', and the next two most popular are ''next'' and ''previous''.
 +
 
 +
Another more philosophical reason to make the first node be the movement ''withinward'' is that if the size of the space is reduced to just a single item having address zero, then the nodal reduction process still makes sense. In such a space, references occupy no bits (a space of size two has single-bit-references), and the single only reference refers to itself.
 +
 
 +
The first thing nodal reduction does is start at node 0 and go withinward by following the zero-length traversal, this is set to zero (itself) which is an infinite process. This fits perfectly with the eastern traditions of [[Taoism]], Zen and Advaita in which the ultimate source of everything is nothing and exhibits the attribute of self-reference or awareness-of-awareness.
 +
 
 +
== Notes ==
 +
*It may be possible to make traversal work as a [[w:Patricia trie|Patricia trie]] which combines all path segments that have only one direction
  
So that's how the binary conversion from a decimal index goes, but there's one more inefficiency to remove; every item in list-space is composed of three list-item-keys (which are each integer indexes of other items) and each index points to the middle of the three. So the index of the root item is 1, the next item is 4, then 7, 10, 13... so as you can see, a division by three can be done to ensure that we're not missing out any binary paths. Actually we subtract one, then divide by three giving us a proper zero-based integer range of binary paths.
+
== See also ==
*'''See Also:''' [[Wikipedia:Trie]]
+
*[[Wikipedia:Trie]]
 +
*[[w:Patricia trie|Patricia trie]]
 +
*[https://blog.datopia.io/2018/11/03/hitchhiker-tree/ The Hitchhiker tree]
 +
*[[Wikipedia:Linked list]]
 +
*[[FFT]]
 +
*[[w:Chinese remainder theorem|Chinese remainder theorem]] ''- Some weird theorem to do with modulo arithmetic closely related to multiplexing from 300AD''
 +
[[Category:Nodal Concepts]]__NOTOC__

Latest revision as of 14:18, 4 December 2019

Legacy.svg Legacy: This article describes a concept that has been superseded in the course of ongoing development on the Organic Design wiki. Please do not develop this any further or base work on this concept, this is only useful for a historic record of work done. You may find a link to the currently used concept or function in this article, if not you can contact the author to find out what has taken the place of this legacy item.

Binary traversal is the algorithm that the nodal model uses to give a simple list space functionality of an associative array. It does it by mapping the binary digits of an list-item's binary address to a traversal path between list-items in the space, effectively forming a bitwise trie data structure.

Although this process might sound slow, it is very cache-local and highly parallelisable due to the lack of register dependencies and therefore in fact performs excellently on modern out-of-order execution CPUs. A red-black tree for example performs much better on paper, but is highly cache-unfriendly on modern CPUs which makes that algorithm bound by memory latency rather than CPU speed.

In comparison, a bitwise trie rarely accesses memory and when it does it does so only to read, thus avoiding SMP cache coherency overhead, and hence is becoming increasingly the algorithm of choice for code which does a lot of insertions and deletions such as memory allocators (e.g. recent versions of the famous Doug Lea's allocator (dlmalloc) and its descendents).

Making a unique bit sequence from a node index

The nodes of a node space are all identified by an integer index which starts at 0 for the first element created (the root node), and 1 for the next and so on. All integers can be represented as a unique binary sequence which allows them to all exhibit an address based on a unique path, this kind of data structure is called trie. Here is a table showing some integers and their binary and path representations:

Decimal   Binary   Nodal Path
0    0    -
1    1    -
2    10    0
3    11    1
4    100    0-0
⋮    ⋮   
100    1100100    0-0-1-0-0-1
⋮    ⋮   
1000    1111101000    0-0-0-1-0-1-1-1-1
⋮    ⋮   
1023    1111111111    1-1-1-1-1-1-1-1-1
1024    10000000000    0-0-0-0-0-0-0-0-0-0
1025    10000000001    1-0-0-0-0-0-0-0-0-0

Both the decimal and the binary numbers really have an infinite number of leading zeros before them, but we can still maintain a unique sequence of digits to represent every integer even if we omit all leading zeros as we have in this table. The point is that this rule means that every single number except for 0 always ends with a non-zero digit (numbers are read from right to left). While this may not be of much interest for most number systems, you will notice that with binary it means that every number except zero ends with a 1 - and if we forget about the two first items in the list, then we could drop the last bit altogether. As can be seen from the Nodal Path column in the list, that's exactly what we do in path traversal. This can't usually work with normal numeric data types because 0 and 00 evaluate to the same, but a sequence of binary digits is more like a string comparison of "0" and "00".

In the actual implementations of the listTraverse function, a multiplication by three must also occur to account for the fact that list items are groups of three node references. For example here's the Binary the function shown using the C language.

// Start at subject listItem and traverse the key as an association to a new listItem
// - key is also a listItem reference and its binary address is used as the traversal path
// - subject and key (all list-item references are ints starting at item 0)
// - if key is 0 the subject is returned unchanged, key 1 and 2 are the two single-iteration traversals
item listTraverse(item subject, item key) {
	if(key++ > 0) {
		int i,j;
		for(i = 1; i <= key >> 1; i <<= 1) {
			j = subject * 3;
			if(key & i) j++;
			if(space[j] == 0) space[j] = listInsert();
			subject = space[j];
		}
	}
	return subject;
}

Zero length traversal

In the table above showing the conversion of integer node references into binary paths, we see that the first two indexes, 0 and 1 both translate to a null path which involves no traversal at all, and then index 2 and 3 translate to the paths of "0" and "1" which involve a single iteration of traversal. In the actual implementations of the listTraverse function an addition of 1 occurs before translation to a binary sequence so that only index 0 results in no traversal, and indexes 1 and 2 result in the two single iteration traversals.

This zero length traversal is still valid because the subject node from which traversal would start has a value just like every other node or list item. If a non-zero value is returned from nodeGetValue when passed a zero key, the value is the focus of the subject node's loop.

Eastern philosophy

As mentioned above, the meaning of the root node (the node at address 0) should be reserved for the meaning that is most commonly traversed, then the two single-step traversals reserved for the next most popular. The most popular traversal is the one that goes withinward, called loop or within, and the next two most popular are next and previous.

Another more philosophical reason to make the first node be the movement withinward is that if the size of the space is reduced to just a single item having address zero, then the nodal reduction process still makes sense. In such a space, references occupy no bits (a space of size two has single-bit-references), and the single only reference refers to itself.

The first thing nodal reduction does is start at node 0 and go withinward by following the zero-length traversal, this is set to zero (itself) which is an infinite process. This fits perfectly with the eastern traditions of Taoism, Zen and Advaita in which the ultimate source of everything is nothing and exhibits the attribute of self-reference or awareness-of-awareness.

Notes

  • It may be possible to make traversal work as a Patricia trie which combines all path segments that have only one direction

See also