Organic Design AI
This article is about how we at Organic Design intend to use Artificial Intelligence in relation to implementing the Holarchy concept.
Holarchy is the foundation ontology within the LangChain schema, so the agents are GATO compliant and operate within the context of the unified holarchy as their worldview and universe of discourse
(Old notes follow)
The project we're currently basing our AI on is Open Assistant (by LAION, Large-scale Artificial Intelligence Open Network) which is a 100% open source LLM chat AI. Some of the plans we have that I'll talk about in this section are beyond the current capabilities of Open Assistant. But these use cases are well within the ballpark of ChatGPT4's current (mid 2023) capabilities, and we're quite certain that there'll be a number of libre AI options that will match this level of capability within a year or so.
Now that we have a fully open source trustable AI available, we can begin using it in our own organisation, allowing it to have access to all the organisations historical information, internal communications and activity stream. This will allow the assistant to actually assist in real day-to-day ways, such as helping with documentation and reports, writing blog posts, notifying us about out-of-date content and ensuring that new information is linked to from relevant contexts to name a few things.
Having a version of our AI running on local LANs and even devices is important for us, because local assistants can be trusted with personal data, and we be rest assured that private data never leaves the device. The more external a service, the more the content must be limited to aggregate data and statistics.
One of the most important aspects of what we need AI for in our organisation is representation maintenance (which is discussed in more detail below in the holarchy section) which we could enforce to be done by code written and maintained by the AI agent rather than it processing the data directly (it needs to work like this anyway so that AI query load is minimised). The classification aspect is aggregate anyway, but again it should be handled through code that the AI maintains, never direct.
We can start by using a central server, while building the pipelines in preparation for distribution of AI agents (which will work just like changing AI models in any context, it's simply seen as another specialist). Each context can have any number of AI agents of various models and roles, some external, some local. One aspect of this connection job is to try and ensure the context is able to continue without AI...
We may also want to use remote AI like ChatGPT for some things which are too difficult for Open Assistant like some connector code. But even when no AIs can write successful code, at least it would have created a decent well-commented in-context boiler plate and expressed demand for human developer attention.
This is just the idea that AIs are themselves just instantiatable API endpoints. It just so happens that this particular API opens up access to many more connectables... but also many pre-instantiated connections could comes pre-packaged, and AIs implements new connections such that they do not require ongoing AI presence in order to keep working (but without AI they won't be serviced).
Our primary aim is to make OD into a unified AI enabled organisation in the form of a holarchy. Remember that the holarchy starts as an ontology of the resource types, languages and their instances used throughout the network.
This is essentially a normal AI-powered distributed application model at the most general level. But done in a very AI-agnostic way so that we can stay very flexible. But the key point here is we're optimising our system in the context of OD being an AI user, not an AI software or model developer.
What that means is that our AI will be used to maintain ans interact with live representations of the main organisational information, entities and users making up our system. Maintaining history and evolving knowledge, tools and interface ecosystem upon those representations. And maintaining up to date versions of the underlying AI models and extensions while also maintaining the integrity of its own experiential history (history of conversations and prompt-structure context).
- long term memory specialising in all things OD and holarchy
- maximal connection into and abstraction of our informational representation
- auto classification ("tagging") of our structure
- to manage ODs threads of operation productively and with clarity
- to expand to more threads, such as news, transcribing and research
- introducing a common way of organising based in the multiplexing concept
GPU server costs
Currently running an Open Assistant that can serve a single user has a minimum requirement of:
- Any industry standard CPU is fine
- 32GB RAM
- 500GB fast SSD storage
- A single GPU with 80GB of VRAM
As of mid 2023, we may have to settle for only 48GB or even 24GB of VRAM which means we won't be able to process with many tokens. But it should only be a matter of a few months before 80GB comes into our price range, the Open Assistant software is fixed to spread processing across multiple GPUs, or new more efficient models become available that can perform well with tighter resource constraints.
Constantly falling costs
The cost of compute power is sure to keep reducing as there are many new technologies coming to market and in development, such as cheaper ways of producing the current chips, and new kinds of chips that are much more powerful coming on the scene.
There are also constant efficiency gains at the software level being realised, so that models with more parameters can be made with less computational overhead, models are being made more effective with less parameters, and made more tokens are able to be generated with less memory. The Vicuna chatbot is a great example of this.
Efficient scaling of GPU resource
At some point we may want to scale up our operation so that it can server more concurrent connections to do more different tasks or serve more clients.
The most cost-efficient way to scale is to design our compute infrastructure in such a way that the vast majority of our GPUs are able to be spontaneously removed with only 30 seconds notice (enough time to finish a current "atomic" workload). GPU resource covered by this kind of SLA has an extremely reduced cost, often over 90% reduction.
Our technical AI infrastructure requirements
All these following items are what we need to find fully open source solutions to that we can run independently on our own server. Packaged up for easy installation on new servers, and in a modular way so we can swap components and models easily within a. Eventually all this should be able to run on local LANs, and even local devices one day.
- GPT-3.5 level chat API running on our own server
- Note that the privacy of our own server is not needed for the representation maintenance because the AI maintains the implementation, not the data. The privacy is required for the analysis and categorisation of local data, such as transcribing meetings and classifying activity.
- Matrix bot interface
- Persistent long-term memory and learning independent of the underlying model (can be built out on top of new models)
- e.g. see Chroma and Weaviate and the Vicuna chatbot which seems to be doing everything we need including being able to query any of a number of active models.
- Our foundation ontology as an overall automation and organisational context
- Extending an AutoGPT type environment
- Incorporating other connectors like Zapier
- Speech-to-text and text-to-speech running independently on our own server
The first thing the prompt structure needs to do is describe the general nature of the closures it will find itself within:
- how does it use its extensions to obtain it's holarchy "CWD"
- how does it establish its workload from the context
- how does it put the execution environments it has access to into the context
- how to use instances in the context (they have descriptions like any API)
- how to make workflows in the context (new instances)