A Theory Of Meta-Intelligence

Ron Itelman
24 min readFeb 8, 2022


The continuum of which states are ‘good’ or ‘bad’ relative to objectives are concepts defined in psychometric frameworks. Force = mass * acceleration, and we can view intelligent systems (humans, machines, and cybernetic) measurable as the force moving from one state to another in a continuum. This reward and state system must be designed and the benefit is creates a quantitative way to define outcomes relative to objectives through a state machine paradigm. Mass can be defined as the number of bits changed between two states in points in time. Therefore, we can change the number of bits changed relative to reward, or we can change the acceleration (or both) to increase the force of our generalizable and universal definition of intelligence. This is a similar approach as dynamic programming, where we hierarchically decompose states into smaller components.

Working in strategy for analytics, ML, and human learning products, I’ve been asking “how do we apply principles from intelligence to creating intelligent systems”?

This theory, named “Meta-Intelligence” is a hypothesis for how to design and measure the intelligence of human+machine learning network systems.

In order to do so, a unique paradigm emerges to unite business strategy, experience design, and software development perspectives into an organizational-scale intelligence.

This isn’t perfect, I am neither a physicist or data scientist, and I imagine I will be working on this theory for the rest of my life because it is an incredible field to study. However, this is grounded in 1st principles from working with some amazing learning scientists (who focus on designing measurable human learning frameworks), computational psychometricians (who focus on creating statistical models to predict how people think, what they need to learn, and how to assess them), and ML engineers (who focus on creating predictive learning models to create personalized and adaptive experiences).

The reason this field if so incredibly exciting to me is because it proposes a way to create a singular intelligence system for an entire organization and even networks of organizations, as a holistic entity. I believe that this paradigm is the paradigm of the future with the potential to change the world, hopefully for the good.

My background is in product management, product strategy, and now data strategy, helping organizations think through how to leverage their data to drive outcomes. I’m currently focused in the Life Sciences, and my deep hope is that we can use this theory to help life science companies exponentially improve patient health.


We view intelligence as similar to a force, f=ma. In other words, given a system’s state space (mass) we measure the rate of change in entropy reduction (acceleration) towards states conceptually defined as maximizing reward. Goals setting and reward mechanisms used in Reinforcement Learning principles will be covered later on in this paper.

Using this perspective, we can derive metrics for this measure and define intelligence simply. If “System 1” can learn to solve problems and achieve goals at a greater acceleration than “System 2”, then this paradigm says “System 1” is more intelligent. This allows the system to have an infinite number of tasks or goals to achieve, and the domain is irrelevant. You could define the measure of intelligence as a comparison between two systems for one task, or a comparison between two systems to learn as many new tasks.

The idea of using a physics-based approach in ML is not new, and we will follow RL principles as well in this theory. Energy-based models are a way to deconstruct problems into data representations that can be used in Markov Decision Processes to make predictions:

“An energy-based model (EBM) is a form of generative model (GM) imported directly from statistical physics to learning.”


The direction is the rate of change in velocity in the entropy of a system’s state space. Image below. If it ‘accelerates forward’ then the rate of entropy reduced over time increases, and if it ‘accelerates backward’ then the rate of entropy increasing over time increases:

E stands for Entropy, and T stands for time. Entropy is reducing over time in this example in an accelerating forward curve.

The number of bits in a state space that are used to construct concept representations used by an entity. For example, if ‘perfection’ is defined as having 1 bit of information in the state of ‘1’, then that state space has a mass of ‘1'. If the state is at ‘0’, the mass of that state space is still 1, it is just not optimized for maximum reward. If the state space space has 5 bits of information (_ _ _ _ _ for example), then mass is equal to ‘5’ bits. We think of a bit as a type of Planck length unit of information.

If an organization A has 1 Billion bits in a state space to describe 10,000 business concepts and processes, and organization B has 1 Trillion bits to describe the exact same number of business concepts and processes, it has a 1,000x higher mass.

This has implications to granularity and representation of information, and the number of parameters used to describe concepts, and the predictive capabilities resulting from the mass of conceptual representations.

Example of how an energy model approach is applied to learning in action spaces that change (which differentiate fixed systems like a chess game to a dynamic complex systems like a business where rules and possible actions can shift):

Learning in continuous action space for developing high dimensional potential energy models

Nature Communications volume 13, Article number: 368 (2022)

Reinforcement learning (RL) approaches that combine a tree search with deep learning have found remarkable success in searching exorbitantly large, albeit discrete action spaces, as in chess, Shogi and Go. Many real-world materials discovery and design applications, however, involve multi-dimensional search problems and learning domains that have continuous action spaces. Exploring high-dimensional potential energy models of materials is an example. Traditionally, these searches are time consuming (often several years for a single bulk system) and driven by human intuition and/or expertise and more recently by global/local optimization searches that have issues with convergence and/or do not scale well with the search dimensionality. Here, in a departure from discrete action and other gradient-based approaches, we introduce a RL strategy based on decision trees that incorporates modified rewards for improved exploration, efficient sampling during playouts and a “window scaling scheme” for enhanced exploitation, to enable efficient and scalable search for continuous action space problems. Using high-dimensional artificial landscapes and control RL problems, we successfully benchmark our approach against popular global optimization schemes and state of the art policy gradient methods, respectively. We demonstrate its efficacy to parameterize potential models (physics based and high-dimensional neural networks) for 54 different elemental systems across the periodic table as well as alloys. We analyze error trends across different elements in the latent space and trace their origin to elemental structural diversity and the smoothness of the element energy surface. Broadly, our RL strategy will be applicable to many other physical science problems involving search over continuous action spaces.

Energy is the number changes in bits over two points in time. In other words if 100 bits of information in a system are changed from a 0 to a 1, or a 1 to a 0, then this entity has an energy of 100 bits. The mass of the entity might have 500 bits, and only 5 bits changed, or perhaps all 500 bits changed. Therefore there is a relationship of energy to mass in a state space between any two points in time.

If the energy of a business process is the change of all 500 bits of information, then in effect this is kind of like ‘gravity’, it is an exerted influence, whereas if the energy is only 1 bit of information , that is very little ‘gravity’.

If there are 500 different agents in a system with a total mass of 1 Billion bits of possible state spaces, and one agent has the potential energy of 1,000 bits of information, and another agent has the potential energy of 10 bits of information, it has more influence over the system and impact of other agents within a system.

There are many books on entropy, and different kinds of entropy, and therefore this is a simplistic approach. We define entropy as the amount of uncertainty of micro-states within a macro-state.

“the system evolves toward the macrostate with the greatest number of microstates”

The Macro and Micro of it Is that Entropy Is the Spread of Energy
https://aapt.scitation.org › doi

or a more physics-based explanation:

Ludwig Boltzmann defined entropy as a measure of the number of possible microscopic states (microstates) of a system in thermodynamic equilibrium, consistent with its macroscopic thermodynamic properties, which constitute the macrostate of the system

The short version is imagine throwing a handful of marbles on the ground. It is more likely that the marbles will randomly spread out on the ground (disorder), this is high-entropy. However, if the marbles all land perfectly packed together tightly in a grid formation that would be very low entropy (order). The disordered version has a much higher probability, the micro-states of marble positions are much more random (or uncertain). The highly ordered version has a really low probability of happening, because to get in that type of formation, you would have to hand-position them, it wouldn’t happen in nature. For something like a cloud, the molecules (micro state) are spread out randomly across a room (in the macro state), and if all molecules are spread out mostly evenly, that would be maximum equilibrium. This would be the opposite of all the molecules being tightly packed in a corner and staying there (like the marbles).

This tendency is a universal law of information from a physics perspective, and we view the bits of information which make up knowledge in an organization as being distributed in highly ordered or random positions.

In order to make predictions with confidence, we need to minimize the entropy for all variables in the algorithm. When discussing systems in infinite games like organizations, there is a lot of entropy to deal with. This is different than a game of chess, with a set number of rules, a clear winner and loser, and fully defined actions and possible states.

When we discuss ‘direction’ in our f=ma approach of intelligence as a type of force, what we are trying to do is a have a linear measure of entropy in a system that can be graphed over time. In order to make a linear measure we need to abstract our starting state followed by a continuum of more desirables states (higher reward levels) to an end optimal state. This can be done by abstracting objectives (business or otherwise) in a state machine model.

Psychometrics + Signal Propagation In Networks

Things like “reward”, “objectives”, and “success” are really human concepts. In nature, we have objectives like “eat” because if we don’t, we die. We have the desire to live which necessitates avoiding pain and the damage it signals as harmful events. The brain manages and hormones are triggered by events that are biologically coded into us. We develop psychological constructs and models of the world to move through the world to achieve our objectives and maximize our reward (which is inside of us).

We don’t consider rocks as intelligent. If a rock gets smashed, there’s no “IT” to care. “IT” has no objective. So we view the human definition of intelligence as biased by the human experience in a biased definition of intelligence — relevant to reward is contextual to objective and not a random process by the environment in which an agent exists (biological entities are in an environment with rules that drive pain, pleasure, life, death, energy to survive). If we say a ‘virus’ is intelligent, it is because we are projecting onto the virus our biologically driven objective of replicating and continuing to live. I doubt a virus has the notion it is alive and desires to spread, but rather is a mechanistic biological process functioning at cellular and molecular levels. Viruses are intelligently designed because of their incredible efficiency to propagate and the massive influence they have. But I wouldn’t call a virus intelligent. It is our lens of intelligent we are layering over the concept of a virus.

There are arguments in computer science related to psychometric vs non-psychometric approaches. After reviewing multiple papers (one of the best that propose an approach that decouples from psychometrics is recommended below), we are defining intelligence as a conceptual construct that involves ‘objectives’ and ‘rewards’, which can be driven by both business, computational, or biological functions. However, we agree with Chollet’s approach and theories in general. One impression reading his paper on assessments of intelligence is that tasks can be reduced to a type of “QR code”, and are completely generalizable from anything else, which we will explore later in this paper as well.

On the Measure of Intelligence
-François Chollet

To make deliberate progress towards more intelligent and more human-like artificial systems, we need to be following an appropriate feedback signal: we need to be able to define and evaluate intelligence in a way that enables comparisons between two systems, as well as comparisons with humans. Over the past hundred years, there has been an abundance of attempts to define and measure intelligence, across both the fields of psychology and AI. We summarize and critically assess these definitions and evaluation approaches, while making apparent the two historical conceptions of intelligence that have implicitly guided them. We note that in practice, the contemporary AI community still gravitates towards benchmarking intelligence by comparing the skill exhibited by AIs and humans at specific tasks such as board games and video games. We argue that solely measuring skill at any given task falls short of measuring intelligence, because skill is heavily modulated by prior knowledge and experience: unlimited priors or unlimited training data allow experimenters to “buy” arbitrary levels of skills for a system, in a way that masks the system’s own generalization power. We then articulate a new formal definition of intelligence based on Algorithmic Information Theory, describing intelligence as skill-acquisition efficiency and highlighting the concepts of scope, generalization difficulty, priors, and experience. Using this definition, we propose a set of guidelines for what a general AI benchmark should look like. Finally, we present a benchmark closely following these guidelines, the Abstraction and Reasoning Corpus (ARC), built upon an explicit set of priors designed to be as close as possible to innate human priors. We argue that ARC can be used to measure a human-like form of general fluid intelligence and that it enables fair general intelligence comparisons between AI systems and humans.

The final part of our theory relates to wrapping the human concept of intelligence into a computational framework that is generalizable to humans and machines.


The concepts of “memes” comes from the idea of thinking about ideas as genes that are spread among communities as viruses. And societies and cultures are abstract concepts that share stories and concepts.

If a human is at the mezzo-scale, the scale of societies is at the macro-scale, and concepts are at a ‘mega-scale’. There is no actual tangible physical object of ‘good’, or ‘ideas’, yet at the same time, we can computationally represent them, and they definitely are communicated across societies and have major impacts on life. For example, the concept of ‘revenge’ has driven war and death, with massive localized impact. Concepts permeate everything an organization does, and because of ML we have begun to think about how to organize concepts in a way that machines can learn from to create recommendation engines on what movie to see, or what document to read, in order to achieve goals.

If we reduce concepts into a “QR code” approach, and map it to localized organizational tasks, we have created a type of field, a bridge between the conceptual world and the physical world. We could measure the impact of concepts, the probabilities related to concepts in systems at certain states at certain localities.

The most interesting paper I have attempted to read and understand had a video made by the authors which introduced how concepts can be communicated over cell phones and can have many representations pointing to the same thing, and I recommend their work:

Constructor Theory of Information
-David Deutsch, Chiara Marletto

We present a theory of information expressed solely in terms of which transformations of physical systems are possible and which are impossible — i.e. in constructor-theoretic terms. Although it includes conjectured laws of physics that are directly about information, independently of the details of particular physical instantiations, it does not regard information as an a priori mathematical or logical concept, but as something whose nature and properties are determined by the laws of physics alone. It does not suffer from the circularity at the foundations of existing information theory (namely that information and distinguishability are each defined in terms of the other). It explains the relationship between classical and quantum information, and reveals the single, constructor-theoretic property underlying the most distinctive phenomena associated with the latter, including the lack of in-principle distinguishability of some states, the impossibility of cloning, the existence of pairs of variables that cannot simultaneously have sharp values, the fact that measurement processes can be both deterministic and unpredictable, the irreducible perturbation caused by measurement, and entanglement (locally inaccessible information).

One thing became apparent in product strategy and design: that different teams of an organization are totally siloed. They used different language, different concepts, and represented concepts differently in their respective domains and scales. Let’s define bits as Planck-scale, database structures nano-scale, code in applications micro-scale, user experiences as mezzo-scale, the logical business rules and processes set by leaders for the organization as macro-scale, networks of organizations as mega-scale, and information about a system that connects all of these at all levels with universal design principles as meta-scale.

When we could get teams to agree to use a common schema of information at all levels, tremendous amounts of efficiency was gained. We believe the more ‘fractal symmetry’ of the form of information at multiple scales among humans, machines, business logic, etc., the more efficient and effective it becomes. However, if not designed well in this kind of systems, consequences can be more efficient at being negative, which is why they must be done correctly.

If processes, databases, APIs, applications, and user experiences are synchronized together so that information dynamically flows symmetrically over time throughout the organization that will have a lot less entropy that if teams operate in silos using different conceptual representations. So the easiest way to think of the fractal nature of information is to think of concepts as musical notes and tempo. If people are all playing in different scales and different tempos, it sounds awful, a chaotic mess. If everyone agrees to the tempo and musical scale in advance, then music becomes harmonious, and resonant events can occur, which have even greater energy and efficiency.

If you can create one representation in one system that is shared at multiple scales and multiple teams, that information compresses the amount of systems that was previously required to achieve that task. This is similar to being able to compress an image and recreate it with a smaller number of bits.

Designing Systems Using A Meta-Intelligence Theoretical Approach

When trying to transform an organization, typically we have to break down their processes into granular enough units so that we can first measure entropy so that we can represent the parameters affected by the ‘gravity’ or ‘energy’ of the concepts associated with tasks employees have.

So if a company wants to develop a customer management system, and the concepts they currently use have a lot of inefficient and dependent steps, we would say that has a lot of complexity relative to representing the state space.

This has ties to Reinforcement Learning as Shane Legg uses it in his paper:

Universal Intelligence: A Definition of Machine Intelligence

Shane Legg, Marcus Hutter

A fundamental problem in artificial intelligence is that nobody really knows what intelligence is. The problem is especially acute when we need to consider artificial systems which are significantly different to humans. In this paper we approach this problem in the following way: We take a number of well known informal definitions of human intelligence that have been given by experts, and extract their essential features. These are then mathematically formalised to produce a general measure of intelligence for arbitrary machines. We believe that this equation formally captures the concept of machine intelligence in the broadest reasonable sense. We then show how this formal definition is related to the theory of universal optimal learning agents. Finally, we survey the many other tests and definitions of intelligence that have been proposed for machines.

Shane use Kolmogorov Complexity to assert effectively Occam’s Razor:

The Kolmogorov complexity of a binary string x is defined as being the length of the shortest program that computes x: K(x) := min p {l(p) : U(p) = x}, where p is a binary string which we call a program, l(p) is the length of this string in bits, and U is a prefix universal Turing machine U called the reference machine.. To gain an intuition for how this works, consider a binary string 0000 . . . 0 that consists of a trillion 0s. Although this string is very long, it clearly has a simple structure and thus we would expect it to have a low complexity. Indeed this is the case because we can write a very short program p that simply loops a trillion times outputting a 0 each time. Similarity, other strings with simple patterns have a low Kolmogorov complexity. On the other hand, if we consider a long irregular random string 111010110000010 . . . then it is much more difficult to find a short program that outputs this string

If it is very difficult to create a system to achieve a goal, then it probably has complexity that can be reduced. If there is a lot of entropy by the current system system then we need to reduce that as well.

We cannot achieve this unless we borrow from a Dynamic Programming approach: the key is hierarchical decomposition of problems in a recursive manner looking for optimal structures.

Dynamic programming

From Wikipedia, the free encyclopedia

Figure 1. Finding the shortest path in a graph using optimal substructure; a straight line indicates a single edge; a wavy line indicates a shortest path between the two vertices it connects (among other paths, not shown, sharing the same two vertices); the bold line is the overall shortest path from start to goal.

Dynamic programming is both a mathematical optimization method and a computer programming method. The method was developed by Richard Bellman in the 1950s and has found applications in numerous fields, from aerospace engineering to economics.

In both contexts it refers to simplifying a complicated problem by breaking it down into simpler sub-problems in a recursive manner. While some decision problems cannot be taken apart this way, decisions that span several points in time do often break apart recursively. Likewise, in computer science, if a problem can be solved optimally by breaking it into sub-problems and then recursively finding the optimal solutions to the sub-problems, then it is said to have optimal substructure.

So we want to break apart complexity, entropy, and the concept representation granularity, so that we can design concept representation in a way the minimizes the complexity and entropy of those concepts, and then recursively going up our fractal system we need those concepts to be able to be computationally and dynamically assembled in a way that minimizes entropy and complexity all the way up from data to code to operational business processes. The more symmetrical conceptual building blocks can be across the organization, the more ‘compressed’ the concepts will be across the conceptual fractal and across the conceptual field.

The way we design data is to simple say “Button 1”, “Button 2”, and “Button 3” are in the “Action Space” of an experience, and when a user clicks on a button, that button (represented in a column) has a true value (or a 1).

For user experience states of the system, those are conceptual representations, such as the user is at workflow point X, Y, or Z. Any concept can be represented, such as reward level, data sets used, images, text, etc. One can A/B/n test any test such as “Do this please” to “DO THIS NOW!” to see how the user experience state influences the behavior of employees. This allows for a system to learn what user experiences best influence the employee given the state of the system, and state of the environment (situation).

This modeling is completely generalizable. We can say the Experience State has two bits of ‘mass’, and row 3 to row 4 has 3 bits of ‘energy’ (changes in bits).

What is different here from regular interaction trackers I’ve seen in the market is that we want to capture what action a user took given a set of all possible actions because the actions NOT taken in an experience state and system state might be more valuable information than the action taken when compiled and analyzed in a time series.

The columns require schemas, taxonomies, and ontologies to do well.

Because this system is generalizable we can use it both for a human to learn and for a machine to learn. We can have the system make recommendations (which could be to use an image, to an employee emailing a client at a certain time), and based on the employee’s action, gain feedback as to whether that moved us toward a higher reward state or not. In effect, it tries to turn an organization’s processes similar to a chess game, where ML can learn optimal recommendations at scale, and humans can be taught through knowledge recommendations and learning experiences. This type of system is a continuous learning and feedback loop system.

This also allows for interesting psychological concepts and tactics to be represented. For example, if there happens to be fear of risk property that can be tagged toward a leader in a certain situation, then observing how they respond allows for psychometric recommendation engines, which has implications in sales and marketing.

This system should be able to observe employees who succeed and fail at tasks and through pattern recognition flag leaders earlier than when the problem bubbles up, creating great potential business value. If we can map out steps in a process like a map, then using Dijkstra’s algorithm is something we could apply to finding shortest paths.

Dijkstra’s algorithm to find the shortest path between a and b. It picks the unvisited vertex with the lowest distance, calculates the distance through it to each unvisited neighbor, and updates the neighbor’s distance if smaller. Mark visited (set to red) when done with neighbors.

In fact, we should be able to ‘nudge’ the state of an entire system. This is a fractal approach, being able to nudge employee (a micro-state of a system), and an intelligence to nudge the entire macro-state’s micro-states towards more optimal configurations. Where Dijkstra meets Turing State Machines gets interesting.

If we can create a ‘map’ of least optimal state to most optimal state, and the state values in between, then we can plot a course from any one point to another. This can be done for both macro and micro states (scale-free / fractal). If columns of bits can be re-used for multiple state representations at various scales, it is more efficient, it compresses the necessary amount of bits in a state space to create these maps.

The question is then, in bits, what are the amount and correct information at any one point in time to move various micro-states. This can be represented through Turing State Machines, where the tape ticker is the stream of bits which is the algorithm to move from one state to another.

Turing machine

From Wikipedia, the free encyclopedia

The head is always over a particular square of the tape; only a finite stretch of squares is shown. The instruction to be performed (q4) is shown over the scanned square. (Drawing after Kleene (1952) p. 375.)

Here, the internal state (q1) is shown inside the head, and the illustration describes the tape as being infinite and pre-filled with “0”, the symbol serving as blank. The system’s full state (its “complete configuration”) consists of the internal state, any non-blank symbols on the tape (in this illustration “11B”), and the position of the head relative to those symbols including blanks, i.e. “011B”. (Drawing after Minsky (1967) p. 121.)

A Turing machine is a mathematical model of computation that defines an abstract machine[1] that manipulates symbols on a strip of tape according to a table of rules.[2] Despite the model’s simplicity, given any computer algorithm, a Turing machine capable of implementing that algorithm’s logic can be constructed.[3]

The machine operates on an infinite[4] memory tape divided into discrete “cells”.[5] The machine positions its “head” over a cell and “reads” or “scans”[6] the symbol there. Then, based on the symbol and the machine’s own present state in a “finite table”[7] of user-specified instructions, the machine first writes a symbol (e.g., a digit or a letter from a finite alphabet) in the cell (some models allow symbol erasure or no writing),[8] then either moves the tape one cell left or right (some models allow no motion, some models move the head),[9] then, based on the observed symbol and the machine’s own state in the table, either proceeds to another instruction or halts computation.[10]

The goal of ‘meta-intelligence’ is to create a blueprint for how to build a ‘HyperCortex’ that mimics the NeoCortex, but not limited to physical constraints of a skull, and distributable across an organization and is in fact scale free for organization to organization learning.

The NeoCortex has several design patterns we can borrow from, specifically Jeff Hawkins proposed model:

A Theory of How Columns in the Neocortex Enable Learning the Structure of the World

Neocortical regions are organized into columns and layers. Connections between layers run mostly perpendicular to the surface suggesting a columnar functional organization. Some layers have long-range excitatory lateral connections suggesting interactions between columns. Similar patterns of connectivity exist in all regions but their exact role remain a mystery. In this paper, we propose a network model composed of columns and layers that performs robust object learning and recognition. Each column integrates its changing input over time to learn complete predictive models of observed objects. Excitatory lateral connections across columns allow the network to more rapidly infer objects based on the partial knowledge of adjacent columns. Because columns integrate input over time and space, the network learns models of complex objects that extend well beyond the receptive field of individual cells. Our network model introduces a new feature to cortical columns. We propose that a representation of location relative to the object being sensed is calculated within the sub-granular layers of each column.

One way to design a ‘cortical column’ is to view the steps of this paradigm in application development. At the first layer of the stack would be the schema (contract for communication of information), which transmits information. The ability to represent this information in different structures makes this an inherently isomorphic perspective. The second layer might be the taxonomy, the third might be APIs, and the fourth might be design elements which are bound with references to the taxonomy etc.

The point is to achieve homogeneous application design that is symmetrical to operational business processes and data representations, thus a fractal-compressed system.

Because this columnar stack can be standardized, then you can apply it to literally every single user interface for any business application, much like the neocortex has homogeneous cortical columns made of layered stacks of neurons. These stacks can communicate information across each other and have predictive voting capability as well.

Shifting from signals that may be represented in JSON to Relational Database structures, and being able to convert signals into a universal format is similar to how the human body has multiple sensory input systems (touch and sight), and yet the body converts that sensory input into a universal and standardized language for the brain to use. Our system needs to do the same first before learning. This is morphological intelligence, the intelligence of the body. The paper above has little about artificial intelligence, or any specific machine learning (other than Markov Decision Processes). Rather, the focus is on how to design macro-state systems that enable learning micro-state systems (human or machine) learning. This is a paradigm about the pre-work that needs to be done to create intelligent systems more intelligently. It involves breaking down processes to a step-level granularity. I’ve seen too many ‘intelligent products’ and ‘advanced analytics’ projects fail to return ROI. A new paradigm was required to allow for intelligent design of intelligent systems that was scale-free and generalizable.

Currently, I’m working on how to create relationships between column rows and JSON structures, because I think there’s an opportunity to have an ontology experience combined with a relational data representation. Concepts can seemingly be stacked in reverse order, which is the opposite of dynamic programming. Rather than recursively breaking down problems and concepts, we can recursively ‘connect up’ concepts. The ability to automate the relationships of concepts and automate generating structured data is the fuel for ML. I’m curious as to how we can design systems that maximize the generation and curation of structuring data automatically.

Ultimately, we want to allow for every interaction with concepts to be a learning and teaching event for both human and machine systems. Rather than focusing 99% of our energy on an AI model, we might want to focus on the intelligence of our Data Ops. How do we efficiently and effectively manage the metadata associated with concepts and design software applications and user experiences so that data scientists get pristine quality data. AI can do so much more than humans ever could — we should be optimizing for them, but still have a more intelligent system even without AI. My goal for a Hyper-Graph is to be able to unite any knowledge generalizably and be able to explore them at macro and micro-states with interaction tracking universally.

I don’t know if the ideas in the meta-intelligence theory I’m proposing will be met with mostly skepticism and criticism, or curiosity and creativity. I imagine it is outside of most people’s interest and I continuously suffer from imposter syndrome. However, I’ve been lucky enough to have some really smart people validate the direction and existing principles I’ve woven together. At the minimum I’ve finally shared what I’ve been working on and my passion for human+machine learning networks. I’ve had fantastic teachers along the way and I believe this paradigm will have the potential to create massive positive impact in the world.



Ron Itelman

I like human + machine learning systems | Principal — Digital, Data & Analytics at Clarkston Consulting