Data representation for AI

Raji SankarJanuary 31st, 2022Last Updated: January 24th, 2022

0 109 5 minutes read

I had written in the previous few blogs why a digital representation of data or a numerical representation of data does not fit the bill for AI. The problem with any of these is that they reduce the continuous, multi-dimensional data that naturally have relations, down to a multiple, disparate, unrelated discrete scalar values which has lost a lot of knowledge that is present in them. Thus, to compensate for this lost knowledge that we could have collected directly, we use mathematical equations. Hence we go woefully wrong when doing so producing a fake output. As I have also said, mathematical equations only represent a far less subset of logic that is present in the natural data. So, along with introducing a lot of overhead in the form of logic to interpret the data, to be learnt even to make the data usable, forcing us to have an offline learning phase, this also reduces the amount of information that can be inferred from the data, since what is coded is just a small subset of the original knowledge.

As I have also said in my previous blog, electronic systems contribute to making data scalar, because we try to convert these sensed parameters into a single electron flow that can be used to create a mathematical representation of the data. It is possible that electronic systems are more oriented towards implementing mathematical functions that they have made data scalar rather than it being a limitation of the electronic system itself. It is not that the principles that add these limitations. It is the way we have used the principles or rather the interfacing to the principles to convert that data into a different representation that is the limiting factor. I think, we have to ask ourselves how we can use these principles differently to be able to capture the data accurately without loss of information. In my view, we seem to be only capturing a very small percentage of the naturally available information in the data.

If we look at the different ways in which we have applied the various concepts, we find that we have used the principle best where we have converted the energy from one form to another and have used the converted energy directly to achieve an outcome. So, we have converted heat energy to mechanical energy and coupled it directly to get motion, this retains all the natural parameters present and continuity and multi-dimensionality. We can convert electrical energy to mechanical energy and same is the case. It is only when we try to interface into that converted forms of energy to collect some data to do mathematical functions do we lose information and force it into a scalar value. Again as I have repeated over and over again, it is because we have interfaced these into a compute system that the problems have become exacerbated.

So, the next question we have to ask ourselves is “Why do we want to convert into scalar data at all?” Isn’t it time to think what does “data”, “input”, “output” means? What does “algorithm” and “programming” really mean? Isn’t it our rigid definition of “output” being a “human readable or sense-able” action such as a monitor displaying language or a speaker emitting human understandable language, music, song etc that makes us think that this is the only way to do things? There are many ways in which the same “output of displaying language and speaking language” can be achieved. If we look into the working of our own body, we find that we seem to have used energy conversion very efficiently. Aren’t protein creation just a form of energy conversion from one form to another? Given that many different types of proteins are formed from various sequences of DNA through the process of transcription and translation, isn’t this similar to the data conversion process that we have, except a much better process that retains a whole lot of information in the process.

I think we can learn a lot from the process of protein creation. If we analyse the process of creation of proteins, we find that they can occur in any cell based on what is required and the conditions in that area and the requirements. Many different proteins are formed and have many functions. I know, the details of protein creation is a very complex topic. But, protein creation is not the point here. The point is that we can see protein creation as the “absorption of information” from the surrounding data that is in the form of energy and being used to perform actions appropriately. When viewed from this point we find that we can use many concepts that are present in protein creation to solve our problems. The way the creation of protein occurs is such that information from a volume has been translated easily. It should also be seen that given its translation varies by a variety of factors, the multi-dimensional, continuous information has been absorbed and translated easily into discrete proteins without much loss of data.

The theory here is that by making a dynamic 3D amino acid structure to be formed based on the parameters in the surrounding environment has allowed the system to be such that continuity, multi-dimensionality and relation of information is encoded in a structure instead of a single scalar value such as an electron flow that we have. Allowing proteins to be formed in any cell has ensured that data present at any point in a volume can be read and translated and used as opposed to the point based sensors that we create using electronics. Proteins can be seen as the output mechanism that is formed just when required as opposed to pre-translated and stored information being processed. The various information in the form of energy is present as is in the original format.

DNA can be seen as the coding that controls the formation of the protein. As can be seen, DNA is not the complete algorithm as we try to create. It is a helper in triggering and encoding that type of protein that needs to be created. We find that each by itself just forms a part, but comes together to work as a whole that achieves an overall goal without much loss of information. I find that we need to study and implement these principles in a system that can be intelligent.

In my view, we should leave the data in the form they are currently and find ways of tapping into them and creating the appropriate output as and when required in the form of 3D molecular pieces that can achieve an outcome rather than translating them rigidly to a standard format, trying to relate them and using them in a rigid format. This allows us to start looking at data-realised algorithms as opposed to a logic-based algorithm that we create.

Published on Java Code Geeks with permission by Raji Sankar, partner at our JCG program. See the original article here: Data representation for AI

Opinions expressed by Java Code Geeks contributors are their own.