The best saying that applies to all AI and data analysis programs is “Garbage in, garbage out”! We all agree that we have terabytes and terabytes of data which is going towards petabytes of data. Yet, the amount of useful knowledge that we can derive and use from it is a pittance compared to the amount of data present. We need to start asking ourselves what is wrong that we cannot find any useful information or knowledge easily from so much data. In contrast, “we”, or in fact, any living being with the least amount of data, can derive huge amounts of knowledge from it.
For example when we see anything using our eyes, the amount of information we are able to pick up from it is much much more than what we can pick up from the picture or video of the same thing. I find that when we observe a scene around us, when we are actually standing at that place, there are many things we are picking up not just the light reflected off the scene using the eyes or sound using the ears. We seem to pick up the light distribution, the sound distribution, the heat distribution, the motion distribution of the scene etc. Typically when we observe of the same scene for over a period of say 10 to 15 minutes we seem to accumulate these parameters as change in data over the period and analyse it. This is then store it as experience. This cannot be recreated by any of the gadgets that we have created, even though the gadgets seem to capture the same light or sound that we capture.
Why is this so? Does this not imply that we have reduced down the available data from those many parameters to a single parameter of light reflection or sound reflection. Thus, we have lost data in the process. Important to note here is, that we have taken that reduced down data which was inherently understandable, and converted it into a series of 1’s and 0’s that makes no sense inherently to us nor to the computer. Even more important to understand is that we have taken that reduced data contained in say the smallest of photon and exploded it into combinations of 1’s and 0’s. To be able to explode it we had to sample the single parameter. So, we lost the continuity that was present in the data. So, a single photon that contained, I do not know how much continuous information, we reduced it down to one sampled information parameter and exploded that one sample parameter and represented it as a series of 1’s and 0’s. To add to it, we have taken co-occurring related data, such as light and sound reflections that are naturally present simultaneously, split them and then exploded them into some garbled representation. So, in the process of all this representing data, we have not only lost continuity, dimensionality and information, we have lost the relation also between various pieces of information. Then after losing all this first, we try to derive this same continuity, dimensionality, information and relation using algorithms!
This, we do not because we get a better representation of data or because it is easy to write algorithms. It is because that is how we have defined data representation in computers. What is very strange is, after all this, we find that the computer still cannot understand the data inherently, and we are not able to retrieve the knowledge back from this data without writing some complex algorithm. No wonder we have terabytes of data with significantly less information in it! All I feel like saying is, “Wah re computers, kya teri lila”!
Isn’t it time we started asking ourselves if we are really achieving anything good by converting the data into this sampled digital format? Haven’t we unnecessarily restricted ourselves first, and then we are trying to write some complex algorithm to overcome these restrictions, we have put for ourselves?
We need to start asking ourselves: “Is our data representation sufficient for what we are trying to achieve, i.e. AI”? Can we get a better representation of data? To do this, we need to understand the data representation requirements for AI. As I have said in my previous blog on logic, “Data IS Logic”. Logic should change with data rather than the other way. First, to understand what this really means?
Looking at the simple bubble sort I was talking about previously, the following data change (thanx geekforgeeks):
( 5 1 4 2 8 ) –> ( 1 5 4 2 8 ), Here, algorithm compares the first two elements, and swaps since 5 > 1.
( 1 5 4 2 8 ) –> ( 1 4 5 2 8 ), Swap since 5 > 4
( 1 4 5 2 8 ) –> ( 1 4 2 5 8 ), Swap since 5 > 2
( 1 4 2 5 8 ) –> ( 1 4 2 5 8 ), Now, since these elements are already in order (8 > 5), algorithm does not swap them.
( 1 4 2 5 8 ) –> ( 1 4 2 5 8 )
( 1 4 2 5 8 ) –> ( 1 2 4 5 8 ), Swap since 4 > 2
( 1 2 4 5 8 ) –> ( 1 2 4 5 8 )
( 1 2 4 5 8 ) –> ( 1 2 4 5 8 )
There is a third pass to this, of no changes, which I have ignored for now. Now, let’s reverse the way we look at this. We looked at the algorithm and, from there, traced the data changes. What if the data stream that we had got was: (5 1 4 2 8)(1 5 4 2 8)(1 4 5 2 8)(1 4 2 5 8)(1 4 2 5 8)(1 4 2 5 8)(1 2 4 5 8)(1 2 4 5 8)? This data is representative of sorting logic, isn’t it? It has embedded in it the logic. So, why change it to a different representative language such as pseudo-code or java or C/C++ or python, etc., rather than retain it as data? Isn’t it because we are trying to change the logic to a descriptive re-usable language that we are losing information?
Why retain it as logic in data? Let’s consider this. Here, we have taken discrete data represented as decimal numbers. What happens when we want a continuous span that needs sorting. As I have said before, the data available in nature is continuous. We have converted it to discrete and, in the process, lost information. Say we are able to represent continuous data, then, how does our algorithm work? Any algorithm that we write works with an underlying assumption that we work with discrete data set. Algorithms cannot be applied to continuous data. Continuous data can only be moulded or melded. What do I mean?
Let’s take an area of heat distribution. In this, if we introduced a small heat source that increases the heat at a single point, what happens? The heat distribution continuously modifies itself till the distribution has sorted itself out into a regularly increasing distribution of heat, just as the step by step modification we saw of the numbers previously. Isn’t this equivalent to the sort algorithm that we wrote? Instead of an algorithm, the data that is contained at each point in the heat distribution moulded itself till it attained an equilibrium. If, we had continuously followed through from the in-equilibrium to equilibrium, we would have got the logic used. Thus data has become the logic, contains knowledge and is capable of inherently understanding and modifying itself to execute a logic. It should also be noted that when two heat sources containing different data, it was able to meld with each other to form a totally valid new data without having any algorithm executed. I think, such is the type of data representation we need to start looking for, if we want to write a true AI system.
Opinions expressed by Java Code Geeks contributors are their own.