The Big Data concept is still relatively new but the concept inside is very old: If you have more and more data, you can eliminate ambiguity and there are less requirements of hunches since data are self-explanatory.
That is a very old idea coming from I.T. people. However, reality always insisted in delaying the moment when that can be accomplished. There are two problems to get there:
- As data grow, it is more necessary a context analysis to decide which one are relevant and which others can be safely ignored.
- At the other side of the table, we could have people trying to misguide automatic decision supporting systems. Actually, the so-called SEO (Search Engine Optimization) could be properly renamed GAD (Google Algorithm Deception) to explain more clearly what it is intended to do.
Perhaps, by now, Big Data could be less prone to the second problem than anyone performing Web Analytics. Web has become the battlefield for a quiet fight:
By one side, the ones trying to get better positions for them and the positive news about them. These are also the ones who try to hide negative news throwing positive ones and repeating them to be sure that the bad ones remain hidden in search results.
By the other side, we have the Masters of Measurement. They try to get magic algorithms able to avoid tricks from the first ones, unless they decide paying for their services.
Big Data has an advantage over Web data: If a company can have its own data sources, they can be more reliable, more expensive to deceive and any attempt could be quite easily visible. Even though, this is not new: During the II World War, knowing how successful a bombing had been was not a matter of reading German newspapers or listening to German radio stations.
The practice known as content analysis used indirect indicators like funerals or burials information that could be more informative if and only if the enemy did not know that these data were used to get information. In this same context, before D-Day, some heavily defended places with rubber-made tanks tried to fool reconnaissance planes about the place where the invasion was to start. That practice has remained for a long time. Actually, it was used even in the Gulf War, adding to the rubber tanks heat sources aimed to deceive infrared detectors, who should get a similar picture to the one coming from running engines.
Deceiving Big Data will be harder than deceiving Internet data but, once known who is using specific data and what is doing with them, there will be always a way to do this. An easy example: Inflation indicators: A Government can decide changing the weight in the different variables or changing prices of Government-controlled prices to get a favorable picture. In the same way, if Big Data is used to give information to external parties, we should not need someone from outside trying to deceive the system. That should be done from inside.
Anyway, the big problem is about the first point: Data without a context are worthless…and the context could be moving faster than any algorithm designed to give meaning to the data. Many surprising outcomes have happened in places where all the information was available. However, that information has been correctly read only after a major disaster. For instance, emergence of new political parties could be seen but, if major players decided to dismiss them, it comes as a surprise for them, even though data were available. The problem was in the decision about what deserves to be analyzed and how to do it, not in the data themselves.
Other times, the problem comes from fast changes in the context that are not included in the variables to analyze. In the case of Spain, we can speak about the changes that 11M, and how it was managed by the different players, supposed in the election three days after. In another election, everybody had a clear idea about who was going to get a position that required an alliance. Good reasons advised an agreement and data showed that everybody was sure that the agreement was coming…but it wasn’t. One of the players was so sure that things were already done that tried to impose conditions that the other players saw as unacceptable. Consequence: The desired position was to the hands of a third player. Very recently, twopeople, both considered as growing stars, can have spoiled their options in minor incidents.
In short, we can have a huge amount of data but we cannot analyze all of them but the ones considered as relevant. At doing that, there is not an algorithm or an amount of data that can be a good replacement for an analysis performed by human experts. An algorithm or automatic system can be fooled, even by another automatic system designed to do that, context analysis can lose important variables that have been misjudged and sudden changes in the context cannot be anticipated by any automatic system.
Big Data can be helpful if rationally used. Otherwise, it will become another fad or worse: It could become a standard and nobody would dare deciding against a machine with a lot of data and an algorithm, even when they are wrong.