tl;dr
Facebook and personal info. Big data in general. Can you have too much information? Information volume vs. information quality.
The Reality
Yesterday, the Guardian published an article about how just by using the data of what people “liked” on Facebook, that researchers could determine drug use, sexual orientation and political affiliation. Like most people I started thinking what they could tell about me. I don’t use Facebook very much (probably not nearly as much as I should for someone with a product to promote) so my initial thought was that they couldn’t tell very much, particularly since as a grumpy pre-curmudgeon, I don’t like anything. But of course there’s more information than just the “likes” they can tell what links you click, what videos you’re interested in seeing, etc. As I thought about that, I came to the conclusion that they probably could figure out an awful lot about me.
Of course this isn’t just a Facebook issue this is an issue with Big Data in general. Now I personally think that Big Data has a long way to go before living up to it’s promise, and that there’s still a lot of hype. But I don’t think anyone would argue with me if I said that we’re going to continually find ourselves with access to more and more information, about all sorts of things: current and potential employees, customers, markets, technology, etc. Now in general I am a big proponent of data driven decision making, and getting as much information as possible, but as always there is the issue of editing. Which of course is the central problem of Big Data, but I’m talking about editing at an even higher level.
Let’s imagine for the moment that the data had been edited and collected. That out of all the information out there you had a folder that contained every possible detail about a potential job applicant, and only about him. That level of “editing” would be a dream come true for most people (and is certainly still a dream). But if you really knew EVERYTHING about potential job applicants would anyone get hired? I’m sure people would and they’d start ignoring certain things just because otherwise there would be no potential pool of applicants. But would those hires be any better? Because actually you don’t know everything, you don’t know how that person is going to perform in that environment, with those people, doing that job. And knowing that is really all you care about, and while it might be interesting to know that someone got straight F’s their first semester of college, does that really bear on the core question?
It’s easy to make information volume a substitute for information quality. The problem is that generally there’s no shortcut to quality, regardless of the sphere. But we want to make decisions quickly, so rather than wait to get more of the specific data we need, we often try to pile up other, less useful data until our pile is big enough. As if it were the size of the pile rather than the stuff in the pile that controls how good our decision making is.
All of this was said, and far more eloquently (which is why I waited till the end, otherwise no one would read my stuff) by Dan McKinley in his post of January 9th, Whom the Gods Would Destroy, They First Give Real-time Analytics. Here’s the money quotes:
Idle hands stoked by a stream of numbers are the devil’s playthings.
Analysis is difficult enough already, without attempting to do it at speed.