I was asked a few questions about “Big Data” recently. Here’s what I said.

What is it, exactly?

Big Data isn’t a thing. Data is data, regardless of volume. If you ask for a Big Pint at the bar you still receive a pint. Big Data is just another in a long line of marketing and technology buzzwords. It was 1941 when “information explosion” was the term being used. See Forbes’ “Short history of Big Data”.

The reason the term has fallen into popular use is two-fold: humanity generates a lot more data these days than it ever has before; giving something a snappy name makes it easier to sell (I’m looking at IBM here).

There’s a lot more data around in part because of network bandwidth and consumer devices. If you have a fibre connection to your home, it’s suddenly an appealing option to download that movie in 2 minutes rather than 2 days. Video is a “Big Fat Lump” of data (technical term). Streaming audio through Spotify etc – data intensive. Uploading photos to Facebook from your smartphone – data intensive.

The other contributor to “Big” in terms of data is less about volume/filesize and more about variety and frequency. There are many more behaviours that generate data these days so it’s possible to record and measure a lot more. More = Big, but it’s just Data. In the 21st Century, the mere act of going about your daily business casts off a halo of data. I wrote something for Marketing Magazine about it here.

Behaviours that cast off data include: clicking a link, visiting a webpage, streaming an audio track, downloading a file, retweeting a tweet, liking something on Facebook, unsubscribing from an email, adding “journalism” to your linkedin profile, tracking your calorie intake via an app, weighing in on wifi scales, logging your run via Nike+, posting a comment on a blog, using gps-enabled maps, swiping your Oyster card, etc etc etc.

Think of a cat running through the rain – each raindrop that hits the cat is a data-point that can be recorded. It’s then possible to take the recorded information and recreate the path the cat took through the rain and study it, predict what it might do next.

What can be achieved with it in social media marketing?

It is a misconception to think of “social media” as being separate from any of the above. If a human being is taking action in a publicly observable network context, it is social by default. The internet was “social” at the moment it was created because it created connections between people. IRC was a social medium. Cave paintings were a social medium.

Does predictive analytics play a part in it?

I talk about Predictive Service here at Tribal. You can’t predict or anticipate anything unless you have data to inform your model. So, yes, having lots of data allows you to make predictions about behaviour or anticipate what might happen next. Think about the cat metaphor above. It’s possible to extrapolate where the cat might step next based on its historical trajectory. Google has been doing this for a long time, predicting where the next outbreak of Flu is likely to be by analysing regional search data.

What kinds of information can be gathered from social media feeds, and how can it be folded into large data sets?

I’m interpreting “social media feeds” as “personally attributable Facebook data” because the answer is slightly different for each social network or service (Pinterest, Twitter, etc etc). With permission, very rich profiles describing someone’s life and preferences can be attached to the information an organisation already holds. For example, you run an email newsletter and know someone’s name and email address in order to deliver the email. If you asked your subscribers to sign in using their Facebook credentials and, importantly, they gave you permission, you could collect a range of information like age, gender, birthplace, favourite brands etc etc

Does anyone know how yet?

Yes, lots of people. It’s happening all the time. It’s been happening since the web was created. There will be a lot more of it.

What are the challenges and potential downfalls of using big data in a social media context?

I think the question might be “what are the challenges and potential downfalls of operating a data-centric predictive model that is able to offer extreme personalisation?”

Good question. There are some technical challenges but they pale into insignificance compared to the social challenge. I’m using social as in “society”. Some people appreciate Amazon making recommendations. Some people use Spotify precisely because it can help them discover music they like but had no idea existed. Both require personal data and algorithms operating on that data in order to make the recommendations. The challenge is to tread the fine line between something that is useful and friendly, and something that is downright creepy. Eric Schmidt at Google said a couple of years ago that “Google policy is to get right up to the creepy line and not cross it”. Here’s a creepy potential scenario.

Photo Credit: josep m. ganyet