Humans and technology: breaking down giant data sets together

@IE University

Data is one of the main drivers of economic growth in the digital era. When properly harnessed, it can represent the main source of many companies’ revenue, generating thousands of dollars of income.

People often speak about data as the gold dust of the modern era. And in many ways it is, except for one key difference: the amount of it. Gold is scarce in nature, making it even more sought-after. Yet with data, companies are finding the opposite problem—there’s just so much of it!

Data sets: Go big or go home

The more data you have, the more valuable it is. This is because high volumes of data allow for better insights into human behavior, such as reasons for purchasing items, visiting a web page, switching to a new brand, etc. With these insights, you can make better decisions that ultimately improve the bottom line of your business.

In many ways, this is nothing new. Customer feedback forms have been around since time immemorial. If you owned a restaurant before the digital era, you would open the box at the end of the week, and see what customers had said. You could sift through the papers, and see that 11 customers said the soup was too salty. With the data you received, you could make a better decision about your business and reign in the seasoning (or even fire the chef).

Now, imagine the same restaurant manager sitting down and sorting through thousands or even millions of forms which give feedback on every aspect of the dining experience. Not just what the customer actively tells you, but the tables they naturally prefer, the most ordered items, how many dishes are being returned a day… the list goes on. Now imagine it’s not just confined to the restaurant, but all your shopping, browsing, and social habits over the course of your day, every day.

machines_data

This is essentially what happens when we browse online, and on a gigantic scale. Every choice we make provides more insight into what we like, both collectively and individually. And the amount of data we’re amassing is simply mind boggling. By 2025, there is expected to be a whopping 175 zettabytes of data. That’s 175 trillion gigabytes. Put another way, if you stored 175 zettabytes onto discs, you’d have a stack that would reach the moon 23 times.

Is it all about size?

You get the picture that big data is… big. But that’s not the only issue facing the individuals who have to break down giant data sets. There are three factors that generally define big data: volume, velocity, and variety. Not only is there a lot of information, but it comes at you at lightning speed and covers a wide range of topics.

So, how do we process it?

Spreading your data out

For many data sets, simple software is enough to process the data so you can understand it. But once we get into the realm of gigantic data sets consisting of a couple terabytes or more, you can no longer process it on one machine. That’s when Hadoop Distributed File System (HDFS) or Apache Spark come into play.

machines_data

These are the most popular open-source software products that allow you to use a network of many computers to share the load on the servers. If our restaurant manager is one server, the waiters and kitchen staff he asks to stay late to go through the mountain of forms is the network. Once it’s been distributed, the software gives you a huge set of tools you can use.

Planning and company culture

Of course, effectively breaking down data sets is about much more than just software or tools. They are essential in the process, but it takes human-focused strategy working alongside the tech to really make it work.

Information Systems IE University

Firstly, the company needs to set a proper goal. Makes sense, doesn’t it? You need to know what you want to achieve in order to work toward it. This involves identifying the problems that require solutions, and applying the data to make it happen.

Secondly, we can’t forget about all of the employees spread across an organization. Many people think that data scientists are the only individuals who work with data, but in reality everyone does. There needs to be a cultural shift throughout the organization so that everyone is open and willing to use the fruits of data in their day to day.

machines_data

Focusing your data and remaining agile

So, you have your business goals and eager work staff, but there’s just so much data. Up to 90% of your data may be unstructured, which means you have to concentrate on the most relevant data islands to achieve your goals. But even after setting parameters, you will need to iterate, iterate, and iterate again.

That’s because the speed of business today means markets are rapidly evolving and growing more and more interconnected. With this change, new data opportunities constantly come to the fore. Good data analytics systems and technicians should always be ready to exploit them as and when new tech—and new opportunities—emerge.

Breaking down giant data sets to discover effective insights is an ongoing challenge for businesses. As we move forward, it will increasingly influence or even define the overall strategy of many companies. In this exciting environment, data scientists are emerging as increasingly important members of organizations who can leverage the enormous power of technology to break down data and produce truly amazing results.