Chapter 10: Data at Scale

Chapter Introduction | Web Resources | In-Depth Activity Comments | Teaching Materials | Quickvote


The main goals of this chapter are to accomplish the following:

  • Provide an overview of some of the potential impacts of data at scale on society.
  • Introduce key methods for collecting data at scale.
  • Discuss how data at scale becomes meaningful.
  • Review key methods for visualizing and exploring data at scale.
  • Introduce design principles for making data at scale ethical.


How do you start your day? How much data do you encounter when first looking at your smartphone, switching on your laptop, or turning on another device? How much do you knowingly create and how much do you create unknowingly? Upon waking up, many people routinely will ask their personal assistant, something like, “Alexa, what is the weather today?” or “Alexa, what is the news?” or “Alexa, is the S-Bahn train to Schönefeld Airport running on time?” Or, they will ask Siri, “What is my first meeting?” or “Where is the meeting?”

Having oriented themselves for the day, people will walk a few blocks to the subway entrance, dip their Metro Card in the turnstile to pay the fare, exit the station at their stop, grab their favorite morning beverage at a nearby cafe, and proceed to their office where they check in with the employee card at a security gate and take the elevator to their floor.

These are just a few of the things that many of us do to start our workdays. Each activity involves creating, searching, and storing data in some way or another. We may know that this is happening, we may suspect that it is happening, or we may be totally unaware of the data that we are generating and with which we are interacting.

There is also increasing concern about exactly what data is collected about us through personal assistants such as Amazon Echo, Google Home, Cortana, and Siri. We also know that many large cities, such as New York and London, have an enormous number of surveillance cameras (CCTV) spread around, especially in busy places such as subway stations and shopping malls. The video footage from these sources is kept for two weeks or more. Similarly, we experience being checked into an office, so we know that our movements are being tracked by security personnel. Our activities are also being tracked more surreptitiously through the technology that we use such as smartphones and credit cards.

What happens to all the data collected about us? How does it improve the services provided by society? Does it make traveling more efficient? Does it reduce traffic congestion? Does it make the streets safer? Moreover, how much of the data collected from our smartcards, smartphone Wi-Fi signals, and CCTV footage can be tracked back to us and pieced together to reveal a bigger picture of who we are and where we go? What might that data reveal about us?

Data at scale, or as it is often called big data, describes all kinds of data including databases of numbers, images of people, things and places, footage of conversations recorded, videos, texts, and environmentally sensed data (such as air quality). It is also being collected at an exponential rate; for example, 400 new YouTube videos are uploaded every minute, while millions of messages circulate through social media. Furthermore, sensors collect billions of bytes of scientific data.

Data at scale has huge potential for grounding and elucidating problems, and it can be collected, used, and communicated in a wide variety of ways. For example, it is increasingly being used for improving a whole range of applications in healthcare, science, education, city planning, finance, world economics, and other areas. It can also provide new insights into human behavior by analyzing data collected from people, such as their facial expressions, movements, gait, and tone of voice. These insights can be enhanced further by using machine learning and machine vision algorithms to make inferences. This includes people’s emotions, their intent, and well-being, which can then be used to inform technology interventions aimed at changing or improving people’s health and well-being. However, beyond societal benefits, data can also be used in potentially harmful ways.

As mentioned in Chapter 8, “Data Gathering,” and Chapter 9, “Data Analysis,” data can be either qualitative or quantitative. Some of the methods and tools used to collect, analyze, and communicate data can be carried out manually or using quite simple tools. What makes this chapter on data at scale different is that it considers how huge volumes of data can be analyzed, visualized, and used to inform new interventions. While having access to large volumes of data enables analysts, designers, and researchers to address large, important issues such as climate change and world economic issues, assuming that there are tools to do this, they also raise a number of user concerns. These include whether someone’s privacy is being violated by the data being collected about them and whether the data corpora being used to make decisions about people, such as the provision of insurance and loans, are fair and transparent.

Furthermore, the combination of vast amounts of data from many sources and the availability of increasingly powerful data analytic tools to analyze that data is now making it possible to discover new information that is not available from any single data source. This is enabling new kinds of research to be conducted for understanding human behavior and environmental problems.