Chapter 10: Data at Scale
| Web Resources
| In-Depth Activity Comments
| Teaching Materials
The main goals of this chapter are to accomplish the following:
- Provide an overview of some of the potential impacts of data at scale on society.
- Introduce key methods for collecting data at scale.
- Discuss how data at scale becomes meaningful.
- Review key methods for visualizing and exploring data at scale.
- Introduce design principles for making data at scale ethical.
How do you start your day? How much data do you encounter when first looking at your
smartphone, switching on your laptop, or turning on another device? How much do you
knowingly create and how much do you create unknowingly? Upon waking up, many people
routinely will ask their personal assistant, something like, “Alexa, what is the weather today?”
or “Alexa, what is the news?” or “Alexa, is the S-Bahn train to Schönefeld Airport running on
time?” Or, they will ask Siri, “What is my first meeting?” or “Where is the meeting?”
Having oriented themselves for the day, people will walk a few blocks to the subway
entrance, dip their Metro Card in the turnstile to pay the fare, exit the station at their stop,
grab their favorite morning beverage at a nearby cafe, and proceed to their office where they
check in with the employee card at a security gate and take the elevator to their floor.
These are just a few of the things that many of us do to start our workdays. Each activity
involves creating, searching, and storing data in some way or another. We may know that this
is happening, we may suspect that it is happening, or we may be totally unaware of the data
that we are generating and with which we are interacting.
There is also increasing concern about exactly what data is collected about us through
personal assistants such as Amazon Echo, Google Home, Cortana, and Siri. We also know
that many large cities, such as New York and London, have an enormous number of surveillance
cameras (CCTV) spread around, especially in busy places such as subway stations
and shopping malls. The video footage from these sources is kept for two weeks or more.
Similarly, we experience being checked into an office, so we know that our movements are
being tracked by security personnel. Our activities are also being tracked more surreptitiously
through the technology that we use such as smartphones and credit cards.
What happens to all the data collected about us? How does it improve the services provided
by society? Does it make traveling more efficient? Does it reduce traffic congestion?
Does it make the streets safer? Moreover, how much of the data collected from our smartcards,
smartphone Wi-Fi signals, and CCTV footage can be tracked back to us and pieced
together to reveal a bigger picture of who we are and where we go? What might that data
reveal about us?
Data at scale, or as it is often called big data, describes all kinds of data including databases
of numbers, images of people, things and places, footage of conversations recorded,
videos, texts, and environmentally sensed data (such as air quality). It is also being collected
at an exponential rate; for example, 400 new YouTube videos are uploaded every minute,
while millions of messages circulate through social media. Furthermore, sensors collect
billions of bytes of scientific data.
Data at scale has huge potential for grounding and elucidating problems, and it can be
collected, used, and communicated in a wide variety of ways. For example, it is increasingly
being used for improving a whole range of applications in healthcare, science, education,
city planning, finance, world economics, and other areas. It can also provide new insights
into human behavior by analyzing data collected from people, such as their facial expressions,
movements, gait, and tone of voice. These insights can be enhanced further by using
machine learning and machine vision algorithms to make inferences. This includes people’s
emotions, their intent, and well-being, which can then be used to inform technology interventions
aimed at changing or improving people’s health and well-being. However, beyond
societal benefits, data can also be used in potentially harmful ways.
As mentioned in Chapter 8, “Data Gathering,” and Chapter 9, “Data Analysis,” data can
be either qualitative or quantitative. Some of the methods and tools used to collect, analyze,
and communicate data can be carried out manually or using quite simple tools. What makes
this chapter on data at scale different is that it considers how huge volumes of data can be
analyzed, visualized, and used to inform new interventions. While having access to large volumes
of data enables analysts, designers, and researchers to address large, important issues
such as climate change and world economic issues, assuming that there are tools to do this,
they also raise a number of user concerns. These include whether someone’s privacy is being
violated by the data being collected about them and whether the data corpora being used
to make decisions about people, such as the provision of insurance and loans, are fair and
Furthermore, the combination of vast amounts of data from many sources and the availability
of increasingly powerful data analytic tools to analyze that data is now making it
possible to discover new information that is not available from any single data source. This
is enabling new kinds of research to be conducted for understanding human behavior and