Chapter

10

:

Data at Scale

Chapter Introduction

Web Resources

In-Depth Activity Comments

Teaching Materials

Quickvote

Chapter Introduction

Objectives

The main goals of this chapter are to accomplish the following:

Provide an overview of some of the potential impacts of data at scale on both individuals and society.
Introduce key methods for collecting data at scale.
Discuss how data at scale is used in interaction design.
Review key methods for visualizing and exploring data at scale.
Introduce privacy and other ethical design concerns with data at scale and AI.

Introduction

What digital technologies do you use when you travel into the city for a day out with friends? Do you plan ahead using a variety of resources? Make a reservation for a restaurant for lunch? Think about buying tickets in advance? Do you create a WhatsApp group to do the planning, for example, to decide where to meet up with your friends and at what time? Do you check when the new museum you all want to go to is open, and read reviews about the exhibition that is currently on there and how much it costs? Having done the initial planning, do you then purchase a train ticket on your mobile phone train app and then check to see if the train you are planning to catch is running on time? Do you think about what to wear and whether you need to take an umbrella? Do you maybe ask your personal assistant, like Alexa, “What is the weather today?”

Having made their plans, most people will then walk, cycle, or take a ride-share requested via an app to their train station, present their phone with the QR code on their digital train ticket to the reader at the turnstile, and take a seat on the train. Most trains provide Wi-Fi, so people will often check their social media and newsfeeds or play a game on their phone. They may also keep in touch with the friends they are meeting to see where they are, maybe tracking their locations using Google Maps. On reaching their destination, they exit the station by tapping their phone at the turnstile. They may need to use a smartphone map app to navigate to the museum and may also take selfies on the journey.

These are just a few of the things that many people do when visiting a city with friends. Several of the activities will involve creating, searching, and storing data in some way or another. People may know that this is happening, may suspect that it is happening, or may be totally unaware of the data that they are generating and how it is being used, as well as the data with which they are interacting. There is also increasing concern about exactly what data is collected about people through interacting with their personal assistants, such as Amazon Echo, Google Home, Cortana, and Siri, and from their social media conversations. Cities, such as New York and London, have an extensive network of surveillance cameras (CCTV) spread around, especially in busy places such as subway stations and shopping malls. The video footage from these sources is typically kept for two weeks or more. Similarly, when people are checked in at a station ticket barrier, their movements are tracked. Their activities are also recorded through many of the apps on our smartphones, such as fitness trackers, payment systems, and social media.

What happens to all the data collected about them? How does it improve the services provided by society? Does it make traveling more efficient? Does it make the streets safer? Moreover, how much of the data collected from smartcards, smartphone Wi-Fi signals, social media, and CCTV footage can be tracked back to them and pieced together to reveal a bigger picture of who they are and where they go? What might that data reveal about society?

Data at scale, or as it is often called Big Data, describes all kinds of data including databases of numbers, images of people, things and places, footage of conversations recorded, videos, texts, and environmentally sensed data (such as air quality). It is also being collected at a tremendous rate; for example, 500 hours of video are uploaded to YouTube every minute, while millions of messages circulate through social media. Furthermore, sensors placed in cities, homes, public transport, and parks collect enormous amounts of environmental data.

Data at scale has huge potential for grounding and elucidating problems, and it can be collected, used, and communicated in a wide variety of ways. For example, it is increasingly being used for improving a whole range of applications in healthcare, science, education, city planning, finance, world economics, and other areas. It can also provide new insights into human behavior, with the use of machine learning, by analyzing data collected from people, such as their facial expressions, movements, gait, and tone of voice. This includes inferring people’s emotions, their intent, and well-being, which can then be used to inform technology interventions aimed at changing or improving people’s health and well-being. However, beyond societal benefits, data can also be used in potentially harmful ways, such as the misuse of data collected that has detected someone’s gender, race, approximate age, where and how long they have been looking at something, and in what emotional state they are in. This type of wide-reaching information could be used mistakenly to identify someone as a criminal or to post inappropriate ads on their phone or computer. Another nefarious use of data collected from individuals’ use of social media, online services, and apps is to then target people with fake news to encourage them to vote in a particular way or scam them to divulge personal information about their finances.

While having access to large volumes of data enables analysts, designers, and researchers to address large, important issues such as climate change and world economic issues, assuming that there are tools to do this, they also raise a number of societal concerns. These include whether someone’s privacy is being violated by the data being collected about them and whether the data corpora being used to make decisions about people, such as the provision of insurance and loans, are fair and transparent.

The combination of vast amounts of data from many sources and the availability of increasingly powerful data analytic tools to analyze that data is now making it possible to discover new information that is not available from any single data source. This is enabling new kinds of research to be conducted for understanding human behavior and environmental problems.