Where Every Street Knows Your Name
This is in the way of a "heads up."
Two main threads are combining just now to change the way that we live in public space. Of course, as Steve Rambam says (reprising Scott McNealy), "Privacy is Dead: Get Over It." We're used to how we have less privacy driving with GPS or using a cell phone or finding out that businesses have compromised our personal information. Even The Onion Router (TOR) has lots of exit nodes captured, and RSA encryption is compromised. So good luck keeping our affairs private.
Still, we're used to having urban anonymity. People we pass on the street don't necessarily know what we're up to or who we are by name. Neither do the businesses we pass or the police. This is probably going to change, though. When it changes, it will probably be big platforms that do it.
Machine Learning and Big Datasets
Machine learning uses big data to train. Some of this training uses combinations of fairly basic algorithms, but what we have heard about since about 2012 has been mainly by "neural networks." These are arranged with a series of memory registers, provocatively called "neurons." There is an input array of neurons and an output array. In the classic case of simple numerical character recognition, there is a 28x28 input layer of pixels with different shades of gray. Then there is an output layer corresponding to the ten digits. In-between there may be 1, 2, -- even 5,000 layers of "neurons" -- which, again, are just memory registers. Typically, all of the input neurons connect to all the neurons in the next layer, and each of those connects to the next after that, and all the way to the output layer. The "learning" happens when the different neurons send numbers weighted with a multiplier between 0 and 1 to activate the neurons in the next layer (and so on). That is, pixel number 1 might send a value of .25 to the first neuron in the next layer, and .3 to the next, and so on. Each layer also has a "bias" which is another number. So it's all a colossal matrix operation. There's an excellent Youtube explanation, summarized below.
So a 28x28 scan of a handwritten "6" propagates all the way through, say, 4 layers, and maybe it activates the "5" output neuron the most. This is an error. Since the handwritten "6" is labeled with a "6" a computer can read, the machine compares the output "5" to the known "6," and generates a graph. It's called an error surface because it can be visualized as a multi-dimensional graph. So the error is sent backward through the neural network, and the network tries again. The guesses go forward through the network, the errors adjust their way back, and the guesses go forward, and the errors get back-propagated back. Needless to say, this takes a LOT of tries and a HUGE amount of labeled data. The MNIST dataset, in this case, is pretty big. It's provided by the National Institute of Standards and Technology. Kids require far fewer example letters and numbers. Computers require huge datasets -- much bigger than a child would. If you are going to recognize human emotions or car models or tell cats from dogs, you need colossal amounts of labeled data. That means you tend to need big platforms that can collect such huge datasets.
The lesson, then, is that whatever you want a computer to learn has to come with a LOT of examples from a big source of data: think Google or Facebook, but probably not the corner store.
Learning Behavior Patterns
Computers can, of course, recognize certain images. They can also recognize behaviors. When a computer plays Chess or Tetris or Go or whatever, it's recognizing behaviors and responding accordingly. So when the Amazon Go store in Seattle sees that a customer has taken a salad, it recognizes the behavior of taking an item. Let's think about this for just a moment. A computer somewhere figures out your intention and attributes it to you with enough accuracy not to generate a customer complaint or fail to record a sale. What else can computers recognize?
- Faces and bodies. We all know about this from police procedurals on TV, but the UK is using facial recognition in real time to find soccer hooligans. Right now there are lots of false matches, but the more you know about a person the better you can recognize them. If you can get people to walk down a hallway, you can add stature, gait, and mannerisms to help remove false positives.
- Behaviors recognized. Some cities in China are using cameras to shame jaywalkers. Some systems entering the paid market can pan to find vehicles, detect different kinds of vehicles, find people walking where they should or shouldn't, detect loiterers, and figure out which direction people are facing -- or if they are looking at something in particular. Transit agencies can count riders. By using large amounts of data, the transit agencies or police could detect anomalous riders.
- Behaviors correlated. This is an extension of traditional policing. Police, for example, may be able to justify a search warrant when several white people arrive at an apartment building with only Black tenants and spend less than 10 minutes inside, depending on how the Supreme Court rules.* Some techniques for discovering peoples' location don't rely on GPS; location can be inferred by looking at camera data or by triangulating wireless signals (particularly as we move to smaller 5G cells.
Prospects
The scary. Data about us can be correlated. Obviously, most of the data that we generate doesn't have to do with our physical position in space. It's often collected as we browse without our knowledge. Moreover, certain groups like the ACLU and AI Now** are concerned with correlation and tracking that affects civil liberties. We New Urbanists may ask, "How does being tracked affect our behavior?" If you are exploring a new city, will the police question your reason for exploring back streets? If it is technically possible to identify people what people do in a privately owned public open space, might we be billed for sitting in them? Researchers writing in Neurocomputing have developed ways to identify behavior such as chatting in a group. We might become more circumspect about who we talk to in public -- is he a criminal? is she a terrorist?
The Exciting. While most of this tracking and behavior recognition can be misused, some of it could be very useful. Maybe we can feel safer at night if a system can identify a mugging and sound the alarm. We might be more comfortable walking in a parking garage if an attendant can chat with us along the way. If we do things right, we might be able to recover a little bit of a small-town feel. We might know who's home, who we will or can walk into, and whether something interesting or fun is happening. We might get warned away from altercations like Waze warns us about accidents.
Ultimately, the combination of interactions in physical space and online may get very weird. We might get too much of a good thing if companies mine real-world interaction for addictive dopamine hits as they do online ones. While the prospect of an addictive public realm might seem scary and remote, we should also keep an eye out for a strange future for public space.
* Personal experience working with the police when I worked at a nonprofit.
** Note, the article's link to AI Now is incorrect. See AI Now