Ethics

On November 19, the NTSB held a public board meeting on the 2018 Uber accident in Tempe, Arizona, involving an “automated” (actually level 3) Uber-operated Volvo SUV. One woman, Elaine Herzberg, a pedestrian, died in the accident. In the wake of the report, it is now a good time to come back to level 3 cars and the question of “safety drivers.”

Given that the purpose of the meeting was to put the blame on someone, media outlets were quick to pick up a culprit for their headlines: the “safety driver” who kept looking at her phone? The sensors who detected all kinds of stuff but never a person? Uber, who deactivated the OEM’s emergency braking? Or maybe, Uber’s “safety culture”? A whole industry’s?

The Board actually blames all of them, steering clear of singling out one event or actor. It is probably the safest and most reasonable course of action for a regulator, and it has relevant implications for how law enforcement will handle accidents involving AVs in the future. But because we are humans, we may stick more strongly with the human part of the story, that of the safety driver.

She was allegedly looking at her phone, “watching TV” as one article put it; following the latest episode of The Voice. The Board determined that she looked at the road one second before the impact. That is short, but under more normal circumstances, enough to smash the brakes. Maybe her foot was far from the pedal; maybe she just did not react because she was not in an “aware” state of mind (“automation complacency,” the report calls it). In any case, it was her job to look on the road, and she was violating Uber’s policy by using her phone while working as a safety driver.

At the time of the accident, the Tempe police released footage from the dash cam, a few seconds up to the impact, showing a poorly-lit street. The relevance of this footage was then disputed in an Ars Technica article which aims to demonstrate how actually well lit the street is, and how just the front lights of the car should have made the victim visible on time. Yet, I think it is too easy to put the blame on the safety driver. She was not doing her job, but what kind of job was it? Humans drive reasonably well, but that’s when we’re actually driving, not sitting in the driver seat with nothing else to do but to wait for something to jump out of the roadside. Even if she had been paying attention, injury was reasonably foreseeable. And even if she would have been driving in broad daylight, there remains a more fundamental problem besides safety driver distraction.

The [NTSB] also found that Uber’s autonomous vehicles were not properly programmed to react to pedestrians crossing the street outside of designated crosswalksone article writes. I find that finding somewhat more appalling than that of a safety driver being distracted. Call that human bias; still I do not expect machines to be perfect. But what this tells us is that stricter monitoring of cellphone usage of safety drivers will not cut it either, if the sensors keep failing. The sensors need to be able to handle this kind of situation. A car whose sensors cannot recognize a slowly crossing pedestrian (anywhere, even in the middle of the highway) does not have its place on a 45-mph road, period.

If there is one thing this accident has shown, it is that “safety drivers” add little to the safety of AVs. It’s a coin flip: the reactivity and skill of the driver makes up for the sensor failure; in other cases, a distracted, “complacent” driver (for any reason, phone or other) does not make up for the sensor failure. It is safe to say that the overall effect on safety is at best neutral. And even worse: it may provide a false sense of safety to the operator, as it apparently did here. This, in turn, prompts us to think about level 3 altogether.

While Uber has stated that it has “significantly improved its safety culture” since the accident, the question of the overall safety of these level 3 cars remains. And beyond everything Uber can do, one may wonder if such accidents are not bound to repeat themselves should level 3 cars see mass commercial deployments. Humans are not reliable “safety drivers.” And in a scenario that involves such drivers, it takes much less than the deadly laundry list of failures we had here to have such an accident happen. Being complacent may also mean that your foot is not close to the pedals, or that your hands are not “hovering above the steering wheel” as they should (apparently) be. That half second extra it takes to smash the brakes or grip the wheel is time enough to transform serious injury into death.

The paramount error here was to integrate a human, a person Uber should have known would be distracted or less responsive than an average driver, as a final safety for sensor failure. Not a long time ago, many industry players were concerned about early standardization. Now that some companies are out there, going fast and literally breaking people (not even things, mind you!), time has come to seriously discuss safety and testing standards, at the US federal and, why not, international level.

A University of Michigan Law School Problem Solving Initiative class on AV standardization will take place during the Winter semester of 2020, with deliverables in April. Stay tuned!

An important development in artificial intelligence space occurred last month with the Pentagon’s Defense Innovation Board releasing draft recommendations [PDF] on the ethical use of AI by the Department of Defense. The recommendations if adopted are expected to “help guide, inform, and inculcate the ethical and responsible use of AI – in both combat and non-combat environments.”

For better or for worse, a predominant debate around the development of autonomous systems today revolves around ethics. By definition, autonomous systems are predicated on self-learning and reduced human involvement. As Andrew Moore, head of Google Cloud AI and former dean of computer science at Carnegie Mellon University defines it, artificial intelligence is just “the science and engineering of making computers behave in ways that, until recently, we thought required human intelligence.”

How then do makers of these systems ensure that the human values that guide everyday interactions are replicated in decisions that machines make? The answer, the argument goes, lies in coding ethical principles that have been tested for centuries into otherwise “ethically blind” machines.

Critics of this argument posit that this recent trend of researching and codifying ethical guidelines is just one way for tech companies to avoid government regulation. Major companies like Google, Facebook and Amazon have all either adopted AI charters or established committees to define ethical principles. Whether these approaches are useful is still open to debate. One research for example found that priming software developers with ethical codes of conduct had “no observed effect” [PDF] on their decision making. Does this then mean that the whole conversation around AI and ethics is moot? Perhaps not.

In the study and development of autonomous systems, the content of ethical guidelines is only as important as the institution adopting them. The primary reason ethical principles adopted by tech companies are met with cynicism is that they are voluntary and do not in and of themselves ensure implementation in practice. On the other hand, when similar principles are adopted by institutions that consider the prescribed codes as a red lines and have the legal authority to implement them, these ethical guidelines become massively important documents.

The Pentagon’s recommendations – essentially five high level principles – must be lauded for moving the conversation in the right direction. The draft document establishes that AI systems developed and deployed by the DoD must be responsible, equitable, traceable, reliable, and governable. Of special note among these are the calls to make AI traceable and governable. Traceability in this context refers to the ability of a technician to reverse engineer the decision making process of an autonomous system and glean how it arrived at the conclusion that it did. The report calls this “auditable methodologies, data sources, and design procedure and documentation.” Governable AI similarly requires systems to be developed with the ability to “disengage or deactivate deployed systems that demonstrate escalatory or other behavior.”

Both of these aspects are frequently the most overlooked in conversations around autonomous systems and yet are critical for ensuring reliability. They are also likely to be the most contested as questions of accountability arise when machines malfunction as they are bound to. They are also likely to make ‘decision made by algorithm’ a less viable defense when creators of AI are confronted with questions of bias and discrimination – as Apple and Goldman Sachs’ credit limit-assigning algorithm recently was.

While the most direct applications of the DoD’s principles is in the context of lethal autonomous weapon systems, their relevance will likely be felt far and wide. The various private technology companies that are currently soliciting and building various autonomous systems for military use – such as Microsoft’s $10 billion JEDI contract to overhaul the military’s cloud computing infrastructure and Amazon’s facial recognition system used by law enforcement – will likely have to invest in building new fail safes into their systems to comply with the DoD’s recommendations. It is likely that these efforts will have a bleed through effect into systems being developed for civilian use as well. The DoD is certainly not the first institution to adopt these principles. Non-governmental agencies such as the Institute of Electricals and Electronic Engineers (IEEE) – the largest technical professional organization in the world – have also called [PDF] for adoption of standards around transparency and accountability in AI to provide “an unambiguous rationale” for all decisions taken. While the specific questions around which ethical principles can be applied to machine learning continue for the foreseeable future, the Pentagon’s draft could play a key role in moving the needle forward.

The “Trolley Problem” has been buzzing around for a while now, so much that it became the subject of large empirical studies which aimed at finding a solution to it that be as close to “our values” as possible, as more casually the subject of an episode of The Good Place.

Could it be, however, that the trolley problem isn’t one? In a recent article, the EU Observer, an investigative not-for-profit outlet based in Brussels, slashed at the European Commission for its “tunnel vision” with regards to CAVs and how it seems to embrace the benefits of this technological and social change without an ounce of doubt or skepticism. While there are certainly things to be worried about when it comes to CAV deployment (see previous posts from this very blog by fellow bloggers here and here) the famed trolley might not be one of those.

The trolley problem seeks to illustrate one of the choices that a self-driving algorithm must – allegedly – make. Faced with a situation where the only alternative to kill is to kill, the trolley problem asks the question of who is to be killed: the young? The old? The pedestrian? The foreigner? Those who put forward the trolley problem usually do so in order to show that as humans, we are forced with morally untenable alternative when coding algorithms, like deciding who is to be saved in an unavoidable crash.

The trolley problem is not a problem, however, because it makes a number of assumptions – too many. The result is a hypothetical scenario which is simple, almost elegant, but mostly blatantly wrong. One such assumption is the rails. Not necessarily the physical ones, like those of actual trolleys, but the ones on which the whole problem is cast. CAVs are not on rails, in any sense of the word, and their algorithms will include the opportunity to go “off-rails” when needed – like get on the shoulder or on the sidewalk. The rules of the road incorporate a certain amount of flexibility already, and such flexibilities will be built in the algorithm.

Moreover, the very purpose of the constant sensor input processed by the driving algorithm is precisely to avoid putting the CAV in such a situation where the only options that remain are collision or collision.

But what if? What if a collision is truly unavoidable? Even then, it is highly misleading to portray CAV algorithm design as a job where one has to incorporate a piece of code specific to every single decision to be made in the course of driving. The CAV will never be faced with an input of the type we all-too-often present the trolley problem: go left and kill this old woman, go right and kill this baby. The driving algorithm will certainly not understand the situation as one where it would kill someone; it may understand that a collision is imminent and that multiple paths are closed. What would it do, then? Break, I guess, and steer to try to avoid a collision, like the rest of us would do.

Maybe what the trolley problem truly reveals is the idea that we are uneasy with automated cars causing accidents – that is, they being machines, we are much more comfortable with the idea that they will be perfect and will be coded so that no accident may ever happen. If, as a first milestone, CAVs are as safe as human drivers, that would certainly be a great scientific achievement. I do recognize however that it might not be enough for the public perception, but that speaks more of our relationship to machines than to any truth behind the murderous trolley. All in all, it is unfortunate that such a problem continues to keep brains busy while there are more tangible problems (such as what to do with all those batteries) which deserve research, media attention and political action.