Today, you can hail a robotaxi in San Francisco or Los Angeles or Las Vegas, but not in New York or Toronto or London.
There are several reasons why this is so, but the principal one is weather. In the American sunbelt, it only rains occasionally, and it never snows. Conversely, elsewhere in the United States and in much of the world, winter weather poses challenging road conditions for months at a time.
Until—and unless—automated driving systems (ADS) can handle winter weather, most of the world will never have self-driving activity, or at least won’t have it at scale. So building ADS that can operate safely and consistently in winter conditions is immensely important. Doing so requires overcoming three challenges.
Firstly, winter weather fundamentally changes how an ADS perceives its environment. Snow covers lane markings, alters the appearance of roads and signs, and transforms familiar landscapes into unrecognizable terrain. Secondly, precipitation and ice degrade sensor performance. Cameras struggle with visibility, LIDAR signals get scattered by snowflakes and fog, and only radar maintains consistent performance in these conditions. Thirdly, winter introduces entirely new driving challenges. Snow and ice change how vehicles perform, create unpredictable road conditions, and require different driving strategies.
Worst of all, these challenges converge: at precisely the moment when an ADS’s sensors are providing degraded data, its algorithms are struggling to interpret an unfamiliar environment, and it's encountering unexpected behaviors… that is when it needs to make decisions faster than ever. In slippery conditions, as human drivers know, the window for corrective action shrinks dramatically. The need to brake can emerge suddenly, and the response must be both immediate and precisely calibrated to the road conditions.
To find out more about the state of the problem and how close we are to a solution, I spoke with Prof. Steven Waslander, who leads the WinTOR research program at the University of Toronto. That program, which supports six faculty members and 30 researchers, focuses on how to handle snow, ice, and severe weather conditions.
Our conversation covered four areas:
the problem of, and solution to, ADS operation in winter weather;
the kinds of sensors self-driving cars have, and how these interact;
how an ADS ‘thinks’, and how improvements in this domain are helping it handle winter driving; and
the implications for Waymo and Tesla’s competing approaches to self-driving technology.
Prof. Waslander and I spoke on 6 February 2025. I have edited our conversation for length and clarity.
The Problem, and How to Solve it
Andrew Miller: Why are winter conditions so challenging for self-driving systems?
Steve Waslander: First off, they're difficult for humans as well. There are four primary causes of accidents. Three are human misbehaviour: drunk driving, distracted driving, and speeding. But the fourth is adverse driving conditions—bad weather. Bad weather doesn't happen all the time, so people aren’t practiced at it; on a bad day, the chances of an accident are something like 50% to 100% higher than on a normal day.
So clearly winter conditions are challenging for people, and for automated systems too.
Adverse weather affects those systems in at least five ways:
The environment has changed. Everything looks different. All the algorithms designed to understand what they are seeing when, say, the grass is green must now understand what they’re seeing when the grass isn’t visible, when the usual conditions don’t apply.
The sensors get degraded. The image data has less information because of precipitation. The lidar data has speckling and specular backscatter as well as missing points. Lidar has more trouble seeing things at distance and cameras have more trouble seeing things everywhere. Radar is really the only sensor relatively unaffected by precipitation, but it has its own limitations—it's never as good as the other two on its own.
Behavior changes. It's harder to predict what's going to happen next. Cars may slide through stop signs, people may be pushing cars out of snowbanks, snow plows may be pushing snow everywhere and slowing down traffic. There are behaviors that only occur in winter or adverse weather that you won't be familiar with if you've never seen them in your training data.
Decision-making gets more complicated. Lanes are not always available, some roads become dangerous, downhills are sometimes more treacherous. You need to add a whole layer of complexity to reasoning that isn’t required in benign conditions.
Finally, there's traction. You may start slipping because you don't know what the friction coefficients will be. You have to have backup maneuvers, drive more cautiously, and behave differently because the uncertainty about what will happen in the future changes.
AMM: What is the solution to these problems? I imagine it's what we might call the ‘parliament of voices’—you've got multiple sensors so you've got redundancy of data, you've got a computer sophisticated enough to correlate all these data points into a single model it can have confidence in?
SW: Yes, correct. You attack it on all fronts.
We teach our networks to perform tasks on sensor data that has cars covered in snow, roadways covered in snow, and so forth. We don't want to spend two years labeling millions of kilometers of data in winter conditions everywhere we go, so we're looking at ways to advance this in an unsupervised manner—just collect data from the fleet driving around with drivers, then learn directly from that unlabeled data to refine what we already know.
We can clean up the sensor data—filter point cloud corruption [i.e., remove points from a data cloud that are misleading outliers], and modify images so you can almost see through fog. There are literally papers on "seeing through fog" and "seeing through snow and precipitation" that look at removing these effects from sensor data.
The behavior challenge is harder because those behaviors somehow must get captured. We're looking at creating simulated environments with realistic enough actors that will replicate at least some of the physics-driven challenges.
This seems to be what companies like Waymo and especially Waabi are doing. Waabi has built this simulator that's super high-fidelity. They're training against millions of scenarios, billions of kilometers in simulation, and they've closed the gap to the real world—when they transfer what they've learned in simulation onto real trucks, it works with very little modification. That seems to be a strong way forward, although it's a huge engineering effort.
Sensors 101
AMM: Let's get granular. What are the different kinds of sensors available to a car, and what areas do they excel in?
SW: The three main perception sensors are camera, lidar, and radar.
Camera provides high resolution, detailed appearance information. You can see things out to hundreds of meters if they're big enough. They give you all the sign information, lane information, traffic light information—they're absolutely critical. But they have huge drawbacks: they don't explicitly measure depth or geometry, they can only infer it by knowing what objects look like. They're susceptible to glare and bad illumination—driving at night is always more challenging—and precipitation can really confound the image data.
Lidar provides both depth and intensity information in a 360-degree point cloud. You can have really high-density points with modern lidar, but most are fairly low density relative to camera data—we're talking about 40-50 megapixels in a 360-degree camera view versus only 1-2 million points in lidar. But the geometric information in lidar is phenomenal—centimeter-level accuracy everywhere out to 100-250 meters on all objects. This gives you a much better chance of detecting motion and shapes of objects. That's why perception challenges are usually dominated by lidar first, and then by lidar-camera combinations a couple years later once we figure out how to fuse the sources reliably.
Radar is a weaker platform that's useful but much lower resolution than lidar. It provides both range and range rate (rate of change relative to the radar's direction), so it directly measures velocity. Lidar doesn't do that yet, although some new lidar systems are learning to measure velocity directly. In a single radar scan, you can get the direction of travel of a vehicle relative to your own motion. That's why radar was used first for highway adaptive cruise control—it could reliably measure large objects ahead and immediately measure their velocity, allowing faster reaction time for braking events. Radar has this amazing characteristic that it can see through all precipitation, making it an amazing redundant sensor in adverse weather when lidar and camera get corrupted.
Our research has shown that camera is worst for precipitation, lidar can mostly handle winter conditions but loses some quality, and radar is basically unaffected, though it suffers from accuracy and resolution limitations.
There are minor sensors as well. There’s GPS that can tell you where you are and pull up correct map information—available 95% of the time. There’s inertial measurement, which gives you body orientation. Wheel odometry can give good estimates of speed and whether you're slipping, indicating available traction. These are more interoceptive—they look at what the car itself is doing rather than the environment.
There's also short-range sonar for parking, and microphones. Those are becoming more important. People are realizing that ambulances and fire trucks can be well discerned by audio, and pedestrians shouting at you is a good thing for the car to pay attention to, especially if they're shouting from under the tires, right? So there's definitely some auxiliary sensors that are coming in that are adding a little bit more detail.
AMM: Are there things that a human driver can sense that right now a car can't? For example, in winter weather, we've all taken a curve too quickly and started to fishtail. A human driver can feel that right away.
SW: That's captured by the inertial measurement units and wheel odometry. Traction and stability control has made huge advances over the last 10-15 years in reacting to fishtailing. Cars themselves recover quite well now—ABS [i.e., anti-lock braking systems] and traction control have contributed a lot.
How an ADS Thinks
SW (continuing): So an ADS can recover from fishtailing. What an ADS can't do yet, and what some groups in the winter project are looking at, is predicting when that's going to happen. If you can prepare the vehicle for slippery conditions, you're less likely to cause a slip that needs recovery. It's safer to slow down before hitting the dangerous situation. They're looking at detecting surface conditions visually or through LIDAR to determine what the traction situation will be in the next second or so.
Our sensors don’t have the same visual acuity as humans do. They could have it, but it would be too expensive. The human eye is phenomenal! But it only sees a very small part of the scene at any time, with the rest painted in by your brain from previous images. Further, I don't think humans can outperform learning systems on detection, tracking, or prediction tasks. Humans have more common sense and a better world model—we know structurally better how things will evolve, when we can merge, whether someone is driving aggressively. Those kinds of things aren't yet fully quantified. Humans are also very good at things like seeing wheel angles to indicate whether someone will turn left at a stop sign or go straight. They look for eye contact and interaction. Those sorts of things aren't common in perception literature yet. Detecting if someone's about to open a car door—we can't get that resolution with current camera data. So there's a gap, but it's not huge.
AMM: One of the interesting things about this is that the computer has to not only retrieve all these data streams from different sensors, it has to assess how much confidence to give to each of them—which is an interesting parallel with human cognition, where we get data from all sorts of sources and have to determine how to weigh it, how confident to be in any predictions we make based on that data.
SW: With learning techniques, you can be guided by the outcome. We build fusion-type methods for perception that take all the streams in, process them individually at first, but then fuse their representations and finally their outputs to come up with a best estimate across all the streams.
When we work with adverse weather, we can see that the inputs from certain streams become more important than others. For example, we did experiments training networks with camera and lidar on nighttime images. Our performance improved when, instead of providing the nighttime image in the image stream, we just provided black images. Through training, the system learned to completely ignore camera data because there was no information in it. This allowed us to bias it towards relying on the lidar in those scenes, which works just as reliably in daytime as night.
Interestingly, when we provided the actual image information directly, the system didn't learn that signal as quickly because there was also some information in the image from headlights and such. You can apply these kinds of techniques across all the lidar streams. More and more, we're getting what are called multi-modality datasets and multi-modal models that are trained on these datasets. They take in all the information while still being supervised with human-annotated labels on the output, so you're getting the best answer from all possible sources of information. It's all being learned together.
One theme of my research has been about debugging these systems. It’s hard to accept the argument that they’ll be safe when they’re just relying on a mixture of linear algebra to give one answer and hope for the best. You need to get a representation of confidence or uncertainty associated with those outputs.
We've built methods that not only provide 3D bounding boxes for an object, but also provide uncertainty associated with every parameter that defines that box. How confident are you that the center of the car is where you said it was? How confident are you about the length of the car if you're only looking at the front? With each of these factors, you can have a richer uncertainty representation than just a classification score, which is what most methods provide.
That uncertainty representation has helped us. The uncertainty explains what the network actually understood about the scene better than just having the single output. This allows for interpretability—engineers can look at problem situations and figure out the causes of failures. We look at situations like these and try to train and improve performance in that specific area.
Waymo versus Tesla
AMM: You mentioned earlier that sensors are expensive. Given Waymo's multi-sensor philosophy versus Tesla's camera-only approach—is a fully-sensored car commercially viable, or would it cost so much that no one would buy it?1
SW: When I said expensive, I meant computationally expensive, which isn't as much of a bottleneck as I used to think. NVIDIA and others have made huge advances in what we can put in a car and process. When I worked with companies in industry, the compute power available was amazing—multiple GPUs onboard, enough to run all the large-scale models we wanted simultaneously.
On Tesla versus Waymo and the market viability—I think the jury's out. Waymo's solution involves bills of materials around $40,000 to $60,000 for all sensors, compute, and installation—basically doubling the price of a car. These aren't impossible numbers. Tesla is only putting in $5,000 to $10,000 worth of hardware, resulting in a car that's ‘self-driving-ish’. It steadily improves but is nowhere near Waymo's ‘two years without a single incident’ level—they're still making mistakes hourly, requiring driver attention.
AMM: One reason I might be more bullish on robotaxis than you is that while adding components is expensive now, that's partly because there isn't a large market yet. Look at the incredible price drops in solar and batteries over the last three decades from economies of scale.
SW: You're absolutely right—the costs will drop. Lidar prices are coming down quickly.
AMM: Tesla wants to go all-in on automated driving with no human interface, but also all-in on no lidar. Is it possible for such a car to operate in snowy conditions without lidar?
SW: It is! Humans drive with no lidar, after all. Tesla will just have to rely on better camera sensors and develop a better world model. They are, at the moment, pushing to get as close as they can to human performance. If they can eventually match and then exceed human driving performance with camera only, it will be a huge cost saving that will dominate the market.
That said, it's a longer road when you don't have redundant sensing and accurate geometry—you have to infer all that missing information from camera data. Humans drive in winter, though it's harder for us, so winter driving will be harder on a camera-only system as well, until all the issues we’ve discussed get resolved. We have proof by example that it's possible, but we don't necessarily have a clear picture for how to fully achieve this ambitious goal.
I’m grateful to Prof. Waslander for speaking to me on this topic. I come away from it feeling reassured.
The winter-driving problem, like so many technical challenges, is daunting only at first glance. Upon inspection, it disaggregates into smaller, more tractable problems: ones where we know what we need to solve them. We need distinct sets of sensors to provide overlapping data when snow and ice cause problems; better training data to help the ADS understand what it is seeing in bad weather; and improved prediction systems to handle the multiple confusions that winter offers.
The first solution needs to get cheaper, and the latter two need to be perfected. But the path forward is clear. It's not a matter of if the solution emerges, but when. And when it does, self-driving technology will finally fulfill its potential, allowing its benefits not just to the sunbelt, but everywhere people want to drive.
The self-driving car industry is divided on the question of sensors. Waymo uses a comprehensive suite of sensors including lidar, cameras, and radar; an approach shared by almost all of its competitors. Tesla has taken a radically different path; its driver-assist systems of today and its ADS of tomorrow rely solely on cameras and computer vision. The company argues that since humans can drive using only vision, machines should be able to as well, and use of other sensors is not only redundant, but will also make a vehicle so expensive as to be commercially unviable.
I learned so much from this conversation! Have you considered hosting a podcast or recordings of these conversations?