Blog Post

The 10 Commandments of Mobility Data

How to make intelligent decisions about intelligent data.

Alexander Pazuchanics

Oct 27, 2023

📩 Stay updated on industry trends & Vianova product updates—subscribe to our Newsletter here!

‍

The world of transportation data is exciting and growing every day. Since the advent of connected phones, connected cars, and even connected… skis, we’ve seen so much new data being produced, with the promise that it can drive better decision-making and lead to “smarter cities”. But something still feels… off. For many cities, it feels like there is a wide gap between the promise of data and the day-to-day experience of trying to use data or even trying to figure out what data to use.

I’ve been fortunate to spend most of my career specifically addressing the question of how we use this explosion of data to make our cities safer and greener. I helped to develop the Pittsburgh Principles for data reporting from autonomous vehicle companies testing in the city. My team built the City of Seattle’s policy around mobility data sharing. And now I’ve spent the last three years working with and listening to cities around the world on the data that matters to them. In that time, I’ve learned a lot and formed some opinions of my own.

Of course, I would be thrilled if you felt like Vianova’s mobility data solutions were the right answers to your challenges. But even if they’re not, I want to help raise the bar on how cities ask for and use data to make better decisions. To that end, I’ve drafted my 10 Commandments of Mobility Data:

1. Distinguish between data and insights

Many organizations are data-rich, but insight-poor. They collect (or purchase) large amounts of data, but they don’t know how to act on it. This isn’t a problem exclusive to the government, it can happen to any organization.

I spend a lot of time thinking about how to make data useful, relevant, or interesting. In other words, I hope we are helping a user to get the insights that data can bring. In some cases, at Vianova, we think that we understand a user’s problem well enough that we produce the insight directly for them, by taking the raw data and processing it into a refined “data product”. We’re not trying to hide anything, we just don’t think that it’s useful to go through every step of the process.

In other cases, we try to build tools to make it as easy as possible for the user to transform data into insights themselves. They may better know their own issues and be more creative than we are. Or they may not know exactly what they want yet, but they’ll “know it when they see it”. In those cases, we try to make the user experience as simple as possible to start and add complexity as we go.

2. Don’t forget, no one has all the data

Perhaps an obvious point, but it’s worth repeating. Nearly every data source is missing some data; or rather, no data source is a perfect representation of reality.

For example, in the domain of road safety, most cities rely on a dataset of reported collisions to determine the areas of greatest risk. But we sometimes forget that this data is not the set of all collisions, it’s the set of reported collisions. It’s a semantic difference, but it’s also a real one — there are numerous reasons that under-reporting of collisions could occur.

Even purely empirical data collection can be subject to errors. Cameras fail, AI fails to recognize objects, tube counters go down, and people counting traffic go get a cup of coffee. Maybe the error rates are quite low, but they’re not zero. And it’s important to acknowledge that there are risks to assuming that the data is the holistic picture (more on that later).

3. Acknowledge the trade-off between more data and richer data

Generally, connected vehicle data promises significantly more data than is collected by most cities today. More unique observations, in more places, with more distinct fields than you’ve had access to before. But the bigger the data collection net you cast, the more likely you will collect the metaphorical boots and tires of bad data.

Empirical data is great! You can collect data with your own eyes, or with a camera which you can review later. The data is likely quite rich, with a lot of additional context about the world around the observation. But you simply can’t collect as much data as quickly (or as cheaply) as you can with large connected vehicle datasets. Even a large number of fishermen with rods will catch less total fish than those trawling with a net.

I’m not suggesting that one type of data is always better than the other. It depends on the use case, the budget, the time frame, and a host of other factors. But the techniques need to be compared on their merits — it should be an intentional choice.

4. Always have a use case

I’ve frequently heard people say something like “Give us all the data, we’ll figure out how we’ll use it once we see it” (sometimes at my own company, sometimes coming out of my own lips). Data exploration is a lot of fun! It’s very interesting to discover interesting facts in the data or to see what kind of questions you could answer.

But to get beyond the exploration stage, users should really consider how they intend to use the data before they even open the file. It’s a tension — sometimes you don’t know what you don’t know. But without a clear rationale for collecting and interpreting the data, it’s very easy to get lost in a theoretical exercise. It’s also very possible to gain access to data you don’t need, or even data you don’t want. This is the principle of data minimization — only take what you need, when you need it.

Having a use case isn’t just good practice, for personal data, it’s a requirement under GDPR. You can learn more about this in some of our other writings.

5. Go through the “data debris” to find out what already exists

We frequently talk to cities that are trying to expand their data portfolio without fully unlocking the potential of their already existing data. There is a wealth of insights to be gained from mining pre-existing datasets, especially data that may exist in another department or team. Data fields that are already collected or could be easily added by one team could have a huge benefit to another — a concept I call “data debris”.

At Vianova, we prioritized features to allow users to upload their own data to combine different sources and start to identify patterns that may have otherwise been hidden. Better exploration of existing data sets, by more users with an understanding of the reason that data is valuable, can lead to insights more quickly.

6. Consider if the juice is worth the squeeze

Data collection (or acquisition) has a real cost — processing, cleaning, hosting, and manipulating the data takes time, effort, and computational power. At a minimum, it creates an opportunity cost — the time or resources that you could have spent doing something else.

The exact right dataset may exist (or could be created) to answer the question you have, but it’s also possible that a dataset that answers 80% of the question you have is available at half the cost in half the time. The same goes with sampling within the data set — perhaps instead of needing one year of historical data, you may only need six months within the last year of data. When considering data needs, it’s important to balance the effort and the value to find something actionable. It’s especially important to not let overwhelming requirements and a desire to “future-proof” lead to an impossibly large or complex data acquisition.

7. Think hard about the goal of representativity

We often get asked questions about the representativeness of the datasets that we work with, especially with connected vehicle data. It’s a fair question, but it’s also maybe imprecise.

Data sets may represent a quite small portion of the number of cars or trucks on the street (we try to target between 3–5% coverage, based on our modeling). This number may appear small, but 1) it’s almost certainly more broadly distributed than existing data collection by traffic cameras or tube counters, and 2) can still result in a significant number of observations. In virtually every field (including transportation), statistical sampling is used to make decisions and deliver results.

The more important question, especially when your use case revolves around transport planning, infrastructure development, or modeling, is whether the data is representative of the behaviors of the whole population. We work with several providers who serve a broad spectrum of drivers with the goal of best representing the average driver.

Representativity is a constantly changing variable, and one that data providers and cities need to work together on to achieve results.

8. Look at the trend, not just the raw number

It’s natural to want to know specific numbers — ever since we were little we’ve been learning to count. But the “big number” may not always be the most productive way of trying to answer a question using data.

When using big sets of mobility data, it’s typically more helpful to aggregate to some level of space and time. In some cases, it’s even required, as otherwise, the data can be personally identifiable. Aggregation can also make it easier to evaluate trends over time — smoothing out abnormalities in the data.

It’s similarly more useful to think about percentage changes than absolute numbers. Small values can sometimes have outsized effects. It’s important to make sure that data is legible, and that people are comparing things that make sense to be compared.

9. Trust your eyes and ears

I work at a start-up that builds mobility data products — I would love to tell you that you never need to leave your office (or your house) again, and the entire transportation system can be managed from your laptop. But that’s not true, and it shouldn’t be the goal.

To me, this is the difference between making “data-driven” decisions, and “data-informed” decisions. Big data sets and bespoke data products can help you triage, prioritize, and identify hot spots or areas of focus. They excel at giving you wider or deeper visibility than you might otherwise have. But no one has a magic machine that will spit out the correct answer 100% of the time.

But these tools need to be considered alongside a range of other inputs, like site visits and conversations with community members. It doesn’t mean that the data is unreliable, it just means that it’s not comprehensive.

10. Start somewhere, anywhere

This may seem obvious, but it isn’t always. Cities are starting from varying levels of data, but practically speaking, you have to start somewhere, and that is usually far from your final goal.

Many cities have virtually no data collection for modes such as cycling or walking behavior. New data products (and new hardware to generate insights) are being created daily to expand the pool of available insights. But in all honesty, no one is 100% sure that they’re collecting all the right data. Data creators, and companies like Vianova who build data products on top of that data, want to know that they’re helping to solve city problems. And the only way we find that out is by working together and tweaking the recipes until we get the right mix. It’s a journey we’re only at the beginning of.

This is a moment for innovation! Work quickly, learn something, and build on top of it. The advantage of most data collection techniques is that they are fast and (comparatively) cheap. It’s better to try and to learn than to suffer from analysis paralysis, waiting for the perfect data set to appear.

About Vianova

Vianova is the data analytics solution to operate the mobility world. Our platform harnesses the power of connected vehicles and IoT data, to provide actionable insights to plan for safer, greener, and more efficient transportation infrastructures. From enabling regulation of shared mobility to transforming last-mile deliveries, and mapping road risk hotspots, Vianova serves 150+ cities, fleet operators, and enterprises across the globe to change the way people and goods move.
For more information: www.vianova.io

If you have any comments that you would like to share with Vianova, please send your comments to hello@vianova.io. If you would like to learn more about what it is like to work at Vianova, and join our talented team, visit our job board or send directly your application to jobs@vianova.io.

👉 Read our previous blog here!

Let's get in touch

Lets talk! We are excited to hear how we can help you solve your mobility challenges.

Become part

of the movement