Drew Raines
Thesis Proposal

Tentative Title:

Towards a predictive model of driving behavior on the ZIP code level: Using demographic, consumer spending, and business patterns data to model vehicle miles traveled in the united states.

Motivation for the research

Development in the United States is heavily dependent on the automobile. Since the rise in significance of the automobile, our towns, cities, and communities have been and are being built along structural norms that necessarily place the automobile as a fundamental part of the transportation fabric for work, domestic, and recreational purposes. This structurally derived dependence on the automobile has worked well for America in many ways. As a country with vast land resources, developers in the United States have always had the ability to site new construction on greenfields – or undeveloped sites - instead of using existing developed lands. Additionally, the spread out method of development complements the use of cars, as areas of high density do not lend themselves to a car-dependent lifestyle. Cars are simply the cheapest, most efficient means of transportation for areas of sufficiently low density, therefore the popularity of cars has worked to shape the forms of new development into shapes that work for them. A car dependent development is in no way inherently bad. In terms of human utility, a shopping trip by car in the suburbs can often take comparable time to one on foot in a more densely developed area. Cars also provide the additional benefits of protection from the elements and increased carrying capacity over an unassisted human. It is, and has always been, a balance of these and other goods associated with the automobile vs. the negatives such as land use change, roadway deaths, ballooning infrastructure costs, and pollution.

Pollution has become an increasingly important negative in the equation over automobile use. While – mile for mile - tailpipe emissions of most traditional air pollutants have been steadily diminished, the release of carbon dioxide as a byproduct of combustion is becoming a greater problem. There is overwhelming evidence that the increased levels of CO2 in the atmosphere, due in large part to our combustion of fossil fuels, are having an effect on global climate patterns.  Anthropogenic climate change is one of the most pressing issues today, as the energy that has fueled society since the industrial revolution is inexorably tied to the release of CO2.

Better understanding the question of how elements of the built environment affect the use of automobiles can shed light on what can be done to make our lifestyles less car dependent.  As transportation uses two thirds of all petroleum and causes a third of the United States carbon dioxide emissions, reducing CO2 emissions from the transportation sector is an important part of getting our total carbon emissions in check. (McCulloch 2009)

Aside from the environmental effects of our car dependent lifestyle, the cheapness that  propelled the move ever outward away from urban cores may no longer be so cheap. The United States spent more than $700 billion on oil in 2008, and especially problematic was that $400 billion of that was oil from other countries. (EIA 2009) In a time when the US is concerned with our trade deficit, hemorrhaging that much money out of the country is especially painful. Building and maintaining the infrastructure that enables life in suburban areas is also getting increasingly expensive as the suburbs reach farther and farther into the countryside. In order to keep up with increasing population and increasing per capita VMT, the United states will have to spend $927 billion dollars just on the construction of new roads to keep up with demand. This does not count road maintenance, land acquisition, or bridge construction. (Burchell and Mukherji 2003) Understanding how the structure of development can reduce car dependence gives us the opportunity design in such as way as to reduce the societal load of new infrastructure construction.

Research question

I would like my research to shed light on the issue of the total and relative importance of density, housing/development structure, transit spending, business proximity and mix, and household economic situations on the number of miles (measured in vehicle miles traveled or VMT) people have to drive to complete daily functions. To that end I am aiming to answer the question:

How do the demographic and economic attributes of a ZIP code affect vehicle miles traveled?

The purpose of my research, then, is to use the answer to that question to add to the existing literature on the issue of car use in America. Specifically, I want to be able 

Methods

My methodology has been shaped by a number of factors. The decision to not focus my research on one area of the United States, but to look at the United States as a whole has had a large effect on the type of data I am using. In order to get data for my Independent variables (IV) that is applicable across the United States, I have been forced to avoid all local sources, and instead rely heavily on the US census, and data provided by marketing and mapping companies. Similarly, shaping my question around the VMT data from Carfax (my dependant variable) has dictated that my unit of analysis is a ZIP code, because that is how their data is organized, and it is difficult to translate to different geographic forms without some loss. Finally, I am attempting to keep all of my data recent. I would like all of my data to be sourced within a year of 2008, so 2000 census data, the richest source of data around, is not workable.

Information available on the ZIP code level and standardized across the entire United States has its limitations. I am interested in looking at how the physical structure of development affects VMT, however zoning information, and the majority of housing stock description data is either not standardized across the country (zoning) or rarely broken down by ZIP code (housing stock structure). There are many great sources of data that I would love to be able to use in my project that are not compatible with these contraints that I have set.  Chief among them is the American Community Survey, the yearly survey that has replaced the long form of the census. Unfortunately, the ACS has only been running for a few years, and does not yet have the data density to be usable on the ZIP code level. I may run further analysis on the county level, where the ACS data is available, if I can figure out a good way to translate the VMT data to the same level of analysis.

I have found a number of strong sources of data, despite the limitations. Several companies take 2000 census data and keep it current using regressions and more recent data. Pinpoint Demographics is one such company. They sell a dataset that is essentially 2000 census data that is updated to the years 2001-2009 through the use of their proprietary regression models. They also have  recent consumer spending data. While this is a great data set, and will likely provide me with the majority of my IV for my ZIP code analysis, there are some drawbacks. Foremost, the company is evasive about exactly where their data comes from, and how they model years past 2000. I believe I will eventually be able to get enough information out of them to make the data usable, however it is not certain. Another source that I plan on using is the ZIP code business patterns data set produced by the census. These are available across the country at the ZIP code level, and tell the number, and size of all business in predefined industry groups. I believe this data will help me get some idea of the zoning of a ZIP code, as I will be able to tell the ratio of residents to jobs, the types of jobs, and the mix of jobs. While this is not a perfect indicator, I believe it will prove useful.

I am relying on some degree of outside assistance with some of the analysis. The VMT data comes from the company CARFAX inc. Once I have determined exactly which areas I need odometer readings from, I will work with an analyst within the company to get that data into a form I can use.

While I am not at a stage of my work where I can confidently state my hypothesizes, there are a few predictions fundamental to this project. Based on previous research, I expect to see that VMT per capita increase when density decreases. I also believe the data will show that more mixed use development will increase as density increases.  Both of these correlations have been shown to hold in a few studies including Ewing, mentioned earlier. They are also some of the core tenants to contemporary city planning models, such as New Urbanism. I will be making a number of other hypotheses, one for each of the IV’s I pan on testing. However, at this time my list of IV is not finalized, and I will need to do additional research in the field in order to create accurate predictions.

Despite my apprehensions about the amount of work that has already been done in this field, and my difficulty in shaping my research question in such a way as to break new grounds, I am excited about the potential of this project. The accuracy and depth of the VMT data I am using is unmatched in any previous study, so I believe that at the conclusion of my work I will have added something new to the VMT literature. As I tried to show in my introduction, I believe that understanding automobile use is an important step to both reducing it and creating alternatives.