Archives For Results

At ONA14 in Chicago in late September we unveiled the new OpenElections data download interface. We presented at the Knight Foundation’s Knight Village during their office hours for featured News Challenge projects, as well as during a lighting talk. OpenElections’ Geoff Hing and Sara Schnadt showed off their handiwork based on in-depth discussions and feedback from many data journos. The crowd at ONA was receptive, and the people we talked to were keen to start having access to the long awaited data from the first few states.

Screen Shot 2014-10-06 at 2.47.55 PM

As you can see from the data map view above, there are only three states that have data available so far. These are Maryland, West Virginia and Wyoming, for which you can download ‘raw’ data. For our purposes, this means that you can get official data at the most common results reporting levels, with the most frequently used fields identified but without any further standardization. We will have ‘raw’ data on all the states in the next few months, and will work on having fully cleaned and standardized data on all the states after this initial process is complete.

Screen Shot 2014-10-06 at 2.48.12 PM

As things progress, you will see updates to both the map view and the detailed data view where you can see the different reporting levels that have data ready for download so far.

Screen Shot 2014-10-06 at 4.30.19 PM

A pink download icon indicates available data, and a grey icon indicates that data exists for a particular race at a particular reporting level, but that we don’t yet have it online.

Screen Shot 2014-10-06 at 4.28.56 PM
The race selection tool at the top of the page includes a visualization that gives an overview of all the races in our timespan, and a slider for selecting a date range to review races in the download table. For states like Maryland (shown in the full page-view above), there are only two races every two years so this slider isn’t so crucial, but for states like Florida (directly above), this slider can be useful.

We encourage you to take the interface for a spin, and tell us what you think! And, if you would like to help us get more data into this interface faster, and you are fairly canny with Python, we would love to hear from you. You can learn more about what this would entail here.

Screen Shot 2014-05-26 at 1.10.01 PM

As part of National Day of Civic Hacking, we are organizing an OpenElections challenge for the hacking events at locations all over the country – Sat May 31 and Sun June 1st.

If you are attending one of these events near you, and would like to join in on our effort to write scrapers for elections results, let us know!

Write Scrapers for us…
Help us extend our core scraper architecture to create a series of custom scrapers that account for the idiosyncrasies in how each state structures data, stores it, and makes it available.

**Our docs for this process are now up on our site. Look here to see what would be involved with joining in**

Your time and expertise would be most appreciated either day. Also, feel free to join in from home.

If you would like to help out, email sschnadt.projects@gmail.com either or tweet at us @OpenElex either before the event or on the day. Our team will be online and available to get you set up.

Thank you!

The OpenElections Team

craig gilbert headshotOE: What is your beat and what kinds stories do you report on?

CG: I cover Washington for the Milwaukee Journal Sentinel but also write a blog about elections, public opinion and political trends with a pretty heavy Wisconsin focus. It’s called The Wisconsin Voter. The majority of my work is for the blog and most of what goes into the blog also runs in the print edition of the paper.

OE: How often do you use elections results in your stories, and why?

CG: I use election data all the time. We’ve had an inordinate number of elections in Wisconsin since 2010, including an explosion of recall fights without any real historical precedent in this country. Wisconsin is also typically a presidential battleground. I use election data to write about political trends nationally and inside the state. I write about turnout trends, and about the election process. Right now I’m doing a long-term research project for the paper in conjunction with a 6-month fellowship at the Marquette Law School. It’s about political polarization, and I’m relying pretty heavily on election data to make the case that metropolitan Milwaukee is an unusual and extreme example of a metro region polarized by party and deeply divided geographically between city and suburbs, with Democratic and Republican voters increasingly clustered in very partisan and very separate communities. I’m comparing voting patterns in Milwaukee to the other major metropolitan areas in the U.S. The project will look at how this came to be, and at the consequences of this kind of division.

OE: What processes do you go through now to access elections results? 

CG: I’ve purchased a lot of historical national and state data from Dave Leip’s US Election Atlas, which is incredibly useful, but only goes down to the county level. I recently discovered that Harvard and Stanford have a burgeoning project to publish precinct-level election data for most states. There’s obviously the election data that’s made available by state and county governments.

OE: How long does this take, and how many people are involved? 

CG: In the case of many of these sources, getting the data into a form that’s convenient for analysis, and building election databases around the numbers, can be labor-intensive. I generally do it myself, although for my current project I’m working with a political scientist, Charles Franklin, who is at the Marquette Law School.

OE: Do you feel like the information that you are able to access now is comprehensive and reliable for your purposes?

CG: No, not comprehensive. There are gaps, mainly when you get down below the county level and want to work with data at the level of municipalities and voting wards. The differences between how states report this information are vast. Wisconsin is pretty good, fortunately for me. You can get a lot of geographic detail. But some states are horrible. In many states, only county data is available. It’s amazing to me how difficult it is to get presidential elections results for many major American cities (as opposed to counties), for example. Some cities like Milwaukee and Chicago post this data routinely. Others don’t and all you get is data for the county they are located in.

OE: How would a project like OpenElections change your process?

CG: Anything that would standardize the data and provide the kind of geographic detail – wards and municipalities – mentioned above would be very helpful.

OE: Would our project make it possible for you to do more with election results. How?

CG: You could do a lot more to compare voting trends and patterns from state to state, city to city, etc. You could also write about voting trends at the local level, which is where people live. People don’t think of themselves as residents of counties, but of towns, cities and neighborhoods.

OE: Do you have any particular requests for our project that would make your work more effective?

CG: It would be nice if the election data were coded by county for sure, and coded in a way that allowed you to aggregate by municipality and congressional district. It would be nice to have results for president, governor and senator included in the project. And of course, all in Excel.

OE: In an ideal world, how far back would you like to have historical results available?

CG: In an ideal world, the 1970s, but that’s asking a lot. The 1990s would be nice.

Craig Gilbert is the Milwaukee Journal Sentinel’s Washington Bureau Chief and author of “The Wisconsin Voter” political blog. Gilbert has covered national and state politics for the paper since 1990, and has written extensively about the electoral battle for the crucial swing states of the upper Midwest. He is currently a fellow at the Marquette Law School working on a research project on polarization. He was a 2009-10 Knight-Wallace fellow at the University of Michigan, where he studied public opinion, survey research, voting behavior and statistics. He previously worked for the Miami Herald, the Kingston (NY) Daily Freeman and was a speechwriter for New York Sen. Daniel Patrick Moynihan. Gilbert has a B.A. in History from Yale University.

The goal of OpenElections is to provide clean, consistent results data for federal, statewide and state legislative races from all 50 states, and to make it easy for journalists (or anyone else) to access them in common data formats. We want to provide a minimum set of information for each race, supplemented by additional data that may be available from the state.

Our results spec is a work in progress, and likely will change as the project develops. Your input on the contents of results data and ways that users can retrieve them are welcomed. Join our Google Group, or send us an email at openelections@gmail.com.

Formats

The initial formats for results data are CSV and JSON, both lightweight text formats that can be easily used by applications ranging from Microsoft Excel to web application frameworks or mobile applications. All files described below are available as both CSV and JSON by adding those extensions to filenames. Future formats may include XML, and we welcome your feedback.

State Metadata

Results data is organized primarily by state and then by year within each state. Each state has a top-level directory labeled with its two-character state abbreviation, such as “MD” for Maryland. At the top level a file “metadata” contains basic information about the availability of results for that state. The layout for the metadata file is as follows:

  • years – an array of integers listing the years for which OpenElections has election information.
  • updated_at – a timestamp indicating the last update to any of the state’s files.
  • volunteers – an array of OpenElections usernames indicating people who have contributed to this state’s results.

In the CSV version, years and volunteers are comma-delimited text; in JSON they are standard arrays.

Elections by Year

Inside a particular year’s directory within a state there is a file called “elections” listing the elections that occurred in that year and information about the status and scope of results data. An example url would be: https://s3.amazonaws.com/openelex-data/us/states/md/2012/elections.json for JSON and https://s3.amazonaws.com/openelex-data/us/states/md/2012/elections.csv for CSV.

The layout of the elections file is as follows:

  • year – an integer representing the year of the elections
  • state – the two-character state postal abbreviation
  • elections – an array of hashes containing individual election information (json file only)
  • date – the date of the election
  • results_type – the type of results available (Certified, Unofficial or null)
  • election_type – ‘primary’, ‘general’, ‘runoff’ or other election type
  • special – a boolean indicating whether the election is a special election
  • office – a string representing the office contested in the election
  • result_levels – an array containing a hash of the availability of results at certain levels (json file only)
  • race_wide – a boolean indicating whether race-level results are available
  • county – a boolean indicating whether county-level results are available
  • congressional_district – a boolean indicating whether congressional district level results are available
  • state_legislative – a boolean indicating whether state legislative district level results are available
  • precinct – a boolean indicating whether precinct-level results are available
  • updated_at – a timestamp indicating when the given election was last updated

Race-wide Results for an Election

Each election date represents a directory containing CSV and JSON files covering elections to specific offices. For example, the Nov. 6, 2012 general election results would be stored in a 2012-11-06 directory, and the filenames would represent offices such as “president”, “us-senate” and “us-house”. Each office file would contain basic information about the election (office, election type, reporting level and total votes), and one record for each candidate listed in the results. The result records consist of at least these fields:

  • first_name – a string representing the parsed first name of the candidate
  • middle_name – a string representing the parsed middle name or initial of the candidate
  • last_name – a string representing the parsed last name of the candidate
  • suffix – a string representing the parsed suffix of the candidate
  • name_raw – a string representing the “raw” full name of the candidate from the results, if present
  • party – a string representing the “raw” party name or abbreviation from the results
  • office – a string representing the office
  • election_type – a string representing the type of election (“special”, “primary”, “election”, for example)
  • reporting_level – a string representing the level represented by the results (“racewide”, “county”, for example)
  • reporting_jurisdiction – a string representing the reporting jurisdiction (the two-character state abbreviation, or county name, for example)
  • winner – a boolean marked true for the winning candidate; all other candidates are marked as false
  • votes – the race-wide number of votes received by a candidate
  • pct – the race-wide percentage of votes received by a candidate

Depending on the state, there may be other fields in the results data, including:

  • write_in – a boolean marked true if the candidate is a write-in candidate
  • precincts – the total number of precincts for a reporting level

Here’s an example of a race-wide JSON president file and a race-wide CSV president file. There also are county-level JSON and CSV files for a single county.

We’d love to get your reactions, comments and suggestions for these results files. Are we missing fields or attributes, or should others be changed? How should we handle different types of votes (absentee, same-day registration, etc.)? What about the URL structure or the way that county-level results are handled? Let us know.