Archives For Contributing

Offline Election Results

September 7, 2015 — Leave a comment

In some states, getting election results data is pretty easy (looking at you, Maryland, Wyoming and Florida, to name three). In others, it’s a matter of going county-by-county, as we’ve previously written. But in most of these cases, we’ve been dealing with results files that are available online.

But what about cases where the results aren’t on the Internet?

Continue Reading...

For OpenElections volunteers coming to NICAR in Atlanta next month, we’ve got a challenge for you: help us tackle Georgia election results.

As we did last year in Baltimore, OpenElections will hold an event on Sunday, March 8, with the goal of writing scrapers and parsers to load 2000-2014 election results from the Peach State, and we’re looking for some help. It’s a great way to get familiar with the project and see what our processes are.

Georgia offers some different tasks, from scraping HTML results to using our Clarify library to parse XML from more recent elections. So we’re looking for people who have some familiarity with Python and election results, but we’re happy to help guide those new to the process, too. Thanks to our volunteers, we’ve already got a good record of where election result data is stored by the state.

Here’s how the process will work: we’ll start by reviewing the state of the data – what’s available for which elections – and then start working on a datasource file that connects that data to our system. After that, we’ll begin writing code to load results data, using other states as our models. As part of that process, we’ll pre-process HTML results into CSVs that we store on Github.

If you’re interested in helping out, there are two things to do: first, let us know by emailing openelections@gmail.com or on Twitter at @openelex. Second, take the time to setup the development environment on your laptop following the instructions here. We’re looking forward to seeing you in Atlanta!

Introducing Clarify

November 26, 2014

An Open Source Elections-Data URL Locator and Parser from OpenElections

By Geoff Hing and Derek Willis, for Knight-Mozilla OpenNews Source Learning

state_summary_page__KY

State election results are like snowflakes: each state—often each county—produces its own special website to share the vote totals. For a project like OpenElections, that involves having to find results data and figuring out how to extract it. In many cases, that means scraping.

But in our research into how election results are stored, we found that a handful of sites used a common vendor: Clarity Elections, which is owned by SOE Software. States that use Clarity genferally share a common look and features, including statewide summary results, voter turnout statistics, and a page linking to county-specific results.

The good news is that Clarity sites also include a “Reports” tab that has structured data downloads in several formats, including XML, XLS, andCSV. The results data are contained in .ZIP files, so they aren’t particularly large or unwieldy. But there’s a catch: the URLs aren’t easily predictable. Here’s a URL for a statewide page:

http://results.enr.clarityelections.com/KY/15261/30235/en/summary.html

The first numeric segment—15261 in this case—uniquely identifies this election, the 2010 primary in Kentucky. But the second numeric segment—30235—represents a subpage, and each county in Kentucky has a different one. Switch over to the page listing the county pages, and you get all the links. Sort of.

The county-specific links, which lead to pages that have structured results files at the precinct level, actually involve redirects, but those secondary numeric segments in the URLs aren’t resolved until we visit them. That means doing a lot of clicking and copying, or scraping. We chose the latter path, although that presents some difficulties as well. Using our time at OpenNews’ New York Code Convening in mid-November, we created a Python library called Clarify that provides access to those URLs containing structured election results data and parses the XML version of it. We’re already using it in OpenElections, and now we’re releasing it for others who work in states that use Clarity software.

See full piece on Source Learning

Eating Our Dog Food

July 15, 2014

By Derek Willis

When Serdar and I first talked about building a national collection of certified election results, we had a very specific audience in mind: the two of us. It seemed like every two years (or more frequently), one or both of us would spend time gathering election results data as part of our jobs (me at The New York Times, Serdar then at The Washington Post). We wanted to create a project that both of us could use, and we knew that if we found it useful, others might, too.

Precinct comparison

The New York Times

In the world of software development, using your own work is called eating your dog food, and we’ve done just that. While we’re nowhere near finished, I am happy to report that OpenElections data has proven useful to at least half of the original intended audience. Both last week and this week, The Upshot, a new politics and policy site at The Times that I work on, used results data from Mississippi collected by OpenElections to dig into the Republican primary and runoff elections for U.S. Senate. The analyses that Nate Cohn did on voting in African-American precincts would not have been possible using the PDF files posted by the Mississippi Secretary of State. We needed data, and we (and you) now have data.

We’ve completed data entry of precinct-level results for the 2012 general election and the 2014 Republican primary runoff elections, plus special elections from 2013, and we’re working on converting more files into data (we just got our first contributions from volunteers, too!). These are just the raw results as the state publishes them; we haven’t yet published them out using our own results format (but that’s coming soon for Maryland and a few other states). We provide the raw results for states that have files requiring some pre-processing – usually image PDFs or other formats that can’t be pulled directly into our processing pipeline.

The Mississippi example is exactly the kind of problem that we hoped OpenElections would help solve, and it’s only the beginning for how election results data could be used. Once we begin publishing results data, we’d love to hear how you use it, too. In the meantime, if you have some time, there’s more Mississippi data to unlock!

Screen Shot 2014-06-05 at 7.31.32 PM

OpenElections represented at this year’s Transparency Camp, a national conference for civic hackers who work to make political process and government data more, well, transparent. This is a growing and very dynamic un-conference and the session topics ranged from ‘Why the internet hasn’t changed politics’  to ‘Interoperable Civic Data — for user-centric technology’. There were many journalists in attendance, as well as political scientists, policy makers, and technologists working within and in support of government. The atmosphere was palpably optimistic, as the general ethos of the crowd was that ‘we are all here to affect positive change’.

Screen Shot 2014-06-05 at 7.32.20 PM

There were many international civic tech folks and journalists in attendance too, who were especially interested to observe how the US deals with the issue of advancing it’s government transparency since the  impact of this is felt all over the world. TCamp is becoming more international each year.

Screen Shot 2014-06-05 at 7.31.03 PM

The conference was also very technical. OpenElections team members Derek Willis and Sara Schnadt spoke to a room full of hackers particularly attuned to the nuances of elections processes and aware of existing results infrastructures and their limitations. Derek walked through the process of acquiring a data source for a state, writing a scraper, and made the case for joining our effort. There were many thoughtful questions and a lively broader discussion about how to best create technologies to facilitate democratic process. The discussion continued and got even more down to the nitty gritty in a later session bringing together representatives from OpenElections, Voting Information Project, Google Civic Innovation, the Sunlight Foundation, and others, to tease out the problem of defining open data identifiers in an open and non-hierarchical ecosystem of technology projects.

Screen Shot 2014-06-05 at 7.29.37 PM

That weekend, as you heard from us leading up to it, was also National Day of Civic Hacking, and TCamp was one of over 100 events taking place around the country. We camped out and hacked in the main room at the conference a good bit (as did our teammates in Chicago and the Bay Area), ramping up new developer volunteers who were joining in from TCamp and from events in other parts of the country. A big thank you to everyone who joined us over the weekend, and great to meet all of you who came on board at TCamp!

Screen Shot 2014-05-26 at 1.10.01 PM

As part of National Day of Civic Hacking, we are organizing an OpenElections challenge for the hacking events at locations all over the country – Sat May 31 and Sun June 1st.

If you are attending one of these events near you, and would like to join in on our effort to write scrapers for elections results, let us know!

Write Scrapers for us…
Help us extend our core scraper architecture to create a series of custom scrapers that account for the idiosyncrasies in how each state structures data, stores it, and makes it available.

**Our docs for this process are now up on our site. Look here to see what would be involved with joining in**

Your time and expertise would be most appreciated either day. Also, feel free to join in from home.

If you would like to help out, email sschnadt.projects@gmail.com either or tweet at us @OpenElex either before the event or on the day. Our team will be online and available to get you set up.

Thank you!

The OpenElections Team

When we embarked on this quest to bring sanity to election data in the U.S., we knew we were in for a heavy lift.

A myriad of data formats awaited us, along with variations in data quality across states and within them over time.  In the past few months, the OpenElections team and volunteers have crafted a system to tame this wild landscape. This post takes a closer look at how we applied this system to Maryland, the first state that we took on to define the data workflow process end to end. Hopefully it helps shine some light on our process and generates ideas on how we can improve things.

The Data Source

Maryland offers relatively clean, precinct-level results on the web. In fact, it provides so many result CSVs (over 700!) that we abstracted the process for generating links to the files, rather than scraping them off the site .

Other states provide harder-to-manage formats such as database dumps and image PDFs that must be massaged into tabular data. We’ve devised a pre-processing workflow to handle these hard cases, and started to apply it in states such as Washington and West Virginia.

The common denominator across all states is the Datasource. It can be a significant effort to wire up code-wise, but once complete, it allows us to easily feed raw results into the data processing pipeline.  Our goal in coming months is to tackle this problem for as many states as possible, freeing contributors to work on more interesting problems such as data loading and standardization.

Raw Results

When the datasource was in place, we were ready to load Maryland’s data as RawResult records in Mongo, our backend datastore. The goal was to minimize the friction of initial data loading. While we retained all available data points, the focus in this critical first step was populating a common core of fields that are available across all states.

In Maryland, this meant writing a series of data loaders to handle variations in data formats across time. Once these raw result loaders were written, we turned our attention to cleanups that make the data more useful to end users.

Transforms

Loading raw results into a common set of fields is a big win, but we’ve set our sights much higher. Election data becomes much more useful after standardizing candidate names, parties, offices, and other common data points.

The types of data transforms we implement will vary by state, and in many cases, one set of cleanups must precede others. Normalizing data into unique contests and candidates is a transform common to all states, usually one that should be performed early in the process.

Transforms let us correct, clean or disambiguate results data in a discrete, easy-to-document, and replicable way.  This helps keep the data loading code simple and clear, especially when dealing with varying data layouts or formats between elections.

In Maryland, we used the core framework to create unique Contest and Candidate records for precinct results. These transforms included:

This opened the door to generating totals at the contest-wide level for each candidate.

Validations

At this point, you might be getting nervous about all this processing.  How do we ensure accuracy with all this data wrangling? Enter data validations, which provide a way to link data integrity checks with a particular transformation, or more broadly check data loading and transformation.  In Maryland, for example, we implemented a validation and bound it to a transform that normalizes the format of precinct names.  In this case, the validation acts like a unit test for the transform.  We also cross-check the loaded and transformed result data in validations that aren’t bound to specific transforms to confirm that we’ve loaded the expected number of results for a particular election or ensure that the sum of a candidate’s sub-racewide vote totals matches up with published racewide totals.

Implementing and running validations has helped us uncover data quirks, such as precinct-level data reflecting only election day vote totals, while result data for other reporting levels includes absentee and other types of votes. Validations have also exposed discrepancies between vote counts published on the State Board of Elections website and ones provided in CSV format.  We’ve circled back to Maryland officials with our findings, prompting them to fix their data at the source.

Summary

Maryland has been a guinea pig of sorts for the OpenElections project (thank you Maryland!).  It’s helped us flesh out a data processing framework and conventions that we hope to apply across the country.  Of course, challenges remain: standardizing party names across states; mapping precincts to counties; and sundry other issues we didn’t cover here remain a challenge.

As we tackle more states, we hope to refine our framework and conventions to address the inevitable quirks in U.S. election data . Meantime, we hope this provides a window into our process and gives you all some footing to make it easier to contribute.

As our volunteers have gathered details on the scope and availability of election results from across the country, one thing became clear: not all election results are created equal.

Some states provide results data in multiple formats and variations; all you have to do is choose and click. Florida has a download for every election. In Pennsylvania, we found that for $7 the state provides a CD with 12 years of consistently formatted results. Idaho has multiple files covering different reporting levels.

But that’s not every state. Some have made the transition from producing PDFs to CSVs, such as West Virginia. Others, like Mississippi, basically provide a picture of the results. For states where data isn’t the norm, OpenElections needs to fill the gap, turning results into data.

This isn’t a glamorous job, but we’d like to tell you a little about how we go about it. For states that provide electronically-generated PDFs, like West Virginia does for elections from 2000-2006, there are several good options for parsing PDF tables into data. The command-line utility pdftotext from the xPDF package works well in many cases, while the excellent Tabula (a product of ProPublica, La Nación and the Knight-Mozilla OpenNews project) can do wonders with more complex files. For West Virginia, xPDF was all we needed (along with some search and replace in a text editor) to make CSV files from the original PDFs. Here’s an example command that generates a fixed-width text file while preserving the rows and column format of the original file:

$ pdftotext -layout 2000 House of Rep Pri.pdf

We used TextWrangler, a free text editor for the Mac, to convert the spaces between columns into tabs, and from there it was trivial to copy results into CSV files. In the process of converting these results, we found several apparent errors (typos or likely copy and paste mistakes) and notified the Secretary of State’s office. To its credit, the office responded quickly and is in the process of fixing the original files (and we’ll update our CSVs when they do).

In Mississippi, however, there’s are no programmatic options, or at least no good ones. Data entry is the best way for us to get precinct-level results that are contained in county-by-county files like this one. Here’s what we’re dealing with: a scanned image of a fax:

 

Screen Shot 2014-05-05 at 8.31.51 PM

 

When it comes to doing data entry, we need to be very specific about what we want and how we want it stored in the CSV file. For our Mississippi files, we’ve developed a guide to the process that we’ll adapt to other states where manual entry is required. Which is where you come in, if you’re up for it. If you’d like to try your hand at a Mississippi file (or another state with PDFs) let us know in the Google Group and we can get you setup. Or you can fork the Mississippi results repository on Github and send us an email per the instructions in the README file.

We know that data entry is neither fun nor exciting (well, most of the time), but think of this: you’ll be part of a project that will provide a great service to journalists, researchers and citizens. And we still have some t-shirts left, too.

NICAR 14 Hackathon

By Sara Schnadt

This year’s NICAR conference was an especially great experience for me. Having spent the past year working remotely with volunteers around the country to develop the groundwork for the OpenElections project, I met so many volunteers in person, featured them in our project update session, and worked alongside them at our day-long hackathon on the last day of the conference, and this made working on the project so much more meaningful.

From meeting Sandra Fish by having her pass by me in the throngs of in-between-session milling to excitedly hand off a CD of Colorado results data, to Ed Borasky telling me over our computers at the hackathon that there is a large and close-knit network of journalists in his local Portland area who would be very supportive of our work, to noticing after many hours of working together that our own Derek Willis and Nolan Hicks have very similar senses of humor, NICAR was a great and constructive convergence of OpenElections supporters.

We were also very pleased to have new volunteers join us for the hackathon, including the likes of NPR’s Jeremy Bowers, and the Chicago hacker Nick Bennett helping with scraper writing and data processing. Bloomberg designer Chloe Whiteaker and civic dev extraordinaire Margie Roswell also blithely drafted us a new public-facing site in a matter of hours. And then there was Bloomberg visualization dev Julian Burgess, who spent most of the day with us, at first trying his hand at learning python just so he could pitch in, then giving an in-depth assessment of our interface and data acquisition strategies. I am new to digital journalism as of this past year, and I have to say I am very taken by the generosity, talent, and character of the people in this space.

More than anything else, meeting all these great folks in person brought home just how important it is to digital journalists to create new civic infrastructure where it doesn’t already exist, and to see how invested you all are in seeing this project succeed. During our session ‘OpenElections, a year in review’, in addition to a detailed update on our progress gathering metadata with our small army of volunteers, and defining a core results data scraper spec, there were spirited discussions about the technical nuances and interesting challenges of our system architecture. These challenges are inherent in taking a motley and wildly varied collection of individual states’ election results archiving methods and creating a new, clean, systematic, national infrastructure. The interest and investment were palpable in the room.

From all of this, it was clear that we are on the right track, and we left with new motivation, support, perspective, talent and stamina to bring the project home in our second year!

 

Hack with us at NICAR!

January 15, 2014

We are organizing an OpenElections hackathon on the Sunday of NICAR, from 10am til 6pm in the Chesapeake room.

If you are here at NICAR, and would like to join us after all the other activities, we would love to have you. We are looking for coders as well as anyone interested in helping out, and spending the day with our team. Here’s how you can be involved:

Track One – Build Our Interface
Help us build the first draft of a public-facing map-based interface for OpenElections. The map will show the status of the project metadata and data acquisition, as well as scraper development. In the finished version, the interface will also give users access to results data.

Track Two – Election Results Data Scraper Development
Help us extend our core scraper architecture to create a series of custom scrapers that account for the idiosyncrasies in how each state structures data, stores it, and makes it available.

**Our docs for this process are now up on our site. Look here to see what would be involved with joining in**

Track Three – Documentation and Use Cases
Help us flesh out the guides that articulate all of the processes volunteers need to know to work with us, as well as the documentation that other developers will need in order to build on our work. Also, or alternately, come by and give us your use cases! We will be collecting descriptions of how all kinds of journalists use elections data now, and how you would like OpenElections to work for you.

Your time and expertise would be most appreciated either all or part of the day.

Thank you!

The OpenElections Team