Archives For Announcements

The Price of OpenElections

December 25, 2015

As we wrap up OpenElections’ work in 2015, we’d like to give you an update on how we’ve spent not only our time but also the money that we’ve gotten, particularly from the John S. and James L. Knight Foundation’s Knight News Challenge. Most of that money that we’ve spent since mid-2013 has gone to salaries for our project manager and a single developer. Neither of the project’s co-founders has been paid for working on OpenElections, and we’ve tried to keep our operations pretty lean.

While our initial grant funding from Knight is nearly exhausted, we’ve made good progress and will keep going. In the past few months we’ve added a few more states (Louisiana, Missouri and Virginia) and we have volunteers working on Wisconsin, Georgia and Oregon, among others. We’ve revised our volunteer documentation to make it easier to understand what we’re doing and how you can help.

In most states, getting county-level data isn’t too much of a problem; that data is usually available online, if not always in native electronic formats. County-level data is usually freely available as well, but we’ve always wanted to develop a resource that can offer precinct-level results where they are available. Here’s why: while counties can be homogenous, precincts are even more distinct and smaller political units and lend themselves to more sophisticated analysis. Candidates and their campaigns care about precinct results. Journalists and researchers should do the same.

Some states make precinct-level data available for free, which is a great service to the public. They include Louisiana, Maryland, Wyoming, West Virginia, Virginia, North Carolina and Florida, among others. Some states, like Pennsylvania, Colorado and Utah, charge a nominal fee for precinct results. But for other states, precinct results are only available county-by-county, and that takes both time and money. We’ve written about Oregon in the past, and we’d like to offer it as an example of the price of precinct results.

The bad news is that it’s not uniform, even within a state. In Oregon we’ve spent more than $1,000 to obtain precinct-level results covering elections 2000-2014, although in many cases we don’t have all of those years. Some counties literally don’t have results to give us before 2010. Crook County was unable to find precinct results for the 2010 general and 2012 primary elections, while a number of counties don’t have elections from before 2008. In other cases, price was a factor: we’ll only have precinct results for 2010-2014 from Tillamook County because the clerk there charged us $222.75 for results for those years. Lake County charges $50 an hour for pulling the results files and another $0.25 a page for copying them. We’ve yet to receive those results, so we don’t know what the final cost for Lake will be.

The good news is that when we do request election results that aren’t freely available online, we’re posting them on our Github site in state-specific repositories. That way other organizations or individuals won’t have to repeat our processes and/or pay for results that we’ve already gotten. We want you to use what we’ve gathered, whether that’s CSV files or original PDFs. That’s our holiday gift to you. We’ll be back at it in 2016, when there are more elections coming up, we hear.

For OpenElections volunteers coming to NICAR in Atlanta next month, we’ve got a challenge for you: help us tackle Georgia election results.

As we did last year in Baltimore, OpenElections will hold an event on Sunday, March 8, with the goal of writing scrapers and parsers to load 2000-2014 election results from the Peach State, and we’re looking for some help. It’s a great way to get familiar with the project and see what our processes are.

Georgia offers some different tasks, from scraping HTML results to using our Clarify library to parse XML from more recent elections. So we’re looking for people who have some familiarity with Python and election results, but we’re happy to help guide those new to the process, too. Thanks to our volunteers, we’ve already got a good record of where election result data is stored by the state.

Here’s how the process will work: we’ll start by reviewing the state of the data – what’s available for which elections – and then start working on a datasource file that connects that data to our system. After that, we’ll begin writing code to load results data, using other states as our models. As part of that process, we’ll pre-process HTML results into CSVs that we store on Github.

If you’re interested in helping out, there are two things to do: first, let us know by emailing openelections@gmail.com or on Twitter at @openelex. Second, take the time to setup the development environment on your laptop following the instructions here. We’re looking forward to seeing you in Atlanta!

By Derek Willis

OpenElections is nearly two years old, and we’re not nearly done yet. In most states we still have a lot of work to do.

As our initial funding from the Knight Foundation winds down, we wanted to provide an update on where the project is and our plans going forward. The first thing to know is this: OpenElections is here to stay. Our timetable has expanded, and we’re looking at other sources of money to boost our capacity to process election results data, but the work we’ve done so far and your interest in it has convinced us of the need.

When we started, Serdar and I had between us years of experience working with election results data in multiple formats. We both worked at news organizations that routinely dealt with different types of data and various election systems.

We still have been surprised by the diversity of results that we’ve found. States like Pennsylvania, North Carolina and Florida have consistent and reliable data across time. Other states, like Arkansas, Colorado and Washington, have different formats and systems depending on the year. Then there are states like Mississippi and New York, which have required significant investments of time and effort.

In practice, that has meant a lot of work within individual states in order to load and process data from 2000 onward. Those efforts have taken more time than we anticipated, for two reasons. First, we have found that states have switched the systems and software they use to publish election results, in some cases multiple times in the past 15 years. We have found some abstractions – we released a separate library to handle states that use Clarity’s software – but in many cases this meant writing several different custom parsers for a single state.

Second, machine readable data is not a universal standard, and for many states it is a recent addition to their practices. This isn’t a criticism as much as it is a statement of reality. Officials from nearly every state we’ve been in contact with have been helpful and even supportive of the project. But we’re also not too far removed from all-paper elections, either.

In response to these factors, we’ve made some adjustments. The main one is to publish “raw” results data from states even before we standardize offices, candidates and parties. We think having election results in a fairly consistent format across a number of years is pretty useful, so we’re not going to wait until everything is done to release that. This week we’ve published raw results in North Carolina, Florida, Pennsylvania and (for recent elections) Mississippi. You can download these from our site or clone them from GitHub depending on your needs. We’ll continue to follow that path as we work on standardization.

Along the way we’ve been very fortunate to have had contributions from volunteers, who both gathered information about the state of election results and also contributed code to the project. We can’t thank all of you enough for your interest and contributions. This would be a much longer road without them, and we hope that you’ll stay involved.

We’d also like to recognize the people who have lived this project with us for most of the past two years. Geoff Hing has been the main point of contact for web development volunteers and has written the bulk of the code that powers the results loading and data display portions of the project. Geoff began a new job at The Chicago Tribune this week, although he’ll still be involved with OpenElections as a volunteer. We’re extremely grateful for his efforts.

Many more of you have emailed with or spoken to Sara Schnadt, the project manager for OpenElections. She’ll be with us through the end of the year as we plan our next steps, and her organizational skills, creative thinking and ability to wrangle two co-founders living on separate coasts has made OpenElections possible.

Investigative Reporters & Editors, a source of training and inspiration for journalists for decades, has made things easy for us by handling the accounting and grant management tasks. Both Serdar and I are proud to be “graduates” of IRE, and we’re thankful for their support of OpenElections.

The goal of OpenElections – to provide access to machine-readable, standardized election results – remains the same as when we began. The path to reach that goal is now a lot clearer than it was two years ago, and with your help we’ve learned a lot about how to get there. We’ll keep moving forward, and invite you to stay involved.

 

 

Introducing Clarify

November 26, 2014

An Open Source Elections-Data URL Locator and Parser from OpenElections

By Geoff Hing and Derek Willis, for Knight-Mozilla OpenNews Source Learning

state_summary_page__KY

State election results are like snowflakes: each state—often each county—produces its own special website to share the vote totals. For a project like OpenElections, that involves having to find results data and figuring out how to extract it. In many cases, that means scraping.

But in our research into how election results are stored, we found that a handful of sites used a common vendor: Clarity Elections, which is owned by SOE Software. States that use Clarity genferally share a common look and features, including statewide summary results, voter turnout statistics, and a page linking to county-specific results.

The good news is that Clarity sites also include a “Reports” tab that has structured data downloads in several formats, including XML, XLS, andCSV. The results data are contained in .ZIP files, so they aren’t particularly large or unwieldy. But there’s a catch: the URLs aren’t easily predictable. Here’s a URL for a statewide page:

http://results.enr.clarityelections.com/KY/15261/30235/en/summary.html

The first numeric segment—15261 in this case—uniquely identifies this election, the 2010 primary in Kentucky. But the second numeric segment—30235—represents a subpage, and each county in Kentucky has a different one. Switch over to the page listing the county pages, and you get all the links. Sort of.

The county-specific links, which lead to pages that have structured results files at the precinct level, actually involve redirects, but those secondary numeric segments in the URLs aren’t resolved until we visit them. That means doing a lot of clicking and copying, or scraping. We chose the latter path, although that presents some difficulties as well. Using our time at OpenNews’ New York Code Convening in mid-November, we created a Python library called Clarify that provides access to those URLs containing structured election results data and parses the XML version of it. We’re already using it in OpenElections, and now we’re releasing it for others who work in states that use Clarity software.

See full piece on Source Learning

At ONA14 in Chicago in late September we unveiled the new OpenElections data download interface. We presented at the Knight Foundation’s Knight Village during their office hours for featured News Challenge projects, as well as during a lighting talk. OpenElections’ Geoff Hing and Sara Schnadt showed off their handiwork based on in-depth discussions and feedback from many data journos. The crowd at ONA was receptive, and the people we talked to were keen to start having access to the long awaited data from the first few states.

Screen Shot 2014-10-06 at 2.47.55 PM

As you can see from the data map view above, there are only three states that have data available so far. These are Maryland, West Virginia and Wyoming, for which you can download ‘raw’ data. For our purposes, this means that you can get official data at the most common results reporting levels, with the most frequently used fields identified but without any further standardization. We will have ‘raw’ data on all the states in the next few months, and will work on having fully cleaned and standardized data on all the states after this initial process is complete.

Screen Shot 2014-10-06 at 2.48.12 PM

As things progress, you will see updates to both the map view and the detailed data view where you can see the different reporting levels that have data ready for download so far.

Screen Shot 2014-10-06 at 4.30.19 PM

A pink download icon indicates available data, and a grey icon indicates that data exists for a particular race at a particular reporting level, but that we don’t yet have it online.

Screen Shot 2014-10-06 at 4.28.56 PM
The race selection tool at the top of the page includes a visualization that gives an overview of all the races in our timespan, and a slider for selecting a date range to review races in the download table. For states like Maryland (shown in the full page-view above), there are only two races every two years so this slider isn’t so crucial, but for states like Florida (directly above), this slider can be useful.

We encourage you to take the interface for a spin, and tell us what you think! And, if you would like to help us get more data into this interface faster, and you are fairly canny with Python, we would love to hear from you. You can learn more about what this would entail here.

Home Screenshot

As we get the first few states’ data processed and ready to release, we are building an interface to deliver it to you, and to show our progress as we go. The live site (above) now shows metadata work to date, and the volunteers involved. Soon you will be able to toggle between this view and a map of the current condition of results data (below). Clicking on each state will show you details on the most cleaned version of available data.

Data Map

The color coding on this new map will change as we get more states online with ‘raw’ results, and fully cleaned data (what you see here is hypothetical and just to illustrate how the map will work). When we say ‘raw’, we really mean results that reflect the data provided by state elections officials. These results are only available at the reporting levels provided by the states and fields like party and candidate names have not been standardized.  These results do have standardized field names, so you will be able to more easily load and analyze the data across states. We will get as many states fully cleaned as we can, but our baseline goal for this year is to wire up and make available most of the data in a ‘raw’ state.

As we build the data interface, we would love to know what you think. Is the terminology we are using clear to you? Is the interaction clear? Is there anything else you would like to see here?

Download Page

If you click on the ‘Detailed Data’ link on the data map page, you will get to this download page showing all the races for the state you have chosen. You can download results at a variety of reporting levels, depending on what is available for this state. We will include rows in this view for all processed data (clean and raw) as well as any races we haven’t processed yet, just so that you know they exist.

Above the download table there is a slider that both gives you an overview of all the races available for a state, and a way to select just a specific date range for which to browse detailed results. You can filter results by race type – such as President, Governor, State Legislative races, etc.. If there are any other ways that you need to access the data, or if anything about this interface could be clearer, please let us know!


We will be building out a preliminary version of this interface in the next couple of weeks, and will revise it further based on what we hear from you.  

To tell us what you think, comment on interface elements here, or email us at openelections@gmail.com.

Screen Shot 2014-06-05 at 7.31.32 PM

OpenElections represented at this year’s Transparency Camp, a national conference for civic hackers who work to make political process and government data more, well, transparent. This is a growing and very dynamic un-conference and the session topics ranged from ‘Why the internet hasn’t changed politics’  to ‘Interoperable Civic Data — for user-centric technology’. There were many journalists in attendance, as well as political scientists, policy makers, and technologists working within and in support of government. The atmosphere was palpably optimistic, as the general ethos of the crowd was that ‘we are all here to affect positive change’.

Screen Shot 2014-06-05 at 7.32.20 PM

There were many international civic tech folks and journalists in attendance too, who were especially interested to observe how the US deals with the issue of advancing it’s government transparency since the  impact of this is felt all over the world. TCamp is becoming more international each year.

Screen Shot 2014-06-05 at 7.31.03 PM

The conference was also very technical. OpenElections team members Derek Willis and Sara Schnadt spoke to a room full of hackers particularly attuned to the nuances of elections processes and aware of existing results infrastructures and their limitations. Derek walked through the process of acquiring a data source for a state, writing a scraper, and made the case for joining our effort. There were many thoughtful questions and a lively broader discussion about how to best create technologies to facilitate democratic process. The discussion continued and got even more down to the nitty gritty in a later session bringing together representatives from OpenElections, Voting Information Project, Google Civic Innovation, the Sunlight Foundation, and others, to tease out the problem of defining open data identifiers in an open and non-hierarchical ecosystem of technology projects.

Screen Shot 2014-06-05 at 7.29.37 PM

That weekend, as you heard from us leading up to it, was also National Day of Civic Hacking, and TCamp was one of over 100 events taking place around the country. We camped out and hacked in the main room at the conference a good bit (as did our teammates in Chicago and the Bay Area), ramping up new developer volunteers who were joining in from TCamp and from events in other parts of the country. A big thank you to everyone who joined us over the weekend, and great to meet all of you who came on board at TCamp!

Screen Shot 2014-05-26 at 1.10.01 PM

As part of National Day of Civic Hacking, we are organizing an OpenElections challenge for the hacking events at locations all over the country – Sat May 31 and Sun June 1st.

If you are attending one of these events near you, and would like to join in on our effort to write scrapers for elections results, let us know!

Write Scrapers for us…
Help us extend our core scraper architecture to create a series of custom scrapers that account for the idiosyncrasies in how each state structures data, stores it, and makes it available.

**Our docs for this process are now up on our site. Look here to see what would be involved with joining in**

Your time and expertise would be most appreciated either day. Also, feel free to join in from home.

If you would like to help out, email sschnadt.projects@gmail.com either or tweet at us @OpenElex either before the event or on the day. Our team will be online and available to get you set up.

Thank you!

The OpenElections Team

Interview with TurboVote Co-Founder Kathryn Peters

kathryn-peters-056a70501d0122524c48c2be9d6a0d97

In this series of interviews, OpenElections has conversations with the leadership of other initiatives that are improving data transparency, easing the voting process and applying new technologies to elections.

For our first piece we talk to Kathryn Peters, co-founder of TurboVote, our sister Knight News Challenge: Data project. TurboVote is a service that aims to make it as easy to vote, and keep track of all the elections you can participate in, as it is to do all the other things we now do online.

***

OE: How did the TurboVote project and Democracy Works Inc. come about, and what were your motivations for starting them?

KP: Seth [Flaxman, TurboVote co-founder] spent a summer in college registering voters in Philadelphia with a sandwich board and a stack of paper forms, and recognized that there had to be a better way to reach would-be voters than standing on street corners. When he finished his first semester of grad school and realized he’d missed a local election back home, that same realization struck him again – voting should fit the way we live. We live online, on our phones, with services and applications that help organize our lives and simplify daily tasks.

Seth asked my advice in building an election-reminder service. My first response was incredulity. I’m from Columbia, MO, where the county clerk Wendy Noren builds her own voter engagement tools and has sent email reminders about upcoming elections for a decade already. I just assumed that these were normal voter services. Once Seth convinced me that Wendy’s online voter services were rare, it made perfect sense to try and make them available to every voter. So we started prototyping.

OE: What background(s) do you bring to this work?

KP: Seth and I met in a graduate policy program, so we’re both deeply committed to innovating with and for government–in this case, local election administrators–which sets us apart from most of the tech startups we know. Seth’s previous work had been as a researcher (at the Council on Foreign Relations), and he approached graduate school with a big research question: why does the Internet seem to be passing government by? I had worked in both political organizing and information management, but was studying international affairs and thinking about how we promote and support democratic processes abroad. Those two concerns came together in a really fantastic way, even if it means I’m in Brooklyn instead of, say, Cairo right now.

OE: Can you describe how TurboVote impacts an individual voter?

KP: It depends a lot on the voter, where they are and what they need. But let’s imagine a college freshman, who arrives on campus and is offered the opportunity to register to vote during orientation, and decides to register at her parents’ home in another state. As she signs up, we’ll also get her on-campus address, and ask if she’ll need to vote by mail in elections back home. So after she joins TurboVote, we’ll send her a voter registration form filled out with her information with an addressed, stamped envelope so she can return it to her election administrator. And then as an election comes up, we’ll send her an email reminder and mail her an absentee ballot request form, again with a stamped envelope so all she has to do is sign it and send it in. And then we’ll send her reminders about the deadline to submit those forms so she gets everything in the mail on time. And election after election, she’ll hear from us and have whatever forms and information she needs to take part, even in local elections she might not hear about living on a college campus the next state over, for example.

We designed a simplified flow chart to try and simplify all the many ways we serve different voters.

process_flow_chart

OE: TurboVote is one of three projects currently in your roster. How has your work expanded and further defined itself this year?

KP: TurboVote’s growth in 2012 demonstrated how much demand there is for voting information and services, but the only way to do this sustainably is if government eventually adopts it and takes on these new tools for voter outreach. To that end, we spent 2013 researching local election administrations across the country, spending six weeks shadowing offices across six states and learning about their work, their staff, the tech they’re using, their needs and motivations. We found dedicated innovators making incremental improvements at every election in pursuit of better elections for their voters. And we found dozens of ideas worth building or popularizing that could help them run elections better, more simply.

From that research, we started building Ballot Scout, which makes it easy to add Intelligent Mail barcodes to absentee ballot envelopes and trace them through the postal system. Right now, most election officials send out their absentee ballots, get some of them back, and have no way of knowing if the others went undelivered, or weren’t cast, or are delayed in a postal processing facility and will arrive three days after the election. Barcode tracking gives officials better insight into what happens to those ballots as they leave the election office, and the ability to intervene if anything goes wrong. We’re working with seven counties from Oregon to Florida to test Ballot Scout this fall (and we’re still looking for three more counties to join the beta).

And last summer, the Pew Charitable Trusts asked us if we’d consider taking on data and technology support for the Voting Information Project. It’s the biggest election dataset in the country, providing tens of millions of Americans with polling place information each cycle, and we were eager to help build out its permanent infrastructure for data collection and processing. It’s also connected us to state election officials and let us get to know their work and needs, as well as those of the counties we’d been working with previously.

OE: What is your business model, and how does it inform your effectiveness?

KP: We’re a 501(c)(3) nonprofit, currently funded through grants from the Knight Foundation, Democracy Fund, and Google, among many others. TurboVote operates on a partnership/fee model, where each of our partner organizations contributes a small amount toward our operating costs, and we’re developing a pricing model for Ballot Scout that will do the same for that service. As we continue to grow and add new partners, these revenues should bring us to fiscal sustainability by 2017, ensuring that we can continue our work without major donations.

OE: How does Democracy Works fit within the ecosystem of voting infrastructure projects going on now? Are there other best practices you are aware of?

KP: Great question. The ecosystem is somewhat ad hoc, but we’ve used research by Dana Chisnell and Whitney Quesenbery at Civic Design for information on what  voters are looking for and how they interact with election data online, and we’re currently collaborating with ELECTricity on a project to offer free website templates to local election offices that takes the Civic Design best practices and implements them by default. We pool our election research with Long Distance Voter, whose forms we use in states that don’t otherwise provide a ballot request form, for example, and we compare deadlines, election administrator addresses, and other data where we can help check and support each others’ work.

We’re also participating in the third-annual National Voter Registration Day, which brings together civic organizations like the League of Women Voters, the Bus Federation, and Voto Latino to celebrate voting and engage new voters across the country.

I’m also keeping an eye on projects in both Los Angeles County, CA and Travis County, TX, where election administrators have recruited designers, computer scientists, academics and citizens to reimagine voting machines. Both are designing their projects to be open-source and available to other jurisdictions, and I think it’s a fantastic model for the kind of collaboration I’d like to see become even more popular in this space.

OE: What do you think of the recent Presidential Commission on Elections Administration and it’s findings? Will it affect how your work is rolled out?

KP: I’m a big fan of the report! The Presidential Commission on Election Administration issued a practical list of recommendations–and accompanying tools–that can help election officials run better elections. They think postal ballot-tracking is a great idea, too, so I may be a little bit biased.

OE: Ideally, what kinds of organizations and systems would come together to make a robust, transparent and cost-effective elections infrastructure?

KP: I think the collaborations in Travis and Los Angeles counties have the right mix – administrators, technologists, designers, and ordinary voters – and that it’s mostly a question of how we scale that and build communications among election innovators so good ideas can really take root and spread nationally.

Kathryn Peters is a co-founder of TurboVote. Her belief in better democracy has taken her from campaign organizing in rural Missouri to a Master’s in Public Policy at the Kennedy School of Government to political rights monitoring in Afghanistan. Katy has also worked for the information management team for the United Nations Department of Safety and Security and the National Democratic Institute’s Information and Communications Technology staff. In 2011, she was honored as one of Forbes magazine’s “30 Under 30” in the field of law and policy.

 

NICAR 14 Hackathon

By Sara Schnadt

This year’s NICAR conference was an especially great experience for me. Having spent the past year working remotely with volunteers around the country to develop the groundwork for the OpenElections project, I met so many volunteers in person, featured them in our project update session, and worked alongside them at our day-long hackathon on the last day of the conference, and this made working on the project so much more meaningful.

From meeting Sandra Fish by having her pass by me in the throngs of in-between-session milling to excitedly hand off a CD of Colorado results data, to Ed Borasky telling me over our computers at the hackathon that there is a large and close-knit network of journalists in his local Portland area who would be very supportive of our work, to noticing after many hours of working together that our own Derek Willis and Nolan Hicks have very similar senses of humor, NICAR was a great and constructive convergence of OpenElections supporters.

We were also very pleased to have new volunteers join us for the hackathon, including the likes of NPR’s Jeremy Bowers, and the Chicago hacker Nick Bennett helping with scraper writing and data processing. Bloomberg designer Chloe Whiteaker and civic dev extraordinaire Margie Roswell also blithely drafted us a new public-facing site in a matter of hours. And then there was Bloomberg visualization dev Julian Burgess, who spent most of the day with us, at first trying his hand at learning python just so he could pitch in, then giving an in-depth assessment of our interface and data acquisition strategies. I am new to digital journalism as of this past year, and I have to say I am very taken by the generosity, talent, and character of the people in this space.

More than anything else, meeting all these great folks in person brought home just how important it is to digital journalists to create new civic infrastructure where it doesn’t already exist, and to see how invested you all are in seeing this project succeed. During our session ‘OpenElections, a year in review’, in addition to a detailed update on our progress gathering metadata with our small army of volunteers, and defining a core results data scraper spec, there were spirited discussions about the technical nuances and interesting challenges of our system architecture. These challenges are inherent in taking a motley and wildly varied collection of individual states’ election results archiving methods and creating a new, clean, systematic, national infrastructure. The interest and investment were palpable in the room.

From all of this, it was clear that we are on the right track, and we left with new motivation, support, perspective, talent and stamina to bring the project home in our second year!