Back to top

Analyze Boston puts Boston’s Data in the public domain

May 25, 2017

Analytics Team

Published by:

Analytics Team

In our journey to re-imagine open data for the City of Boston, we took all aspects of the experience into account.

The Analyze Boston team left no stone unturned as we set out to democratize access to the City’s data. We met with open data enthusiasts, spoke with average residents at bus stops, and hosted pop-ups at local libraries to understand how Boston perceives open data.

As part of this effort, we hosted summits to communicate the value of open data to the City’s new Data Coordinators. The first Data Coordinator Summit took place at the Boston Public Library in March 2016. The summits helped us realize the importance of effective licensing to broaden the value of open data in Boston.

One of our first presenters was Jake, from a mapping company in Boston called Mapkin. Jake's company leverages open data to provide hyper-localized directions to users. His demo and presentation intrigued the audience, but it was his closing remarks that stuck with us. Jake asked the City to clearly designate license terms for data it publishes. If they did this, software companies like his could easily use and build software with open data. As Jake explained, the worst case scenario is to build a civic tech product that cannot be used due to restrictive or incompatible licenses.

Setting goals

We took this feedback to heart. We assessed the City’s earlier licensing efforts and set goals for Boston's next generation open data platform. We wanted to set data licensing terms that were:

  • consistent across most or all of the datasets available
  • clear to users how they can use the data.
  • interoperable with other common data licenses, and
  • open to broad use with minimal or no restrictions.

After a period of review, we realized our existing open data licenses didn't meet our new goals. We published datasets with varying license terms. Sometimes, datasets lacked licensing information altogether, creating confusion for our users. We knew we needed a new approach. Thankfully, we were able to enlist the aid of Harvard Law School’s Cyberlaw Clinic. Students from the clinic helped us evaluate options and figure out our next steps. Over the fall of 2016, we settled on the Open Data Commons Public Domain Dedication and License (PDDL). We believe this license best achieves the goals we set forth.

Is public data copyrighted?

Before we could choose a new license, we had to settle a foundational legal question:

Does the City of Boston have copyrights over its data?

This remains an open question for most governments. In Massachusetts, the Secretary of the Commonwealth instructs that “records created by Massachusetts government agencies [...] are not copyrighted and are available for public use.” It is unclear whether this opinion extends to sub-state entities like the City of Boston. Massachusetts Public Records Law further restricts the City’s rights in datasets and its ability to place terms on the use of its datasets. Moreover, datasets are made of facts, and facts are not copyrightable. Extensive research did not reveal any Massachusetts case law on the issue. A California court ruling concluded that public records laws mean you can't copyright public records. The same ruling said that forcing people to agree to terms is incompatible to the purpose of public records laws: promoting government transparency.

Choosing a license

It did not appear that Boston had any clear right to place restrictions on the data, and we also wanted to drive innovation by putting up as few barriers as possible. That meant we needed a license that would make the data as open as possible. Our initial Open Data Policy had required users to credit us if they used our data, but we later realized that getting people to use our data is more important. As DataSF explained when they selected PDDL for their open data, “If you note us as a source, that’s awesome, but gosh, don’t mess up your UI doing it.” In other words: we appreciate getting credit, but would rather go without it than get in users' way.

We identified the following two licenses as suitable options:

Open data programs across the country already use both licenses. They are easy to interpret, and can be used in conjunction with other datasets and licenses. We selected PDDL for two reasons:

  1. Most of the open data portals we surveyed use PDDL.
  2. PDDL is designed specifically for open data, whereas CC0 is mostly used for software (for example, the code that runs both boston.gov and data.boston.gov).

The PDDL license is now the default for all datasets published by the City on Analyze Boston.

Moving forward

We hope that this will foster a vibrant community of open data in Boston. To do this, we want to make it easy for our data to be used by both companies and individuals without worrying about the legal fine print, and  we believe the choice to switch to public domain licensing will help achieve that goal.

If you have any feedback about our data licensing or anything else, let us know via this simple form.

This post was written by Ben Green. Ben is a Data Analytics Fellow for the City of Boston’s Analytics Team. He is also a PhD Candidate in Applied Math at the Harvard School of Engineering and Applied Sciences and a Fellow at the Berkman Klein Center for Internet and Society, where he studies the intersections of data science with law, policy, and social science.