The Open Data Policy calls for us to release high-value datasets to the public. High-value is a subjective term, highly relative to the standpoint of the person making the judgment. Therefore, to have a solid determination of high-value, we need to understand who it is that may be making the judgment:
Components of Prioritization
In our thinking about Open Data, we identified several high-level personas that would naturally care about what is released. These personas will be part of our evaluation on what constitutes datasets that are scheduled for release.
In order to prioritize, we plan to measure a dataset on the following overarching factors:
The value of a dataset is gauged by demand from various stakeholders; whether this dataset will increase transparency, accountability or internal efficiency; or create economic opportunity.
The City will place the highest priority on protecting data that exposes the City to security risks or the public's private information from release.
Low-quality data (i.e. missing fields, erroneous entry, manually updated) could receive a lower prioritization, since it may need some extra work.
Readiness gauges the amount of work required to convert the data to an open format, and whether data are already routinely published.
Based on the preceding components of prioritization, we have derived an initial prioritization matrix that will be revised as necessary following a legal and technical review:
|Field Name||Field Description||Prioritization Category|
|Mayor Demand / Council Demand||Is there demand from the Mayor / Council for this dataset?||Value|
|Interdepartmental demand||Can releasing these data positively influence workflows / performance across City silos?||Value|
|Departmental demand||Does the department desire that these data be released?||Value|
|Dataset included in Open Data Census||The Open Data Census contains some of the more highly requested datasets across the country and is a good indicator for demand (http://sdgo.io/1IzdOom)||Value|
|There is an application built ready to use these data.||If there is an application built on sample data that we can deliver to residents or other users quickly and provide impact.||Value|
|There is a known, constant stream of PRAs for these data||If releasing these data will alleviate some PRA work from departments.||Value|
|Resident Demand||Percentage of surveyed residents who want these data released||Value|
|# Of Defined Personas Affected Positively||Based on our persona definitions, which of the personas are likely to use these data? (http://sdgo.io/1HeIOiK)||Value|
|Is this data already being published?||Are these data already being published, but not in a central, organized location and in open format?||Readiness|
|Coordinator Value Assignment||Did the Coordinator mark these data as high / med or low priority||Value|
|Data Sensitivity Assignment||Are these data public, protected or sensitive||Security|
|Data Quality Concerns||Are there concerns about data quality?||Quality|
|Data governance structure||Does a minimum viable level of data governance structure exist for this dataset?||Quality|
|Data frequency of change||If the data get updated often, releasing it without ETL will render it irrelevant, and building ETLs may require investment of time and money.||Quality|
|ETL Required?||A 3-level indicator of whether ETL is required - 1=yes, 2=yes,but can delay, 3 = no||Quality|
|Do this data contain potential PII (Personally Identifiable Information), or PCI (Payment Card Industry) information?||If these data contains PII or PCI information, they will need special handling making it harder to release||Security|
|Do these data contain information detrimental to the City's security if released or information that is business sensitive?||If these data contain such information, they will need special handling making it harder to release||Security|
|Do these data contain information that is public but only under specific terms?||If these data contain such information, they will need special handling making it harder to release||Security|
|Data extraction complexity||If it's hard to extract data from a given system, it may result in a lower prioritization||Readiness|
|Metadata Availability||Metadata Availability||Readiness|
|Do these data support a performance indicator?||If these data support a published performance indicator for the department, that will cause a higher level of prioritization||Value|