-
Notifications
You must be signed in to change notification settings - Fork 13
Findings from Other Agencies
- No data warehouse currently, but they are working with consultants to build something in the next 2 years
- They have a single-purpose MS SQL Server for an ESRI Geodatabase -- used only for travel model networks. Staff are afraid of it, just a couple "experts" on staff know how it works. It's a silo. Most analysts work around it or ignore it.
- They use both Tableau and R/Shiny and for data dashboards, but there is no underlying central database. They import model outputs into the systems manually (well, using scripts). Examples:
- Equity metrics for travel model runs: https://public.tableau.com/profile/psrc.data#!/vizhome/Sonar-Equity-New/Equity
- Land use summaries: https://public.tableau.com/profile/psrc.data#!/vizhome/Sonar_0/RGCsandMICs
- They also use R/Shiny for Urbansim model outputs but those are behind the firewall.
- Huge proponents of the OpenGeo/GeoServer/PostgreSQL open source GIS stack.
- They put everything in one giant database -- even accounting, finance, HR/payroll, APC data, AVL data... EVERYTHING. And they say that it reaps huge rewards, because they can connect things like which bus drivers are on which routes and connect that to performance data. Yowza!!
- But they don't publish much. No data warehouse portal, just a list of available shapefiles and developer APIs here
- ESRI geodatabase and MS SQL Server stack on the back end
- ArcGIS Online used for webmapping and warehouse front-end
- A bit out of scope, but related: they have excellent developer APIs.
- Official transportation data archive for the Portland-Vancouver Metropolitan region (http://portal.its.pdx.edu/)
- Has a Microsoft SQL-Server system; built in-house.
- Apparently a "database savant" designed their data schema, and they are now trying to untangle it for regular users?
- Data table structure may be a useful starting point -- but in my opinion it is clearly overengineered. Simple is better -- for performance reasons, you don't want to overdo multiple "joins".
- City demographics and infrastructure; SWITRS data
http://transbasesf.org/transbase/
- Transit network and accessibility data
http://revision.lewis.ucla.edu/
Philly just announced a "data store" migration from Socrata over to Carto. I emailed their tech lead and he is crazy excited about Carto.
In our open data stack, the "data store" is the database where the data is hosted. It generates links to download the data in multiple formats, and an API to integrate the data into applications. These links are added to OpenDataPhilly, the public catalog. The new data store will also provide links, which will be added to OpenDataPhilly in the same way.
We initially used GitHub as the "data store," and, in 2014, we switched to Socrata. It helped us achieve our goals of providing consistent formats and APIs, and broadening the audience of open data by providing interactive visualizations alongside the raw data.
But a huge amount of our data is geospatial in nature, and advanced geospatial analysis requires the data be hosted in a geospatial database. So we had to set up a parallel data store for geospatial data: Esri's Open Data product on ArcGIS Online. This got the job done, but required us to setup and maintain two parallel data publishing processes.
Both platforms have served us well, but each come with their own limitations, and maintaining two processes creates inefficiencies. So we've decided to consolidate our open data publishing on a single data store: Carto (formerly known as CartoDB). It's an open source platform that's essentially a PostgreSQL database with an API and user interface on top. It makes syncing data very easy on our side, supports advanced geospatial queries (via PostGIS), and offers some very impressive analysis tools.
Datasets that are currently hosted on Socrata (the ones with data.phila.gov in the URL) have been migrated to Carto, and will no longer be available in Socrata (starting next week). We will migrate the ones hosted in Esri Open Data in a later phase, over a period of time.
Carto provides a powerful API that lets you query each dataset as if it were a PostgreSQL table (it is), and download it in many formats. It's even more powerful than what Socrata provided, but it uses a slightly different format.
For those of you who have applications or scripts integrated against specific Socrata APIs, this would normally mean you'd need to modify them to work with the equivalent Carto APIs. However, to ease this transition, we've built an open source reverse proxy to provide backwards compatibility. It sits at data.phila.gov and translates Socrata-style API requests into Carto API requests and points them to the corresponding Carto dataset.
We still recommend that you modify your integrations to point to the new Carto datasets, but the reverse proxy makes it less urgent.
One of our favorite features Socrata provides in DataLens because it broadens the audience for open data by allowing users to drill down into data and answer questions without having to download a giant spreadsheet. Unfortunately, these pages will be deactivated as we switch to Carto. However, we've built an open source tool called VizWit that provides most of the same functionality and works with multiple providers - including Carto. While it may not have all the polish of DataLens, we hope it will continue to make our open datasets more accessible, and we're excited at the prospect of community contributions since it's open source.
- The datasets will still be available in multiple download formats (even more, actually)
- The datasets will still provide a powerful API
- You'll still access all the datasets via OpenDataPhilly
- We'll still accompany data releases with interactive visualizations
All the datasets have been migrated to Carto and are available under the data-test.phila.gov domain. Next week we'll point data.phila.gov there as well, and update all the links in OpenDataPhilly.
Get Started
- Back-End Setup
- Setting up your development environment
- Building Prospector locally
- Publishing your changes
Other Useful Links
- Recipes for typical tasks
- Glossary
- Publishing instructions for CMP standalone site
- Deploying a new release GitHub Pages and Prospector
Platform Considerations
Background Information