wiki:Data

Overview

Each OpenBlock deployment requires data sets that are largely unique depending on the geographic region being served.

Street Address Data

To perform geocoding and provide a means of navigating at a block level, OpenBlock requires a database of the streets in your area. This must include each street's address ranges for individual street segments. If you live in the U.S.A. and your city hasn't had much new development since the year 2000, the U.S. Census' TIGER/Line file (  http://www.census.gov/geo/www/tiger/ ) may be sufficient. Importers for ESRI Shapefiles with appropriate attributes also exist (TeleAtlas street data?)

Some cities like Portland and Washington DC have opened up their address data on  http://openaddresses.org

TODO: document how to load them

Geographic Region Shapes

You need to add some named regions of interest to your database and provide their geographic boundaries. Usually, these regions are defined by providing a shapefile.

Typical examples:

  • neighborhoods in your area
  • local zip codes

Some places to find boundaries:

TODO: document how to load them, where shapes might be found

News and Other Local Data

You'll need to find some online data to feed to your system, and write (or modify) some "scraper" scripts to fetch that data and load it. What sort of data? Anything is potentially usable as long as each item has, at minimum:

  • a date
  • a location (eg. geographic point or street address)
  • a description and/or title

If it doesn't have both a date and a location, it won't be very useful in OpenBlock, since that would violate one of the two key buzzwords ("hyperlocal" and "news").

Examples of Data

  • local news / blog feeds
  • public meeting minutes
  • public events calendar
  • crime reports
  • building permits
  • health inspections

Recommendations for getting data

  • Find out if there are license restrictions on using the data
  • Find out if there are already feeds (RSS, Atom) you can use. Feeds are easy to parse and there is existing infrastructure to handle them.
  • Find out there are dates in the data.
  • Find out if the data has already has been geotagged with a latitude, longitude. If so, ideally it would be in the feed in a standard format like GeoRSS.
  • If there are no existing feeds, and it's government data, it's time to start making phone calls and persuading your local officials to release their publicly-owned information.  http://resource.org/8_principles.html has some good recommendations - emphasize the "machine readable" part.
  • If you can't get a feed, but the data's on the web in some form, there is scraper infrastructure to handle other formats (HTML, CSV, PDF, ...) but you'll have to do some more scripting. Scraping HTML or PDF should be seen as a last resort, since any future visual redesign is likely to break your scraper, but sometimes that's all you can get.

For more information, see Ideal Feed Formats.

Importing data once you find it

You'll need some scraper scripts.

Resources

Many communities already have local initiatives cataloging freely available data