Introduce Mapbox
Building Conflation
Building Visualization
https://mapbox.com/community/
What do we do?
Connect
Respond to incoming inquiries,
attend events, and reach out to
target partners.
Assess
Identify good-fit projects
and scope their needs.
Match
Connect projects to the
support they need and
facilitate the relationship.
Our Tools & Data
Our Operations
Our Community
Partnerships
Share
Evaluate, assess impact,
promote, celebrate
Build with empathy
Visualization
visualization
Telemetry
Passive telemetry
Telemetry Deep Dive
telemetry: 250 million miles every day
Analysis
Imagery
VisionSDK
VisionSDK
VisionSDK
Logistics
Logistics
Logistics
Community Generated Data
Building Conflation
R&D Goal:
find most efficient way to iterate quickly to
best of both OSM and Microsoft buildings
In the US
Microsoft: ~125 million
OSM: ~25 million
(~300 million globally)
Failed Approaches
●PostGIS: query timed out
●PySpark w/ shapely lib: logic worked, but ~7 days
●GeoSpark: limited functionality, no performance
gain
STARK
A framework for Spatio-Temporal Data Analytics on Spark
https://github.com/dbis-ilm/stark
●Binary Space Partitioning
●RTree Indexing
●Spatial Join on Intersects
Data Preparation
●Query internal validated OSM data store (in
dynamodb); cost based binary space partitioning
into in 256 partitions
●Pull down Microsoft building data; and partition into
the same
●r-tree index of each partition
●Pair are then distributed through spark cluster.
Uniform partitions result in more efficient operation
Operation
●For each Microsoft building polygon
○Drop if bad geometry, or in manual maintained list
of non-buildings
○If Microsoft building intersects OSM, choose OSM
(OSM data assumed better). Otherwise Microsoft
●Apply unique id
○O-{osm_id}-{osm_part_id}
○M-{incrementing_integer_id}
Output
●take resulting output, and renders
GeoJSON to s3
●push to tippecanoe, generate mbtiles
STARK
USA: 1 Hour 35 minutes -> 125M Microsoft
Buildings + 26M OSM Buildings -> $7 Cost
2 hours
Process Evaluation
●Pan and scan samples
●All buildings present from OSM
●Buildings present on tile edges
●Attributes present
●Buildings are extrudable
Quality Evaluation
●50 randomly selected Z17 tiles in 65 US metropolitan statistical areas
(MSA)
●Pan and Scan Samples against imagery
○Check for false positives and missing buildings
○Shape approximate shape of representative buildings
○Logical placement of buildings (i.e., buildings not on Roadways)
●Score quality on 1 (High) -> 5 (Low)
Conclusion: MSFT pretty good, some areas better than
others
Building Visualization
VisionSDK
VisionSDK
VisionSDK
VisionSDK
VisionSDK
VisionSDK
VisionSDK
OECD Working Party on Territorial Indicators Workshop
Modernising statistical systems for better data on regions and cities
Open Buildings: Visualization and
Conflation at Scale