Building a Spatial Graph Pipeline
This tutorial demonstrates an end-to-end workflow for urban function inference, where the goal is to classify Points of Interest (POIs) based on their spatial context. The pipeline leverages city2graph to bridge geospatial data processing with graph-based machine learning. The process begins by collecting real-world POI and street network data from OpenStreetMap (OSM) via OSMnx. To ensure reproducibility and robustness, the workflow includes a synthetic data fallback that generates clustered POIs if live OSM data is unavailable.
Feature Engineering and Graph Construction
Spatial features are engineered by calculating local POI density and proximity to the nearest street segments. The core of the spatial analysis involves constructing various proximity graph families to represent urban structure, including:
- K-Nearest Neighbors (KNN)
- Delaunay Triangulation
- Gabriel Graphs
- Relative Neighborhood Graphs (RNG)
- Euclidean Minimum Spanning Trees (EMST)
- Waxman Graphs
These topologies are compared to evaluate how different connectivity strategies capture urban relationships. The data is then converted into PyTorch Geometric formats, supporting both homogeneous graphs (for standard classification) and heterogeneous graphs (to model relationships between different urban function categories).
Model Training and Inference
For classification, the tutorial implements a two-layer GraphSAGE model. The model learns node representations by aggregating features from local graph neighborhoods. The training process uses a 60/20/20 split for training, validation, and testing. Performance is evaluated using accuracy and macro-F1 scores. Finally, the learned embeddings are visualized using PCA, and predictions are mapped back to geographic space, providing a clear view of how the model interprets urban functions based on spatial structure.