The Shortest Path In San Francisco dataset is a synthetic dataset of shortest paths around San Francisco. These shortest paths have been calculated using random start and end points, weight maps derived from distance and from travel time, and random small polygonal obstructions which led individual paths avoid certain small regions. It contains 20,242 trajectories in total containing about 5 million points.
It has been created for the ACM SIGSPATIAL GIS Cup 2017 on Range Queries under Fréchet distance and is given as a set of trajectories in Global Web Mercator (EPSG:3857). It is based on the ACM SIGSPATIAL GIS Cup submission winning the ACM SIGSPATIAL 2015 competition (Werner, 2015) in which shortest paths under polygonal constraints can be extracted.
The data is derived from OpenStreetMap and we republish the derived data under identical terms, that is following the Open Data Commons Database License (ODbl). For details, see https://www.openstreetmap.org/copyright
When you use it in your scientific works, we encourage you to
To get you started, the following python snippet creates a list of all trajectories formatted as individual numpy arrays. Therefore, it parses the TGZ file. It is rather slow and can only be used when you are importing into your envisaged temporary work format.
import urllib
from os.path import isfile
import tarfile
import numpy as np;
from tqdm import tqdm;
from matplotlib import pyplot as plt;
if __name__=="__main__":
print("Checking if data exists")
if not isfile('shortest-sf.tgz'):
print("Downloading... ")
urllib.urlretrieve ("https://www.martinwerner.de/files/shortest-sf.tgz", "shortest-sf.tgz")
else:
print("Found local file")
#unzip all files
dataset = tarfile.open('shortest-sf.tgz')
loa = list()
for f in tqdm(dataset.getmembers()):
if f.isfile():
f_trajectory = dataset.extractfile(f)
m = np.loadtxt(f_trajectory, skiprows=1)
loa = loa + list(m) # add to list of arrays