This vignette is an introduction to the photon package, an interface to the photon geocoder developed by komoot. Photon is open-source, based on OpenStreetMap data, and powered by the ElasticSearch search engine. It is – according to komoot – fast, scalable, multilingual, typo-tolerant, and up-to-date. Photon can do unstructured geocoding, reverse geocoding, and (under special circumstances) structured geocoding. Komoot offers a public photon API (https://photon.komoot.io/) but you can also set up a photon instance on a local machine.
photon supports both online and offline geocoding.
Online geocoding through komoots public API is intriguing because it is
convenient and offers up-to-date global coverage. It is appropriately
easy to use online geocoding in photon. First, it is
necessary to tell R that you want to use the public API. This can be
done using the workhorse function new_photon()
. To set up
online geocoding, simply call it without parameters:
new_photon()
#> <photon>
#> Type : remote
#> Server : https://photon.komoot.io/
The created photon
object is attached to the session and
does not have to be stored manually. Now you can geocode.
cities1 <- geocode(c("Sanaa", "Caracas"), osm_tag = ":city")
cities1
#> Simple feature collection with 2 features and 12 fields
#> Geometry type: POINT
#> Dimension: XY
#> Bounding box: xmin: -66.9146 ymin: 10.50609 xmax: 44.20588 ymax: 15.35386
#> Geodetic CRS: WGS 84
#> # A tibble: 2 × 13
#> idx osm_type osm_id country osm_key countrycode osm_value name county state type extent geometry
#> <int> <chr> <int> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <list> <POINT [°]>
#> 1 1 N 2.58e8 Yemen place YE city Sana… At Ta… Aman… dist… <lgl> (44.20588 15.35386)
#> 2 2 R 1.12e7 Venezu… place VE city Cara… Munic… Capi… city <dbl> (-66.9146 10.50609)
Similarly, you can also reverse geocode. photon fully
supports sf
objects so that all geocoding functions return
sf
dataframes and reverse()
accepts
sf
and sfc
objects as input.
cities2 <- reverse(cities1, osm_tag = ":city")
cities2
#> Simple feature collection with 2 features and 12 fields
#> Geometry type: POINT
#> Dimension: XY
#> Bounding box: xmin: -66.9146 ymin: 10.50609 xmax: 44.20588 ymax: 15.35386
#> Geodetic CRS: WGS 84
#> # A tibble: 2 × 13
#> idx osm_type osm_id country osm_key countrycode osm_value name county state type extent geometry
#> <int> <chr> <int> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <list> <POINT [°]>
#> 1 1 N 2.58e8 Yemen place YE city Sana… At Ta… Aman… dist… <lgl> (44.20588 15.35386)
#> 2 2 R 1.12e7 Venezu… place VE city Cara… Munic… Capi… city <dbl> (-66.9146 10.50609)
all.equal(cities1, cities2)
#> [1] TRUE
Online geocoding is nice and it is most likely what you need for basic tasks. But what if online geocoding is not enough? What if you need to geocode a dataset of 200,000 places? What if you need to geocode sensitive information from survey respondents? And what about structured geocoding?
Offline geocoding
The photon backend is freely available on the photon GitHub repository. With it, you can set up a local instance of photon. Offline geocoding is nice because it is extremely fast, versatile and it doesn’t send your potentially sensitive data around the internet. In a lot of cases, offline geocoding is absolutely imperative, yet usually, setting up an offline geocoder can be quite cumbersome. photon takes over this task!
To run photon, you need Java 11 or higher. Setting up local photon
also works through new_photon()
. This time, we pass a path
where the necessary files should be stored and a country for which a
search index should be downloaded. While global coverage is also
possible, the global search index is extremely large (around 80 GB). By
default, new_photon()
downloads a search index tagged with
latest
but it is also possible to query a search index
created at a specific date.
path <- file.path(tempdir(), "photon")
photon <- new_photon(path, country = "Samoa")
#> ℹ java version "22" 2024-03-19
#> ℹ Java(TM) SE Runtime Environment (build 22+36-2370)
#> ℹ Java HotSpot(TM) 64-Bit Server VM (build 22+36-2370, mixed mode, sharing)
#> ✔ Successfully downloaded photon 0.5.0. [7s]
#> ✔ Successfully downloaded search index. [590ms]
#> • Version: 0.5.0
#> • Coverage: Samoa
#> • Time: latest
The resulting object is an R6 class with a few methods to control the
instance. To start photon, run $start()
. This starts an
external java process which can be accessed using the $proc
attribute.
photon$start()
#> Running java -jar photon-0.5.0.jar -listen-ip 0.0.0.0 -listen-port 2322
#> 2024-10-25 17:04:26,912 [main] WARN org.elasticsearch.node.Node - version [5.6.16-SNAPSHOT] is a pre-release version of Elasticsearch and is not suitable for production
#> ✔ Photon is now running. [11s]
photon$proc
#> PROCESS 'java', running, pid 22744.
To check if the service is up and running, you can use
$is_ready()
.
photon$is_ready()
#> [1] TRUE
Finally, to properly stop photon after you used it, you can run
$stop()
. You do not actually need to run it
manually, because it is (implicitly) executed on two occasions: 1. on
garbage collection and 2. when the R session ends and external processes
are killed.
photon$stop()
To compare offline and online geocoding, let’s benchmark them by geocoding the Samoan capital Apia:
# offline geocoding
bench::mark(geocode("Apai", limit = 1), iterations = 25)$median
#> [1] 17.1ms
# online geocoding
new_photon()
bench::mark(geocode("Apai", limit = 1), iterations = 25)$median
#> [1] 1.05s
That is a speed increase by a factor of almost 60 (and possibly more on faster machines)!
Finally, to clean up photon, i.e. stop the instance and delete the
photon directory, run $purge()
.
photon$purge()
#> ℹ Purging an instance kills the photon process and removes the photon directory.
#> Continue? (y/N/Cancel) y