Feature request: a batch version of mapzen.places.getHierarchiesByLatLon #99

simonw · 2017-10-02T23:31:34Z

Or just a general mechanism for batch API calls in general would be fantastic.

thisisaaronland · 2017-10-02T23:36:43Z

Tell me more?

simonw · 2017-10-03T20:00:50Z

We often need to resolve hierarchies for a bunch of places at once. For example... let's say we're returning a page with 10 events on it. Each event has a latitude/longitude point, and we want to show a breadcrumb on each event "card" showing the state, city and neighbourhood.

We do that by hitting our own internal service action which serves up an aggressively cached set of data derived from calls to getHierarchiesByLatLon. Provided those points have already been queried by our service, we'll be able to return the result direct from our cache.

BUT... what if we don't have the results cached yet? We need to make up to 10 individual calls to getHierarchiesByLatLon to pull back the data we need.

That's when we run into the mapzen 4-requests-per-second rate limit.

It would be fantastic if we could do something like this instead:

https://places.mapzen.com/v1/
    ?method=batch
    &api_key=mapzen-xxx
    &batch=URLENCODE({
        "1":{"method":"mapzen.places.getHierarchiesByLatLon","latitude":37.777228,"longitude":-122.470779},
        "2":{"method":"mapzen.places.getHierarchiesByLatLon","latitude":37.677228,"longitude":-122.470779},
        "3":{"method":"mapzen.places.getHierarchiesByLatLon","latitude":37.577228,"longitude":-122.470779},
        "4":{"method":"mapzen.places.getHierarchiesByLatLon","latitude":37.477228,"longitude":-122.470779}
    })

And get back a response something like this:

{
    "batch_results": {
        "1": {
            "hierarchies": [
                {
                    "neighbourhood_id": 85865919,
                    "continent_id": 102191575,
                    "macrohood_id": "1108830805",
                    "country_id": 85633793,
                    "locality_id": 85922583,
                    "county_id": 102087579,
                    "region_id": 85688637
                }
            ],
            "stat": "ok"
        },
        "2": {
            "hierarchies": [
                {
                    "neighbourhood_id": 85865919,
                    "continent_id": 102191575,
                    "macrohood_id": "1108830805",
                    "country_id": 85633793,
                    "locality_id": 85922583,
                    "county_id": 102087579,
                    "region_id": 85688637
                }
            ],
            "stat": "ok"
        },
        "3": {
            "hierarchies": [
                {
                    "neighbourhood_id": 85865919,
                    "continent_id": 102191575,
                    "macrohood_id": "1108830805",
                    "country_id": 85633793,
                    "locality_id": 85922583,
                    "county_id": 102087579,
                    "region_id": 85688637
                }
            ],
            "stat": "ok"
        },
        "4": {
            "hierarchies": [
                {
                    "neighbourhood_id": 85865919,
                    "continent_id": 102191575,
                    "macrohood_id": "1108830805",
                    "country_id": 85633793,
                    "locality_id": 85922583,
                    "county_id": 102087579,
                    "region_id": 85688637
                }
            ],
            "stat": "ok"
        }
    }
}

Doing this via a GET may not be the right thing (url-encoded JSON in a query string is ugly and long) - maybe a POST would be more sensible:

POST https://places.mapzen.com/v1/?method=batch&api_key=mapzen-xxx
{
    "1": {
        "method": "mapzen.places.getHierarchiesByLatLon",
        "latitude": 37.777228,
        "longitude": -122.470779
    },
    "2": {
        "method": "mapzen.places.getHierarchiesByLatLon",
        "latitude": 37.677228,
        "longitude": -122.470779
    },
    "3": {
        "method": "mapzen.places.getHierarchiesByLatLon",
        "latitude": 37.577228,
        "longitude": -122.470779
    },
    "4": {
        "method": "mapzen.places.getHierarchiesByLatLon",
        "latitude": 37.477228,
        "longitude": -122.470779
    }
}

Here's how we built this for Eventbrite's API: https://www.eventbrite.com/developer/v3/api_overview/batching/

There are all sorts of complexities around this - the need for sensible limits, how it interacts with rate-limiting etc - but being able to group requests in this way would be really useful.

thisisaaronland · 2017-10-04T01:46:26Z

As you mention, there are all sorts of complexities around this. I could imagine (in that way I can imagine all kinds of crazy stuff at the end of the day... :-) building a thin layer of icing... I mean a "service" on top of this:

https://github.com/whosonfirst/go-whosonfirst-api

Which would basically manage all the requests, whether they are executed concurrently or not, and take care of all the boring details (rate limiting, billing, etc.) behind the scenes.

I will have a closer look at the Eventbrite docs and start thinking about it more generally.

Do you imagine that you would want to mix and match API calls/methods inside a single batch request?

thisisaaronland · 2017-10-04T17:18:19Z

Okay, so this is incredibly wet paint but:

https://github.com/whosonfirst/go-whosonfirst-api-batch/blob/master/batch.go

As in:

./bin/wof-api-batch-server
2017/10/04 10:15:14 listening on localhost:8080
2017/10/04 10:15:18 TIMING 793.099943ms

And:

curl -s 'localhost:8080?api_key=mapzen-****' -d @batch.json | jq '.[].stat'
"ok"
"ok"
"ok"
"ok"

Where batch.json looks like this:

[
	{"method":"mapzen.places.getHierarchiesByLatLon","latitude":37.777228,"longitude":-122.470779},
        {"method":"mapzen.places.getHierarchiesByLatLon","latitude":37.677228,"longitude":-122.470779},
        {"method":"mapzen.places.getHierarchiesByLatLon","latitude":37.577228,"longitude":-122.470779},
        {"method":"mapzen.places.getHierarchiesByLatLon","latitude":37.477228,"longitude":-122.470779}
]

Question: Is there a particular reason your example batch request has numeric keys?

simonw · 2017-10-04T18:02:21Z

The numeric key thing was just one way to make it easy to keep track of "I asked these questions, I got these responses back again". Doing it as a list is entirely as good, it just means the client code that tries to remember which question it asked in order to get which response would work very slightly differently.

thisisaaronland · 2017-10-04T18:49:25Z

Another question(s):

How would you feel about an API that returned a ticket and required you to poll for results?
How would you feel about an API that handled requests/responses over a WebSocket connection?

simonw · 2017-10-04T22:45:50Z

I'd love the above as additions to a traditional request/response API, but not as a replacement for it.

A request/response batch API like the one described above would certainly need to be strict about how many batch requests are allowed. The neatest mechanism I've considered for this would be to assign each method a "cost", and allow a budget for a batch call.

For example, maybe mapzen.places.getHierarchiesByLatLon is assigned a cost of 5, and mapzen.places.getInfo has a cost of 1. If the batch API had a budget of 20, I would know that I could run 3 getHierarchiesByLatLon calls and 5 getInfo calls in a single batch request.

As a consumer of an API, I want to be confident that the API will return in a sensible amount of time - so having guidance that says "you can spend up to 20 credits in a batch call and we're confident we could return in <100ms" would be really useful.

An API that returns a ticket and asks me to poll for a result... that would be fantastic for big batch jobs. I have 80,000 venue locations I'd like to geocode right now - I'd love it if I could send you the whole lot in one go and then poll for a few minutes waiting for a giant response to be ready.

The WebSocket thing: I'll be honest, from regular Python (using the requests library) I think I'd just find it too fiddly to use. I'd have to drop in a Python websocket library instead. I'd do it if I had to, but given the choice between that and a polling-based API I'd take the polling one. I'm sure node.js developers would disagree with me wildly here :)

simonw · 2017-10-04T22:46:52Z

Huh, I just noticed that you already have a mapzen.places.getInfoMulti method: https://mapzen.com/documentation/places/methods/#mapzen.places.getInfoMulti

thisisaaronland · 2017-10-04T22:54:05Z

That's good to know and I tend to share your feelings. The WS stuff seems sufficiently fiddly and complex across languages that I can imagine it rapidly outstripping any potential benefits. I might implement a proof-of-concept endpoint but mostly as an experiment...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature request: a batch version of mapzen.places.getHierarchiesByLatLon #99

Feature request: a batch version of mapzen.places.getHierarchiesByLatLon #99

simonw commented Oct 2, 2017

thisisaaronland commented Oct 2, 2017

simonw commented Oct 3, 2017 •

edited

Loading

thisisaaronland commented Oct 4, 2017

thisisaaronland commented Oct 4, 2017

simonw commented Oct 4, 2017

thisisaaronland commented Oct 4, 2017

simonw commented Oct 4, 2017

simonw commented Oct 4, 2017 •

edited

Loading

thisisaaronland commented Oct 4, 2017

Feature request: a batch version of mapzen.places.getHierarchiesByLatLon #99

Feature request: a batch version of mapzen.places.getHierarchiesByLatLon #99

Comments

simonw commented Oct 2, 2017

thisisaaronland commented Oct 2, 2017

simonw commented Oct 3, 2017 • edited Loading

thisisaaronland commented Oct 4, 2017

thisisaaronland commented Oct 4, 2017

simonw commented Oct 4, 2017

thisisaaronland commented Oct 4, 2017

simonw commented Oct 4, 2017

simonw commented Oct 4, 2017 • edited Loading

thisisaaronland commented Oct 4, 2017

simonw commented Oct 3, 2017 •

edited

Loading

simonw commented Oct 4, 2017 •

edited

Loading