Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Discovery makes implementation assumptions for listing TDs via pagination parameters #278

Open
thjaeckle opened this issue Feb 24, 2022 · 3 comments

Comments

@thjaeckle
Copy link

I just had a look of what to do in order to provide a WoT Discovery compliant Listing of TDs and wanted to provide feedback.

I think it is very problematic that this chapter has quite some bias of how a TD Directory implements its persistence:

  • the limit / offset based pagination is a very inefficient mechanism for pagination (e.g. see articles here or here) which nowadays is replaced by cursor based pagination instead
    • I don't see a need to specify how pagination is concretely done, it must only be possible to provide a "next" page somehow
  • the "canonical" Link header is specified to include an additional mandatory etag
    • which also is only needed for "limit/offset" based pagination - where adding a new TD at e.g. the "first" position shifts the result array
    • in a cursor based pagination approach, the cursor for the next page would include an identifier for the first TD of the next page in its "where clause" - which would not be affected by adding new TDs during pagination
    • also, specifying the required etag puts quite a lot of effort to the TD Directory, e.g. it would have to calculate a hash of all matching TDs for each "next" page, which could be a very expensive operation

In my opinion, the specification forces the implementer to choose an inefficient persistence.
Does it add much benefit of specifying something like limit and offset query parameters?
Or would it not be sufficient to define (which even already is defined) that a next link header must be returned when there are more results to load than returned in the response? And that this link must be followed to load the next page?

One other thing I noticed:
The query parameters which have to be provided by an implementation are very hidden in the text, I would have expected a table of the required/optional query parameters to support for the "Listing" API.

@mmccool
Copy link
Contributor

mmccool commented Apr 11, 2022

@farshidtz can you comment? I think at this point we are out of time to make big changes in how this is done, but we should definitely look into improving this in Discovery 2.0 (which we def need to do anyway for other reasons, i.e. to intercept the JSONPath spec when it is published by IETF).

@mmccool
Copy link
Contributor

mmccool commented Aug 23, 2022

I'm also wondering if we can fix this in a way that preserves compatibility. Suppose we introduce a new way to do pagination, but keep the old API around. That gives compatibility, but forces a system to implement pagination twice, and one way is inefficient. Do we do that for 1.1 then deprecate the "old way" for 2.0? Or do we just drop the old way and go straight for a new, cursor-based approach and publish a 2.0 spec?

@farshidtz
Copy link
Member

The reason for using limit and offset was to allow programmatic generation of page links and allow parallel queries. Some databases have O(n) complexity for offset, and are more efficient when paginating with cursor-based queries. But there are other issues:

  • With cursor-based, the client has to receive and process the request to get the link and issue a follow up request. This adds significant latency when querying many pages.
  • Cursor-based pagination only works reliably if the identifier is unique. We use id by default but allow sorting by other fields too.

For the etag, calculating the hash of the whole dataset is inefficient, but the spec is suggesting something different:

The etag value could be a revision number, timestamp, or UUID Version 4, set whenever the TD collection changes in a way that affects the ordering of the TDs. The clients may rely on the etag value to know whether the collection remains consistent across paginated retrieval of the collection. For example, creation or deletion of TDs or update of TD fields used for ordering may shift the calculated paging window.

With that said, we could make the use of limit, offset query arguments, and etag in canonical link header optional if there is consensus.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants