Skip to content

Commit

Permalink
feat(taps): Implement reference paginators (#732)
Browse files Browse the repository at this point in the history
* feat: Implement paginators

* docs: Add documentation for new implementation

* Update singer_sdk/pagination.py

* Update docs/porting.md

Co-authored-by: Aaron ("AJ") Steers <aj@meltano.com>

* Make linter happy

* Make pre-commit happy

* Use args and kwargs for base class

* Full coverage for new code

* Remove commented type var

Co-authored-by: Aaron ("AJ") Steers <aj@meltano.com>
  • Loading branch information
edgarrmondragon and aaronsteers authored Sep 1, 2022
1 parent e84036b commit 16751ee
Show file tree
Hide file tree
Showing 18 changed files with 1,015 additions and 68 deletions.
7 changes: 7 additions & 0 deletions docs/classes/singer_sdk.pagination.BaseAPIPaginator.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
singer_sdk.pagination.BaseAPIPaginator
======================================

.. currentmodule:: singer_sdk.pagination

.. autoclass:: BaseAPIPaginator
:members:
7 changes: 7 additions & 0 deletions docs/classes/singer_sdk.pagination.BaseHATEOASPaginator.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
singer_sdk.pagination.BaseHATEOASPaginator
==========================================

.. currentmodule:: singer_sdk.pagination

.. autoclass:: BaseHATEOASPaginator
:members:
7 changes: 7 additions & 0 deletions docs/classes/singer_sdk.pagination.BaseOffsetPaginator.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
singer_sdk.pagination.BaseOffsetPaginator
=========================================

.. currentmodule:: singer_sdk.pagination

.. autoclass:: BaseOffsetPaginator
:members:
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
singer_sdk.pagination.BasePageNumberPaginator
=============================================

.. currentmodule:: singer_sdk.pagination

.. autoclass:: BasePageNumberPaginator
:members:
7 changes: 7 additions & 0 deletions docs/classes/singer_sdk.pagination.HeaderLinkPaginator.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
singer_sdk.pagination.HeaderLinkPaginator
=========================================

.. currentmodule:: singer_sdk.pagination

.. autoclass:: HeaderLinkPaginator
:members:
7 changes: 7 additions & 0 deletions docs/classes/singer_sdk.pagination.JSONPathPaginator.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
singer_sdk.pagination.JSONPathPaginator
=======================================

.. currentmodule:: singer_sdk.pagination

.. autoclass:: JSONPathPaginator
:members:
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
singer_sdk.pagination.LegacyPaginatedStreamProtocol
===================================================

.. currentmodule:: singer_sdk.pagination

.. autoclass:: LegacyPaginatedStreamProtocol
:members:
7 changes: 7 additions & 0 deletions docs/classes/singer_sdk.pagination.LegacyStreamPaginator.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
singer_sdk.pagination.LegacyStreamPaginator
===========================================

.. currentmodule:: singer_sdk.pagination

.. autoclass:: LegacyStreamPaginator
:members:
7 changes: 7 additions & 0 deletions docs/classes/singer_sdk.pagination.SimpleHeaderPaginator.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
singer_sdk.pagination.SimpleHeaderPaginator
===========================================

.. currentmodule:: singer_sdk.pagination

.. autoclass:: SimpleHeaderPaginator
:members:
7 changes: 7 additions & 0 deletions docs/classes/singer_sdk.pagination.SinglePagePaginator.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
singer_sdk.pagination.SinglePagePaginator
=========================================

.. currentmodule:: singer_sdk.pagination

.. autoclass:: SinglePagePaginator
:members:
74 changes: 73 additions & 1 deletion docs/porting.md
Original file line number Diff line number Diff line change
Expand Up @@ -103,10 +103,82 @@ _Important: If you've gotten this far, this is a good time to commit your code b

Pagination is generally unique for almost every API. There's no single method that solves for very different API's approach to pagination.

Most likely you will use `get_next_page_token` to parse and return whatever the "next page" token is for your source, and you'll use `get_url_params` to define how to pass the "next page" token back to the API when asking for subsequent pages.
Most likely you will use [get_new_paginator](singer_sdk.RESTStream.get_new_paginator) to instantiate a [pagination class](./classes/singer_sdk.pagination.BaseAPIPaginator) for your source, and you'll use `get_url_params` to define how to pass the "next page" token back to the API when asking for subsequent pages.

When you think you have it right, run `poetry run tap-mysource` again, and debug until you are confident the result is including multiple pages back from the API.

You can also add unit tests for your pagination implementation for additional confidence:

```python
from singer_sdk.pagination import BaseHATEOASPaginator, first


class CustomHATEOASPaginator(BaseHATEOASPaginator):
"""Paginator for HATEOAS APIs - or "Hypermedia as the Engine of Application State".
This paginator expects responses to have a key "next" with a value
like "https://api.com/link/to/next-item".
""""

def get_next_url(self, response: Response) -> str | None:
"""Get a parsed HATEOAS link for the next, if the response has one."""

try:
return first(
extract_jsonpath("$.links[?(@.rel=='next')].href", response.json())
)
except StopIteration:
return None


def test_paginator_custom_hateoas():
"""Validate paginator that my custom paginator."""

resource_path = "/path/to/resource"
response = Response()
paginator = CustomHATEOASPaginator()
assert not paginator.finished
assert paginator.current_value is None
assert paginator.count == 0

response._content = json.dumps(
{
"links": [
{
"rel": "next",
"href": f"{resource_path}?page=2&limit=100",
}
]
}
).encode()
paginator.advance(response)
assert not paginator.finished
assert paginator.current_value.path == resource_path
assert paginator.current_value.query == "page=2&limit=100"
assert paginator.count == 1

response._content = json.dumps(
{
"links": [
{
"rel": "next",
"href": f"{resource_path}?page=3&limit=100",
}
]
}
).encode()
paginator.advance(response)
assert not paginator.finished
assert paginator.current_value.path == resource_path
assert paginator.current_value.query == "page=3&limit=100"
assert paginator.count == 2

response._content = json.dumps({"links": []}).encode()
paginator.advance(response)
assert paginator.finished
assert paginator.count == 3
```

Note: Depending on how well the API is designed, this could take 5 minutes or multiple hours. If you need help, sometimes [PostMan](https://postman.com) or [Thunder Client](https://marketplace.visualstudio.com/items?itemName=rangav.vscode-thunder-client) can be helpful in debugging the APIs specific quirks.

## Run pytest
Expand Down
19 changes: 19 additions & 0 deletions docs/reference.rst
Original file line number Diff line number Diff line change
Expand Up @@ -89,3 +89,22 @@ JSON Schema builder classes
:template: module.rst

typing


Pagination
----------

.. autosummary::
:toctree: classes
:template: class.rst

pagination.BaseAPIPaginator
pagination.SinglePagePaginator
pagination.BaseHATEOASPaginator
pagination.HeaderLinkPaginator
pagination.JSONPathPaginator
pagination.SimpleHeaderPaginator
pagination.BasePageNumberPaginator
pagination.BaseOffsetPaginator
pagination.LegacyPaginatedStreamProtocol
pagination.LegacyStreamPaginator
1 change: 1 addition & 0 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -147,6 +147,7 @@ exclude_lines = [
"raise NotImplementedError",
"if __name__ == .__main__.:",
'''class .*\bProtocol\):''',
'''@(abc\.)?abstractmethod''',
]
fail_under = 82

Expand Down
36 changes: 18 additions & 18 deletions samples/sample_tap_gitlab/gitlab_rest_streams.py
Original file line number Diff line number Diff line change
@@ -1,11 +1,12 @@
"""Sample tap stream test for tap-gitlab."""

from pathlib import Path
from typing import Any, Dict, List, Optional, cast
from __future__ import annotations

import requests
from pathlib import Path
from typing import Any, cast

from singer_sdk.authenticators import SimpleAuthenticator
from singer_sdk.pagination import SimpleHeaderPaginator
from singer_sdk.streams.rest import RESTStream
from singer_sdk.typing import (
ArrayType,
Expand All @@ -21,7 +22,7 @@
DEFAULT_URL_BASE = "https://gitlab.com/api/v4"


class GitlabStream(RESTStream):
class GitlabStream(RESTStream[str]):
"""Sample tap test for gitlab."""

_LOG_REQUEST_METRIC_URLS = True
Expand All @@ -39,8 +40,8 @@ def authenticator(self) -> SimpleAuthenticator:
)

def get_url_params(
self, context: Optional[dict], next_page_token: Optional[Any]
) -> Dict[str, Any]:
self, context: dict | None, next_page_token: str | None
) -> dict[str, Any]:
"""Return a dictionary of values to be used in URL parameterization."""
params: dict = {}
if next_page_token:
Expand All @@ -50,21 +51,20 @@ def get_url_params(
params["order_by"] = self.replication_key
return params

def get_next_page_token(
self, response: requests.Response, previous_token: Optional[Any]
) -> Optional[Any]:
"""Return token for identifying next page or None if not applicable."""
next_page_token = response.headers.get("X-Next-Page", None)
if next_page_token:
self.logger.debug(f"Next page token retrieved: {next_page_token}")
return next_page_token
def get_new_paginator(self) -> SimpleHeaderPaginator:
"""Return a new paginator for GitLab API endpoints.
Returns:
A new paginator.
"""
return SimpleHeaderPaginator("X-Next-Page")


class ProjectBasedStream(GitlabStream):
"""Base class for streams that are keys based on project ID."""

@property
def partitions(self) -> List[dict]:
def partitions(self) -> list[dict]:
"""Return a list of partition key dicts (if applicable), otherwise None."""
if "{project_id}" in self.path:
return [
Expand Down Expand Up @@ -162,7 +162,7 @@ class EpicsStream(ProjectBasedStream):

# schema_filepath = SCHEMAS_DIR / "epics.json"

def get_child_context(self, record: dict, context: Optional[dict]) -> dict:
def get_child_context(self, record: dict, context: dict | None) -> dict:
"""Perform post processing, including queuing up any child stream types."""
# Ensure child state record(s) are created
return {
Expand All @@ -183,8 +183,8 @@ class EpicIssuesStream(GitlabStream):
parent_stream_type = EpicsStream # Stream should wait for parents to complete.

def get_url_params(
self, context: Optional[dict], next_page_token: Optional[Any]
) -> Dict[str, Any]:
self, context: dict | None, next_page_token: str | None
) -> dict[str, Any]:
"""Return a dictionary of values to be used in parameterization."""
result = super().get_url_params(context, next_page_token)
if not context or "epic_id" not in context:
Expand Down
Loading

0 comments on commit 16751ee

Please sign in to comment.