-
Notifications
You must be signed in to change notification settings - Fork 96
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[PECO-1263] Implement a .returned_as_direct_result property for AsyncExecution status #325
base: peco-1263-staging
Are you sure you want to change the base?
Conversation
Signed-off-by: Jesse Whitehouse <jesse.whitehouse@databricks.com>
Oh man, why? This is really unfortunate behavior. |
src/databricks/sql/ae.py
Outdated
@@ -225,6 +228,15 @@ def last_sync_timestamp(self) -> Optional[datetime]: | |||
"""The timestamp of the last time self.status was synced with the server""" | |||
return self._last_sync_timestamp | |||
|
|||
@property | |||
def is_available(self) -> bool: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How long is it available for? How does this limitation work in the CUJ of recovering from a user-space crash?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm going to check with the thrift server folks to get that answer. I believe it's available for a few hours. With cloud fetch enabled it's 24 hours.
What do you think of the name is_available
? I'd like a name that's reflective but also easy enough to type.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Its really a tough one; I think this might be a little misleading, because isn't the answer 'Unknown' for crash recovery CUJ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think so. Because under this situation, if the client crashes it will do so before the query_id
is returned. Which means there would be no way to pick up the execution anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, so if it gets a query_id, then it must be available, unless its been a long time? Although, again, available doesn't sound quite right because the query could still be executing. Maybe just returned_as_direct_result?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea. I've done the rename and pushed.
Note: we could get around this by allowing users to disable directResults
when the TExecuteStatementReq
is emitted.
thrift server team. Signed-off-by: Jesse Whitehouse <jesse.whitehouse@databricks.com>
Signed-off-by: Jesse Whitehouse <jesse.whitehouse@databricks.com>
Signed-off-by: Jesse Whitehouse <jesse.whitehouse@databricks.com>
Signed-off-by: Jesse Whitehouse <jesse.whitehouse@databricks.com>
Signed-off-by: Jesse Whitehouse <jesse.whitehouse@databricks.com>
will this cause any issues? |
Jade, this is the issue:
|
@@ -81,6 +81,7 @@ class AsyncExecution: | |||
] | |||
_last_sync_timestamp: Optional[datetime] = None | |||
_result_set: Optional["ResultSet"] = None | |||
_returned_as_direct_result: bool = False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we need change get_result method to also check this flag?
Description
This PR implements an additional boolean flag on
AsyncExecution
that indicates whether this execution returned directResults. When this happens, results cannot be fetched from the server again. Because when adirectResults
query returns its operation is closed immediately.I determined this is necessary while documenting the new
execute_async()
behaviour. For context, when we send aTExecuteStatementReq
to the thrift server, there are two possible return conditions:TExecuteStatementResp
will actually include the results!TExecuteStatementResp
will only include thequery_id
for the query, which we can use to poll for the results until they are ready.In case 1, the result is sent within the initial
TExecuteStatementResp
and they cannot be fetched a second time. What this means for users is that if they are going to call.serialize()
to persist aquery_id
andsecret
for use by another thread, they need a way to verify that the results will actually be available to another thread.This new
.returned_as_direct_result
property makes that easier.One unfortunate implication of this behaviour is that users cannot use one thread to kickoff all their queries and a separate thread to fetch their results. Because not all
AsyncExecution
objects can actually be picked up separately.