-
Notifications
You must be signed in to change notification settings - Fork 117
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement dynamic status #667
base: main
Are you sure you want to change the base?
Implement dynamic status #667
Conversation
4b15707
to
c76bc82
Compare
47df40e
to
95edc07
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Before this gets integrated, I would like to have the opportunity to test it with https://github.com/crim-ca/weaver to evaluate if any refactor causes major breaking issues.
That could break multiple birdhouse stacks that rely on pywps/weaver across multiple projects.
FYI @huard @cehbrecht
pywps/app/Service.py
Outdated
@operations("status") | ||
def get_status(self, wps_request, uuid): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this be GetStatus
?
This is the name of the WPS operation: http://opengeospatial.github.io/e-learning/wps/text/operations.html
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hello @fmigneault ,
Thanks for feedbacks.
I was not aware of GetStatus, I am focused on WPS 1.0.0 spec. I used request=status to avoid complex change in code. Initially I used a separate url than the [server] url
but it require more setup to ensure wsgi to expose the new url path, and need some change on how we dispatch urls within the code, thus I dropped the idea and switched to the most convenient implementation to start the discussion. I also considered to use service=something. I will be happy to get some suggestion about this url, the only requirement is that the request need the status uuid to retrieve the status within the database.
As far as I understand GetStatus, it's seems it not the same as the WPS status file required in WPS 1.0.0 thus they cannot be merged together. Nevertheless, the GetStatus request can be implemented on top of this patch suite, all data required for generating the GetStatus outputs is stored in the database in this patch suite. We can also implement the GetResult.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the long delay.
service=...
should be avoided as it is (normally) intended to redirect to other services such as WMS, WCS, etc.
Currently, PyWPS does not have other services running under the same webapp, but a server using a PyWPS behind the scene could filter this parameter to perform some redirections to other endpoints.
The idea of GetStatus
is to retrieve the current status of the job, regardless of where the status is stored.
However, the full XML details of the status can always be retrieved from statusLocation
provided in the execute response. This location does not necessarily have to be a file, but this is what PyWPS currently uses. That location does not have to be on the same server either. It could point at a remote instance or an S3 bucket for example.
If GetStatus
is needed, I think the right approach would be to employ WPS 2.0. It would be preferable to support (relatively) newer revisions than patch older ones with unofficial requests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, do you have some idea of which url I should use ?
I will remove the decorator @operations
that can be confusion within the code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should be a configurable setting to allow instances to adapt as needed.
Similar to how the [server] outputurl
can be configured, it could be a [server] statusurl
or similar.
This way, the server can decide where to host status files (or not if the setting is left empty by default).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is no need of file for dynamic status, thus the url does not need a mapping to a file on the file system. I will keep an option to keep the file on file system with same behaviour as before, but the url that I talk about will be provided by wsgi application directly, not through apache, in the same manner as DescribeProcess request for instance, or as we will have to implement for WPS 2.0.0, i.e. GetStatus.
Currently this url look like:
http://hostname/path/to/wps?service=WPS&request=status&uuid=SOMEUUID
where http://hosname/pat/to/wps
is the setting provided by [server] url
If we want to use another unrelated path than the one above, it require that user configure a second wsgi application.
One option is to use a sub-path on top of the one above as instance it can be:
http://hostname/path/to/wps/status?uuid=SOMEUUID
or
http://hostname/path/to/wps/status/SOMEUUID
Another option is to not bind the WPS with the url option [server] url
directly but having wps service binded to sub-path that way in the configuration above the wps may by bound to a path like:
http://hostname/path/to/wps/wps?service=WPS&request=GetCapabilities
having status urls and wps request url at the same level.
In this case we may have following configurations:
[server]
rooturl=http://hostname/path/to/wsgi
status_sub_path=custom_status_path
wps_sub_path=custom_wps_path
that will produce the following urls for WPS request:
http://hostname/path/to/wsgi/custom_wps_path?service=WPS&request=GetCapabilities
and
http://hostname/path/to/wsgi/custom_status_path?uuid=someuuid
The last options is my favorite compromise. default value for wps_sub_path can be 'wps' and the default status_sub_path can be 'status'
pywps/response/status.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe this should be kept this way, and not move it to the pywps.__init__.py
.
Init is not where the status should be defined.
All the code should import from pywps.response.status import WPS_STATUS
.
There is no reason to break this interface.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the reason why I moved the WPS_STATUS
was to solve a recursive import issue. I think it was because implicitly python load __init__.py
from all directory within an import path, even if it's not required. For instance if you import a.b.c.d, python load a.__init__.py
, a.b.__init__.py
and a.b.c.__init__.py
before loading a.b.c.d.py
and if you need import a.b.c.d
within a.b.__init__.py
you are in trouble.
I will check it.
Best regards
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If pywps.response.status
only defines WPS_STATUS
without importing other pywps
locations, it should be possible to do from pywps.response.status import WPS_STATUS
from anywhere. There's actually more chances that a circular import occurs if from pywps import WPS_STATUS
is called while pywps
is still being imported by some other module.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I get this error.txt if I do not move WPS_STATUS.
I will check closer if I can find a better change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem is because of the relative imports added to pywps/response/__init__.py
.
There is no need to import them there as shortcuts.
Anywhere that needs to import these classes can simply use the complete module path.
Avoid using ..
and .
imports. Always do explicit from pywps...
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@fmigneault Ok thanks I will try it :)
pywps/app/Service.py
Outdated
|
||
from werkzeug.wrappers import Request, Response | ||
from pywps.app.WPSExecuteRequest import WPSExecuteRequest | ||
from ..response import CapabilitiesResponse, DescribeResponse, ExecuteRawResponse, StatusResponse |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use full pywps import path
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok I will update the code to follow this policy
pywps/app/Process.py
Outdated
def _setup_status_storage(self): | ||
self._status_store = StorageBuilder.buildStorage() | ||
|
||
@property | ||
def status_store(self): | ||
if self._status_store is None: | ||
self._setup_status_storage() | ||
return self._status_store | ||
|
||
@property | ||
def status_location(self): | ||
return self.status_store.location(self.status_filename) | ||
|
||
@property | ||
def status_filename(self): | ||
return str(self.uuid) + '.xml' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if we want to keep this file available locally (on top of allowing request=GetStatus
)?
Can't the handlers be left available?
The OGC API - Processes package I work on that uses PyWPS for corresponding WPS operations expects the XML files to be made available to users to help them debug why their process failed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hello @fmigneault ,
The file can be retrieved with a quite easy GET request from uuid and the json data for generating the status is also stored within the database, thus I think this would be duplicated. I can implement it, but I would prefer that it is not implemented until the following comments get addresses.
Another thing is that currently the data storage within PyWPS seems to be a mess. The pywps.inout.storage does not seems to be used consistently across the current code. As far as remember the FileStorage
must be used to ensure that the status file get a valid output url i.e. [server] outputurl
configuration. And the FileStorage
setup must be made in way that the path of stored file it the one exposed by [server] outputurl
, thus I tryed to avoid that, that why I put the whole data within the database, avoiding using FileStorage
.
The general idea here is that I would like to remove completly [server] outputurl
and I would like every request goes to one entry point. that entry point will map request using the proper backend Storage facility. That way the storage facility and the way of the data are exposed is independent, and users are free to store data in the way they prefer by implementing they own pywps.inout.Storage
or using one provided by pywps. For instance, this entry point may be a root path with several sub-path such as 'http://host/path/wps', 'http://host/path/status' or 'http://host/path/files' as I designed in the first place. We may also use GET parameters instead as I did in the current patch suite. I droped the path idea for GET method because the later is less invasive within the code and make the the setup of the server simpler. Note that I have another patch suite where I implement this behavior to store and access results data.
Best regards
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that users could have the option to control where the status is stored and/or reported. However, to avoid breaking existing instances, it would be preferable if the default uses the local file as it was previously done. An option would have to be used to avoid producing the file. Any server that relies on this local XML file to exist locally will break.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will implement that option
You are welcome to test it and to provide feedbacks. I'm using it with my own WPS server :) Best regards. |
de7cc67
to
ae7947a
Compare
Hello, I did updated the patch suite to address comment above, to summarize changes:
I do not like to much the setup of URL but as said, it's compromise, maybe in futur we may implement a cleaner aproach. Best regards |
ae7947a
to
039b44b
Compare
The WPS request contain the pre-parsed WPS data in json, i.e. an intermediate form of the request. We can store this form for later processing. In other hand the WPSExecuteRequest is the final WPS request form to run the process, it is used into the handler function as wps_request parameter. This simplify the data stored for later use. This also avoid that serialization of pywps.inout.inputs.*Input is used in very diferent context.
WPSExecuteRequest.inputs['id'] is collections deque, this patch convert them to regular python list.
The ExecuteResponse is responsible to render the output XML and this is not mandatory for running process, in particular background process. This patch split the part that are required to run a given execute request from the part that is required to render the execute output.
Within test the database is in memory database, thus upon fork the database is also forked and all async process will update a copy of the database instead of actually updating the main database, making actual test of async request wrong.
039b44b
to
667aea8
Compare
Fix more relative imports |
Overview
This patch suite implement the #354 with several refactor. I did this patch to simplify and split the previous PR #662 to make review easier. I expect to submit other relevant changes but I rework them to avoid dramatic change into the user API. For instance I did extended the service=WPS API by adding request=status instead of adding a new separate endpoint as I did in PR #662
Related Issue / Discussion
#662
Additional Information
This contribution is supported by MINES ParisTech.
Contribution Agreement
(as per https://github.com/geopython/pywps/blob/master/CONTRIBUTING.rst#contributions-and-licensing)