-
-
Notifications
You must be signed in to change notification settings - Fork 56
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Suggestion: Regex-based inbound data filters #482
Comments
The window watcher recently had another PR merged for filtering window titles by regex on the client-side before sending: ActivityWatch/aw-watcher-window#99 I don't like the idea of server-side data filters (ideally it'd happen already on the client), but totally agree about discoverability/ease of configuration. This could be addressed with the server-side settings that's in the recent betas. Watchers could fetch the server's filter settings (which could be configured in the Settings view) and filter before sending, just like in the PR above. I think your plan sounds great, PRs welcome! |
What's your argument for client-side data filtering exactly? I don't see the point of implementing regex filtering in each client again and again tbh. I think even performance-wise it would be nicer to have a single efficient implementation in Rust. I see that there is a bit of overhead involved with sending the full window title to the server, but after all, the communication is happening on I'm not sure if there's any privacy advantage by filtering earlier either. I see that there is some possibility of MITM'ing the server, but I don't think it's very likely that such an attack happens -- and if it does, the early filtering is not a sufficient privacy gurantee at all. On the other hand, it's much more likely that users want to show their timeline to others, but want to make sure that some data will never be visible there. Depending on what each watcher implements, they might not have the ability to do so. Implementing it in a central place also allows to iterate on the design much easier, e.g. when adding replacements. I see that different watchers provide different fields on which the regexes could be applied, so they might want to have some control over the way the filtering works. But then again, the categories work the same way. Thinking of which, I think by having the filtering implemented in the server, it could deliver a much nicer user experience when setting it up, since it could preview the changes it would have applied if it were active in the past. We could even re-use the implementation for data scrubbing. |
It feels wrong to send potentially sensitive information (even if locally) only for it to be discarded.
Performance is not a concern as regexes are fast in any language and the strings involved are short. Most people are still using aw-server-python (default) and we are keeping them at feature-parity, so there'd be no "single implementation" anyway.
It's practically already implemented in aw-watcher-window. imo there's very little to iterate on here. The design is clear, just need to add a setting for it in aw-webui and make the watcher respect it.
I don't see how it would affect the user experience in any way. None of those things require filtering implemented in the server.
Data scrubbing with previews would be purely an UI feature in aw-webui using the existing API, no changes needed to the server.
This seems like a different but similar feature, where you want some sensitive data stored (not filtered in the first place), but you want it hidden/masked for the purpose of sharing/screenshots (prob what I want instead of a filter). Seems like another purely UI feature. Already stored data matching the filter expression could be hidden/masked by default in the UI. |
Ok, I think I'll just write myself a proxy for scrubbing then |
Just for a perspective, we had a similar discussion with a different resolution #302 (comment) |
That's fair, I think that's a good take. I do think clients should be able to check server configuration, via the settings endpoint, and respect filters set there (i.e. from web UI). But might get messy. It'd be problematic if we want them to apply generally, without having to change all watchers by having each opt-in by respecting the filter (potentially with different regex-engines etc, not great). I guess we can do both. Have a setting, let privacy-aware watchers filter before sending, and let the server double-filter. In addition to letting users filter events to avoid saving them, I've been meaning to add "hidden" categories that are hidden by default in visualizations and either replaced with "Private" or hidden altogether. For the sensitive stuff people want to keep but not see (by default). |
Being able to filter the inbound data has been the most popular feature request for a very long time and comes up again and again in the issues of different repositories. In the status quo, some watchers (e.g. the window watcher) have their own filtering, but they have various problems:
Suggestion
We could add regex-based filtering on the heartbeat level: whenever a heartbeat comes in, it's checked against a set of user-configurable regexes. If one matches on any field of the entry, the entry is discarded. We could also extend this feature to allow regex-replacing entries or matching only some fields of the JSON entry.
Similar inbound data filters can be found in e.g. Sentry.
I could probably implement this by myself, at least in Python and the Vue frontend, but probably in Rust as well, but I'd like to know if the approach is welcome in the first place. Tbh, if it isn't, I'd consider writing a simple proxy server that does exactly this -- applying some replacements to the
heartbeat
endpoint and passing everything else through.The text was updated successfully, but these errors were encountered: