-
-
Notifications
You must be signed in to change notification settings - Fork 32
draft about distinct id implementation #342
base: master
Are you sure you want to change the base?
Conversation
import org.jetbrains.annotations.NotNull; | ||
import org.jetbrains.annotations.Nullable; | ||
|
||
// TODO: to think, this could be a SentryOptionsCache that could be expanded, does it make sense? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
to discuss, related to the naming and scope of this class (IUserCache/AndroidUserCache)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense to me to keep this focused on user caching, specially the abstraction
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if Cache is the right naming here, because we actually persist the data to the SharedPreferences
. A cache is usually used to improve the performance of something, see CPU caches, browser caches, web server caches. For me it is more like a UserRepo
or UserStorage
. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not entirely, I borrowed the idea from https://developer.android.com/reference/android/content/Context.html#getCacheDir()
I see the Caching term as something temporary, being in memory or in the disk.
Also, https://docs.sentry.io/clients/java/integrations/#android (old SDK version)
Events will be buffered to disk (in the application’s cache directory) by default
I'm fine changing it as well, but it might make it inconsistent with other SDKs.
@bruno-garcia opnions?
// TODO: only these 3 fields make sense to me, otherwise caching a raw json would be better, but I | ||
// don't want to do that. | ||
private static final String EMAIL = "email"; | ||
private static final String ID = "id"; | ||
private static final String USERNAME = "user_name"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
to discuss, for sessions, we only need these 3 fields
// TODO: I believe this makes sense only for global hub SDKs, but if not, | ||
// how do we cache multiple users (because we have multiple scopes)? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
to discuss
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For me it makes sense to keep this class only specific to Android. I guess we are not going to run into the problem of caching multiple users.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yep, this one is Android-specific, but we can't forget that sentry-core
is cross-platform.
nothing to do here for now.
// is it a good idea to do it here? this guarantees that it cleans up on restart if the flag | ||
// has been swapped | ||
cleanUserFields(edit); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
to discuss
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it makes sense, because otherwise you could receive an old user. Imagine the following scenario:
setUser(A)
cacheUserForSessions = false
setUser(B)
, which doesn't do anything.cacheUserForSessions = true
getUser()
returnsA
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yep, that was exactly the use-case
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just came up with another scenario:
setUser(A)
- cacheUserForSessions = false
- cacheUserForSessions = true
- getUser() returns A
In this scenario, even with calling cleanUserFields
in setUser
and getUser
, you get the old user A. SentryOptions.setCacheUserForSessions
could clear the UserCache
if set to false
to solve this problem.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, I'd say on SDK. init, if the flag is false
, we should clean this up.
SentryOptions fields are only mutable on SDK init., but not on runtime.
good catch
// fallback to sentryDeviceId | ||
if (distinctId == null) { | ||
// if I add a transient field (not serializable) to the User object, I can get rid of the | ||
// distinctId on options | ||
// I cannot simply set on user.Id cus its gonna be a breaking change | ||
distinctId = options.getDistinctId(); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
to discuss
final User finalUser = getUser(); | ||
// synchronized (userLock) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's racey if we don't sync it. but as session is already locked, only a copy of User would be fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Technically user would need to be readonly but yeah
|
||
String sentryDeviceId; | ||
// cache it if not set | ||
if (!sharedPreferences.contains(SENTRY_DEVICE_ID)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the place to have this code on some package private static class? If so we could just call it from here.
I believe this contains
call could be replace with the sharedPreferences.getString(SENTRY_DEVICE_ID, null) != null
to avoid two I/O calls.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SharedPreferences
does in-memory caching sync and it flushes to the disk async., so unless its value has changed, it's not an issue.
I'd separate it in a method or class, yes.
import org.jetbrains.annotations.NotNull; | ||
import org.jetbrains.annotations.Nullable; | ||
|
||
// TODO: to think, this could be a SentryOptionsCache that could be expanded, does it make sense? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense to me to keep this focused on user caching, specially the abstraction
if (options.isCacheUserForSessions()) { | ||
final SharedPreferences.Editor edit = sharedPreferences.edit(); | ||
if (user != null) { | ||
edit.putString(ID, user.getId()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we settle for only caching the userId
instead? Or at least a single value (when a user is set, we evaluate the which value based on an order (id>username>email)? Would be worth seeing what the issues page already considers when accounting for "user" of a crash.
If we're expecting to store the actual User, we'd need to consider the protocol can change (versioning/new fields added), user define data and makes it trickier:
All other keys are stored as extra information but not specifically processed by Sentry.
If we're storing just a single value to be used for session tracking only, it would minimize these edge cases.
Everything else you said stands true:
setUser(...)
makes the SDK evaluate a new value for session distrinctId
. setUser(null)
deletes that cached value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yep, a single value would be ok, makes sense.
the order would be id
> email
> username
if following the release health specs.
/** | ||
* enable the usage of Settings.Secure.ANDROID_ID to keep retro compatible with older SDK versions | ||
*/ | ||
private boolean enableSecureAndroidId; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could this option be named in such as way as to indicate a fallback (as opposed to enable
which sounds like a feature).
Perhaps useLegacyUserId
or something else?
Also, should this be guarded by sendDefaultPii
? Meaning we'd need both this flag on + sendDefaultPii
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agreed, naming here can be improved, useLegacyAndroidId
maybe? the behavior is very dependent on it.
so initially I've added sendDefaultPii
and later removed. as it's the only field to be guarded, it sounds redundant, one would need to swap 2 flags for a single behavior change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we add it, I would already mark it as deprecated and tell our users to get rid of it.
@@ -102,16 +105,24 @@ public void setTransaction(@Nullable String transaction) { | |||
* @return the user | |||
*/ | |||
public @Nullable User getUser() { | |||
return user; | |||
synchronized (userLock) { | |||
if (user == null) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A couple of potential improvements:
- double-check lock
- account for when no user is ever set. Now it'll take the lock and check the cache on each use.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've noticed the multiple checks of getUser
call.
Ideally, we'd only check it again if the value has been changed.
Definitely improvements to be done.
final User finalUser = getUser(); | ||
// synchronized (userLock) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Technically user would need to be readonly but yeah
options.getDistinctId(), user, options.getEnvironment(), options.getRelease()); | ||
String distinctId = null; | ||
|
||
final User finalUser = getUser(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this mean that now even if the user isn't cached we'll be user the user from the scope?
Seems like this brings us back to a potential double-user count per device issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this mean that now even if the user isn't cached we'll be user the user from the scope?
mm guess I don't get it, what do you mean?
Seems like this brings us back to a potential double-user count per device issue.
this is true only on the first run of the App., if cacheUserForSessions
is enabled and value has been set, it's gonna reuse the same distinctId, unless setUser has been called again with different data.
if (distinctId == null) { | ||
// if I add a transient field (not serializable) to the User object, I can get rid of the | ||
// distinctId on options | ||
// I cannot simply set on user.Id cus its gonna be a breaking change |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment made me wonder if we should just release 3.0.0 with sessions and by default report the installation id as both distinctId
and user.id
. With that fallback to secureId you added already. Would simplify things?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if we do a major bump, I'd remove secureAndroidId
once for all.
I'd set installationId
by default onto Scope.user.id
on SDK init if cacheUserForSessions
is disabled and if enabled, the cached userId
.
this would make event.user.id
and session.distinctId
normalized.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That makes sense. Also because a btter fallback is to tell users to call secure.AndroidId and set to the user.id by themselves
|
||
// TODO: only these 3 fields make sense to me, otherwise caching a raw json would be better, but I | ||
// don't want to do that. | ||
private static final String EMAIL = "email"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't we prefix those constants here? Could it be that the users of our SDK also save email to their shared preferences? What about 'sentry_user_cache_email' or something similar?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
its a dedicated SharedPreferences, see:
https://github.com/getsentry/sentry-android/pull/342/files#diff-5b8dbf3e5fc61d592bc69f90d2a3b53cR76-R77
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a few comments :)
/** | ||
* enable the usage of Settings.Secure.ANDROID_ID to keep retro compatible with older SDK versions | ||
*/ | ||
private boolean enableSecureAndroidId; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we add it, I would already mark it as deprecated and tell our users to get rid of it.
// TODO: I believe this makes sense only for global hub SDKs, but if not, | ||
// how do we cache multiple users (because we have multiple scopes)? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For me it makes sense to keep this class only specific to Android. I guess we are not going to run into the problem of caching multiple users.
edit.putString(ID, user.getId()) | ||
.putString(EMAIL, user.getEmail()) | ||
.putString(USERNAME, user.getUsername()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For me it is a bit confusing why we pass the whole user to IUserCache.setUser
and only save id, email and username. Why don't we save the other users properties?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we could, but our session
guideline only considers id
, email
, and username
as a valid unique identifier.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe then we should create an extra class for this and name it something like SessionIdCache
and add this as a comment?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe the Caching class should not be aware of what id
it is, by that I mean (id, email or username).
maybe this class should take a String nominated as distinctId
and that's all, so the naming would not matter
.
A session also has an id
so SessionIdCache
would not fit, as we are caching the Session.distinctId
which is a unique identifier of the user or device.
UserIdCache
would make sense though?! naming is hard hehe
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe this class should take a String nominated as distinctId
I prefer this too. It says what it does. So when setting the user, if this option is enabled, we select the data from the user object (id, username, email) to become the distinctId and cache that so the caching itself isn't aware of this logic and we don't have to deal with caching the actual user object and possible user-defined fields.
import org.jetbrains.annotations.NotNull; | ||
import org.jetbrains.annotations.Nullable; | ||
|
||
// TODO: to think, this could be a SentryOptionsCache that could be expanded, does it make sense? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if Cache is the right naming here, because we actually persist the data to the SharedPreferences
. A cache is usually used to improve the performance of something, see CPU caches, browser caches, web server caches. For me it is more like a UserRepo
or UserStorage
. What do you think?
|
||
@Override | ||
public void setUser(final @Nullable User user) { | ||
if (options.isCacheUserForSessions()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we could separate the logic of whether the user data should be saved or nor from the actual Android code. I would imagine two classes. One called something like UserCache
that holds SentryAndroidOptions
and SentryDeviceId
and this class task to a simple interface like UserPersistance
, that has an Android implementation an actually saves the user's data to the SharedPreferences
. Then we have the SDK logic separated from persisting the data to Android. It would be easier to test and easier to extend. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yep, we could do that, just not sure how deep this abstraction should be, do we really see this user caching thing going forward with other SDKs? like java-backend, java-desktop etc... otherwise, it's just more classes for a single block of code, which is an if
.
@bruno-garcia opinions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It could be pulled out from the Android code but I believe it would only make sense then in desktop apps, right? Since this User in server apps are usually one per request and doesn't really make sense to cache.
Tbh we're not even sure this feature will even be used, it's just a work around to support distinctId
for sessions based on user defined user id
.
The proposal for distinctId
has an option of not dealing with this at all (using only the generated id) which sounds like a good first version until we can get customer feedback to make a more educated decision.
// is it a good idea to do it here? this guarantees that it cleans up on restart if the flag | ||
// has been swapped | ||
cleanUserFields(edit); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it makes sense, because otherwise you could receive an old user. Imagine the following scenario:
setUser(A)
cacheUserForSessions = false
setUser(B)
, which doesn't do anything.cacheUserForSessions = true
getUser()
returnsA
|
||
if (androidId == null | ||
|| androidId.isEmpty() | ||
|| androidId.equalsIgnoreCase("9774d56d682e549c")) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we maybe add a comment about what 9774d56d682e549c
is? I had to Google it ;)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ups, copy-pasta mistake.
the original code has a comment which leads to the google blogpost
see
Line 869 in 19aaac5
// https://android-developers.googleblog.com/2011/03/identifying-app-installations.html |
if (finalUser != null) { | ||
if (finalUser.getId() != null && !finalUser.getId().isEmpty()) { | ||
distinctId = finalUser.getId(); | ||
} else if (finalUser.getEmail() != null && !finalUser.getEmail().isEmpty()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we use the email
or the username
as the distinctId
if the id
is not set?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
because all of them are considered to be a unique identifier, in this order.
if there's no id, check for email, no email as well? username.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was the approach taken in the session supported added in the python SDK at least.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for clarifying that 👍. Didn't know
thanks for the review :) |
📢 Type of change
📜 Description
draft about distinct id implementation
💡 Motivation and Context
distinctId should be changeable, but still keeping compatibility with older SDKs.
secureAndroidId should be deprecated, but still possible to keep compatibility.
💚 How did you test it?
no tests at all, only a draft PR to support my DACI request.
📝 Checklist
🔮 Next steps
get approval and do the real implementation.