-
Notifications
You must be signed in to change notification settings - Fork 694
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sequential learner syncing #11390
Sequential learner syncing #11390
Conversation
Build Artifacts
|
d5b8737
to
2523791
Compare
2523791
to
40f830c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I get the overall gist - although I can't claim to have followed the entire thing all the way through. One question about handling of network connection errors.
from kolibri.utils.translation import ugettext as _ | ||
|
||
|
||
logger = logging.getLogger(__name__) | ||
SOUD_SYNC_PROCESSING_JOB_ID = "50" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As an aside, I am starting to wonder if we shouldn't just allow a task to register with static_id=True
and then have the task machinery generate a hash of the task for the id, to avoid collisions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That would be helpful
instance_id=instance_id, | ||
defaults={ | ||
"status": SyncQueueStatus.Pending, | ||
"updated": time.time(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The difference in the application of the defaults between get_or_create
and update_or_create
shouldn't confuse me as much as it does, but I always forget the defaults will always be applied in the update.
self.set_next_attempt(28 + 2 ** self.attempts) | ||
|
||
class Meta: | ||
unique_together = ("user_id", "instance_id") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Came back to check for this, and wasn't disappointed!
context | ||
) | ||
) | ||
sync_queue.increment_and_backoff_next_attempt() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we be backing off in this case, or waiting for the device to come back online again and then requeueing instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I want to say that we could skip the backoff, but I think there are scenarios where we do get some benefit here in the time being, particularly with static locations that may not be on the same subnet. During the next run, if the network location doesn't exist or is marked with a connection error, it won't make any requests, but continue to backoff until it exceeds the max attempts
Hi @bjester, I was asked to start regression testing on this one however using the latest available .deb asset and the Android build I can't get the sync to start working. Once I import the learner on the LOD the assigned lesson resources are correctly displayed on the device but after that the status is constantly 'Not connected to server'. I am testing with a clean install of Kolibri on all of the devices. sync.not.working.mp4Logs: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've been testing some syncs and the merge facility feature and everything works fine in the merging process. However, after restarting the device, when it tries to sync again, this error happens:
INFO 2023-10-24 19:52:03,759 Response failure: http://192.168.0.201:80/api/public/info/
ERROR 2023-10-24 19:52:03,958 Internal Server Error: /api/public/syncqueue/
Traceback (most recent call last):
File "/datos/le/mio/kolibri/venv/lib/python3.10/site-packages/django/db/models/query.py", line 464, in get_or_create
return self.get(**lookup), False
File "/datos/le/mio/kolibri/venv/lib/python3.10/site-packages/django/db/models/query.py", line 378, in get
raise self.model.DoesNotExist(
kolibri.core.device.models.DoesNotExist: SyncQueue matching query does not exist.
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/datos/le/mio/kolibri/venv/lib/python3.10/site-packages/django/db/backends/utils.py", line 69, in execute
return self.cursor.execute(sql, params)
File "/datos/le/mio/kolibri/venv/lib/python3.10/site-packages/django/db/backends/sqlite3/base.py", line 328, in execute
return Database.Cursor.execute(self, query, params)
sqlite3.IntegrityError: UNIQUE constraint failed: device_syncqueue.user_id, device_syncqueue.instance_id
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/datos/le/mio/kolibri/venv/lib/python3.10/site-packages/django/core/handlers/exception.py", line 41, in inner
response = get_response(request)
File "/datos/le/mio/kolibri/venv/lib/python3.10/site-packages/django/core/handlers/base.py", line 187, in _get_response
response = self.process_exception_by_middleware(e, request)
File "/datos/le/mio/kolibri/venv/lib/python3.10/site-packages/django/core/handlers/base.py", line 185, in _get_response
response = wrapped_callback(request, *callback_args, **callback_kwargs)
File "/datos/le/mio/kolibri/venv/lib/python3.10/site-packages/django/views/decorators/csrf.py", line 58, in wrapped_view
return view_func(*args, **kwargs)
File "/datos/le/mio/kolibri/venv/lib/python3.10/site-packages/rest_framework/viewsets.py", line 116, in view
return self.dispatch(request, *args, **kwargs)
File "/datos/le/mio/kolibri/venv/lib/python3.10/site-packages/rest_framework/views.py", line 495, in dispatch
response = self.handle_exception(exc)
File "/datos/le/mio/kolibri/venv/lib/python3.10/site-packages/rest_framework/views.py", line 455, in handle_exception
self.raise_uncaught_exception(exc)
File "/datos/le/mio/kolibri/venv/lib/python3.10/site-packages/rest_framework/views.py", line 492, in dispatch
response = handler(request, *args, **kwargs)
File "/datos/le/mio/kolibri/kolibri/core/public/api.py", line 342, in create
queue_object = self.create_or_update(request)
File "/datos/le/mio/kolibri/kolibri/core/public/api.py", line 305, in create_or_update
queue_object, created = SyncQueue.objects.get_or_create(
File "/datos/le/mio/kolibri/venv/lib/python3.10/site-packages/django/db/models/manager.py", line 85, in manager_method
return getattr(self.get_queryset(), name)(*args, **kwargs)
File "/datos/le/mio/kolibri/venv/lib/python3.10/site-packages/django/db/models/query.py", line 466, in get_or_create
return self._create_object_from_params(lookup, params)
File "/datos/le/mio/kolibri/venv/lib/python3.10/site-packages/django/db/models/query.py", line 509, in _create_object_from_params
six.reraise(*exc_info)
File "/datos/le/mio/kolibri/venv/lib/python3.10/site-packages/django/utils/six.py", line 686, in reraise
raise value
File "/datos/le/mio/kolibri/venv/lib/python3.10/site-packages/django/db/models/query.py", line 500, in _create_object_from_params
obj = self.create(**params)
File "/datos/le/mio/kolibri/venv/lib/python3.10/site-packages/django/db/models/query.py", line 394, in create
obj.save(force_insert=True, using=self.db)
File "/datos/le/mio/kolibri/venv/lib/python3.10/site-packages/django/db/models/base.py", line 807, in save
self.save_base(using=using, force_insert=force_insert,
File "/datos/le/mio/kolibri/venv/lib/python3.10/site-packages/django/db/models/base.py", line 838, in save_base
updated = self._save_table(raw, cls, force_insert, force_update, using, update_fields)
File "/datos/le/mio/kolibri/venv/lib/python3.10/site-packages/django/db/models/base.py", line 924, in _save_table
result = self._do_insert(cls._base_manager, using, fields, update_pk, raw)
File "/datos/le/mio/kolibri/venv/lib/python3.10/site-packages/django/db/models/base.py", line 962, in _do_insert
return manager._insert([self], fields=fields, return_id=update_pk,
File "/datos/le/mio/kolibri/venv/lib/python3.10/site-packages/django/db/models/manager.py", line 85, in manager_method
return getattr(self.get_queryset(), name)(*args, **kwargs)
File "/datos/le/mio/kolibri/venv/lib/python3.10/site-packages/django/db/models/query.py", line 1079, in _insert
return query.get_compiler(using=using).execute_sql(return_id)
File "/datos/le/mio/kolibri/venv/lib/python3.10/site-packages/django/db/models/sql/compiler.py", line 1112, in execute_sql
cursor.execute(sql, params)
File "/datos/le/mio/kolibri/venv/lib/python3.10/site-packages/django/db/backends/utils.py", line 65, in execute
with self.db.wrap_database_errors:
File "/datos/le/mio/kolibri/venv/lib/python3.10/site-packages/django/db/utils.py", line 94, in __exit__
six.reraise(dj_exc_type, dj_exc_value, traceback)
File "/datos/le/mio/kolibri/venv/lib/python3.10/site-packages/django/utils/six.py", line 685, in reraise
raise value.with_traceback(tb)
File "/datos/le/mio/kolibri/venv/lib/python3.10/site-packages/django/db/backends/utils.py", line 69, in execute
return self.cursor.execute(sql, params)
File "/datos/le/mio/kolibri/venv/lib/python3.10/site-packages/django/db/backends/sqlite3/base.py", line 328, in execute
return Database.Cursor.execute(self, query, params)
django.db.utils.IntegrityError: UNIQUE constraint failed: device_syncqueue.user_id, device_syncqueue.instance_id
INFO 2023-10-24 19:52:04,036 192.168.0.60 - - "POST /api/public/syncqueue/" 500 0 "" "Kolibri/0.16.0b6.dev0+git.211.g1a3009ce python-requests/2.27.1"
I've tried it both using an old kolibri instance migrated from 0.15.x or with a new instance created from scratch. In both cases the result is the same
On the code side, nothing called my attention excepting the small problem with timestamp.
Just to leave a paper trail here - there seem to be two issues at play here. The first is the issue that both @pcenov and @jredrejo have reproduced, which is that syncing seems to get into a failure state, where trying to create a SyncQueue object on the server continuously results in integrity errors. The other appears to be a regression in the merge user workflow, whereby after the user is merged, and the old user deleted, this is not properly propagated to the frontend, and the cached FacilityUser that is now deleted is still being referenced. |
Have made fixes here for currently failing test cases as a start. Now focusing on the behaviour issues. |
Prevent errors by relying on unique together enforcement.
It appears that these were actually related - resolving the former also resolved the latter. However, there still appear to be issues with sync status being properly reported to the frontend. |
A sync status is now being generated, unfortunately, it always seems to be 'not recently synced'. Working to resolve. |
Sync status should now also be properly reported on the LOD side and the server side (I have confirmed both locally). This should be ready for more testing. |
Hi @rtibbles syncing is still not working at all. Tried with both Ubuntu VM and Windows 11 as server and clean installs of Kolibri. On the LOD the main error in the logs is: The sync status seems to be partially working - initially on the coach side it's displayed correctly as 'Synced' but on the LOD it's displayed as 'Not recently synced' right from the start. Logs: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Everything seems to work now fine after the latest commits
Happy for the validation, but will await QA approval for final sign off.
Hi @rtibbles, the 2 main issues I was able to identify are:
2023-10-26_16-55-15.mp4
The good news is that syncing to a LOD on Ubuntu is working correctly and occurs frequently. UbuntuServer.zip |
For Android, is the behaviour the same regardless of whether it's the merged user flow or the import flow? |
OK, I replicated the merged user syncing issues and they should now be resolved. I will need to do some local replication of the Android syncing issues using an Android device, as I suspect there is something happening at the Android app layer that is not getting caught in the logs. I think we can merge this PR regardless of this, and I can file a follow up issue to investigate further. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @rtibbles I confirm that the merged user syncing issues are indeed fixed now. No issues observed after regression testing so I think this is good to go.
I also confirm my observations from yesterday about Android (for both merged and not merged learner), and today I was not able to sync it with the server as well. Adding the logs from today in case you are able to spot something but as you say it can be tracked separately.
b06cfe2
into
learningequality:release-v0.16.x
Summary
The Learn-only device syncing:
SyncQueue
modelThe
SyncQueue
API:sync_interval
configured on the server (seems like something we can integrate into device management later on)References
#11209
#11166
Reviewer guidance
Running
kolibri manage migrate --database syncqueue
may be required to fully migrate the modelsTesting checklist
PR process
Reviewer checklist
yarn
andpip
)