Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: Timeout issues occurred with search/flush/load operations after upgrading using Kafka as MQ. #37926

Open
1 task done
zhuwenxing opened this issue Nov 22, 2024 · 3 comments
Assignees
Labels
kind/bug Issues or changes related a bug triage/accepted Indicates an issue or PR is ready to be actively worked on.
Milestone

Comments

@zhuwenxing
Copy link
Contributor

Is there an existing issue for this?

  • I have searched the existing issues

Environment

- Milvus version:2.4.14--> master-20241122-cfa1f1f1-amd64
- Deployment mode(standalone or cluster):both
- MQ type(rocksmq, pulsar or kafka):    
- SDK version(e.g. pymilvus v2.0.0rc2):
- OS(Ubuntu or CentOS): 
- CPU/Memory: 
- GPU: 
- Others:

Current Behavior


[2024-11-22T03:47:21.690Z] <name>: Checker__WsheWmII

[2024-11-22T03:47:21.690Z] <description>: 

[2024-11-22T03:47:21.690Z] <schema>: {'auto_id': False, 'description': '', 'fields': [{'name': 'int64', 'description': '', 'type': <DataType.INT64: 5>, 'is_primary': True, 'auto_id': False}, {'name': 'float', 'description': '', 'type': <DataType.FLOAT: 10>}......  (api_request.py:37)

[2024-11-22T03:47:21.690Z] [2024-11-22 03:44:08 - DEBUG - ci_test]: (api_request)  : [Collection.flush] args: [], kwargs: {'timeout': 180} (api_request.py:62)

[2024-11-22T03:47:21.690Z] [2024-11-22 03:47:08 - WARNING - pymilvus.decorators]: Retry timeout: 180s (decorators.py:106)

[2024-11-22T03:47:21.690Z] [2024-11-22 03:47:08 - ERROR - pymilvus.decorators]: RPC error: [flush], <MilvusException: (code=1, message=Retry timeout: 180s, message=wait for flush timeout, collection: Checker__WsheWmII, flusht_ts: 454098170359578632)>, <Time:{'RPC start': '2024-11-22 03:44:08.822834', 'RPC error': '2024-11-22 03:47:08.944214'}> (decorators.py:140)

[2024-11-22T03:47:21.690Z] [2024-11-22 03:47:08 - ERROR - ci_test]: Traceback (most recent call last):

[2024-11-22T03:47:21.690Z]   File "/home/jenkins/agent/workspace/tests/python_client/utils/api_request.py", line 32, in inner_wrapper

[2024-11-22T03:47:21.690Z]     res = func(*args, **_kwargs)

[2024-11-22T03:47:21.690Z]   File "/home/jenkins/agent/workspace/tests/python_client/utils/api_request.py", line 63, in api_request

[2024-11-22T03:47:21.690Z]     return func(*arg, **kwargs)

[2024-11-22T03:47:21.690Z]   File "/usr/local/lib/python3.8/dist-packages/pymilvus/orm/collection.py", line 318, in flush

[2024-11-22T03:47:21.690Z]     conn.flush([self.name], timeout=timeout, **kwargs)

[2024-11-22T03:47:21.690Z]   File "/usr/local/lib/python3.8/dist-packages/pymilvus/decorators.py", line 141, in handler

[2024-11-22T03:47:21.690Z]     raise e from e

[2024-11-22T03:47:21.690Z]   File "/usr/local/lib/python3.8/dist-packages/pymilvus/decorators.py", line 137, in handler

[2024-11-22T03:47:21.690Z]     return func(*args, **kwargs)

[2024-11-22T03:47:21.690Z]   File "/usr/local/lib/python3.8/dist-packages/pymilvus/decorators.py", line 176, in handler

[2024-11-22T03:47:21.690Z]     return func(self, *args, **kwargs)

[2024-11-22T03:47:21.690Z]   File "/usr/local/lib/python3.8/dist-packages/pymilvus/decorators.py", line 107, in handler

[2024-11-22T03:47:21.690Z]     raise MilvusException(

[2024-11-22T03:47:21.690Z] pymilvus.exceptions.MilvusException: <MilvusException: (code=1, message=Retry timeout: 180s, message=wait for flush timeout, collection: Checker__WsheWmII, flusht_ts: 454098170359578632)>

[2024-11-22T03:47:21.690Z]  (api_request.py:45)

[2024-11-22T03:47:21.690Z] [2024-11-22 03
[2024-11-22T03:41:57.652Z] 2024-11-22 03:41:57.227 | INFO     | __main__:search_test:53 - recall test for index type HNSW

[2024-11-22T03:41:57.907Z] 2024-11-22 03:41:57.685 | INFO     | __main__:search_test:63 - 

[2024-11-22T03:41:57.907Z] Search...

[2024-11-22T03:45:19.279Z] grpc RpcError: [search], <_InactiveRpcError: StatusCode.DEADLINE_EXCEEDED, Deadline Exceeded>, <Time:{'RPC start': '2024-11-22 03:41:57.879525', 'gRPC error': '2024-11-22 03:45:17.956501'}>

[2024-11-22T03:45:19.282Z] Traceback (most recent call last):

[2024-11-22T03:45:19.282Z]   File "scripts/second_recall_test.py", line 103, in <module>

[2024-11-22T03:45:19.282Z]     search_test(host, index_type)

[2024-11-22T03:45:19.282Z]   File "scripts/second_recall_test.py", line 65, in search_test

[2024-11-22T03:45:19.282Z]     res = collection.search(

[2024-11-22T03:45:19.282Z]   File "/usr/local/lib/python3.8/dist-packages/pymilvus/orm/collection.py", line 801, in search

[2024-11-22T03:45:19.282Z]     resp = conn.search(

[2024-11-22T03:45:19.282Z]   File "/usr/local/lib/python3.8/dist-packages/pymilvus/decorators.py", line 155, in handler

[2024-11-22T03:45:19.282Z]     raise e from e

[2024-11-22T03:45:19.282Z]   File "/usr/local/lib/python3.8/dist-packages/pymilvus/decorators.py", line 137, in handler

[2024-11-22T03:45:19.282Z]     return func(*args, **kwargs)

[2024-11-22T03:45:19.282Z]   File "/usr/local/lib/python3.8/dist-packages/pymilvus/decorators.py", line 176, in handler

[2024-11-22T03:45:19.282Z]     return func(self, *args, **kwargs)

[2024-11-22T03:45:19.282Z]   File "/usr/local/lib/python3.8/dist-packages/pymilvus/decorators.py", line 90, in handler

[2024-11-22T03:45:19.282Z]     raise e from e

[2024-11-22T03:45:19.282Z]   File "/usr/local/lib/python3.8/dist-packages/pymilvus/decorators.py", line 86, in handler

[2024-11-22T03:45:19.282Z]     return func(*args, **kwargs)

[2024-11-22T03:45:19.282Z]   File "/usr/local/lib/python3.8/dist-packages/pymilvus/client/grpc_handler.py", line 806, in search

[2024-11-22T03:45:19.282Z]     return self._execute_search(request, timeout, round_decimal=round_decimal, **kwargs)

[2024-11-22T03:45:19.282Z]   File "/usr/local/lib/python3.8/dist-packages/pymilvus/client/grpc_handler.py", line 747, in _execute_search

[2024-11-22T03:45:19.282Z]     raise e from e

[2024-11-22T03:45:19.282Z]   File "/usr/local/lib/python3.8/dist-packages/pymilvus/client/grpc_handler.py", line 735, in _execute_search

[2024-11-22T03:45:19.282Z]     response = self._stub.Search(request, timeout=timeout)

[2024-11-22T03:45:19.282Z]   File "/usr/local/lib/python3.8/dist-packages/grpc/_interceptor.py", line 277, in __call__

[2024-11-22T03:45:19.282Z]     response, ignored_call = self._with_call(

[2024-11-22T03:45:19.282Z]   File "/usr/local/lib/python3.8/dist-packages/grpc/_interceptor.py", line 332, in _with_call

[2024-11-22T03:45:19.282Z]     return call.result(), call

[2024-11-22T03:45:19.282Z]   File "/usr/local/lib/python3.8/dist-packages/grpc/_channel.py", line 438, in result

[2024-11-22T03:45:19.282Z]     raise self

[2024-11-22T03:45:19.282Z]   File "/usr/local/lib/python3.8/dist-packages/grpc/_interceptor.py", line 315, in continuation

[2024-11-22T03:45:19.282Z]     response, call = self._thunk(new_method).with_call(

[2024-11-22T03:45:19.282Z]   File "/usr/local/lib/python3.8/dist-packages/grpc/_interceptor.py", line 343, in with_call

[2024-11-22T03:45:19.282Z]     return self._with_call(

[2024-11-22T03:45:19.282Z]   File "/usr/local/lib/python3.8/dist-packages/grpc/_interceptor.py", line 332, in _with_call

[2024-11-22T03:45:19.282Z]     return call.result(), call

[2024-11-22T03:45:19.282Z]   File "/usr/local/lib/python3.8/dist-packages/grpc/_channel.py", line 438, in result

[2024-11-22T03:45:19.282Z]     raise self

[2024-11-22T03:45:19.282Z]   File "/usr/local/lib/python3.8/dist-packages/grpc/_interceptor.py", line 315, in continuation

[2024-11-22T03:45:19.282Z]     response, call = self._thunk(new_method).with_call(

[2024-11-22T03:45:19.282Z]   File "/usr/local/lib/python3.8/dist-packages/grpc/_channel.py", line 1178, in with_call

[2024-11-22T03:45:19.282Z]     return _end_unary_response_blocking(state, call, True, None)

[2024-11-22T03:45:19.283Z]   File "/usr/local/lib/python3.8/dist-packages/grpc/_channel.py", line 1004, in _end_unary_response_blocking

[2024-11-22T03:45:19.283Z]     raise _InactiveRpcError(state)  # pytype: disable=not-instantiable

[2024-11-22T03:45:19.283Z] grpc._channel._InactiveRpcError: <_InactiveRpcError of RPC that terminated with:

[2024-11-22T03:45:19.283Z] 	status = StatusCode.DEADLINE_EXCEEDED

[2024-11-22T03:45:19.283Z] 	details = "Deadline Exceeded"

[2024-11-22T03:45:19.283Z] 	debug_error_string = "UNKNOWN:Error received from peer  {grpc_message:"Deadline Exceeded", grpc_status:4, created_time:"2024-11-22T03:45:17.955338033+00:00"}"

[2024-11-22T03:45:19.283Z] >

script returned exit code 1

Expected Behavior

No response

Steps To Reproduce

No response

Milvus Log

pod info

[2024-11-22T03:34:26.794Z] + kubectl get pods -o wide

[2024-11-22T03:34:26.795Z] + grep kafka-cluster-upgrade-2929

[2024-11-22T03:34:27.051Z] kafka-cluster-upgrade-2929-0                                     2/2     Running       1 (2m47s ago)    3m5s    10.104.15.65    4am-node20   <none>           <none>

[2024-11-22T03:34:27.052Z] kafka-cluster-upgrade-2929-1                                     2/2     Running       2 (2m38s ago)    3m5s    10.104.34.112   4am-node37   <none>           <none>

[2024-11-22T03:34:27.052Z] kafka-cluster-upgrade-2929-2                                     2/2     Running       2 (2m36s ago)    3m5s    10.104.24.96    4am-node29   <none>           <none>

[2024-11-22T03:34:27.052Z] kafka-cluster-upgrade-2929-etcd-0                                1/1     Running       0                3m5s    10.104.34.111   4am-node37   <none>           <none>

[2024-11-22T03:34:27.052Z] kafka-cluster-upgrade-2929-etcd-1                                1/1     Running       0                3m5s    10.104.15.66    4am-node20   <none>           <none>

[2024-11-22T03:34:27.052Z] kafka-cluster-upgrade-2929-etcd-2                                1/1     Running       0                3m5s    10.104.24.97    4am-node29   <none>           <none>

[2024-11-22T03:34:27.052Z] kafka-cluster-upgrade-2929-exporter-ddb49dc84-mbkkx              1/1     Running       0                3m5s    10.104.32.132   4am-node39   <none>           <none>

[2024-11-22T03:34:27.052Z] kafka-cluster-upgrade-2929-milvus-datacoord-fc89d4b57-k7dhk      1/1     Running       3 (2m13s ago)    3m5s    10.104.21.184   4am-node24   <none>           <none>

[2024-11-22T03:34:27.052Z] kafka-cluster-upgrade-2929-milvus-datanode-8474d86456-6jkhc      1/1     Running       2 (2m34s ago)    3m5s    10.104.19.176   4am-node28   <none>           <none>

[2024-11-22T03:34:27.052Z] kafka-cluster-upgrade-2929-milvus-datanode-8474d86456-9fn74      1/1     Running       3 (2m13s ago)    3m5s    10.104.21.185   4am-node24   <none>           <none>

[2024-11-22T03:34:27.052Z] kafka-cluster-upgrade-2929-milvus-indexcoord-fd67cd6cb-qdwcx     1/1     Running       0                3m5s    10.104.21.187   4am-node24   <none>           <none>

[2024-11-22T03:34:27.052Z] kafka-cluster-upgrade-2929-milvus-indexnode-5b94cb684c-4v6qq     1/1     Running       3 (2m12s ago)    3m5s    10.104.21.186   4am-node24   <none>           <none>

[2024-11-22T03:34:27.052Z] kafka-cluster-upgrade-2929-milvus-indexnode-5b94cb684c-m6sbv     1/1     Running       3 (2m8s ago)     3m5s    10.104.15.67    4am-node20   <none>           <none>

[2024-11-22T03:34:27.052Z] kafka-cluster-upgrade-2929-milvus-indexnode-5b94cb684c-rcd4q     1/1     Running       2 (2m33s ago)    3m5s    10.104.19.177   4am-node28   <none>           <none>

[2024-11-22T03:34:27.052Z] kafka-cluster-upgrade-2929-milvus-proxy-67d7576588-qt28d         1/1     Running       3 (2m8s ago)     3m5s    10.104.21.188   4am-node24   <none>           <none>

[2024-11-22T03:34:27.052Z] kafka-cluster-upgrade-2929-milvus-querycoord-56dd4fc795-vtrpc    1/1     Running       3 (2m8s ago)     3m5s    10.104.21.189   4am-node24   <none>           <none>

[2024-11-22T03:34:27.052Z] kafka-cluster-upgrade-2929-milvus-querynode-666fbb45f4-6n4j9     1/1     Running       3 (2m9s ago)     3m5s    10.104.6.113    4am-node13   <none>           <none>

[2024-11-22T03:34:27.052Z] kafka-cluster-upgrade-2929-milvus-querynode-666fbb45f4-m8kmm     1/1     Running       3 (2m9s ago)     3m4s    10.104.20.215   4am-node22   <none>           <none>

[2024-11-22T03:34:27.052Z] kafka-cluster-upgrade-2929-milvus-querynode-666fbb45f4-q9tpj     1/1     Running       3 (2m10s ago)    3m4s    10.104.15.70    4am-node20   <none>           <none>

[2024-11-22T03:34:27.052Z] kafka-cluster-upgrade-2929-milvus-rootcoord-ccb55bb5c-tj89s      1/1     Running       3 (2m10s ago)    3m4s    10.104.15.68    4am-node20   <none>           <none>

[2024-11-22T03:34:27.052Z] kafka-cluster-upgrade-2929-minio-0                               1/1     Running       0                3m      10.104.15.71    4am-node20   <none>           <none>

[2024-11-22T03:34:27.052Z] kafka-cluster-upgrade-2929-minio-1                               1/1     Running       0                3m      10.104.34.114   4am-node37   <none>           <none>

[2024-11-22T03:34:27.052Z] kafka-cluster-upgrade-2929-minio-2                               1/1     Running       0                3m      10.104.24.99    4am-node29   <none>           <none>

[2024-11-22T03:34:27.052Z] kafka-cluster-upgrade-2929-minio-3                               1/1     Running       0                3m      10.104.23.54    4am-node27   <none>           <none>

[2024-11-22T03:34:27.052Z] kafka-cluster-upgrade-2929-zookeeper-0                           1/1     Running       0                3m5s    10.104.34.113   4am-node37   <none>           <none>

[2024-11-22T03:34:27.052Z] kafka-cluster-upgrade-2929-zookeeper-1                           1/1     Running       0                3m4s    10.104.15.69    4am-node20   <none>           <none>

[2024-11-22T03:34:27.052Z] kafka-cluster-upgrade-2929-zookeeper-2                           1/1     Running       0                3m4s    10.104.24.98    4am-node29   <none>           <none>

failed job: https://qa-jenkins.milvus.io/blue/organizations/jenkins/deploy_test_kafka_cron/detail/deploy_test_kafka_cron/2929/pipeline/385
log:
artifacts-kafka-cluster-upgrade-2929-server-second-deployment-logs.tar.gz

Anything else?

No response

@zhuwenxing zhuwenxing added kind/bug Issues or changes related a bug needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Nov 22, 2024
@zhuwenxing
Copy link
Contributor Author

image

checkpoint tag increase a lot

@zhuwenxing
Copy link
Contributor Author

kafka metric

image

@yanliang567
Copy link
Contributor

/assign @liliu-z
/unassign

@sre-ci-robot sre-ci-robot assigned liliu-z and unassigned yanliang567 Nov 22, 2024
@yanliang567 yanliang567 added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Nov 22, 2024
@yanliang567 yanliang567 added this to the 2.5.0 milestone Nov 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Issues or changes related a bug triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
None yet
Development

No branches or pull requests

3 participants