Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix when running nsdperf over RoCE #4

Open
wants to merge 5 commits into
base: master
Choose a base branch
from
Open

Conversation

cristeab
Copy link

@cristeab cristeab commented Mar 24, 2021

The original version used only the first GID of a specific network interface. This fix puts in a vector all GIDs it can find for a given interface, then it finds the first interface name and uses it to run the tests.

In order to show the GIDs of a Mellanox ConnectX-6 adapter use:

[root@localhost ~]# show_gids mlx5_0
DEV PORT INDEX GID IPv4 VER DEV


mlx5_0 1 0 fe80:0000:0000:0000:0e42:a1ff:fe5d:4db8 v1 ens1f0
mlx5_0 1 1 fe80:0000:0000:0000:0e42:a1ff:fe5d:4db8 v2 ens1f0
mlx5_0 1 2 fe80:0000:0000:0000:1186:a6b1:3f0b:c441 v1 ens1f0
mlx5_0 1 3 fe80:0000:0000:0000:1186:a6b1:3f0b:c441 v2 ens1f0
mlx5_0 1 4 0000:0000:0000:0000:0000:ffff:ac13:003c 172.19.0.60 v1 ens1f0
mlx5_0 1 5 0000:0000:0000:0000:0000:ffff:ac13:003c 172.19.0.60 v2 ens1f0
n_gids_found=6

Note that in this case the original nsdperf version fails because it uses only the first GID and nsdperf exists with this error below:

[root@localhost ~]# ./nsdperf-rdma -r mlx5_0/1 -s -d
05:32:39.904017 nsdperf-rdma 1.28 server started
Connection from 172.19.4.2
05:32:49.594867 got msg Version ID 2 len 0 from 172.19.4.2/0
05:32:49.620581 got msg Parms ID 4 len 56 from 172.19.4.2/0
05:32:49.642896 RDMA port mlx5_0:1 has no address
05:32:49.643393 sending msg ReplyErr ID 4 len 24 to 172.19.4.2/0
05:33:22.192687 got msg Kill ID 6 len 0 from 172.19.4.2/0
Connection to 172.19.4.2/0 broken
05:33:22.193039 Closed connection to 172.19.4.2/0

@bolinches
Copy link
Contributor

thanks a lot. As soon we can test in our lab will merge. Many thanks for the work

@bolinches bolinches added the enhancement New feature or request label Mar 29, 2021
@bolinches
Copy link
Contributor

@cristeab Sorry for the delay

We have tested it and works nicely, thanks you so much for the effort you put into this. You are going to bare with us here a little bit, let me explain.

We (and I specially) was not expecting to get collaborations on the code this "soon", we have an internal repository where we we have a nsdperf version 1.29 where RoCE is already there plus other things. But your collaboration has put show clearly a few things.

First and foremost current model we use use to develop this tool does not work as-is. We are de facto alienating non IBMers from helping as into the development at best by using internal repos instead of a public one

Second, for once we have a good collaboration we are not sure how to proceed, we have 1.29 there with this and other changes

And last but certainly not least, we need to change how we work here. It won't happen overnight but the conversations have started already and I hope we can come out with something more clear later on. To you but to any other collaborator please bare with us a bit longer.

Thanks a lot for your effort but for now I will not merge the changes until we have a more clear way how we proceed here if fully move 1.29 here, or a variation of your changes and 1.29

I really thank you what you have done once again, I hope you do not feel it is going to waste even if we go with 1.29. Even if that is the case this is clearly showing that we need to change how we move forward with this excellent network benchmark tool.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants