Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

question about emb_device and device for History class #16

Open
Chen-Cai-OSU opened this issue May 10, 2022 · 4 comments
Open

question about emb_device and device for History class #16

Chen-Cai-OSU opened this issue May 10, 2022 · 4 comments

Comments

@Chen-Cai-OSU
Copy link

Hello Matthias,
Thank you very much for the code. Nice work as always.
I was wondering what is the difference between emb_device and device for the History class?

When I initialized a GCN like this

model = GCN(10, 10, 10, 10, 5, device='cpu')
print(model)

I get

GCN(
  (histories): ModuleList(
    (0): History(10, 10, emb_device=cpu, device=cpu)
    (1): History(10, 10, emb_device=cpu, device=cpu)
    (2): History(10, 10, emb_device=cpu, device=cpu)
    (3): History(10, 10, emb_device=cpu, device=cpu)
  )
  (lins): ModuleList()
  (convs): ModuleList(
    (0): GCNConv(10, 10)
    (1): GCNConv(10, 10)
    (2): GCNConv(10, 10)
    (3): GCNConv(10, 10)
    (4): GCNConv(10, 10)
  )
  (bns): ModuleList(
    (0): BatchNorm1d(10, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (1): BatchNorm1d(10, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (2): BatchNorm1d(10, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (3): BatchNorm1d(10, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (4): BatchNorm1d(10, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  )
)

but I noticed that there is a process in cuda:0 (I have multiple gpus), which I don't understand why. Is this desirable behavior?
Also, in general, should I always set the device in GCN class as none? I noticed this is what you did in the large_benchmark/main.py.

@rusty1s
Copy link
Owner

rusty1s commented May 11, 2022

device refers to the device your model is on, while emb_device refers to the device where the historical embeddings are stored. In general, device=cuda and emb_device=cpu. Note that the device will be automatically set in case you call model.to(device).

@Chen-Cai-OSU
Copy link
Author

Chen-Cai-OSU commented May 11, 2022

Thank you for the explanation. What I don't understand is that when I run the following code,

model = GCN(10, 10, 10, 10, 2, device='cpu').to('cuda:3')
print(model)

I got

GCN(
  (histories): ModuleList(
    (0): History(10, 10, emb_device=cpu, device=cuda:3)
  )
  (lins): ModuleList()
  (convs): ModuleList(
    (0): GCNConv(10, 10)
    (1): GCNConv(10, 10)
  )
  (bns): ModuleList(
    (0): BatchNorm1d(10, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (1): BatchNorm1d(10, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  )
)

I observe there is a process both on cuda:0 and cuda:3. (I expect only cuda:3 is used) Does that mean the emb_device is somehow not CPU? I also print out the self.emb = torch.empty(num_embeddings, embedding_dim, device=device, pin_memory=pin_memory) when History class is initialized and it is indeed the CPU. I just don't how why cuda:0 is used.

I am using torch 1.10.0 + cuda 11.3 + pyg 2.0.4 + python 3.7.13. Let me know if you need more info. Thank you!

@rusty1s
Copy link
Owner

rusty1s commented May 12, 2022

Yes, this looks correct to me. Histories will be on CPU while model parameters are on cuda:3. If there is a process running on cuda:0, that is definitely a bug I can try to look into. Any pointers highly appreciated.

@Chen-Cai-OSU
Copy link
Author

I don't know what the possible reasons are. I also tried pyg=2.0.4 + torch 1.7.1 + cuda 11.0 and get the same error. To reproduce the error, just add the following line in models/gcn.py

if __name__ == 'main':
  model = GCN(10, 10, 10, 10, 5, device='cpu')
  print(model)

and run python -m torch_geometric_autoscale.models.gcn

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants