Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tensorflow error using GPU for training on M1 Mac #125

Open
bfhealy opened this issue Oct 13, 2022 · 1 comment
Open

tensorflow error using GPU for training on M1 Mac #125

bfhealy opened this issue Oct 13, 2022 · 1 comment
Labels
bug Something isn't working

Comments

@bfhealy
Copy link
Collaborator

bfhealy commented Oct 13, 2022

When running ./scope.py train and including --gpu 0 to specify use of the GPU, I get an error even though the GPU is recognized and available. I think this may happen because the ResourceApplyAdamWithAmsgrad operation is not currently supported by tensorflow-metal (see e.g. this discussion and its similarity to the error messages below). I've tried upgrading to the latest version of tensorflow-metal (0.6.0) but still get the error. Fortunately training still runs reasonably fast on the CPU.

tensorflow.python.framework.errors_impl.InvalidArgumentError: Cannot assign a device for operation model/conv_conv_1/separable_conv2d/ReadVariableOp: Could not satisfy explicit device specification '' because the node {{colocation_node model/conv_conv_1/separable_conv2d/ReadVariableOp}} was colocated with a group of nodes that required incompatible device '/job:localhost/replica:0/task:0/device:GPU:0'. All available devices [/job:localhost/replica:0/task:0/device:CPU:0, /job:localhost/replica:0/task:0/device:GPU:0].

Colocation Debug Info: Colocation group had the following types and supported devices: Root Member(assigned_device_name_index_=2 requested_device_name_='/job:localhost/replica:0/task:0/device:GPU:0' assigned_device_name_='/job:localhost/replica:0/task:0/device:GPU:0' resource_device_name_='/job:localhost/replica:0/task:0/device:GPU:0' supported_device_types_=[CPU] possible_devices_=[] ResourceApplyAdamWithAmsgrad: CPU ReadVariableOp: GPU CPU _Arg: GPU CPU

@bfhealy bfhealy added the bug Something isn't working label Oct 13, 2022
@bfhealy
Copy link
Collaborator Author

bfhealy commented Jan 17, 2023

Update: version 0.7.0 of tensorflow-metal does not raise the above error. However, running training on the GPU takes slightly longer per epoch (and overall) than using the CPU on my M1 Mac.

Side note: In the latest version of tensorflow-macos (2.11), the Keras optimizer in nn.py needs to be changed to tf.keras.optimizers.legacy.Adam to work properly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant