tensorflow error using GPU for training on M1 Mac #125

bfhealy · 2022-10-13T16:01:40Z

When running ./scope.py train and including --gpu 0 to specify use of the GPU, I get an error even though the GPU is recognized and available. I think this may happen because the ResourceApplyAdamWithAmsgrad operation is not currently supported by tensorflow-metal (see e.g. this discussion and its similarity to the error messages below). I've tried upgrading to the latest version of tensorflow-metal (0.6.0) but still get the error. Fortunately training still runs reasonably fast on the CPU.

tensorflow.python.framework.errors_impl.InvalidArgumentError: Cannot assign a device for operation model/conv_conv_1/separable_conv2d/ReadVariableOp: Could not satisfy explicit device specification '' because the node {{colocation_node model/conv_conv_1/separable_conv2d/ReadVariableOp}} was colocated with a group of nodes that required incompatible device '/job:localhost/replica:0/task:0/device:GPU:0'. All available devices [/job:localhost/replica:0/task:0/device:CPU:0, /job:localhost/replica:0/task:0/device:GPU:0].

Colocation Debug Info: Colocation group had the following types and supported devices: Root Member(assigned_device_name_index_=2 requested_device_name_='/job:localhost/replica:0/task:0/device:GPU:0' assigned_device_name_='/job:localhost/replica:0/task:0/device:GPU:0' resource_device_name_='/job:localhost/replica:0/task:0/device:GPU:0' supported_device_types_=[CPU] possible_devices_=[] ResourceApplyAdamWithAmsgrad: CPU ReadVariableOp: GPU CPU _Arg: GPU CPU

The text was updated successfully, but these errors were encountered:

bfhealy · 2023-01-17T22:18:22Z

Update: version 0.7.0 of tensorflow-metal does not raise the above error. However, running training on the GPU takes slightly longer per epoch (and overall) than using the CPU on my M1 Mac.

Side note: In the latest version of tensorflow-macos (2.11), the Keras optimizer in nn.py needs to be changed to tf.keras.optimizers.legacy.Adam to work properly.

bfhealy added the bug Something isn't working label Oct 13, 2022

bfhealy mentioned this issue Jan 18, 2023

Fix callbacks for pre-trained model, use legacy optimizer #214

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

tensorflow error using GPU for training on M1 Mac #125

tensorflow error using GPU for training on M1 Mac #125

bfhealy commented Oct 13, 2022

bfhealy commented Jan 17, 2023

tensorflow error using GPU for training on M1 Mac #125

tensorflow error using GPU for training on M1 Mac #125

Comments

bfhealy commented Oct 13, 2022

bfhealy commented Jan 17, 2023