GafferML #6150

johnhaddon · 2024-11-19T16:42:04Z

This dips Gaffer's littlest toe into the swirling waters of machine learning, adding a handful of nodes to do processing using ONNX Runtime. These are :

DataToTensor : Converts Gaffer data to tensors.
Inference : Loads ONNX models and performance inference using an array of input tensors.
ImageToTensor : Converts images to tensors for use with the Inference node.
TensorToImage : Converts tensors back to images following inference.

At this point we are deliberately not shipping any actual ML models with Gaffer, nor any nodes to do specific tasks. This is all on a strictly "bring your own" basis, whereby the preparation of the .onnx models is entirely up to the user using external tools. But we believe that Gaffer provides a decent environment for packaging such models into useful Box-based end-user tools that interoperate with Gaffer's other modules, with internal nodes for wrangling the data into appropriate shape, and external plugs to provide user control.

Currently the "bring your own" philosophy extends as far as us not even shipping ONNX runtime with Gaffer - instead you must download it yourself and provide it via ONNX_ROOT, a little like is done with 3rd party renderers. In future we might ship the runtime, but right now it adds more to the package size than is justified by GafferML's slightly experimental nature. Development and testing to date has used v1.19.2 from https://github.com/microsoft/onnxruntime/releases/tag/v1.19.2.

@lucienfostier has guided me expertly through the development process on this, and has been busy testing it on various more advanced image-processing tasks in the background. Hopefully he'll be able to share details on that soon, but for the meantime, here are a couple of screenshots I've prepared of Gaffer doing basic image processing using the Segment Anything and Depth Anything models respectively :

lucienfostier

This looks good to me!
I had to spread the review across 3 days to carefully review everything but hopefully that will be useful.
Looking forward to integrate this in production!
Thanks!

lucienfostier · 2024-11-22T05:18:12Z

python/GafferML/__init__.py

@@ -0,0 +1,49 @@
+##########################################################################
+#
+#  Copyright (c) 2012, John Haddon. All rights reserved.


This copyright is different than most other files, is it on purpose?

That was a sloppy copy-paste - fixed in 26dfaf8.

lucienfostier · 2024-11-22T05:18:36Z

python/GafferML/__init__.py

+__import__( "GafferImage" )
+
+if hasattr( os, "add_dll_directory" ) :
+	os.add_dll_directory( ( pathlib.Path( os.environ["ONNX_ROOT"] ) / "lib" ).resolve() )


Just a question, this is only necessary on Windows?

Yes, correct - Python won't find DLL dependencies of a module unless you tell it where they are with add_dll_directory(). The code only runs on Windows, because hasattr() returns false on other platforms.

lucienfostier · 2024-11-22T05:27:48Z

python/GafferMLTest/TensorTest.py

+			GafferML.Tensor( IECore.IntVectorData( [ 1, 2, 3, 4 ] ), [ 4 ] )
+		]
+
+		self.assertEqual( len( { t.hash() for t in tensors } ), len( tensors ) )


What is this testing exactly?

It checks that each tensor has a unique hash with respect to the others. If any tensor wasn't unique, the size of the set of hashes would be smaller than the number of tensors.

Oh I didnt notice the set, now that makes sense.

lucienfostier · 2024-11-22T05:29:07Z

python/GafferMLTest/TensorTest.py

+		self.assertNotEqual( tensor1, tensor2 ) # Different shape
+
+		tensor2 = GafferML.Tensor( IECore.IntVectorData( [ 3, 2, 1 ] ), [ 3 ] )
+		self.assertNotEqual( tensor1, tensor2 ) # Different shape


Should the comment be # Different data instead?

Indeed it should - fixed in 9967936.

lucienfostier · 2024-11-22T05:40:53Z

src/GafferML/Tensor.cpp

+					allocator, shape.data(), shape.size(), ONNX_TENSOR_ELEMENT_DATA_TYPE_BOOL
+				);
+				std::copy( typedData->readable().begin(), typedData->readable().end(), value.GetTensorMutableData<bool>() );
+				m_state = new State{ std::move( value ), nullptr };


Why are we not passing the typedData as the DataPtr arg here?

Because the Ort::Value owns its own copy of the data in this case. We only keep the DataPtr when the Ort::Value is referencing the data that we own within that DataPtr.

Makes sense

lucienfostier · 2024-11-24T05:39:00Z

src/GafferML/TensorToImage.cpp

+	size_t i = shape.size() - 3;
+	for( size_t d = 0; d < i; ++d )
+	{
+		if( shape[d] != 1 )
+		{
+			throw IECore::Exception(
+				fmt::format(
+					"Expected {} dimensional tensor to have size 1 in dimension {}",
+					shape.size(), d
+				)
+			);
+		}
+	}


This might need a revisit once we support models that process frame sequences.
For example the one i'm working with right now would have the following shape [1, 55, 3, 296, 720] that is respectively:

batch size

frame number

channels

height

width

So in that case, would a single tensor contain all the frames? And we'd need to add a TensorToImage::framePlug() to say which one to extract?

Happy to rejig this in future PRs, but I think it's worth getting this merged as-is and then working from that baseline. It's one of the reasons I've documented that we're not guaranteeing ABI stability at this point.

Yes I think we should merge as-is but in the future we might need to do what you suggest and have the same for the ImageToTensor to gather a frame range into a single tensor.

Cool - those both sound like a good future addition.

src/GafferML/TensorToImage.cpp

lucienfostier · 2024-11-24T05:51:20Z

src/GafferML/TensorToImage.cpp

+	const Box2i validTileBound = BufferAlgo::intersection( dataWindow, tileBound );
+	out.resize( ImagePlug::tileSize() * ImagePlug::tileSize() );
+
+	const float *sourceData = tensorData->value().GetTensorData<float>();


Have you already considered the case where an image tensor is stored as a different data type than float, for example as double or FLOAT16?

Oh, no, I have not! I think it would crash right now, at least for the FLOAT16 case. Should I throw if it isn't float? Or do some templating to support the conversion? And if the latter, do I need to worry about integer formats too?

I haven't tested a model where the current code wouldnt work but I was considering that for my TensorToMesh.
We can worry about that later.

Changes.md

johnhaddon · 2024-11-25T13:00:46Z

Thanks for the thorough review @lucienfostier! I think I've been through and replied to all your comments now. There are a couple where I might still need to push code changes, pending your feedback. I can get onto those tomorrow as necessary...

lucienfostier

Thanks for addressing the comments, LGTM!

lucienfostier · 2024-11-26T05:33:34Z

python/GafferMLTest/TensorTest.py

+			GafferML.Tensor( IECore.IntVectorData( [ 1, 2, 3, 4 ] ), [ 4 ] )
+		]
+
+		self.assertEqual( len( { t.hash() for t in tensors } ), len( tensors ) )


Oh I didnt notice the set, now that makes sense.

lucienfostier · 2024-11-26T05:34:35Z

src/GafferML/Tensor.cpp

+
+bool Tensor::isEqualTo( const IECore::Object *other ) const
+{
+	if( !Object::isEqualTo( other ) )


Makes sense

lucienfostier · 2024-11-26T05:34:59Z

src/GafferML/Tensor.cpp

+					allocator, shape.data(), shape.size(), ONNX_TENSOR_ELEMENT_DATA_TYPE_BOOL
+				);
+				std::copy( typedData->readable().begin(), typedData->readable().end(), value.GetTensorMutableData<bool>() );
+				m_state = new State{ std::move( value ), nullptr };


Makes sense

lucienfostier · 2024-11-26T05:35:33Z

src/GafferMLModule/GafferMLModule.cpp

+object tensorGetItemTyped( const Tensor &tensor, const std::vector<int64_t> &location )
+{
+	return object(
+		const_cast<Ort::Value &>( tensor.value() ).At<T>( location )


lucienfostier · 2024-11-26T05:35:56Z

python/GafferMLTest/TensorPlugTest.py

+		script2 = Gaffer.ScriptNode()
+		script2.execute( script.serialise() )
+		self.assertIsInstance( script2["node"]["user"]["p"], GafferML.TensorPlug )
+		self.assertEqual( script2["node"]["user"]["p"].getValue(), GafferML.Tensor() )


Alright, I guess we dont need that now

lucienfostier · 2024-11-26T05:45:11Z

src/GafferML/ImageToTensor.cpp

+					for( int x = validTileBound.min.x; x < validTileBound.max.x; ++x )
+					{
+						dstData[dstIndex] = sourceData[srcIndex++];
+						dstIndex += dstStride;
+					}


I see your point, let's just merge it like that and we can always revisit if it becomes a problem.

lucienfostier · 2024-11-26T05:45:19Z

src/GafferML/ImageToTensor.cpp

+		vector<int64_t> shape;
+		if( interleaveChannels )
+		{
+			shape = { 1, dataWindow.size().y, dataWindow.size().x, (int64_t)channels.size() };


Makes sense

lucienfostier · 2024-11-26T05:45:45Z

src/GafferML/ImageToTensor.cpp

+		static_cast<TensorPlug *>( output )->setValue( tensor );
+	}
+
+	ComputeNode::compute( output, context );


lucienfostier · 2024-11-26T05:47:34Z

src/GafferML/TensorToImage.cpp

+	size_t i = shape.size() - 3;
+	for( size_t d = 0; d < i; ++d )
+	{
+		if( shape[d] != 1 )
+		{
+			throw IECore::Exception(
+				fmt::format(
+					"Expected {} dimensional tensor to have size 1 in dimension {}",
+					shape.size(), d
+				)
+			);
+		}
+	}


Yes I think we should merge as-is but in the future we might need to do what you suggest and have the same for the ImageToTensor to gather a frame range into a single tensor.

lucienfostier · 2024-11-26T05:48:36Z

src/GafferML/TensorToImage.cpp

+	const Box2i validTileBound = BufferAlgo::intersection( dataWindow, tileBound );
+	out.resize( ImagePlug::tileSize() * ImagePlug::tileSize() );
+
+	const float *sourceData = tensorData->value().GetTensorData<float>();


I haven't tested a model where the current code wouldnt work but I was considering that for my TensorToMesh.
We can worry about that later.

This is a wrapper class that will allow us to pass ONNX values through Gaffer's computation graph.

This will be used for passing Tensor values between nodes.

This allows data from elsewhere in Gaffer to be converted for use in GafferML.

This forms the meat of GafferML, loading ONNX models and performing inference using data from an array of input TensorPlugs.

This is a node which converts images from GafferImage into tensors for use by the Inference node.

This allows tensors to be converted back to GafferImage images, after they have been processed by the Inference node.

And advertise them in Changes.md.

johnhaddon · 2024-11-26T11:42:10Z

Thanks for addressing the comments, LGTM!

I've squashed all the fixups in to the relevant commits, omitting the linear-search one. I've also added one final commit which checks the input tensor type in TensorToImage. Happy to merge?

lucienfostier · 2024-11-26T16:25:17Z

Yep, let's merge!

johnhaddon requested review from lucienfostier and ericmehl November 19, 2024 16:42

johnhaddon self-assigned this Nov 19, 2024

lucienfostier suggested changes Nov 24, 2024

View reviewed changes

johnhaddon force-pushed the onnx branch from 0a30f46 to 0d6779c Compare November 25, 2024 12:44

lucienfostier approved these changes Nov 26, 2024

View reviewed changes

johnhaddon added 12 commits November 26, 2024 11:10

PlugLayout : Update activations when children added/removed

724d182

GafferML : Add library and module boilerplate

597afa5

JH config : Build GafferML

24ec3f1

GafferML : Add Tensor class

0e5eb74

This is a wrapper class that will allow us to pass ONNX values through Gaffer's computation graph.

GafferML : Add TensorPlug

7728139

This will be used for passing Tensor values between nodes.

GafferML : Add DataToTensor node

5dcee4c

This allows data from elsewhere in Gaffer to be converted for use in GafferML.

GafferML : Add Inference node

df99cff

This forms the meat of GafferML, loading ONNX models and performing inference using data from an array of input TensorPlugs.

GafferML : Add ImageToTensor

6d95369

This is a node which converts images from GafferImage into tensors for use by the Inference node.

GafferML : Add TensorToImage

5e886b9

This allows tensors to be converted back to GafferImage images, after they have been processed by the Inference node.

GUI startup : Add GafferML nodes to NodeMenu

a4cf447

And advertise them in Changes.md.

CI : Build and test GafferML

1e151d0

TensorToImage : Check tensor data type

674f3a8

johnhaddon force-pushed the onnx branch from af01802 to 674f3a8 Compare November 26, 2024 11:39

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GafferML #6150

GafferML #6150

johnhaddon commented Nov 19, 2024 •

edited

Loading

lucienfostier left a comment

lucienfostier Nov 22, 2024

johnhaddon Nov 25, 2024

lucienfostier Nov 22, 2024

johnhaddon Nov 25, 2024

lucienfostier Nov 22, 2024

johnhaddon Nov 25, 2024

lucienfostier Nov 26, 2024

lucienfostier Nov 22, 2024

johnhaddon Nov 25, 2024

lucienfostier Nov 22, 2024

johnhaddon Nov 25, 2024

lucienfostier Nov 26, 2024

lucienfostier Nov 24, 2024

johnhaddon Nov 25, 2024

lucienfostier Nov 26, 2024

johnhaddon Nov 26, 2024

lucienfostier Nov 24, 2024

johnhaddon Nov 25, 2024

lucienfostier Nov 26, 2024

johnhaddon commented Nov 25, 2024

lucienfostier left a comment

lucienfostier Nov 26, 2024

lucienfostier Nov 26, 2024

lucienfostier Nov 26, 2024

lucienfostier Nov 26, 2024

lucienfostier Nov 26, 2024

lucienfostier Nov 26, 2024

lucienfostier Nov 26, 2024

lucienfostier Nov 26, 2024

lucienfostier Nov 26, 2024

lucienfostier Nov 26, 2024

johnhaddon commented Nov 26, 2024

lucienfostier commented Nov 26, 2024

GafferML #6150

Are you sure you want to change the base?

GafferML #6150

Conversation

johnhaddon commented Nov 19, 2024 • edited Loading

lucienfostier left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

johnhaddon commented Nov 25, 2024

lucienfostier left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

johnhaddon commented Nov 26, 2024

lucienfostier commented Nov 26, 2024

johnhaddon commented Nov 19, 2024 •

edited

Loading