-
Notifications
You must be signed in to change notification settings - Fork 207
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GafferML #6150
base: 1.5_maintenance
Are you sure you want to change the base?
GafferML #6150
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good to me!
I had to spread the review across 3 days to carefully review everything but hopefully that will be useful.
Looking forward to integrate this in production!
Thanks!
python/GafferML/__init__.py
Outdated
@@ -0,0 +1,49 @@ | |||
########################################################################## | |||
# | |||
# Copyright (c) 2012, John Haddon. All rights reserved. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This copyright is different than most other files, is it on purpose?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That was a sloppy copy-paste - fixed in 26dfaf8.
__import__( "GafferImage" ) | ||
|
||
if hasattr( os, "add_dll_directory" ) : | ||
os.add_dll_directory( ( pathlib.Path( os.environ["ONNX_ROOT"] ) / "lib" ).resolve() ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a question, this is only necessary on Windows?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, correct - Python won't find DLL dependencies of a module unless you tell it where they are with add_dll_directory()
. The code only runs on Windows, because hasattr()
returns false on other platforms.
GafferML.Tensor( IECore.IntVectorData( [ 1, 2, 3, 4 ] ), [ 4 ] ) | ||
] | ||
|
||
self.assertEqual( len( { t.hash() for t in tensors } ), len( tensors ) ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is this testing exactly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It checks that each tensor has a unique hash with respect to the others. If any tensor wasn't unique, the size of the set of hashes would be smaller than the number of tensors.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh I didnt notice the set, now that makes sense.
python/GafferMLTest/TensorTest.py
Outdated
self.assertNotEqual( tensor1, tensor2 ) # Different shape | ||
|
||
tensor2 = GafferML.Tensor( IECore.IntVectorData( [ 3, 2, 1 ] ), [ 3 ] ) | ||
self.assertNotEqual( tensor1, tensor2 ) # Different shape |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should the comment be # Different data
instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed it should - fixed in 9967936.
allocator, shape.data(), shape.size(), ONNX_TENSOR_ELEMENT_DATA_TYPE_BOOL | ||
); | ||
std::copy( typedData->readable().begin(), typedData->readable().end(), value.GetTensorMutableData<bool>() ); | ||
m_state = new State{ std::move( value ), nullptr }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are we not passing the typedData
as the DataPtr
arg here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because the Ort::Value
owns its own copy of the data in this case. We only keep the DataPtr when the Ort::Value
is referencing the data that we own within that DataPtr
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense
size_t i = shape.size() - 3; | ||
for( size_t d = 0; d < i; ++d ) | ||
{ | ||
if( shape[d] != 1 ) | ||
{ | ||
throw IECore::Exception( | ||
fmt::format( | ||
"Expected {} dimensional tensor to have size 1 in dimension {}", | ||
shape.size(), d | ||
) | ||
); | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This might need a revisit once we support models that process frame sequences.
For example the one i'm working with right now would have the following shape [1, 55, 3, 296, 720] that is respectively:
- batch size
- frame number
- channels
- height
- width
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So in that case, would a single tensor contain all the frames? And we'd need to add a TensorToImage::framePlug()
to say which one to extract?
Happy to rejig this in future PRs, but I think it's worth getting this merged as-is and then working from that baseline. It's one of the reasons I've documented that we're not guaranteeing ABI stability at this point.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes I think we should merge as-is but in the future we might need to do what you suggest and have the same for the ImageToTensor
to gather a frame range into a single tensor.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool - those both sound like a good future addition.
const Box2i validTileBound = BufferAlgo::intersection( dataWindow, tileBound ); | ||
out.resize( ImagePlug::tileSize() * ImagePlug::tileSize() ); | ||
|
||
const float *sourceData = tensorData->value().GetTensorData<float>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have you already considered the case where an image tensor is stored as a different data type than float
, for example as double
or FLOAT16
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, no, I have not! I think it would crash right now, at least for the FLOAT16
case. Should I throw if it isn't float? Or do some templating to support the conversion? And if the latter, do I need to worry about integer formats too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't tested a model where the current code wouldnt work but I was considering that for my TensorToMesh
.
We can worry about that later.
Thanks for the thorough review @lucienfostier! I think I've been through and replied to all your comments now. There are a couple where I might still need to push code changes, pending your feedback. I can get onto those tomorrow as necessary... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for addressing the comments, LGTM!
GafferML.Tensor( IECore.IntVectorData( [ 1, 2, 3, 4 ] ), [ 4 ] ) | ||
] | ||
|
||
self.assertEqual( len( { t.hash() for t in tensors } ), len( tensors ) ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh I didnt notice the set, now that makes sense.
|
||
bool Tensor::isEqualTo( const IECore::Object *other ) const | ||
{ | ||
if( !Object::isEqualTo( other ) ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense
allocator, shape.data(), shape.size(), ONNX_TENSOR_ELEMENT_DATA_TYPE_BOOL | ||
); | ||
std::copy( typedData->readable().begin(), typedData->readable().end(), value.GetTensorMutableData<bool>() ); | ||
m_state = new State{ std::move( value ), nullptr }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense
object tensorGetItemTyped( const Tensor &tensor, const std::vector<int64_t> &location ) | ||
{ | ||
return object( | ||
const_cast<Ort::Value &>( tensor.value() ).At<T>( location ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah I see!
script2 = Gaffer.ScriptNode() | ||
script2.execute( script.serialise() ) | ||
self.assertIsInstance( script2["node"]["user"]["p"], GafferML.TensorPlug ) | ||
self.assertEqual( script2["node"]["user"]["p"].getValue(), GafferML.Tensor() ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alright, I guess we dont need that now
for( int x = validTileBound.min.x; x < validTileBound.max.x; ++x ) | ||
{ | ||
dstData[dstIndex] = sourceData[srcIndex++]; | ||
dstIndex += dstStride; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see your point, let's just merge it like that and we can always revisit if it becomes a problem.
vector<int64_t> shape; | ||
if( interleaveChannels ) | ||
{ | ||
shape = { 1, dataWindow.size().y, dataWindow.size().x, (int64_t)channels.size() }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense
src/GafferML/ImageToTensor.cpp
Outdated
static_cast<TensorPlug *>( output )->setValue( tensor ); | ||
} | ||
|
||
ComputeNode::compute( output, context ); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice
size_t i = shape.size() - 3; | ||
for( size_t d = 0; d < i; ++d ) | ||
{ | ||
if( shape[d] != 1 ) | ||
{ | ||
throw IECore::Exception( | ||
fmt::format( | ||
"Expected {} dimensional tensor to have size 1 in dimension {}", | ||
shape.size(), d | ||
) | ||
); | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes I think we should merge as-is but in the future we might need to do what you suggest and have the same for the ImageToTensor
to gather a frame range into a single tensor.
const Box2i validTileBound = BufferAlgo::intersection( dataWindow, tileBound ); | ||
out.resize( ImagePlug::tileSize() * ImagePlug::tileSize() ); | ||
|
||
const float *sourceData = tensorData->value().GetTensorData<float>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't tested a model where the current code wouldnt work but I was considering that for my TensorToMesh
.
We can worry about that later.
This is a wrapper class that will allow us to pass ONNX values through Gaffer's computation graph.
This will be used for passing Tensor values between nodes.
This allows data from elsewhere in Gaffer to be converted for use in GafferML.
This forms the meat of GafferML, loading ONNX models and performing inference using data from an array of input TensorPlugs.
This is a node which converts images from GafferImage into tensors for use by the Inference node.
This allows tensors to be converted back to GafferImage images, after they have been processed by the Inference node.
And advertise them in Changes.md.
I've squashed all the fixups in to the relevant commits, omitting the linear-search one. I've also added one final commit which checks the input tensor type in TensorToImage. Happy to merge? |
Yep, let's merge! |
This dips Gaffer's littlest toe into the swirling waters of machine learning, adding a handful of nodes to do processing using ONNX Runtime. These are :
At this point we are deliberately not shipping any actual ML models with Gaffer, nor any nodes to do specific tasks. This is all on a strictly "bring your own" basis, whereby the preparation of the
.onnx
models is entirely up to the user using external tools. But we believe that Gaffer provides a decent environment for packaging such models into useful Box-based end-user tools that interoperate with Gaffer's other modules, with internal nodes for wrangling the data into appropriate shape, and external plugs to provide user control.Currently the "bring your own" philosophy extends as far as us not even shipping ONNX runtime with Gaffer - instead you must download it yourself and provide it via ONNX_ROOT, a little like is done with 3rd party renderers. In future we might ship the runtime, but right now it adds more to the package size than is justified by GafferML's slightly experimental nature. Development and testing to date has used v1.19.2 from https://github.com/microsoft/onnxruntime/releases/tag/v1.19.2.
@lucienfostier has guided me expertly through the development process on this, and has been busy testing it on various more advanced image-processing tasks in the background. Hopefully he'll be able to share details on that soon, but for the meantime, here are a couple of screenshots I've prepared of Gaffer doing basic image processing using the Segment Anything and Depth Anything models respectively :