Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dynamically Linked Library in CPP #11439

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

soumiiow
Copy link

@soumiiow soumiiow commented Nov 5, 2024

Related to prestodb/presto#23634 in the Prestissimo space
and based off of the following PR: https://github.com/facebookincubator/velox/pull/1005/files

These changes will allow users to dynamically load functions in prestissimo using cpp. The Presto Server will use this library to dynamically load User Defined Functions (UDFs), connectors, or types.

an example of dynamically registering a function is also provided for reference, along with a unit test

Currently, this library works on linux machines but not MacOS.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Nov 5, 2024
Copy link

netlify bot commented Nov 5, 2024

Deploy Preview for meta-velox ready!

Name Link
🔨 Latest commit 4e82b71
🔍 Latest deploy log https://app.netlify.com/sites/meta-velox/deploys/674fff17c8bdbe0008b882de
😎 Deploy Preview https://deploy-preview-11439--meta-velox.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

@Yuhta Yuhta requested a review from pedroerp November 5, 2024 15:40
@pedroerp
Copy link
Contributor

pedroerp commented Nov 5, 2024

@soumiiow thanks for looking into this. Out of curiosity, why doesn't this work in MacOS?

@@ -15,6 +15,7 @@ add_subdirectory(base)
add_subdirectory(caching)
add_subdirectory(compression)
add_subdirectory(config)
add_subdirectory(dynamicRegistry)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we use snake case for directory names "dynamic_registry"

Copy link
Contributor

@pedroerp pedroerp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very cool! I few small comments but overall looks good.

#include <dlfcn.h>
#include <iostream>
#include "velox/common/base/Exceptions.h"
namespace facebook::velox {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: new line before namespace definition.

VELOX_USER_FAIL("Couldn't find Velox registry symbol: {}", error);
}
registryItem();
std::cout << "LOADED DYLLIB 1" << std::endl;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for consistency, could you use LOG(INFO) and print the file name / path of the library loaded?


static constexpr const char* kSymbolName = "registry";

void loadDynamicLibraryFunctions(const char* fileName) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can probably omit the "Functions" from the name, and this can be used to really load anything, as long as you provide the registration functions. Let's name it loadDynamicLibrary()

### 1. Create a cpp file for your dynamic library
For dynamically loaded function registration, the format followed is mirrored of that of built-in function registration with some noted differences. Using [MyDynamicTestFunction.cpp](tests/MyDynamicTestFunction.cpp) as an example, the function uses the extern "C" keyword to protect against name mangling. A registry() function call is also necessary here.

### 2. Register functions dynamically by creating .dylib or .so shared libraries and dropping them in a plugin directory
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: the titles are too long; maybe just add the docs as a refular numbered list?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I pushed it out without the title formatting but does this look a bit cluttered now?

auto signaturesBefore = getFunctionSignatures().size();

// Function does not exist yet.
EXPECT_THROW(dynamicFunction(0), VeloxUserError);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you use VELOX_ASSERT_THROW() instead to validate the right exception is being thrown?

# `MyDynamicFunction.cpp` as a small .so library, and use the
# MY_DYNAMIC_FUNCTION_LIBRARY_PATH macro to locate the .so binary.
add_compile_definitions(
MY_DYNAMIC_FUNCTION_LIBRARY_PATH="${CMAKE_CURRENT_BINARY_DIR}")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please vendor the macro. Maybe something like VELOX_TEST_DYNAMIC_LIBRARY_PATH

* limitations under the License.
*/

#include "velox/common/dynamicRegistry/DynamicLibraryLoader.h"
Copy link
Collaborator

@majetideepak majetideepak Nov 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this header include is not required.


// Dynamically load the library.
std::string libraryPath = MY_DYNAMIC_FUNCTION_LIBRARY_PATH;
libraryPath += "/libvelox_function_my_dynamic.so";
Copy link
Collaborator

@majetideepak majetideepak Nov 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What else is an issue for MacOS?

Comment on lines 22 to 23
${GMock}
${GTEST_BOTH_LIBRARIES})
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please use the GTest:: targets

# To test functions being added by dynamically linked libraries, we compile
# `MyDynamicFunction.cpp` as a small .so library, and use the
# VELOX_TEST_DYNAMIC_LIBRARY_PATH macro to locate the .so binary.
add_compile_definitions(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use target_compile_definitions( on the relevant target instead.

if(${VELOX_BUILD_TESTING})
add_subdirectory(tests)
endif()
velox_add_library(velox_dynamic_function_loader DynamicLibraryLoader.cpp)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
velox_add_library(velox_dynamic_function_loader DynamicLibraryLoader.cpp)
velox_add_library(velox_dynamic_function_loader DynamicLibraryLoader.cpp)
velox_link_libraries(velox_dynamic_function_loader PRIVATE velox_exception)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding docs!

# VELOX_TEST_DYNAMIC_LIBRARY_PATH macro to locate the .so binary.
add_compile_definitions(
VELOX_TEST_DYNAMIC_LIBRARY_PATH="${CMAKE_CURRENT_BINARY_DIR}")
add_library(velox_function_my_dynamic SHARED MyDynamicFunction.cpp)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a new line before and after.

# `MyDynamicFunction.cpp` as a small .so library, and use the
# VELOX_TEST_DYNAMIC_LIBRARY_PATH macro to locate the .so binary.
add_compile_definitions(
VELOX_TEST_DYNAMIC_LIBRARY_PATH="${CMAKE_CURRENT_BINARY_DIR}")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MacOS support is still missing. You can create the full library path here based on the CMake options I shared earlier.

@@ -0,0 +1,30 @@
#include "velox/functions/Udf.h"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing license header here.

@@ -0,0 +1,22 @@
# Velox: Dynamically Loading Registry Libraries in C++
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Dynamic Loading of Velox Extensions" is probably a better title.

@@ -0,0 +1,22 @@
# Velox: Dynamically Loading Registry Libraries in C++

This library adds the ability to load User Defined Functions (UDFs), connectors, or types without having to fork and build Prestissimo, through the use of shared libraries that a Prestissimo worker can access. These are to be loaded on launch of the Presto server. The Presto server searches for any .so or .dylib files and loads them using this library.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Prestissimo -> Velox. Remaining paragraph as well.

target_link_libraries(name_of_dynamic_fn PRIVATE xsimd fmt::fmt velox_expression)
```

3. In the Prestissimo worker's config.properties file, set the plugin.dir property
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move this to Prestissimo.

```
plugin.dir="User\Test\Path\plugin"
```
4. When the worker or the sidecar process starts, it will scan the plugin directory and attempt to dynamically load all shared libraries
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move this to Prestissimo.


namespace facebook::velox {

/// Dynamically opens and registers functions defined in a shared library (.so)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove (.so)
Add fullstop.


/// Dynamically opens and registers functions defined in a shared library (.so)
///
/// Given a shared library name (.so), this function will open it using dlopen,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Opens a shared library using dlopen, looks for the symbol registry, and invokes it.


// Lookup the symbol.
void* registrySymbol = dlsym(handler, kSymbolName);
auto registryItem = reinterpret_cast<void (*)()>(registrySymbol);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do we ensure the signature is void registry();? What happens if the return type is different or there are arguments?

Copy link
Collaborator

@aditi-pandit aditi-pandit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @soumiiow. Had bunch of minor comments, except for a bigger one around testing.

@@ -0,0 +1,4 @@
if(${VELOX_BUILD_TESTING})
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit : Don't think its a convention, but most CMakeLists files have the velox_add_library function calls before the sub-directory related functions/macros.


// Lookup the symbol.
void* registrySymbol = dlsym(handler, kSymbolName);
auto registryItem = reinterpret_cast<void (*)()>(registrySymbol);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit : rename registryFunction

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hey!! so i got some previous feedback to stay away from the "registryFunction" in the naming so as to not make it seem like this library is to be used exclusively for functions, and to move away from our initial design which was made with only the function loading in mind. Perhaps, would there be a better name for this variable than the work "item"? I can only rlly think of registryItem or registryPtr but would love to hear your suggestions too

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@soumiiow : To me this is almost like the "main" function in a executable program. How about "loadLibrary" or "loadUserLibrary" or "enterUserLibrary" ? There could be code beyond registration here as well.

if (error != nullptr) {
VELOX_USER_FAIL("Couldn't find Velox registry symbol: {}", error);
}
registryItem();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a comment "Invoke the registry function"


void registry() {
facebook::velox::registerFunction<
facebook::velox::common::dynamicRegistry::Dynamic123Function,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This assumes that certain types/functions are already available at this point. Those types and functions depend on the registrations that the service like Prestissimo have done apriori before the "registry" function here is invoked. For this test, since its inherited from FunctionTestBase, all the Velox registrations are available.

So in general, this function assumes some context setup done already. It might be better to explicitly describe those assumptions here.

Or then change the test to not assume anything and do all the registrations within its code itself.

std::string libraryPath = VELOX_TEST_DYNAMIC_LIBRARY_PATH;
libraryPath += "/libvelox_function_my_dynamic.so";

loadDynamicLibrary(libraryPath.data());
Copy link
Collaborator

@aditi-pandit aditi-pandit Nov 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets add some more test cases :
i) One that loads 2 different libraries and checks that the number of function signatures increment at each point.
ii) One that loads the same library again and validates the behavior. We are not likely to load the same library again in the service, but then its better to make that assumption explicit. But in any case its possible that you have 2 libraries that do the exact same thing that are loaded one after another. So then we should be explicit about the behavior then.
iii) An error case with an incorrect implementation of the
/ registry function signature.
iv) Generally when adding functions, we want to add them to a catalog, so they have a namespace. Prestissimo definitely has namespaces. How do you incorporate this in the logic ? It would be good for your test to demo a function added to a non-default namespace.

@@ -0,0 +1,30 @@
#include "velox/functions/Udf.h"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does this file not have the copyright header ? Is it intentional so that we ensure we didn't trigger the build rules for it ?

BTW, our clients might use any copyright header they want. So we should ensure our builds can handle that.


extern "C" {

void registry() {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need the entire function definition in the extern or is it possible to just declare the function here and have the definition elsewhere. Just asking as its possible the registration is big function or we want it to call other functions.

e.g. In this folder we have a bunch of window functions we want to expose for users to register. Might be better to use this kind of file structure as its more realistic : https://github.com/facebookincubator/velox/blob/main/velox/functions/prestosql/window/WindowFunctionsRegistration.cpp#L30

target_link_libraries(name_of_dynamic_fn PRIVATE xsimd fmt::fmt velox_expression)
```

3. In the Prestissimo worker's config.properties file, set the plugin.dir property
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not relevant in Velox. And also since its not used anywhere in the current code, its hard to put this in picture.

@@ -0,0 +1,22 @@
# Velox: Dynamically Loading Registry Libraries in C++

This library adds the ability to load User Defined Functions (UDFs), connectors, or types without having to fork and build Prestissimo, through the use of shared libraries that a Prestissimo worker can access. These are to be loaded on launch of the Presto server. The Presto server searches for any .so or .dylib files and loads them using this library.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be good to not talk about Prestissimo in this README.

This is a generic utility for dynamically loading a "registry" function from a library. Its sufficient to just say that this is for "Extensibility" features that add custom user code which could include new Velox types, functions, operators and connectors.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants