Skip to content

Commit

Permalink
update API and docstring
Browse files Browse the repository at this point in the history
  • Loading branch information
jnke2016 committed Sep 17, 2024
1 parent 579fd0a commit 5cdf40a
Show file tree
Hide file tree
Showing 2 changed files with 43 additions and 543 deletions.
187 changes: 33 additions & 154 deletions cpp/include/cugraph/sampling_functions.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -178,6 +178,7 @@ uniform_neighbor_sample(
* @tparam edge_t Type of edge identifiers. Needs to be an integral type.
* @tparam weight_t Type of edge weights. Needs to be a floating point type.
* @tparam edge_type_t Type of edge type. Needs to be an integral type.
* @tparam bias_t Type of bias. Needs to be an integral type.
* @tparam label_t Type of label. Needs to be an integral type.
* @tparam store_transposed Flag indicating whether sources (if false) or destinations (if
* true) are major indices
Expand Down Expand Up @@ -250,80 +251,6 @@ biased_neighbor_sample(
bool dedupe_sources = false,
bool do_expensive_check = false);

/**
* @brief Heterogeneous Neighborhood Sampling.
*
* This function traverses from a set of starting vertices, traversing outgoing edges and
* randomly selects (with edge biases or not) from these outgoing neighbors to extract a subgraph.
* The branching out to select outgoing neighbors is performed with homogeneous fanouts.
*
* Output from this function is a tuple of vectors (src, dst, weight, edge_id, edge_type, hop,
* label, offsets), identifying the randomly selected edges. src is the source vertex, dst is the
* destination vertex, weight (optional) is the edge weight, edge_id (optional) identifies the edge
* id, edge_type (optional) identifies the edge type, hop identifies which hop the edge was
* encountered in. The label output (optional) identifes the vertex label. The offsets array
* (optional) will be described below and is dependent upon the input parameters.
*
* If @p starting_vertex_labels is not specified then no organization is applied to the output, the
* label and offsets values in the return set will be std::nullopt.
*
* If @p starting_vertex_labels is specified and @p label_to_output_comm_rank is not specified then
* the label output has values. This will also result in the output being sorted by vertex label.
* The offsets array in the return will be a CSR-style offsets array to identify the beginning of
* each label range in the data. `labels.size() == (offsets.size() - 1)`.
*
* If @p starting_vertex_labels is specified and @p label_to_output_comm_rank is specified then the
* label output has values. This will also result in the output being sorted by vertex label. The
* offsets array in the return will be a CSR-style offsets array to identify the beginning of each
* label range in the data. `labels.size() == (offsets.size() - 1)`. Additionally, the data will
* be shuffled so that all data with a particular label will be on the specified rank.
*
* @tparam vertex_t Type of vertex identifiers. Needs to be an integral type.
* @tparam edge_t Type of edge identifiers. Needs to be an integral type.
* @tparam weight_t Type of edge weights. Needs to be a floating point type.
* @tparam edge_type_t Type of edge type. Needs to be an integral type.
* @tparam label_t Type of label. Needs to be an integral type.
* @tparam store_transposed Flag indicating whether sources (if false) or destinations (if
* true) are major indices
* @tparam multi_gpu Flag indicating whether template instantiation should target single-GPU (false)
* @param handle RAFT handle object to encapsulate resources (e.g. CUDA stream, communicator, and
* handles to various CUDA libraries) to run graph algorithms.
* * @param rng_state A pre-initialized raft::RngState object for generating random numbers
* @param graph_view Graph View object to generate NBR Sampling on.
* @param edge_weight_view Optional view object holding edge weights for @p graph_view.
* @param edge_id_view Optional view object holding edge ids for @p graph_view.
* @param edge_type_view Optional view object holding edge types for @p graph_view.
* @param edge_bias_view Optional view object holding edge biases (to be used in biased sampling) for @p
* graph_view. Bias values should be non-negative and the sum of edge bias values from any vertex
* should not exceed std::numeric_limits<bias_t>::max(). 0 bias value indicates that the
* corresponding edge can never be selected. passing std::nullopt as the edge biases will result in
* uniform sampling.
* @param starting_vertices Device span of starting vertex IDs for the sampling.
* In a multi-gpu context the starting vertices should be local to this GPU.
* @param starting_vertex_labels Optional device span of labels associted with each starting vertex
* for the sampling.
* @param label_to_output_comm_rank Optional tuple of device spans mapping label to a particular
* output rank. Element 0 of the tuple identifes the label, Element 1 of the tuple identifies the
* output rank. The label span must be sorted in ascending order.
* @param heterogeneous_fan_out Tuple of host spans defining branching out (fan-out) degree per
* source vertex for each level in CSR style format. The first element of the tuple is the offset
* array per edge type id and the second element corresponds to the fanout values.
* The sampling method can use different fan_out values for each edge type.
* The fan-out offsets size must be proportional to the number of edge types and fan_out values.
* @param return_hops boolean flag specifying if the hop information should be returned.
* @param prior_sources_behavior Enum type defining how to handle prior sources, (defaults to
* DEFAULT)
* @param dedupe_sources boolean flag, if true then if a vertex v appears as a destination in hop X
* multiple times with the same label, it will only be passed once (for each label) as a source
* for the next hop. Default is false.
* @param with_replacement boolean flag specifying if random sampling is done with replacement
* (true); or, without replacement (false); default = true;
* @param do_expensive_check A flag to run expensive checks for input arguments (if set to `true`).
* @return tuple device vectors (vertex_t source_vertex, vertex_t destination_vertex,
* optional weight_t weight, optional edge_t edge id, optional edge_type_t edge type,
* optional int32_t hop, optional label_t label, optional size_t offsets)
*/

struct sampling_flags_t {
/**
* Specifies how to handle prior sources. Default is DEFAULT.
Expand All @@ -349,77 +276,12 @@ struct sampling_flags_t {
bool with_replacement{true};
};

template <typename vertex_t,
typename edge_t,
typename weight_t,
typename edge_type_t,
typename bias_t,
typename label_t,
bool store_transposed,
bool multi_gpu>
std::tuple<rmm::device_uvector<vertex_t>,
rmm::device_uvector<vertex_t>,
std::optional<rmm::device_uvector<weight_t>>,
std::optional<rmm::device_uvector<edge_t>>,
std::optional<rmm::device_uvector<edge_type_t>>,
std::optional<rmm::device_uvector<int32_t>>,
std::optional<rmm::device_uvector<label_t>>,
std::optional<rmm::device_uvector<size_t>>>
heterogeneous_neighbor_sample(
raft::handle_t const& handle,
raft::random::RngState& rng_state,
graph_view_t<vertex_t, edge_t, store_transposed, multi_gpu> const& graph_view,
std::optional<edge_property_view_t<edge_t, weight_t const*>> edge_weight_view,
std::optional<edge_property_view_t<edge_t, edge_t const*>> edge_id_view,
std::optional<edge_property_view_t<edge_t, edge_type_t const*>> edge_type_view,
std::optional<edge_property_view_t<edge_t, bias_t const*>> edge_bias_view,
raft::device_span<vertex_t const> starting_vertices,
std::optional<raft::device_span<label_t const>> starting_vertex_labels,
std::optional<std::tuple<raft::device_span<label_t const>, raft::device_span<int32_t const>>>
label_to_output_comm_rank,
std::tuple<raft::host_span<int32_t const>, raft::host_span<int32_t const>>
heterogeneous_fan_out,
bool return_hops,
bool with_replacement = true,
prior_sources_behavior_t prior_sources_behavior = prior_sources_behavior_t::DEFAULT,
bool dedupe_sources = false,
bool do_expensive_check = false);

/**
* @brief Homogeneous Neighborhood Sampling.
* @brief Heterogeneous Neighborhood Sampling.
*
* This function traverses from a set of starting vertices, traversing outgoing edges and
* randomly selects (with edge biases or not) from these outgoing neighbors to extract a subgraph.
* The branching out to select outgoing neighbors is performed with homogeneous fanouts
*
* Output from this function is a tuple of vectors (src, dst, weight, edge_id, edge_type, hop,
* label, offsets), identifying the randomly selected edges. src is the source vertex, dst is the
* destination vertex, weight (optional) is the edge weight, edge_id (optional) identifies the edge
* id, edge_type (optional) identifies the edge type, hop identifies which hop the edge was
* encountered in. The label output (optional) identifes the vertex label. The offsets array
* (optional) will be described below and is dependent upon the input parameters.
*
* If @p starting_vertex_labels is not specified then no organization is applied to the output, the
* label and offsets values in the return set will be std::nullopt.
*
* If @p starting_vertex_labels is specified and @p label_to_output_comm_rank is not specified then
* the label output has values. This will also result in the output being sorted by vertex label.
* The offsets array in the return will be a CSR-style offsets array to identify the beginning of
* each label range in the data. `labels.size() == (offsets.size() - 1)`.
*
* If @p starting_vertex_labels is specified and @p label_to_output_comm_rank is specified then the
* label output has values. This will also result in the output being sorted by vertex label. The
* offsets array in the return will be a CSR-style offsets array to identify the beginning of each
* label range in the data. `labels.size() == (offsets.size() - 1)`. Additionally, the data will
* be shuffled so that all data with a particular label will be on the specified rank.
* @brief Defines flags for sampling
*/

/**
* @brief Uniform Neighborhood Sampling.
*
* This function traverses from a set of starting vertices, traversing outgoing edges and
* randomly selects from these outgoing neighbors to extract a subgraph.
* The branching out to select outgoing neighbors is performed with homogeneous fanouts.
*
* Output from this function is a tuple of vectors (src, dst, weight, edge_id, edge_type, hop,
* offsets), identifying the randomly selected edges. src is the source vertex, dst is the
Expand All @@ -442,6 +304,7 @@ heterogeneous_neighbor_sample(
* @tparam edge_t Type of edge identifiers. Needs to be an integral type.
* @tparam weight_t Type of edge weights. Needs to be a floating point type.
* @tparam edge_type_t Type of edge type. Needs to be an integral type.
* @tparam bias_t Type of bias. Needs to be an integral type.
* @tparam store_transposed Flag indicating whether sources (if false) or destinations (if
* true) are major indices
* @tparam multi_gpu Flag indicating whether template instantiation should target single-GPU (false)
Expand All @@ -452,6 +315,11 @@ heterogeneous_neighbor_sample(
* @param edge_weight_view Optional view object holding edge weights for @p graph_view.
* @param edge_id_view Optional view object holding edge ids for @p graph_view.
* @param edge_type_view Optional view object holding edge types for @p graph_view.
* @param edge_bias_view Optional view object holding edge biases (to be used in biased sampling) for @p
* graph_view. Bias values should be non-negative and the sum of edge bias values from any vertex
* should not exceed std::numeric_limits<bias_t>::max(). 0 bias value indicates that the
* corresponding edge can never be selected. passing std::nullopt as the edge biases will result in
* uniform sampling.
* @param starting_vertices Device span of starting vertex IDs for the sampling.
* In a multi-gpu context the starting vertices should be local to this GPU.
* @param starting_vertex_offsets Optional device span of offsets identifying the range of
Expand All @@ -460,16 +328,17 @@ heterogeneous_neighbor_sample(
* vertex label. This should be the same on each rank.
* @param fan_out Host span defining branching out (fan-out) degree per source vertex for each
* level
* @param flags A set of flags indicating which sampling features should be used.l
* @param flags A set of flags indicating which sampling features should be used.
* @param do_expensive_check A flag to run expensive checks for input arguments (if set to `true`).
* @return tuple device vectors (vertex_t source_vertex, vertex_t destination_vertex,
* optional weight_t weight, optional edge_t edge id, optional edge_type_t edge type,
* optional int32_t hop, optional size_t offsets)
* optional int32_t hop, optional label_t label, optional size_t offsets)
*/
template <typename vertex_t,
typename edge_t,
typename weight_t,
typename edge_type_t,
typename bias_t,
bool store_transposed,
bool multi_gpu>
std::tuple<rmm::device_uvector<vertex_t>,
Expand All @@ -479,25 +348,28 @@ std::tuple<rmm::device_uvector<vertex_t>,
std::optional<rmm::device_uvector<edge_type_t>>,
std::optional<rmm::device_uvector<int32_t>>,
std::optional<rmm::device_uvector<size_t>>>
homogeneous_uniform_neighbor_sample(
heterogeneous_neighbor_sample(
raft::handle_t const& handle,
raft::random::RngState& rng_state,
graph_view_t<vertex_t, edge_t, store_transposed, multi_gpu> const& graph_view,
std::optional<edge_property_view_t<edge_t, weight_t const*>> edge_weight_view,
std::optional<edge_property_view_t<edge_t, edge_t const*>> edge_id_view,
std::optional<edge_property_view_t<edge_t, edge_type_t const*>> edge_type_view,
std::optional<edge_property_view_t<edge_t, bias_t const*>> edge_bias_view,
raft::device_span<vertex_t const> starting_vertices,
std::optional<raft::device_span<size_t const>> starting_vertex_offsets,
std::optional<raft::device_span<int32_t const>> label_to_output_comm_rank,
raft::host_span<int32_t const> fan_out,
std::tuple<raft::host_span<int32_t const>, raft::host_span<int32_t const>>
heterogeneous_fan_out,
sampling_flags_t sampling_flags,
bool do_expensive_check = false);

/**
* @brief Biased Neighborhood Sampling.
* @brief Homogeneous Neighborhood Sampling.
*
* This function traverses from a set of starting vertices, traversing outgoing edges and
* randomly selects (with edge biases) from these outgoing neighbors to extract a subgraph.
* randomly selects (with edge biases or not) from these outgoing neighbors to extract a subgraph.
* The branching out to select outgoing neighbors is performed with homogeneous fanouts
*
* Output from this function is a tuple of vectors (src, dst, weight, edge_id, edge_type, hop,
* offsets), identifying the randomly selected edges. src is the source vertex, dst is the
Expand All @@ -512,10 +384,15 @@ homogeneous_uniform_neighbor_sample(
* in the return will be a CSR-style offsets array to identify the beginning of each label range in
* the output vectors.
*
* If @p label_to_output_comm_rank is specified then the data will be shuffled so that all entries
* for a particular label are returned on the specified rank. This will result in the offsets array
* on other GPUs indicating that there are no entries for that label (`offsets[i] == offsets[i+1]`).
*
* @tparam vertex_t Type of vertex identifiers. Needs to be an integral type.
* @tparam edge_t Type of edge identifiers. Needs to be an integral type.
* @tparam weight_t Type of edge weights. Needs to be a floating point type.
* @tparam edge_type_t Type of edge type. Needs to be an integral type.
* @tparam bias_t Type of bias. Needs to be an integral type.
* @tparam store_transposed Flag indicating whether sources (if false) or destinations (if
* true) are major indices
* @tparam multi_gpu Flag indicating whether template instantiation should target single-GPU (false)
Expand All @@ -526,24 +403,26 @@ homogeneous_uniform_neighbor_sample(
* @param edge_weight_view Optional view object holding edge weights for @p graph_view.
* @param edge_id_view Optional view object holding edge ids for @p graph_view.
* @param edge_type_view Optional view object holding edge types for @p graph_view.
* @param edge_bias_view View object holding edge biases (to be used in biased sampling) for @p
* @param edge_bias_view Optional view object holding edge biases (to be used in biased sampling) for @p
* graph_view. Bias values should be non-negative and the sum of edge bias values from any vertex
* should not exceed std::numeric_limits<bias_t>::max(). 0 bias value indicates that the
* corresponding edge can never be selected.
* corresponding edge can never be selected. passing std::nullopt as the edge biases will result in
* uniform sampling.
* @param starting_vertices Device span of starting vertex IDs for the sampling.
* In a multi-gpu context the starting vertices should be local to this GPU.
* @param starting_vertex_offsets Optional device span of offsets identifying the range of
* @param starting_vertex_offsets Optional device span of offsets identifying the range of
* starting vertex values for this label.
* @param label_to_output_comm_rank Optional device span identifying which rank should get each
* vertex label. This should be the same on each rank.
* @param fan_out Host span defining branching out (fan-out) degree per source vertex for each
* level
* @param flags A set of flags indicating which sampling features should be used.l
* @param flags A set of flags indicating which sampling features should be used.
* @param do_expensive_check A flag to run expensive checks for input arguments (if set to `true`).
* @return tuple device vectors (vertex_t source_vertex, vertex_t destination_vertex,
* optional weight_t weight, optional edge_t edge id, optional edge_type_t edge type,
* optional int32_t hop, optional size_t offsets)
* optional int32_t hop, optional label_t label, optional size_t offsets)
*/

template <typename vertex_t,
typename edge_t,
typename weight_t,
Expand All @@ -558,14 +437,14 @@ std::tuple<rmm::device_uvector<vertex_t>,
std::optional<rmm::device_uvector<edge_type_t>>,
std::optional<rmm::device_uvector<int32_t>>,
std::optional<rmm::device_uvector<size_t>>>
homogeneous_biased_neighbor_sample(
homogeneous_neighbor_sample(
raft::handle_t const& handle,
raft::random::RngState& rng_state,
graph_view_t<vertex_t, edge_t, store_transposed, multi_gpu> const& graph_view,
std::optional<edge_property_view_t<edge_t, weight_t const*>> edge_weight_view,
std::optional<edge_property_view_t<edge_t, edge_t const*>> edge_id_view,
std::optional<edge_property_view_t<edge_t, edge_type_t const*>> edge_type_view,
edge_property_view_t<edge_t, bias_t const*> edge_bias_view,
std::optional<edge_property_view_t<edge_t, bias_t const*>> edge_bias_view,
raft::device_span<vertex_t const> starting_vertices,
std::optional<raft::device_span<size_t const>> starting_vertex_offsets,
std::optional<raft::device_span<int32_t const>> label_to_output_comm_rank,
Expand Down
Loading

0 comments on commit 5cdf40a

Please sign in to comment.