Skip to content

How To Use?

Michael Freund edited this page Dec 4, 2023 · 2 revisions

Parameters

Command Line

FlexRML operates as a command-line program, offering a variety of configuration flags for ease of use:

./FlexRML [OPTIONS]
Flag Description Default
-h Display available flags. None
-m [path] Specify the path to the mapping file. None
-o [name] Define the name for the output file. "output.nq"
-d Remove duplicate entries before writing to the output file. false
-t Enable multi-threaded processing. By default, the maximum number of available threads are used. false
-tc [integer] Specify the number of threads to use for processing. A value of 0 implies using all available threads. Adjusting the number of threads can optimize performance depending on the workload and system capabilities 0
-a Enable adaptive Result Size Estimation and adaptive hash size selection, optimizing memory usage and performance based on data characteristics. 0
-p [float] Set the sampling probability (0-1) used for Result Size Estimation. Higher probabilities produce more accurate estimates but require more processing time. 0.2
-c [path] Specify a config file to use, allowing settings to be read from a file instead of passed as command line arguments. None
-b [integer] Set a fixed hash size. The value must be one of [32, 64, 128], determining the memory allocation for hash tables. None
-r [tokens,to,remove] List of tokens to be removed from the input, helping to clean or preprocess the data as needed. None

Note:

  • When a config file is specified using the '-c' flag, all other command-line arguments are ignored, and settings are exclusively loaded from the config file.
  • Selecting a fixed hash size using the '-b' flag skips the adaptive Result Size Estimation. Be aware that if the manually chosen hash size is too small for the input data, hash collisions may occur. This can lead to missing N-Quads in the output.

Config File

Alternatively a config file can be used.

# Config File

# I/O Settings
mapping=./path/to/mapping_file.ttl
output_file=output.nq

# Mapping Settings
remove_duplicates=true
use_threading=true
number_of_threads=
fixed_hash_size=
adaptive_hash_selection=true
sampling_probability=0.2
tokens_to_remove=NA

If no value is provided, the default value will be used.

Example

./FlexRML -m ./path/to/mapping_file.ttl -o output_file.nq -d

or when using a config file equivalent to the command above:

./FlexRML -c ./path/to/config_file.ini
# Config File

# I/O Settings
mapping=./path/to/mapping_file.ttl
output_file=output_file.nq

# Mapping Settings
remove_duplicates=true
use_threading=false
number_of_threads=
fixed_hash_size=
adaptive_hash_selection=false
sampling_probability=
Clone this wiki locally