-
Notifications
You must be signed in to change notification settings - Fork 1
/
corpora.config
32 lines (31 loc) · 1.14 KB
/
corpora.config
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
### Downloads
# Each corpus in this section will be downloaded with respect to its
# configuration.
# Each entry in this section is a triple: "DL";PATH_TO_CONFIG with
# "DL" = Marking this as a download config
# CONFIG_ID = ID for the config file (must be unique in corpora.config)
# PATH_TO_CONFIG = Path to the config file
#
# Each config file must consist of tuples: CORPUS_NAME;FILE_NAME;DL_LINK with
# CORPUS_NAME = Name of the folder in the data directory
# FILE_NAME = Name of the (decompressed) file
# DL_LINK = Download link of the file
#
###
DL;./corpora/pizza_and_chili.config
DL;./corpora/manzini_lightweight.config
#
### Random Numbers
# Each line represents one file that is generated if the name does not exist.
# Each entry in this section is a 4-tuple: "RD";NAME;LENGTH;SYMBOL_WIDTH with
# "RD" = Marking this as a random generator entry
# NAME = Name of the file
# LENGTH = Number of characters to generate
# SYMBOL_WIDTH = Number of bytes per symbol
#
# The size of the file is SYMBOL_WIDTH*LENGTHS bytes.
#
RD;example_100000000_1;100000000;1
RD;example_100000000_2;100000000;2
RD;example_100000000_4;100000000;4
RD;example_100000000_8;100000000;8