-
-
Notifications
You must be signed in to change notification settings - Fork 110
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Switch List implementation to use Trie-based lookup #134
base: main
Are you sure you want to change the base?
Conversation
1. Hash-based trie where children are referenced using a Hash and storing one node per word char 2. Hash-based trie (as 1) where each node contains a word char and values are stores as Symbol 3. Hash-based trie (as 1) where each node contains a word part and values are stored as String 4. Array-based trie where children are referenced using an Array and creating a mapping for each letter of the alphabet Some caveats: - 4) doesn't play nice with an alphabet which contains non ASCII chars as the mapping would be hard to achieve - 2) doesn't play nice with an alphabet which contains non ASCII chars as there's a risk of potential memory issues with version of Ruby where Symbols are not garbage collected - The current list is Unicode (and not Punycode for now) hence both 2) and 4) in practice are not usable - 3) implicitly saves space as there is no need to save the "." that, for what silly as it seems, the current list has 8750 dots (and 8061 rules) - memory cost is cost of the Trie structure AND cost of the string allocated to store the words (including "."). --- Memory comparison: ➜ publicsuffix-ruby git:(thesis-trie) ruby test/profilers/tries_prosize.rb 943,325 @trie_hash 598,730 @trie_symbol 312,361 @trie_parts 1,627,182 @trie_array HashTrie: Total allocated: 23745976 bytes (333807 objects) Total retained: 16647216 bytes (172460 objects) allocated memory by gem ----------------------------------- 23745936 publicsuffix-ruby/lib 40 other allocated memory by file ----------------------------------- 23745936 /Users/weppos/Code/publicsuffix-ruby/lib/public_suffix/trie_hash.rb 40 test/profilers/tries_profiler.rb allocated memory by location ----------------------------------- 12042560 /Users/weppos/Code/publicsuffix-ruby/lib/public_suffix/trie_hash.rb:8 6892920 /Users/weppos/Code/publicsuffix-ruby/lib/public_suffix/trie_hash.rb:16 3516640 /Users/weppos/Code/publicsuffix-ruby/lib/public_suffix/trie_hash.rb:44 1293696 /Users/weppos/Code/publicsuffix-ruby/lib/public_suffix/trie_hash.rb:83 120 /Users/weppos/Code/publicsuffix-ruby/lib/public_suffix/trie_hash.rb:39 40 test/profilers/tries_profiler.rb:16 allocated memory by class ----------------------------------- 12042560 Hash 6140656 String 2297720 Array 2297680 PublicSuffix::TrieHash::Node 967320 Enumerator 40 PublicSuffix::TrieHash allocated objects by gem ----------------------------------- 333806 publicsuffix-ruby/lib 1 other allocated objects by file ----------------------------------- 333806 /Users/weppos/Code/publicsuffix-ruby/lib/public_suffix/trie_hash.rb 1 test/profilers/tries_profiler.rb allocated objects by location ----------------------------------- 172323 /Users/weppos/Code/publicsuffix-ruby/lib/public_suffix/trie_hash.rb:16 87916 /Users/weppos/Code/publicsuffix-ruby/lib/public_suffix/trie_hash.rb:44 57442 /Users/weppos/Code/publicsuffix-ruby/lib/public_suffix/trie_hash.rb:8 16122 /Users/weppos/Code/publicsuffix-ruby/lib/public_suffix/trie_hash.rb:83 3 /Users/weppos/Code/publicsuffix-ruby/lib/public_suffix/trie_hash.rb:39 1 test/profilers/tries_profiler.rb:16 allocated objects by class ----------------------------------- 153418 String 57443 Array 57442 Hash 57442 PublicSuffix::TrieHash::Node 8061 Enumerator 1 PublicSuffix::TrieHash retained memory by gem ----------------------------------- 16647176 publicsuffix-ruby/lib 40 other retained memory by file ----------------------------------- 16647176 /Users/weppos/Code/publicsuffix-ruby/lib/public_suffix/trie_hash.rb 40 test/profilers/tries_profiler.rb retained memory by location ----------------------------------- 12042560 /Users/weppos/Code/publicsuffix-ruby/lib/public_suffix/trie_hash.rb:8 4595280 /Users/weppos/Code/publicsuffix-ruby/lib/public_suffix/trie_hash.rb:16 9296 /Users/weppos/Code/publicsuffix-ruby/lib/public_suffix/trie_hash.rb:83 40 /Users/weppos/Code/publicsuffix-ruby/lib/public_suffix/trie_hash.rb:39 40 test/profilers/tries_profiler.rb:16 retained memory by class ----------------------------------- 12042560 Hash 2306936 String 2297680 PublicSuffix::TrieHash::Node 40 PublicSuffix::TrieHash retained objects by gem ----------------------------------- 172459 publicsuffix-ruby/lib 1 other retained objects by file ----------------------------------- 172459 /Users/weppos/Code/publicsuffix-ruby/lib/public_suffix/trie_hash.rb 1 test/profilers/tries_profiler.rb retained objects by location ----------------------------------- 114882 /Users/weppos/Code/publicsuffix-ruby/lib/public_suffix/trie_hash.rb:16 57442 /Users/weppos/Code/publicsuffix-ruby/lib/public_suffix/trie_hash.rb:8 134 /Users/weppos/Code/publicsuffix-ruby/lib/public_suffix/trie_hash.rb:83 1 /Users/weppos/Code/publicsuffix-ruby/lib/public_suffix/trie_hash.rb:39 1 test/profilers/tries_profiler.rb:16 retained objects by class ----------------------------------- 57575 String 57442 Hash 57442 PublicSuffix::TrieHash::Node 1 PublicSuffix::TrieHash Retained String Report ----------------------------------- 6728 "." 5987 "a" 4263 "o" 3636 "i" 3027 "e" 3012 "n" 2918 "u" 2868 "m" ... HashTrieSymbol: Total allocated: 21449376 bytes (276392 objects) Total retained: 14350616 bytes (115045 objects) allocated memory by gem ----------------------------------- 21449336 publicsuffix-ruby/lib 40 other allocated memory by file ----------------------------------- 21448296 /Users/weppos/Code/publicsuffix-ruby/lib/public_suffix/trie_hash.rb 1040 /Users/weppos/Code/publicsuffix-ruby/lib/public_suffix/trie_hash_symbol.rb 40 test/profilers/tries_profiler.rb allocated memory by location ----------------------------------- 12042560 /Users/weppos/Code/publicsuffix-ruby/lib/public_suffix/trie_hash.rb:8 4595280 /Users/weppos/Code/publicsuffix-ruby/lib/public_suffix/trie_hash.rb:16 3516640 /Users/weppos/Code/publicsuffix-ruby/lib/public_suffix/trie_hash.rb:44 1293696 /Users/weppos/Code/publicsuffix-ruby/lib/public_suffix/trie_hash.rb:83 1040 /Users/weppos/Code/publicsuffix-ruby/lib/public_suffix/trie_hash_symbol.rb:9 120 /Users/weppos/Code/publicsuffix-ruby/lib/public_suffix/trie_hash.rb:39 40 test/profilers/tries_profiler.rb:18 allocated memory by class ----------------------------------- 12042560 Hash 3843536 String 2297720 Array 2297680 PublicSuffix::TrieHashSymbol::Node 967320 Enumerator 520 Symbol 40 PublicSuffix::TrieHashSymbol allocated objects by gem ----------------------------------- 276391 publicsuffix-ruby/lib 1 other allocated objects by file ----------------------------------- 276365 /Users/weppos/Code/publicsuffix-ruby/lib/public_suffix/trie_hash.rb 26 /Users/weppos/Code/publicsuffix-ruby/lib/public_suffix/trie_hash_symbol.rb 1 test/profilers/tries_profiler.rb allocated objects by location ----------------------------------- 114882 /Users/weppos/Code/publicsuffix-ruby/lib/public_suffix/trie_hash.rb:16 87916 /Users/weppos/Code/publicsuffix-ruby/lib/public_suffix/trie_hash.rb:44 57442 /Users/weppos/Code/publicsuffix-ruby/lib/public_suffix/trie_hash.rb:8 16122 /Users/weppos/Code/publicsuffix-ruby/lib/public_suffix/trie_hash.rb:83 26 /Users/weppos/Code/publicsuffix-ruby/lib/public_suffix/trie_hash_symbol.rb:9 3 /Users/weppos/Code/publicsuffix-ruby/lib/public_suffix/trie_hash.rb:39 1 test/profilers/tries_profiler.rb:18 allocated objects by class ----------------------------------- 95990 String 57443 Array 57442 Hash 57442 PublicSuffix::TrieHashSymbol::Node 8061 Enumerator 13 Symbol 1 PublicSuffix::TrieHashSymbol retained memory by gem ----------------------------------- 14350576 publicsuffix-ruby/lib 40 other retained memory by file ----------------------------------- 14349536 /Users/weppos/Code/publicsuffix-ruby/lib/public_suffix/trie_hash.rb 1040 /Users/weppos/Code/publicsuffix-ruby/lib/public_suffix/trie_hash_symbol.rb 40 test/profilers/tries_profiler.rb retained memory by location ----------------------------------- 12042560 /Users/weppos/Code/publicsuffix-ruby/lib/public_suffix/trie_hash.rb:8 2297640 /Users/weppos/Code/publicsuffix-ruby/lib/public_suffix/trie_hash.rb:16 9296 /Users/weppos/Code/publicsuffix-ruby/lib/public_suffix/trie_hash.rb:83 1040 /Users/weppos/Code/publicsuffix-ruby/lib/public_suffix/trie_hash_symbol.rb:9 40 /Users/weppos/Code/publicsuffix-ruby/lib/public_suffix/trie_hash.rb:39 40 test/profilers/tries_profiler.rb:18 retained memory by class ----------------------------------- 12042560 Hash 2297680 PublicSuffix::TrieHashSymbol::Node 9816 String 520 Symbol 40 PublicSuffix::TrieHashSymbol retained objects by gem ----------------------------------- 115044 publicsuffix-ruby/lib 1 other retained objects by file ----------------------------------- 115018 /Users/weppos/Code/publicsuffix-ruby/lib/public_suffix/trie_hash.rb 26 /Users/weppos/Code/publicsuffix-ruby/lib/public_suffix/trie_hash_symbol.rb 1 test/profilers/tries_profiler.rb retained objects by location ----------------------------------- 57442 /Users/weppos/Code/publicsuffix-ruby/lib/public_suffix/trie_hash.rb:8 57441 /Users/weppos/Code/publicsuffix-ruby/lib/public_suffix/trie_hash.rb:16 134 /Users/weppos/Code/publicsuffix-ruby/lib/public_suffix/trie_hash.rb:83 26 /Users/weppos/Code/publicsuffix-ruby/lib/public_suffix/trie_hash_symbol.rb:9 1 /Users/weppos/Code/publicsuffix-ruby/lib/public_suffix/trie_hash.rb:39 1 test/profilers/tries_profiler.rb:18 retained objects by class ----------------------------------- 57442 Hash 57442 PublicSuffix::TrieHashSymbol::Node 147 String 13 Symbol 1 PublicSuffix::TrieHashSymbol Retained String Report ----------------------------------- 1 "*.compute-1.amazonaws.com" 1 "*.compute.amazonaws.com.cn" 1 "*.githubcloudusercontent.com" 1 "0" 1 "1" ... HashTrieParts: Total allocated: 6263412 bytes (98963 objects) Total retained: 3392172 bytes (43476 objects) allocated memory by gem ----------------------------------- 6263372 publicsuffix-ruby/lib 40 other allocated memory by file ----------------------------------- 3971787 /Users/weppos/Code/publicsuffix-ruby/lib/public_suffix/trie_hash.rb 2291585 /Users/weppos/Code/publicsuffix-ruby/lib/public_suffix/trie_hash_parts.rb 40 test/profilers/tries_profiler.rb allocated memory by location ----------------------------------- 2291585 /Users/weppos/Code/publicsuffix-ruby/lib/public_suffix/trie_hash_parts.rb:29 2232560 /Users/weppos/Code/publicsuffix-ruby/lib/public_suffix/trie_hash.rb:8 1739107 /Users/weppos/Code/publicsuffix-ruby/lib/public_suffix/trie_hash.rb:16 120 /Users/weppos/Code/publicsuffix-ruby/lib/public_suffix/trie_hash.rb:39 40 test/profilers/tries_profiler.rb:20 allocated memory by class ----------------------------------- 2232560 Hash 1574772 String 967320 Enumerator 909040 Array 579680 PublicSuffix::TrieHashParts::Node 40 PublicSuffix::TrieHashParts allocated objects by gem ----------------------------------- 98962 publicsuffix-ruby/lib 1 other allocated objects by file ----------------------------------- 57967 /Users/weppos/Code/publicsuffix-ruby/lib/public_suffix/trie_hash.rb 40995 /Users/weppos/Code/publicsuffix-ruby/lib/public_suffix/trie_hash_parts.rb 1 test/profilers/tries_profiler.rb allocated objects by location ----------------------------------- 43472 /Users/weppos/Code/publicsuffix-ruby/lib/public_suffix/trie_hash.rb:16 40995 /Users/weppos/Code/publicsuffix-ruby/lib/public_suffix/trie_hash_parts.rb:29 14492 /Users/weppos/Code/publicsuffix-ruby/lib/public_suffix/trie_hash.rb:8 3 /Users/weppos/Code/publicsuffix-ruby/lib/public_suffix/trie_hash.rb:39 1 test/profilers/tries_profiler.rb:20 allocated objects by class ----------------------------------- 39363 String 22554 Array 14492 Hash 14492 PublicSuffix::TrieHashParts::Node 8061 Enumerator 1 PublicSuffix::TrieHashParts retained memory by gem ----------------------------------- 3392132 publicsuffix-ruby/lib 40 other retained memory by file ----------------------------------- 3392067 /Users/weppos/Code/publicsuffix-ruby/lib/public_suffix/trie_hash.rb 65 /Users/weppos/Code/publicsuffix-ruby/lib/public_suffix/trie_hash_parts.rb 40 test/profilers/tries_profiler.rb retained memory by location ----------------------------------- 2232560 /Users/weppos/Code/publicsuffix-ruby/lib/public_suffix/trie_hash.rb:8 1159467 /Users/weppos/Code/publicsuffix-ruby/lib/public_suffix/trie_hash.rb:16 65 /Users/weppos/Code/publicsuffix-ruby/lib/public_suffix/trie_hash_parts.rb:29 40 /Users/weppos/Code/publicsuffix-ruby/lib/public_suffix/trie_hash.rb:39 40 test/profilers/tries_profiler.rb:20 retained memory by class ----------------------------------- 2232560 Hash 579892 String 579680 PublicSuffix::TrieHashParts::Node 40 PublicSuffix::TrieHashParts retained objects by gem ----------------------------------- 43475 publicsuffix-ruby/lib 1 other retained objects by file ----------------------------------- 43474 /Users/weppos/Code/publicsuffix-ruby/lib/public_suffix/trie_hash.rb 1 /Users/weppos/Code/publicsuffix-ruby/lib/public_suffix/trie_hash_parts.rb 1 test/profilers/tries_profiler.rb retained objects by location ----------------------------------- 28981 /Users/weppos/Code/publicsuffix-ruby/lib/public_suffix/trie_hash.rb:16 14492 /Users/weppos/Code/publicsuffix-ruby/lib/public_suffix/trie_hash.rb:8 1 /Users/weppos/Code/publicsuffix-ruby/lib/public_suffix/trie_hash.rb:39 1 /Users/weppos/Code/publicsuffix-ruby/lib/public_suffix/trie_hash_parts.rb:29 1 test/profilers/tries_profiler.rb:20 retained objects by class ----------------------------------- 14492 Hash 14492 PublicSuffix::TrieHashParts::Node 14491 String 1 PublicSuffix::TrieHashParts Retained String Report ----------------------------------- 1792 "jp" 756 "no" 549 "museum" 370 "it" 332 "com" ... HashTrieArray: Total allocated: 27171176 bytes (276366 objects) Total retained: 20072416 bytes (115019 objects) allocated memory by gem ----------------------------------- 27171136 publicsuffix-ruby/lib 40 other allocated memory by file ----------------------------------- 27171136 /Users/weppos/Code/publicsuffix-ruby/lib/public_suffix/trie_array.rb 40 test/profilers/tries_profiler.rb allocated memory by location ----------------------------------- 17765400 /Users/weppos/Code/publicsuffix-ruby/lib/public_suffix/trie_array.rb:14 4595280 /Users/weppos/Code/publicsuffix-ruby/lib/public_suffix/trie_array.rb:22 3516640 /Users/weppos/Code/publicsuffix-ruby/lib/public_suffix/trie_array.rb:50 1293696 /Users/weppos/Code/publicsuffix-ruby/lib/public_suffix/trie_array.rb:89 120 /Users/weppos/Code/publicsuffix-ruby/lib/public_suffix/trie_array.rb:45 40 test/profilers/tries_profiler.rb:22 allocated memory by class ----------------------------------- 20063120 Array 3843016 String 2297680 PublicSuffix::TrieArray::Node 967320 Enumerator 40 PublicSuffix::TrieArray allocated objects by gem ----------------------------------- 276365 publicsuffix-ruby/lib 1 other allocated objects by file ----------------------------------- 276365 /Users/weppos/Code/publicsuffix-ruby/lib/public_suffix/trie_array.rb 1 test/profilers/tries_profiler.rb allocated objects by location ----------------------------------- 114882 /Users/weppos/Code/publicsuffix-ruby/lib/public_suffix/trie_array.rb:22 87916 /Users/weppos/Code/publicsuffix-ruby/lib/public_suffix/trie_array.rb:50 57442 /Users/weppos/Code/publicsuffix-ruby/lib/public_suffix/trie_array.rb:14 16122 /Users/weppos/Code/publicsuffix-ruby/lib/public_suffix/trie_array.rb:89 3 /Users/weppos/Code/publicsuffix-ruby/lib/public_suffix/trie_array.rb:45 1 test/profilers/tries_profiler.rb:22 allocated objects by class ----------------------------------- 114885 Array 95977 String 57442 PublicSuffix::TrieArray::Node 8061 Enumerator 1 PublicSuffix::TrieArray retained memory by gem ----------------------------------- 20072376 publicsuffix-ruby/lib 40 other retained memory by file ----------------------------------- 20072376 /Users/weppos/Code/publicsuffix-ruby/lib/public_suffix/trie_array.rb 40 test/profilers/tries_profiler.rb retained memory by location ----------------------------------- 17765400 /Users/weppos/Code/publicsuffix-ruby/lib/public_suffix/trie_array.rb:14 2297640 /Users/weppos/Code/publicsuffix-ruby/lib/public_suffix/trie_array.rb:22 9296 /Users/weppos/Code/publicsuffix-ruby/lib/public_suffix/trie_array.rb:89 40 /Users/weppos/Code/publicsuffix-ruby/lib/public_suffix/trie_array.rb:45 40 test/profilers/tries_profiler.rb:22 retained memory by class ----------------------------------- 17765400 Array 2297680 PublicSuffix::TrieArray::Node 9296 String 40 PublicSuffix::TrieArray retained objects by gem ----------------------------------- 115018 publicsuffix-ruby/lib 1 other retained objects by file ----------------------------------- 115018 /Users/weppos/Code/publicsuffix-ruby/lib/public_suffix/trie_array.rb 1 test/profilers/tries_profiler.rb retained objects by location ----------------------------------- 57442 /Users/weppos/Code/publicsuffix-ruby/lib/public_suffix/trie_array.rb:14 57441 /Users/weppos/Code/publicsuffix-ruby/lib/public_suffix/trie_array.rb:22 134 /Users/weppos/Code/publicsuffix-ruby/lib/public_suffix/trie_array.rb:89 1 /Users/weppos/Code/publicsuffix-ruby/lib/public_suffix/trie_array.rb:45 1 test/profilers/tries_profiler.rb:22 retained objects by class ----------------------------------- 57442 Array 57442 PublicSuffix::TrieArray::Node 134 String 1 PublicSuffix::TrieArray Retained String Report ----------------------------------- 1 "*.compute-1.amazonaws.com" 1 "*.compute.amazonaws.com.cn" 1 "*.githubcloudusercontent.com" 1 "accident-investigation.aero" 1 "accident-prevention.aero" 1 "air-traffic-control.aero" ...
In the first iteration I completely missed the point that given the domain name system is hierarchical, to increase compression it is a good idea to store the reversed string or parts. In this way strings sharing common suffixes such as: - io - github.io - gitlab.io will better leverage Trie compression as the space for io will be shared with the path for the other two suffixes. As a result of this change, decreased drastically: Before: ➜ publicsuffix-ruby git:(thesis-trie) ruby test/profilers/tries_prosize.rb 943,325 @trie_hash 598,730 @trie_symbol 312,361 @trie_parts 1,627,182 @trie_array After: ➜ publicsuffix-ruby git:(thesis-trie) ✗ ruby test/profilers/tries_prosize.rb 624,813 @trie_hash 399,660 @trie_symbol 197,291 @trie_parts 982,347 @trie_array ➜ publicsuffix-ruby git:(thesis-trie) ruby test/profilers/tries_profiler.rb hash Total allocated: 17,067,504 bytes (262,240 objects) Total retained: 10,433,288 bytes (112,605 objects) ➜ publicsuffix-ruby git:(thesis-trie) ruby test/profilers/tries_profiler.rb hash-symbol Total allocated: 15,567,184 bytes (224,732 objects) Total retained: 8,932,968 bytes (75,097 objects) ➜ publicsuffix-ruby git:(thesis-trie) ruby test/profilers/tries_profiler.rb hash-parts Total allocated: 7,388,993 bytes (130,792 objects) Total retained: 1,438,762 bytes (24,513 objects) ➜ publicsuffix-ruby git:(thesis-trie) ruby test/profilers/tries_profiler.rb array Total allocated: 18,700,776 bytes (224,706 objects) Total retained: 12,066,560 bytes (75,071 objects)
Change the Trie to store an associative key/pair, instead of a single set of words. The key is the rule, the value is the metadata of the rule. ➜ publicsuffix-ruby git:(thesis-trie) ✗ ruby test/profilers/list_profsize.rb 8061 rules: 482,019 PublicSuffix::List size 263,451 Size of @rules 307,630 Size of @trie It looks like the Hash is still a little bit smaller than the Trie.
Merge the entry into the trie node. That will also allow to save the attribute "length" of the entry, which is not required by the trie as I can already determine the length by the level in the tree. Before: ➜ publicsuffix-ruby git:(thesis-trie) ✗ ruby test/profilers/list_profsize.rb 8061 rules: 482,019 PublicSuffix::List size 263,451 Size of @rules 307,630 Size of @trie After: ➜ publicsuffix-ruby git:(thesis-trie) ✗ ruby test/profilers/list_profsize.rb 8061 rules: 490,391 PublicSuffix::List size 263,451 Size of @rules 226,985 Size of @trie The trie is now beating the Hash by ~40kb.
This commit handle wildcard and exceptions, and passes all the tests. --- Benchmark Hash vs Trie: ➜ publicsuffix-ruby git:(thesis-trie) ✗ WHAT=hash ruby test/benchmarks/bm_find.rb Rehearsal ------------------------------------------------------------- NAME_SHORT 0.530000 0.010000 0.540000 ( 0.540001) NAME_MEDIUM 0.600000 0.000000 0.600000 ( 0.608115) NAME_LONG 0.780000 0.010000 0.790000 ( 0.796897) NAME_WILD 0.900000 0.020000 0.920000 ( 0.961535) NAME_EXCP 1.020000 0.020000 1.040000 ( 1.094007) IAAA 0.620000 0.010000 0.630000 ( 0.649537) IZZZ 0.590000 0.000000 0.590000 ( 0.604190) PAAA 1.030000 0.020000 1.050000 ( 1.082507) PZZZ 0.970000 0.020000 0.990000 ( 1.009199) JP 0.920000 0.010000 0.930000 ( 0.939533) IT 0.610000 0.010000 0.620000 ( 0.618309) COM 0.630000 0.000000 0.630000 ( 0.642974) ---------------------------------------------------- total: 9.330000sec user system total real NAME_SHORT 0.580000 0.010000 0.590000 ( 0.592958) NAME_MEDIUM 0.680000 0.010000 0.690000 ( 0.698372) NAME_LONG 0.820000 0.010000 0.830000 ( 0.830893) NAME_WILD 0.810000 0.010000 0.820000 ( 0.831984) NAME_EXCP 0.960000 0.010000 0.970000 ( 0.981469) IAAA 0.600000 0.010000 0.610000 ( 0.611947) IZZZ 0.610000 0.000000 0.610000 ( 0.626348) PAAA 0.970000 0.020000 0.990000 ( 0.982282) PZZZ 0.990000 0.010000 1.000000 ( 1.012680) JP 0.940000 0.010000 0.950000 ( 0.954031) IT 0.610000 0.010000 0.620000 ( 0.627587) COM 0.620000 0.010000 0.630000 ( 0.636131) ➜ publicsuffix-ruby git:(thesis-trie) ✗ WHAT=trie ruby test/benchmarks/bm_find.rb Rehearsal ------------------------------------------------------------- NAME_SHORT 0.700000 0.010000 0.710000 ( 0.722887) NAME_MEDIUM 0.750000 0.010000 0.760000 ( 0.767034) NAME_LONG 0.790000 0.010000 0.800000 ( 0.802235) NAME_WILD 0.770000 0.010000 0.780000 ( 0.786366) NAME_EXCP 0.810000 0.010000 0.820000 ( 0.832109) IAAA 0.680000 0.000000 0.680000 ( 0.690577) IZZZ 0.690000 0.010000 0.700000 ( 0.694839) PAAA 0.810000 0.010000 0.820000 ( 0.826133) PZZZ 0.790000 0.010000 0.800000 ( 0.803508) JP 0.830000 0.000000 0.830000 ( 0.855188) IT 0.710000 0.010000 0.720000 ( 0.714962) COM 0.670000 0.010000 0.680000 ( 0.687400) ---------------------------------------------------- total: 9.100000sec user system total real NAME_SHORT 0.690000 0.010000 0.700000 ( 0.706099) NAME_MEDIUM 0.730000 0.010000 0.740000 ( 0.749351) NAME_LONG 0.750000 0.010000 0.760000 ( 0.765484) NAME_WILD 0.770000 0.010000 0.780000 ( 0.781182) NAME_EXCP 0.800000 0.000000 0.800000 ( 0.815244) IAAA 0.670000 0.010000 0.680000 ( 0.682966) IZZZ 0.670000 0.010000 0.680000 ( 0.682771) PAAA 0.830000 0.010000 0.840000 ( 0.847581) PZZZ 0.810000 0.010000 0.820000 ( 0.829023) JP 0.810000 0.000000 0.810000 ( 0.831782) IT 0.680000 0.010000 0.690000 ( 0.691071) COM 0.660000 0.010000 0.670000 ( 0.669978)
Ruby allocates a reasonable amount of memory even for an empty Hash. Do not allocate the children Hash until needed, to avoid having nodes with no children using unnecessary extra memory. Pre-initialize children: 226,985 Size of @trie ➜ publicsuffix-ruby git:(thesis-trie) ruby test/profilers/initialization_profiler.rb Total allocated: 8950176 bytes (117512 objects) Total retained: 2475538 bytes (40477 objects) Lazy-initialize children: 219,329 Size of @trie ➜ publicsuffix-ruby git:(thesis-trie) ✗ ruby test/profilers/initialization_profiler.rb Total allocated: 8643936 bytes (109856 objects) Total retained: 2169298 bytes (32821 objects)
As a historical note, I was hacking on something similar a while ago, but never got around to integrating it with the gem. https://gist.github.com/pzb/5aba13a67bd9fa64b3769397c842889b is what I had. It is way faster than the existing gem but is missing support for dynamically enabling/disabling the private section. |
Thanks for the feedback @pzb This PR, along with #133, was the result of a research I made as part of my degree thesis. I must say that the results achieved with #133 are already stunning compared with the existing gem, and I am planning on releasing it as soon as I can. Sadly, I merged it a while ago but there is a lot of extra work (mostly docs and deprecation info) I have to complete before releasing it as a major version. You can already test it using master instead of the released gem. The library is now working in constant time, whereas before it was still linear time (although optimized). The tree based version in this PR is a few milliseconds slower than the hash-based one, but it allows to save some extra bytes of allocation. That's why I was considering to merge it as well. The good news is that both this PR and #133 allows dynamic modification of the list. I worked on a DAWG/DAFSA version that was even more lightweight, but that did not allow dynamic modifications of the list hence I discarded it for now. If you have the chance, take a look at #134 and give a try at the version in master that already includes that PR. I believe you'll be very happy about the improvements. :) |
No description provided.