Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Aggregator support prefetch and new hasher #9679

Open
wants to merge 14 commits into
base: master
Choose a base branch
from

Conversation

guo-shaoge
Copy link
Contributor

@guo-shaoge guo-shaoge commented Nov 28, 2024

What problem does this PR solve?

Issue Number: close #9680

Problem Summary:

What is changed and how it works?

  1. Add new hash func
  2. Support prefetch for HashTable and StringHashTable
    1. For StringHashTable, emplace will touch the sepcific submap instead of using StringHashMap method. Because it's easier to implement prefetch.

Benchmark

workloads: TPCH-50G
queries:

-- Q1-1: key_int64; distinct rate: 10M/300M; HashMap
explain analyze select /*+ mpp_1phase_agg() */ sum(l_quantity), l_partkey from lineitem group by l_partkey limit 1 offset 9000000;
-- Q1-2: key_int64; 75M/300M; HashMp
explain analyze select /*+ mpp_1phase_agg() */ sum(l_quantity), l_orderkey from lineitem group by  L_orderkey limit 1 offset 70000000;
-- Q1-3: key_int64; 7/300M; HashMap
explain analyze select /*+ mpp_1phase_agg() */ sum(l_quantity), l_linenumber from lineitem group by  l_linenumber;

-- Q2-1: one_key_strbinpadding_phmap; 2/300M; StringHashMap
explain analyze select /*+ mpp_1phase_agg() */ sum(l_discount), l_linestatus from lineitem group by l_linestatus;
-- Q2-2: one_key_strbinpadding; 104M/300M; StringHashMap
explain analyze select /*+ mpp_1phase_agg() */ sum(l_discount) from lineitem group by l_comment limit 1 offset 100000000;


-- Q3-1: key_serialized as group by method;  33/300M; HashMap with StringRef key
explain analyze select /*+ mpp_1phase_agg() */ sum(l_discount), l_returnflag from lineitem group by l_returnflag, l_discount;
-- Q3-2: key_serialized as group by method; 77M/300M; HashMap with StringRef key
explain analyze select /*+ mpp_1phase_agg() */ sum(l_discount), l_returnflag from lineitem group by l_returnflag, l_discount, l_extendedprice limit 1 offset 75000000;


-- Q4-1: two_keys_num64_strbinpadding: 21/300M; HashMap with StringRef key
explain analyze select /*+ mpp_1phase_agg() */ sum(l_discount) from lineitem group by l_returnflag, L_LINENUMBER;
-- Q4-2: two_keys_num64_strbinpadding; 29.9M/300M; HashMap with StringRef key
explain analyze select /*+ mpp_1phase_agg() */ sum(l_discount), l_partkey from lineitem group by l_returnflag, l_partkey limit 1 offset 29000000;


-- Q5-1: keys_128; 77/300M; HashMap with UInt128 key
explain analyze select /*+ mpp_1phase_agg() */ sum(l_linenumber), l_discount from lineitem group by l_linenumber, l_discount;
-- Q5-2: keys_128; 5M/300M; HashMap with UInt128 key
explain analyze select /*+ mpp_1phase_agg() */ sum(l_linenumber), l_discount from lineitem group by l_suppkey, l_discount limit 1 offset 4000000;


-- Q6-1: key_string; 4/300M; StringHashMap
 explain analyze select /*+ mpp_1phase_agg() */ sum(l_quantity), l_shipinstruct from lineitem group by l_shipinstruct;
 
 -- Q7-1: keys_256; 290M/300M; HashMap;
 explain analyze select /*+ mpp_1phase_agg() */ sum(l_suppkey) from lineitem group by l_suppkey, l_tax, l_discount, l_partkey limit 1;

Results:

Check List

Tests

  • Unit test
  • Integration test
  • Manual test (add detailed scripts or steps below)
  • No code

Side effects

  • Performance regression: Consumes more CPU
  • Performance regression: Consumes more Memory
  • Breaking backward compatibility

Documentation

  • Affects user behaviors
  • Contains syntax changes
  • Contains variable changes
  • Contains experimental features
  • Changes MySQL compatibility

Release note

None

Signed-off-by: guo-shaoge <shaoge1994@163.com>
Signed-off-by: guo-shaoge <shaoge1994@163.com>
Signed-off-by: guo-shaoge <shaoge1994@163.com>
Signed-off-by: guo-shaoge <shaoge1994@163.com>
Signed-off-by: guo-shaoge <shaoge1994@163.com>
Signed-off-by: guo-shaoge <shaoge1994@163.com>
@ti-chi-bot ti-chi-bot bot added do-not-merge/needs-linked-issue release-note-none Denotes a PR that doesn't merit a release note. labels Nov 28, 2024
Copy link
Contributor

ti-chi-bot bot commented Nov 28, 2024

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please ask for approval from guo-shaoge, ensuring that each of them provides their approval before proceeding. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@ti-chi-bot ti-chi-bot bot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. and removed do-not-merge/needs-linked-issue labels Nov 28, 2024
Signed-off-by: guo-shaoge <shaoge1994@163.com>
Signed-off-by: guo-shaoge <shaoge1994@163.com>
Signed-off-by: guo-shaoge <shaoge1994@163.com>
Signed-off-by: guo-shaoge <shaoge1994@163.com>
Signed-off-by: guo-shaoge <shaoge1994@163.com>
@guo-shaoge guo-shaoge changed the title Aggregator support prefetch Aggregator support prefetch and new hasher Dec 2, 2024
Signed-off-by: guo-shaoge <shaoge1994@163.com>
Signed-off-by: guo-shaoge <shaoge1994@163.com>
@guo-shaoge
Copy link
Contributor Author

/retest

Signed-off-by: guo-shaoge <shaoge1994@163.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
release-note-none Denotes a PR that doesn't merit a release note. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

support prefetch and new hasher for Aggregator
2 participants