-
Notifications
You must be signed in to change notification settings - Fork 671
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Incorrect result from SELECT DISTINCT ON a distributed table (version 12.1-1 ) #7684
Comments
Confirmed, the problem really is with this part |
@scooreman if you can patch/compile citus, I think this patch is ok:
I suppose you have a larger SQL code base to test, so if you can give it a try ? |
@c2main I can confirm that the fix works on our data set. |
@scooreman nice to know it works! @onurctirtir or others , I didn't made a PR for that yet, can you reopen ? |
Similar to other bugs affected by citus usage of ruletuils: the provided tree as input is distinct from the tree expected in ruleutils in PostgreSQL. As a consequence, it is required in some places to re-order the tree structure to make it compliant or "back" to the parser tree (before it's rewritten and reordered by PostgreSQL to optimize execution). Or to lookup the target list and ensure it's the one "we expect". Fox this bug, the `get_rule_sortgroupclause()` is used to check if target list entry `resname` is defined, and use it directly if it exists. No benchmark where run, it's not expected to impact a lot.
Similar to other bugs affected by citus usage of ruletuils: the provided tree as input is distinct from the tree expected in ruleutils in PostgreSQL. As a consequence, it is required in some places to re-order the tree structure to make it compliant or "back" to the parser tree (before it's rewritten and reordered by PostgreSQL to optimize execution). Or to lookup the target list and ensure it's the one "we expect". Fox this bug, the `get_rule_sortgroupclause()` is used to check if target list entry `resname` is defined, and use it directly if it exists. No benchmark where run, it's not expected to impact a lot.
@scooreman be careful, the provided patch is probably not the good fix, it's more subtle than that. Citus is building a query like the following, note the worker_column_3 added, and the absence of table qualified names.
And the results for this query are correct ! See PostgreSQL DISTINCT and ORDER BY documentation in [1] and [2]. The problem is only related to absence of table qualified name in the built query by citus. So my patch is good only by accident... [1] https://www.postgresql.org/docs/current/sql-select.html#SQL-DISTINCT |
For reference, there is a patch from August 2024 in PostgreSQL touching this area: It's really not clear what's the best fix for Citus, and having to manage PostgreSQL 14 and 15 may not be that easy or free. Maybe better to backpatch more of PostgreSQL 17 ruleutils code into ruleutils 14, 15 and 16... |
Query no longer returns the correct result after distributing table:
sandbox=# CREATE TABLE test (
attribute1 varchar(255),
attribute2 varchar(255),
attribute3 varchar(255)
);
CREATE TABLE
sandbox=# INSERT INTO test (attribute1, attribute2, attribute3)
VALUES ('Phone', 'John', 'A'),
('Phone', 'Eric', 'A'),
('Tablet','Eric', 'B');
INSERT 0 3
sandbox=# SELECT DISTINCT ON (T.attribute1, T.attribute2)
T.attribute1 as attribute1,
T.attribute3 as attribute2
FROM test T;
attribute1 | attribute2
------------+------------
Phone | A
Phone | A
Tablet | B
(3 rows)
sandbox=# SELECT create_distributed_table('test', 'attribute1');$$"ford-dev".test$$ )
NOTICE: Copying data from local table...
NOTICE: copying the data has completed
DETAIL: The local data in the table is no longer visible, but is still on disk.
HINT: To remove the local data, run: SELECT truncate_local_data_after_distributing_table(
create_distributed_table
(1 row)
sandbox=# SELECT DISTINCT ON (T.attribute1, T.attribute2)
T.attribute1 as attribute1,
T.attribute3 as attribute2
FROM test T;
attribute1 | attribute2
------------+------------
Phone | A
Tablet | B
(2 rows)
The text was updated successfully, but these errors were encountered: