Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ESQL: LOOKUP JOIN produces additional null rows #117702

Open
alex-spies opened this issue Nov 28, 2024 · 2 comments · May be fixed by #117843
Open

ESQL: LOOKUP JOIN produces additional null rows #117702

alex-spies opened this issue Nov 28, 2024 · 2 comments · May be fixed by #117843
Labels
:Analytics/ES|QL AKA ESQL >bug Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo)

Comments

@alex-spies
Copy link
Contributor

alex-spies commented Nov 28, 2024

Reproducer:

curl -u elastic:password -H "Content-Type: application/json" "127.0.0.1:9200/languages_lookup" -XPUT -d '{\
  "mappings": {\
        "properties": {"language_code": {"type": "keyword"}, "language_name":{"type": "keyword"}}}, "settings": {"index": {"mode": "lookup"}}}'\

$ curl -u elastic:password -HContent-Type:application/json 'localhost:9200/languages_lookup/_doc?refresh' -d'{"language_code": "1", "language_name": "English"}'
$ curl -u elastic:password -HContent-Type:application/json 'localhost:9200/languages_lookup/_doc?refresh' -d'{"language_code": "2", "language_name": "French"}'
$ curl -u elastic:password -HContent-Type:application/json 'localhost:9200/test/_doc?refresh' -d'{"x": 1}'
$ curl -u elastic:password -HContent-Type:application/json 'localhost:9200/test/_doc?refresh' -d'{"x": 3}'

$ curl -u elastic:password -H "Content-Type: application/json" "127.0.0.1:9200/_query?format=txt" -d '{\
    "query": "FROM test | sort x | limit 10 | eval language_code = x::keyword | lookup join languages_lookup on language_code | keep x, language_code, language_name"\
}'\

       x       | language_code | language_name 
---------------+---------------+---------------
1              |1              |English        
3              |3              |null           
null           |null           |null           
null           |null           |null  

Note the 2 additional rows at the bottom. The expected output should only be the 2 first rows.

Interestingly, changing the limit 10 to limit 4 doesn't change the query's output - but changing to limit 3 and limit 2 shaves off 1 resp. 2 of the wrong rows.

Here's are the optimizer diffs (not sure if they help):

[2024-11-28T12:28:32,894][INFO ][o.e.x.e.a.Analyzer       ] [runTask-0] Rule analysis.Analyzer$ResolveTable applied
Keep[[?x, ?language_code, ?language_name]]               = Keep[[?x, ?language_code, ?language_name]]
\_LookupJoin[LEFT OUTER USING [?language_code],[],[],[]] = \_LookupJoin[LEFT OUTER USING [?language_code],[],[],[]]
  |_Eval[[TOSTRING(?x) AS language_code]]                =   |_Eval[[TOSTRING(?x) AS language_code]]
  | \_Limit[10[INTEGER]]                                 =   | \_Limit[10[INTEGER]]
  |   \_OrderBy[[Order[?x,ASC,LAST]]]                    =   |   \_OrderBy[[Order[?x,ASC,LAST]]]
  |     \_UnresolvedRelation[test]                       !   |     \_EsRelation[test][x{f}#123]
  \_UnresolvedRelation[languages_lookup]                 !   \_EsRelation[languages_lookup][LOOKUP][language_code{f}#124, language_name{f}#125]
                                                        
[2024-11-28T12:28:32,895][INFO ][o.e.x.e.a.Analyzer       ] [runTask-0] Rule analysis.Analyzer$ResolveRefs applied
Keep[[?x, ?language_code, ?language_name]]                                           ! EsqlProject[[x{f}#123, language_code{r}#117, language_name{f}#125]]
\_LookupJoin[LEFT OUTER USING [?language_code],[],[],[]]                             ! \_LookupJoin[LEFT,[language_code{r}#117],[language_code{r}#117],[language_code{f}#124]]
  |_Eval[[TOSTRING(?x) AS language_code]]                                            !   |_Eval[[TOSTRING(x{f}#123) AS language_code]]
  | \_Limit[10[INTEGER]]                                                             =   | \_Limit[10[INTEGER]]
  |   \_OrderBy[[Order[?x,ASC,LAST]]]                                                !   |   \_OrderBy[[Order[x{f}#123,ASC,LAST]]]
  |     \_EsRelation[test][x{f}#123]                                                 =   |     \_EsRelation[test][x{f}#123]
  \_EsRelation[languages_lookup][LOOKUP][language_code{f}#124, language_name{f}#125] =   \_EsRelation[languages_lookup][LOOKUP][language_code{f}#124, language_name{f}#125]
                                                                                    
[2024-11-28T12:28:32,895][INFO ][o.e.x.e.a.Analyzer       ] [runTask-0] Rule analysis.Analyzer$AddImplicitLimit applied
EsqlProject[[x{f}#123, language_code{r}#117, language_name{f}#125]]                     ! Limit[10000[INTEGER]]
\_LookupJoin[LEFT,[language_code{r}#117],[language_code{r}#117],[language_code{f}#124]] ! \_EsqlProject[[x{f}#123, language_code{r}#117, language_name{f}#125]]
  |_Eval[[TOSTRING(x{f}#123) AS language_code]]                                         !   \_LookupJoin[LEFT,[language_code{r}#117],[language_code{r}#117],[language_code{f}#124]]
  | \_Limit[10[INTEGER]]                                                                !     |_Eval[[TOSTRING(x{f}#123) AS language_code]]
  |   \_OrderBy[[Order[x{f}#123,ASC,LAST]]]                                             !     | \_Limit[10[INTEGER]]
  |     \_EsRelation[test][x{f}#123]                                                    !     |   \_OrderBy[[Order[x{f}#123,ASC,LAST]]]
  \_EsRelation[languages_lookup][LOOKUP][language_code{f}#124, language_name{f}#125]    !     |     \_EsRelation[test][x{f}#123]
                                                                                        !     \_EsRelation[languages_lookup][LOOKUP][language_code{f}#124, language_name{f}#125]
                                                                                       
[2024-11-28T12:28:32,896][INFO ][o.e.x.e.o.LogicalPlanOptimizer] [runTask-0] Rule logical.SubstituteSurrogatePlans applied
Limit[10000[INTEGER]]                                                                     = Limit[10000[INTEGER]]
\_EsqlProject[[x{f}#123, language_code{r}#117, language_name{f}#125]]                     = \_EsqlProject[[x{f}#123, language_code{r}#117, language_name{f}#125]]
  \_LookupJoin[LEFT,[language_code{r}#117],[language_code{r}#117],[language_code{f}#124]] !   \_Project[[x{f}#123, language_code{r}#117, language_name{f}#125]]
    |_Eval[[TOSTRING(x{f}#123) AS language_code]]                                         !     \_Join[LEFT,[language_code{r}#117],[language_code{r}#117],[language_code{f}#124]]
    | \_Limit[10[INTEGER]]                                                                !       |_Eval[[TOSTRING(x{f}#123) AS language_code]]
    |   \_OrderBy[[Order[x{f}#123,ASC,LAST]]]                                             !       | \_Limit[10[INTEGER]]
    |     \_EsRelation[test][x{f}#123]                                                    !       |   \_OrderBy[[Order[x{f}#123,ASC,LAST]]]
    \_EsRelation[languages_lookup][LOOKUP][language_code{f}#124, language_name{f}#125]    !       |     \_EsRelation[test][x{f}#123]
                                                                                          !       \_EsRelation[languages_lookup][LOOKUP][language_code{f}#124, language_name{f}#125]
                                                                                         
[2024-11-28T12:28:32,896][INFO ][o.e.x.e.o.LogicalPlanOptimizer] [runTask-0] Rule logical.CombineProjections applied
Limit[10000[INTEGER]]                                                                    = Limit[10000[INTEGER]]
\_EsqlProject[[x{f}#123, language_code{r}#117, language_name{f}#125]]                    ! \_Project[[x{f}#123, language_code{r}#117, language_name{f}#125]]
  \_Project[[x{f}#123, language_code{r}#117, language_name{f}#125]]                      !   \_Join[LEFT,[language_code{r}#117],[language_code{r}#117],[language_code{f}#124]]
    \_Join[LEFT,[language_code{r}#117],[language_code{r}#117],[language_code{f}#124]]    !     |_Eval[[TOSTRING(x{f}#123) AS language_code]]
      |_Eval[[TOSTRING(x{f}#123) AS language_code]]                                      !     | \_Limit[10[INTEGER]]
      | \_Limit[10[INTEGER]]                                                             !     |   \_OrderBy[[Order[x{f}#123,ASC,LAST]]]
      |   \_OrderBy[[Order[x{f}#123,ASC,LAST]]]                                          !     |     \_EsRelation[test][x{f}#123]
      |     \_EsRelation[test][x{f}#123]                                                 !     \_EsRelation[languages_lookup][LOOKUP][language_code{f}#124, language_name{f}#125]
      \_EsRelation[languages_lookup][LOOKUP][language_code{f}#124, language_name{f}#125] ! 
                                                                                        
[2024-11-28T12:28:32,897][INFO ][o.e.x.e.o.LogicalPlanOptimizer] [runTask-0] Rule logical.PushDownAndCombineLimits applied
Limit[10000[INTEGER]]                                                                  ! Project[[x{f}#123, language_code{r}#117, language_name{f}#125]]
\_Project[[x{f}#123, language_code{r}#117, language_name{f}#125]]                      ! \_Limit[10000[INTEGER]]
  \_Join[LEFT,[language_code{r}#117],[language_code{r}#117],[language_code{f}#124]]    =   \_Join[LEFT,[language_code{r}#117],[language_code{r}#117],[language_code{f}#124]]
    |_Eval[[TOSTRING(x{f}#123) AS language_code]]                                      =     |_Eval[[TOSTRING(x{f}#123) AS language_code]]
    | \_Limit[10[INTEGER]]                                                             =     | \_Limit[10[INTEGER]]
    |   \_OrderBy[[Order[x{f}#123,ASC,LAST]]]                                          =     |   \_OrderBy[[Order[x{f}#123,ASC,LAST]]]
    |     \_EsRelation[test][x{f}#123]                                                 =     |     \_EsRelation[test][x{f}#123]
    \_EsRelation[languages_lookup][LOOKUP][language_code{f}#124, language_name{f}#125] =     \_EsRelation[languages_lookup][LOOKUP][language_code{f}#124, language_name{f}#125]
                                                                                      
[2024-11-28T12:28:32,898][INFO ][o.e.x.e.o.LogicalPlanOptimizer] [runTask-0] Rule logical.ReplaceLimitAndSortAsTopN applied
Project[[x{f}#123, language_code{r}#117, language_name{f}#125]]                        = Project[[x{f}#123, language_code{r}#117, language_name{f}#125]]
\_Limit[10000[INTEGER]]                                                                = \_Limit[10000[INTEGER]]
  \_Join[LEFT,[language_code{r}#117],[language_code{r}#117],[language_code{f}#124]]    =   \_Join[LEFT,[language_code{r}#117],[language_code{r}#117],[language_code{f}#124]]
    |_Eval[[TOSTRING(x{f}#123) AS language_code]]                                      =     |_Eval[[TOSTRING(x{f}#123) AS language_code]]
    | \_Limit[10[INTEGER]]                                                             !     | \_TopN[[Order[x{f}#123,ASC,LAST]],10[INTEGER]]
    |   \_OrderBy[[Order[x{f}#123,ASC,LAST]]]                                          !     |   \_EsRelation[test][x{f}#123]
    |     \_EsRelation[test][x{f}#123]                                                 !     \_EsRelation[languages_lookup][LOOKUP][language_code{f}#124, language_name{f}#125]
    \_EsRelation[languages_lookup][LOOKUP][language_code{f}#124, language_name{f}#125] ! 
                                                                                      
[2024-11-28T12:28:32,898][INFO ][o.e.x.e.o.PhysicalPlanOptimizer] [runTask-0] Rule physical.ProjectAwayColumns applied
ProjectExec[[x{f}#123, language_code{r}#117, language_name{f}#125]]                                                                  = ProjectExec[[x{f}#123, language_code{r}#117, language_name{f}#125]]
\_LimitExec[10000[INTEGER]]                                                                                                          = \_LimitExec[10000[INTEGER]]
  \_LookupJoinExec[[language_code{r}#117],[language_code{r}#117],[language_code{f}#124],[x{f}#123, language_code{r}#117, language_na =   \_LookupJoinExec[[language_code{r}#117],[language_code{r}#117],[language_code{f}#124],[x{f}#123, language_code{r}#117, language_na
me{f}#125]]                                                                                                                          = me{f}#125]]
    |_EvalExec[[TOSTRING(x{f}#123) AS language_code]]                                                                                =     |_EvalExec[[TOSTRING(x{f}#123) AS language_code]]
    | \_TopNExec[[Order[x{f}#123,ASC,LAST]],10[INTEGER],null]                                                                        =     | \_TopNExec[[Order[x{f}#123,ASC,LAST]],10[INTEGER],null]
    |   \_ExchangeExec[[],false]                                                                                                     !     |   \_ExchangeExec[[x{f}#123],false]
    |     \_FragmentExec[filter=null, estimatedRowSize=0, reducer=[], fragment=[<>                                                   =     |     \_FragmentExec[filter=null, estimatedRowSize=0, reducer=[], fragment=[<>
TopN[[Order[x{f}#123,ASC,LAST]],10[INTEGER]]                                                                                         ! Project[[x{f}#123]]
\_EsRelation[test][x{f}#123]<>]]                                                                                                     ! \_TopN[[Order[x{f}#123,ASC,LAST]],10[INTEGER]]
    \_FragmentExec[filter=null, estimatedRowSize=0, reducer=[], fragment=[<>                                                         !   \_EsRelation[test][x{f}#123]<>]]
EsRelation[languages_lookup][LOOKUP][language_code{f}#124, language_name{f}#125]<>]]                                                 !     \_FragmentExec[filter=null, estimatedRowSize=0, reducer=[], fragment=[<>
                                                                                                                                     ! EsRelation[languages_lookup][LOOKUP][language_code{f}#124, language_name{f}#125]<>]]
                                                                                                                                    
[2024-11-28T12:28:32,899][INFO ][o.e.x.e.o.LocalPhysicalPlanOptimizer] [runTask-0] Rule local.ReplaceSourceAttributes applied
EsSourceExec[languages_lookup][language_code{f}#124, language_name{f}#125] ! EsQueryExec[languages_lookup], indexMode[lookup], query[][_doc{f}#126], limit[], sort[] estimatedRowSize[null]
                                                                          
[2024-11-28T12:28:32,902][INFO ][o.e.x.e.o.LocalLogicalPlanOptimizer] [runTask-0] Rule local.ReplaceTopNWithLimitAndSort applied
Project[[x{f}#123]]                            = Project[[x{f}#123]]
\_TopN[[Order[x{f}#123,ASC,LAST]],10[INTEGER]] ! \_Limit[10[INTEGER]]
  \_EsRelation[test][x{f}#123]                 !   \_OrderBy[[Order[x{f}#123,ASC,LAST]]]
                                               !     \_EsRelation[test][x{f}#123]
                                              
[2024-11-28T12:28:32,903][INFO ][o.e.x.e.o.LocalLogicalPlanOptimizer] [runTask-0] Rule logical.ReplaceLimitAndSortAsTopN applied
Project[[x{f}#123]]                     = Project[[x{f}#123]]
\_Limit[10[INTEGER]]                    ! \_TopN[[Order[x{f}#123,ASC,LAST]],10[INTEGER]]
  \_OrderBy[[Order[x{f}#123,ASC,LAST]]] !   \_EsRelation[test][x{f}#123]
    \_EsRelation[test][x{f}#123]        ! 
                                       
[2024-11-28T12:28:32,903][INFO ][o.e.x.e.o.LocalPhysicalPlanOptimizer] [runTask-0] Rule local.ReplaceSourceAttributes applied
ProjectExec[[x{f}#123]]                                 = ProjectExec[[x{f}#123]]
\_TopNExec[[Order[x{f}#123,ASC,LAST]],10[INTEGER],null] = \_TopNExec[[Order[x{f}#123,ASC,LAST]],10[INTEGER],null]
  \_EsSourceExec[test][x{f}#123]                        !   \_EsQueryExec[test], indexMode[standard], query[][_doc{f}#127], limit[], sort[] estimatedRowSize[null]
                                                       
[2024-11-28T12:28:32,904][INFO ][o.e.x.e.o.LocalPhysicalPlanOptimizer] [runTask-0] Rule local.PushTopNToSource applied
ProjectExec[[x{f}#123]]                                                                                  = ProjectExec[[x{f}#123]]
\_TopNExec[[Order[x{f}#123,ASC,LAST]],10[INTEGER],null]                                                  ! \_EsQueryExec[test], indexMode[standard], query[][_doc{f}#127], limit[10], sort[[FieldSort[field=x{f}#123, direction=ASC, nulls=LAST]]] estimatedRowSize[null]
  \_EsQueryExec[test], indexMode[standard], query[][_doc{f}#127], limit[], sort[] estimatedRowSize[null] ! 
                                                                                                        
[2024-11-28T12:28:32,904][INFO ][o.e.x.e.o.LocalPhysicalPlanOptimizer] [runTask-0] Rule local.InsertFieldExtraction applied
ProjectExec[[x{f}#123]]                                                                                                                                        = ProjectExec[[x{f}#123]]
\_EsQueryExec[test], indexMode[standard], query[][_doc{f}#127], limit[10], sort[[FieldSort[field=x{f}#123, direction=ASC, nulls=LAST]]] estimatedRowSize[null] ! \_FieldExtractExec[x{f}#123]<[]>
                                                                                                                                                               !   \_EsQueryExec[test], indexMode[standard], query[][_doc{f}#127], limit[10], sort[[FieldSort[field=x{f}#123, direction=ASC, nulls=LAST]]] estimatedRowSize[null]
                                                                                                       
@elasticsearchmachine elasticsearchmachine added the Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) label Nov 28, 2024
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-analytical-engine (Team:Analytics)

@craigtaverner
Copy link
Contributor

This was likely due to the bug that caused left-hand side data to be read from BOTH indexes. If the two indexes had no fields in common, no errors are produced and instead null values appear on the left (as many extra null rows as there were rows on the right). This should be fixed by #117843

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Analytics/ES|QL AKA ESQL >bug Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo)
Projects
None yet
3 participants