Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

There is no supertype when CROSS to INNER JOIN rewrite WHERE a.key=b.key-1 #21794

Open
UnamedRus opened this issue Mar 16, 2021 · 16 comments
Open
Assignees
Labels
comp-joins JOINs feature st-discussion The story requires discussion /research / expert help / design & decomposition before will be taken

Comments

@UnamedRus
Copy link
Contributor

Describe the bug
There is no supertype for types UInt64, Int64. CROSS to INNER JOIN rewrite WHERE a.key=b.key-1

How to reproduce
21.4.1.6210

SELECT *
FROM
(
    SELECT number
    FROM numbers(10)
) AS a
,
(
    SELECT number
    FROM numbers(10)
) AS b
WHERE a.number = (b.number - 1)

Query id: 6ca49d4e-e6a1-4dc0-9552-280d481be176


0 rows in set. Elapsed: 0.003 sec.

Received exception from server (version 21.4.1):
Code: 53. DB::Exception: Received from localhost:9000. DB::Exception: Type mismatch of columns to JOIN by: number: UInt64 at left, minus(b.number, 1): Int64 at right. Can't get supertype: There is no supertype for types UInt64, Int64 because some of them are signed integers and some are unsigned integers, but there is no signed integer type, that can exactly represent all required unsigned integer values.


set cross_to_inner_join_rewrite=0;

SELECT *
FROM
(
    SELECT number
    FROM numbers(10)
) AS a
,
(
    SELECT number
    FROM numbers(10)
) AS b
WHERE a.number = (b.number - 1)

Query id: ac47d703-f528-43ea-a0e7-a030a869b568

┌─number─┬─b.number─┐
│      0 │        1 │
│      1 │        2 │
│      2 │        3 │
│      3 │        4 │
│      4 │        5 │
│      5 │        6 │
│      6 │        7 │
│      7 │        8 │
│      8 │        9 │
└────────┴──────────┘

@UnamedRus UnamedRus added the bug Confirmed user-visible misbehaviour in official release label Mar 16, 2021
@UnamedRus
Copy link
Contributor Author

@vdimir

@vdimir
Copy link
Member

vdimir commented Mar 16, 2021

It's expected, because UInt64 and Int64 is too big and we will need Int128 to fit all possible range of data.

Any ideas how to handle it?

@vdimir vdimir added feature and removed bug Confirmed user-visible misbehaviour in official release labels Mar 16, 2021
@UnamedRus
Copy link
Contributor Author

Well, it works in regular WHERE condition:

│       Filter (WHERE)                                                                                                │
│       Header: number UInt64                                                                                         │
│               b.number UInt64                                                                                       │
│       Filter column: equals(number, minus(b.number, 1)) (removed)                                                   │
│       Actions: INPUT : 0 -> number UInt64 : 0                                                                       │
│                INPUT : 1 -> b.number UInt64 : 1                                                                     │
│                COLUMN Const(UInt8) -> 1 UInt8 : 2                                                                   │
│                FUNCTION minus(b.number : 1, 1 :: 2) -> minus(b.number, 1) Int64 : 3                                 │
│                FUNCTION equals(number : 0, minus(b.number, 1) :: 3) -> equals(number, minus(b.number, 1)) UInt8 : 2
SELECT *
FROM
(
    SELECT number
    FROM numbers(18446744073709551605, 10)
) AS a
,
(
    SELECT number
    FROM numbers(18446744073709551605, 10)
) AS b
WHERE a.number = b.number

Query id: 7a0424af-e83a-40d5-af42-b7764c89b90f

┌───────────────number─┬─────────────b.number─┐
│ 18446744073709551605 │ 18446744073709551605 │
│ 18446744073709551606 │ 18446744073709551606 │
│ 18446744073709551607 │ 18446744073709551607 │
│ 18446744073709551608 │ 18446744073709551608 │
│ 18446744073709551609 │ 18446744073709551609 │
│ 18446744073709551610 │ 18446744073709551610 │
│ 18446744073709551611 │ 18446744073709551611 │
│ 18446744073709551612 │ 18446744073709551612 │
│ 18446744073709551613 │ 18446744073709551613 │
│ 18446744073709551614 │ 18446744073709551614 │
└──────────────────────┴──────────────────────┘


SELECT *
FROM
(
    SELECT number
    FROM numbers(18446744073709551605, 10)
) AS a
,
(
    SELECT number
    FROM numbers(18446744073709551605, 10)
) AS b
WHERE a.number = (b.number - 1)

Query id: 56890aa1-92c0-4f0f-8f03-e17dc029560a

Ok.


So clickhouse just blindly convert a.number and (b.number - 1) to Int64 and hope that numbers just not big enough.

@UnamedRus
Copy link
Contributor Author

Any ideas how to handle it?

Can we just convert minus to plus?


SELECT *
FROM
(
    SELECT number
    FROM numbers(18446744073709551605, 10)
) AS a
,
(
    SELECT number
    FROM numbers(18446744073709551605, 10)
) AS b
WHERE (a.number + 1) = b.number

Query id: 15a48d3d-a6c2-4c15-b386-f1bf26e7e63c

┌───────────────number─┬─────────────b.number─┐
│ 18446744073709551605 │ 18446744073709551606 │
│ 18446744073709551606 │ 18446744073709551607 │
│ 18446744073709551607 │ 18446744073709551608 │
│ 18446744073709551608 │ 18446744073709551609 │
│ 18446744073709551609 │ 18446744073709551610 │
│ 18446744073709551610 │ 18446744073709551611 │
│ 18446744073709551611 │ 18446744073709551612 │
│ 18446744073709551612 │ 18446744073709551613 │
│ 18446744073709551613 │ 18446744073709551614 │
└──────────────────────┴──────────────────────┘

It probably shouldn't break more than we have now, i guess.

@UnamedRus
Copy link
Contributor Author

SELECT
    18446744073709551605 - 1 AS x,
    CAST(x, 'UInt64') AS y

Query id: d77d5d86-d32f-4f87-ade6-6f547e6426c6

┌───x─┬────────────────────y─┐
│ -12 │ 18446744073709551604 │
└─────┴──────────────────────┘

@akuzm
Copy link
Contributor

akuzm commented Mar 16, 2021

Any ideas how to handle it?

No way, probably. Hash join uses binary comparison for keys, so it requires a supertype. Logical comparison with equals can work for different types w/o conversion, so a merge join would work. By the way, does it work if you enable experimental bigint types?

@UnamedRus
Copy link
Contributor Author

By the way, does it work if you enable experimental bigint types?

No.

No way, probably. Hash join uses binary comparison for keys, so it requires a supertype.

Ok, but for example we have Int64 and UInt64 as datatypes for keys, and have equals condition on them.

So we actually don't care for any Int64 keys < 0, because we already know that we can't find corresponding UInt64 key for them.
In that case can we safely convert it to UInt64 then?

@akuzm
Copy link
Contributor

akuzm commented Mar 16, 2021

So we actually don't care for any Int64 keys < 0, because we already know that we can't find corresponding UInt64 key for them.
In that case can we safely convert it to UInt64 then?

You're right, we don't need a supertype, but only a type that can hold any common value of the key types. For UInt64 and Int64, the common values are nonnegative Int64, which fit into UInt64. Now I start remembering that this is kind of what we wanted to do with automated type conversion for USING: e.g. we have left.key = right.key, and the key types different, so we transform the condition into left.key = castOrNull(<left key type>, right.key), and it just magically works. The right keys that don't fit the left type range are converted to Null, so the join condition doesn't hold for them. Maybe we can again apply this approach here?

@vdimir
Copy link
Member

vdimir commented Mar 16, 2021

For query SELECT a FROM t1 FULL JOIN t2 USING (a) we will have column that contains all values conatenated from t1.a and t2.a, so a in result table have to hold all values

@UnamedRus
Copy link
Contributor Author

UnamedRus commented Mar 16, 2021

For query SELECT a FROM t1 FULL JOIN t2 USING (a) we will have column that contains all values conatenated from t1.a and t2.a, so a in result table have to hold all values

CH would say a is ambiguous column, so you actually should have t1.a and t2.a colums with Int64 and UInt64 type.

I am wrong.

WITH
    t1 AS
    (
        SELECT 1 AS a
    ),
    t2 AS
    (
        SELECT -1 AS a
    )
SELECT *
FROM t1
FULL OUTER JOIN t2 USING (a)

Query id: 1b264356-3d4f-451e-8bcd-3f80f7d014ec

┌─a─┐
│ 1 │
└───┘
┌──a─┐
│ -1 │
└────┘

Postgresql just convert all to numberic:

 WITH t1 as (SELECT 18446744073709551605 as a), t2 as (SELECT -1 as a) SELECT pg_typeof(a), a FROM t1 FULL JOIN t2 USING (a)
 

pg_typeof | a
-- | --
numeric | 18446744073709551605
numeric | -1

@filimonov filimonov added the comp-joins JOINs label Mar 17, 2021
@alexey-milovidov alexey-milovidov added the st-discussion The story requires discussion /research / expert help / design & decomposition before will be taken label Jun 14, 2021
@alexey-milovidov
Copy link
Member

alexey-milovidov commented Jun 14, 2021

@vdimir

We don't need "least supertype" when doing JOIN.
What we actually need is the "most subtype". That always exists.

We already applying this method for IN (subquery).
We are converting all the values with accurateCase to the most subtype and throwing off the values that are unconvertible.

Ask @kitaisreal for the details if you need.

@alexey-milovidov
Copy link
Member

@akuzm

transform the condition into left.key = castOrNull(, right.key), and it just magically works. The right keys that don't fit the left type range are converted to Null, so the join condition doesn't hold for them. Maybe we can again apply this approach here?

Absolutely. We already have accurateCastOrNull function for this purpose.

@alexey-milovidov
Copy link
Member

For query SELECT a FROM t1 FULL JOIN t2 USING (a) we will have column that contains all values conatenated from t1.a and t2.a, so a in result table have to hold all values

@vdimir Let's apply it only for INNER JOIN.

@UnamedRus
Copy link
Contributor Author

It could be applied to LEFT/RIGHT join as well.
But appropriate type should be used (from correct joined table)

@alexey-milovidov
Copy link
Member

@vdimir is it still relevant?

@vdimir
Copy link
Member

vdimir commented Jan 19, 2022

is it still relevant?

I suppose yes, I've made attempts, but not finished yet. I'm going to continue this task

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
comp-joins JOINs feature st-discussion The story requires discussion /research / expert help / design & decomposition before will be taken
Projects
None yet
Development

No branches or pull requests

5 participants