-
Notifications
You must be signed in to change notification settings - Fork 6.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Transform rhs of IN when lhs is sub-expression of primary key during key condition analysis #50078
base: master
Are you sure you want to change the base?
Conversation
@amosbird @KochetovNicolai please help to review when you have time. Thanks! |
This is an automated comment for commit 08f0118 with description of existing statuses. It's updated for the latest CI running
|
The idea seem working, but code need some changes. |
45e8c93
to
42978ff
Compare
Fail tests looks unrelated @KochetovNicolai @amosbird |
8d42445
to
c796ca9
Compare
c796ca9
to
bc98aa1
Compare
@KochetovNicolai @alexey-milovidov can you help review when have time, thanks! |
178ae44
to
7c9ed21
Compare
@KochetovNicolai Resolved the conflict with your recent PreparedSet development. It seem that now it can build set for index from subquery when enabling analyzer 👍 |
@@ -473,7 +475,7 @@ MergeTreeSetIndex::MergeTreeSetIndex(const Columns & set_elements, std::vector<K | |||
::sort(indexes_mapping.begin(), indexes_mapping.end(), | |||
[](const KeyTuplePositionMapping & l, const KeyTuplePositionMapping & r) | |||
{ | |||
return std::tie(l.key_index, l.tuple_index) < std::tie(r.key_index, r.tuple_index); | |||
return std::forward_as_tuple(l.key_index, l.rhs_chain.size(), l.lhs_chain.size(), l.tuple_index) < std::forward_as_tuple(r.key_index, r.rhs_chain.size(), r.lhs_chain.size(), r.tuple_index); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a bit tricky line. Looks like we prefer the shortest monotonic chain?
Let's add a comment about it (and why).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I understand, in this logic, if a key column appear in multiple positions in the lhs of IN, then we will only chose one of them. If we have a
and f(a)
, then doesn't it will be better to choose the condition on a
instead of f(a)
? Pls correct me if I'm wrong.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a comment to explain
488b7a4
to
e8e4af5
Compare
Changelog category (leave one):
Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):
Consider following case:
Following query can use the primary key by apply
toStartOfInterval(xxx, toIntervalSecond(5))
to'2020-02-02 20:02:20'
However, following query cannot:
This PR try to fix the issue by trying apply the expression to all elements of the set in the rhs to create a new set for IN analysis.
Close #49928, partially fix #50956
Documentation entry for user-facing changes