Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Transform rhs of IN when lhs is sub-expression of primary key during key condition analysis #50078

Open
wants to merge 34 commits into
base: master
Choose a base branch
from

Conversation

canhld94
Copy link
Contributor

@canhld94 canhld94 commented May 22, 2023

Changelog category (leave one):

  • Performance Improvement

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):

  • Transform rhs of IN when lhs is sub-expression of primary key during key condition analysis

Consider following case:

CREATE TABLE x (ts DateTime) ORDER BY toStartOfInterval(ts, toIntervalSecond(5));

Following query can use the primary key by apply toStartOfInterval(xxx, toIntervalSecond(5)) to '2020-02-02 20:02:20'

SELECT * FROM x WHERE ts = '2020-02-02 20:02:20';

However, following query cannot:

SELECT * FROM x WHERE ts IN ('2020-02-02 20:02:20');

This PR try to fix the issue by trying apply the expression to all elements of the set in the rhs to create a new set for IN analysis.

Close #49928, partially fix #50956

Documentation entry for user-facing changes

  • Documentation is written (mandatory for new features)

Information about CI checks: https://clickhouse.com/docs/en/development/continuous-integration/

@canhld94
Copy link
Contributor Author

@amosbird @KochetovNicolai please help to review when you have time. Thanks!

@ucasfl ucasfl added the can be tested Allows running workflows for external contributors label May 22, 2023
@robot-ch-test-poll3 robot-ch-test-poll3 added the pr-performance Pull request with some performance improvements label May 22, 2023
@robot-ch-test-poll3
Copy link
Contributor

robot-ch-test-poll3 commented May 22, 2023

This is an automated comment for commit 08f0118 with description of existing statuses. It's updated for the latest CI running
The full report is available here
The overall status of the commit is 🔴 failure

Check nameDescriptionStatus
CI runningA meta-check that indicates the running CI. Normally, it's in success or pending state. The failed status indicates some problems with the PR🟡 pending
Fast testNormally this is the first check that is ran for a PR. It builds ClickHouse and runs most of stateless functional tests, omitting some. If it fails, further checks are not started until it is fixed. Look at the report to see which tests fail, then reproduce the failure locally as described here🔴 failure
Mergeable CheckChecks if all other necessary checks are successful🟢 success
Push to DockerhubThe check for building and pushing the CI related docker images to docker hub🟢 success
Style CheckRuns a set of checks to keep the code style clean. If some of tests failed, see the related log from the report🟢 success

@canhld94
Copy link
Contributor Author

The idea seem working, but code need some changes.

@KochetovNicolai KochetovNicolai self-assigned this May 24, 2023
@canhld94
Copy link
Contributor Author

canhld94 commented Jun 7, 2023

Fail tests looks unrelated @KochetovNicolai @amosbird

@canhld94
Copy link
Contributor Author

@KochetovNicolai @alexey-milovidov can you help review when have time, thanks!

@canhld94
Copy link
Contributor Author

@KochetovNicolai Resolved the conflict with your recent PreparedSet development. It seem that now it can build set for index from subquery when enabling analyzer 👍

@@ -473,7 +475,7 @@ MergeTreeSetIndex::MergeTreeSetIndex(const Columns & set_elements, std::vector<K
::sort(indexes_mapping.begin(), indexes_mapping.end(),
[](const KeyTuplePositionMapping & l, const KeyTuplePositionMapping & r)
{
return std::tie(l.key_index, l.tuple_index) < std::tie(r.key_index, r.tuple_index);
return std::forward_as_tuple(l.key_index, l.rhs_chain.size(), l.lhs_chain.size(), l.tuple_index) < std::forward_as_tuple(r.key_index, r.rhs_chain.size(), r.lhs_chain.size(), r.tuple_index);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a bit tricky line. Looks like we prefer the shortest monotonic chain?
Let's add a comment about it (and why).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I understand, in this logic, if a key column appear in multiple positions in the lhs of IN, then we will only chose one of them. If we have a and f(a), then doesn't it will be better to choose the condition on a instead of f(a)? Pls correct me if I'm wrong.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a comment to explain

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
can be tested Allows running workflows for external contributors pr-performance Pull request with some performance improvements
Projects
None yet
5 participants