Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Uptime calculation improvement and 1-year uptime #2750

Merged
merged 16 commits into from
Aug 31, 2023
Merged

Conversation

louislam
Copy link
Owner

@louislam louislam commented Feb 9, 2023

Try to improve the uptime calculation performance by an aggregate table.

However the definition of uptime will be a little bit different:

Before After
24 hours 24 hours 24 hours (No change)
30 days 30 * 24 hours 29days + today from 00:00
1 year / 364days + today from 00:00

Assume that the heartbeat interval is 20 seconds.

Before:
The current best/worst case should be 3 * 60 * 24 * 30 = 129,600 for 30-day uptime, which means it will sums up 129,600 numbers. This process will be triggered 3 times per minute for a monitor.

After:
The worst case of summation should be 1 year. It would be 3 * 60 * 24 + 364 = 4,684 numbers. The best case will be 29 (1-month) / 364 (1-year) numbers (at 00:00). It should be a lot faster.

Using aggregate table is actually suggested by ChatGPT, they also suggested time-series database. I do not consider this first.

@louislam louislam added this to the 2.0.0 milestone Feb 9, 2023
@chakflying
Copy link
Collaborator

chakflying commented Feb 9, 2023

Just a wild suggestion: In theory maybe we can cache it such that each update for a particular window will only do 1 read and 1 write.

  • On startup, calculate the full uptime, cache values total_duration and uptime_duration
  • Then on each heartbeat, read the last value that is leaving the calculation window, e.g. the last beat exactly 24 hours before now
  • If this beat is up, we subtract its duration from both total_duration and uptime_duration, otherwise we only subtract it from total_duration
  • Store the latest beat, add its duration to total_duration and uptime_duration depending on the status
  • Calculate percentage from the updated values

Haven't tried this and don't know if there are edge cases to handle tho.

Edit: Oops I realized if the beat leaving the window has a longer duration than the current rate, it would lead to the same beat being subtracted multiple times...

Frame 1
Frame 2

I guess if we can somehow handle these 2 cases it would work?

@louislam
Copy link
Owner Author

Good point. I also recheck the current logic, it seems that it is also not handled the edge cases correctly. But I am not quite sure.

I think for this part, maybe it is good to start with writing test cases first.

SELECT
-- SUM all duration, also trim off the beat out of time window
SUM(
CASE
WHEN (JULIANDAY(\`time\`) - JULIANDAY(?)) * 86400 < duration
THEN (JULIANDAY(\`time\`) - JULIANDAY(?)) * 86400
ELSE duration
END
) AS total_duration,
-- SUM all uptime duration, also trim off the beat out of time window
SUM(
CASE
WHEN (status = 1 OR status = 3)
THEN
CASE
WHEN (JULIANDAY(\`time\`) - JULIANDAY(?)) * 86400 < duration
THEN (JULIANDAY(\`time\`) - JULIANDAY(?)) * 86400
ELSE duration
END
END
) AS uptime_duration
FROM heartbeat
WHERE time > ?
AND monitor_id = ?
`, [
startTime, startTime, startTime, startTime, startTime,
monitorID,
]);

@louislam
Copy link
Owner Author

louislam commented Mar 27, 2023

The uptime calculation actually make me a bit headache, because when I tried to look into it, there are a lot of weird cases to be handled.

I am rethinking the time-series database option, it seems that QuestDB is quite promising, because it could sum up a large set of data with a simple sql. If it is really gaining a lot of performance, it maybe an ultimate solution for #1740 too.

But I don't know the ram usage, I will try to import 1,000,000 heartbeat records into QuestDB and test it.

Reference from QuestDB's README:
image

@louislam
Copy link
Owner Author

After some tests, I think QuestDB is really the way to go.

For example, I try to sum up 30-day uptime:

QuestDB Result

Oracle Cloud free instance (2cores + 1GB RAM)

The execution time: around 2ms - 7ms

image

SQLite Result

My notebook (11gen i7 8cores + 16GB RAM)

I don't know how to use the sqlite command to display the execution time, so I ran it on my pc.

The execution time: around 69ms - 75ms

So even though the oracle cloud instance is weak, it is still faster than sqlite on my pc.

image

@louislam
Copy link
Owner Author

louislam commented Apr 6, 2023

But unfortunately, the memory usage of QuestDB is too large.

Will look into these databases later:

  • RedisTimeSeries
  • InfluxDB

image

@louislam louislam modified the milestones: 1.22.0, 1.23.0 May 15, 2023
@louislam louislam modified the milestones: 1.23.0, Pending Jun 29, 2023
@mabed-fr
Copy link

mabed-fr commented Jun 30, 2023

But unfortunately, the memory usage of QuestDB is too large.

Will look into these databases later:

  • RedisTimeSeries
  • InfluxDB

image

Hello

I find it strange that QuestDB is so energy intensive, because when I see their presentation

Docker hub introduction --> QuestDB is an open-source database designed to make time-series lightning fast and easy

Is it possible to test with QuestDB on external docker?

What is the final strategy? have a TSDB + relational database?(external-mariadb OR mysqlite)?

Otherwise, on my side, I find REDIS-TSDB / InfluxDB / QuestDB very good in addition to improving the graphic part

@louislam
Copy link
Owner Author

louislam commented Jul 1, 2023

I likely stick back to SQLite/MariaDB, as the setup is easier and it won't use a lot of RAM.

I may look into the sliding window or rolling window algorithm later.

@louislam louislam changed the base branch from master to 2.0.X July 1, 2023 12:21
@louislam louislam modified the milestones: Pending, 2.0.0 Jul 1, 2023
@mhkarimi1383
Copy link
Contributor

By using PostgreSQL we could use timescaledb as an extension and activate that for some tables only, also I love Postgres more than MySQL🙂

@veitorg
Copy link

veitorg commented Jul 10, 2023

Do you already have a timeframe for when this feature will be released?

@CommanderStorm
Copy link
Collaborator

Do you already have a timeframe for when this feature will be released?

See #2720 (comment)

@louislam louislam linked an issue Jul 18, 2023 that may be closed by this pull request
@CommanderStorm CommanderStorm mentioned this pull request Jul 18, 2023
@louislam
Copy link
Owner Author

louislam commented Aug 19, 2023

I think most time-series database use too many memory, which is not ideal for Uptime Kuma (I hope Uptime Kuma could be relatively lightweight), so I go back to my original plan - aggregate tables.

It is going well, getting a 1-year uptime is less than 1ms.

# Subtest: Worst case
    # Subtest: get1YearUptime()
    ok 2 - get1YearUptime()
      ---
      duration_ms: 0.587246

image

Also, I added support for native Node.js test runner along with this pr.
https://nodejs.org/api/test.html

@mhkarimi1383
Copy link
Contributor

mhkarimi1383 commented Aug 19, 2023

I think most time-series database use too many memory, so I go back to my original plan - aggregate tables.

It is going well, getting a 1-year uptime is less than 1ms.

# Subtest: Worst case
    # Subtest: get1YearUptime()
    ok 2 - get1YearUptime()
      ---
      duration_ms: 0.587246

image

The size of the database will get bigger and bigger soon I think 🤔

But query performance is great...

@louislam
Copy link
Owner Author

I think most time-series database use too many memory, so I go back to my original plan - aggregate tables.
It is going well, getting a 1-year uptime is less than 1ms.

# Subtest: Worst case
    # Subtest: get1YearUptime()
    ok 2 - get1YearUptime()
      ---
      duration_ms: 0.587246

image

The size of the database will get bigger and bigger soon I think 🤔

But query performance is great...

I tried to make these tables as small as possible, they are all int and float.

image

Also I will try to save some space by eliminating duplicate strings in #3595

@mhkarimi1383
Copy link
Contributor

I think only one record for each monitor in aggregated tables is enough.

Can you tell me if we need them?
If yes, we could have a more restricted retention policy on them

@mhkarimi1383
Copy link
Contributor

For that we could use a custom CreateOrUpdate function to check if record already exist or not?

@camilonova
Copy link

Table size is a decent downside if performance is great.

@mhkarimi1383
Copy link
Contributor

Table size is a decent downside if performance is great.

Big tables may make select queries slower...

@camilonova
Copy link

Big tables may make select queries slower...

Isn't the whole point of a database to have big tables?

@aciducen1995
Copy link

aciducen1995 commented Nov 7, 2023

is this WIP or already a feature? cause i cant find setting to change status page uptime to monthly etc in latest version

@chakflying
Copy link
Collaborator

There is no such feature available.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Yearly uptime
8 participants