CREATE FORECASTING MODEL #6861

tomhuds · 2022-11-28T14:51:16Z

Is there an existing issue for this?

I have searched the existing issues

Current Behavior

When making bulk timeseries predictions (e.g. the below), I would expect there to be HORIZON number or rows at the end.

Currently if horizon =1, 0 horizon rows are shown
Currently if horizon =3, 2 horizon rows are shown

Example of bulk ts predictions query:

SELECT m.received_at as received_at_model,
t.received_at as recieved_at_input,
t.temp as temp_true,
m.temp as temp_pred,
m.temp_explain as temp_pred_explain
FROM mindsdb.model_v5 as m
JOIN files.dataset as t
WHERE t.received_at > '2022-08-01 07:00:00';

Expected Behavior

HORIZON rows at the end

https://www.loom.com/share/e2a37269852f4f0082e187c57e893b33

Steps To Reproduce

Lightwood staging, Mindsdb staging

https://docs.google.com/document/d/1_duUhNR_hEta0sZrQyo7A8WlQj1HjYyI5aWiPpxx9cw/edit?usp=sharing

Anything else?

No response

paxcema · 2022-11-30T20:13:37Z

Okay, this is actually expected behavior. What passes as the correct behavior in the video above has to do with the lightwood handler inside mindsdb, which will need a minor fix (skip to the bottom for that). However, here's more detail for disclosure:

What is happening

All time series predictions in lightwood are handled as "bulk" predictions. You just take a bunch of measurements and transform some of the columns into arrays, depending on the problem definition.

At the end of this transformation, there will always be "cold-start" rows where the previous # of entries is not enough to fill the entire WINDOW-lengthed array, and so they contain Nones. However, assuming the input data is "big enough", there will be at least one row with complete context for each group in the DF. Let's take one of these and analyze what is the current behavior:

timestamp | group | target
30-11-2022 20:08:01 | 'A' | 4.04

Let's say the above timestamp is T. The transformed DF will look like this:

timestamp | group | __ts_previous_targets | target
[T-W+1, ..., T-1, T] | 'A' | [value_at_T-W, ..., value_at_T-1] | 4.04

And the mixers will be trained to predict the value in target given the array in timestamp and the context in __ts_previous_targets (plus all other columns if deemed useful).

Notice how we can't add the actual value into the priors array because otherwise we end up with no supervision signal. It needs to be like this. Which means that when bulk training and by extension when bulk predicting, the prediction always starts at the latest available timestamp for each row.

For the simplest case of horizon=1, this means a bulk prediction call will forecast a target value for precisely all the timestamps available in the DF. Nothing more, nothing less. If this bulk prediction corresponds to data where we already have actuals (like in the common mindsdb use case), then this is worthless. However, plug-in a df with any other future timestamps and you would get a correct forecast like any other.

If we now consider horizon=H, and make it so that the latest row in the DF matches the present, we know that for this last row, H values will be predicted. The first one, aligned with the last timestamp, and the rest of them for future timestamps of moments yet to come. In the MindsDB world, this will translate into the behavior observed above where you have H-1 out-of-sample predictions, which is not the expectation. From lightwood's point of view, however, this is not wrong.

Solution

As seen in the previous section, there is no way we can change the alignment in lightwood because we need it to be like this in order for the supervision signal to exist and flow.

What we can do in the lightwood handler, however, is to activate row timestamp inference for all time series predictions, including DATE > '$SPECIFIC_DATE' cases like in the OP, so that you always get a new row at the very end which starts at the timestamp that comes immediately after the last passed timestamp, and which spans HORIZON rows.

paxcema · 2022-12-13T01:24:31Z

Part of the solution will actually be Lightwood-side. Fixed in mindsdb/lightwood#1075, the idea is to enable forced out-of-sample row inference even when the offset is not manually set. This will be made optional as breaking the n_rows_in == n_rows_out seems too strong of a change otherwise.

Upstream, the mindsdb handler can specify this flag at prediction time.

tomhuds · 2023-01-31T16:29:59Z

Aiming for EOD tuesday.

EDIT: Current situation (as opposed to OP) is that LATEST works fine, but specific date cutoffs return all predictions from that point onwards. However, it seems like a data gathering issue within mindsdb rather than inside lightwood.

tomhuds · 2023-02-07T16:47:25Z

Finalise discussion (https://docs.google.com/spreadsheets/d/1xGsyTcfojNpsZEJ6N0GxzFrS-IwsdoREA_6xFHlw2_w/edit#gid=398055419) and make changes

tomhuds · 2023-02-14T16:45:12Z

Do in Q1B

paxcema · 2023-06-23T16:20:05Z

@tomhuds suggest moving this issue into the mindsdb repo, as all lightwood-side changes have been completed.

tomhuds · 2023-07-13T16:14:02Z

Did testing - see 'proposed' sheet, rows 62 and below:
https://docs.google.com/spreadsheets/d/1xGsyTcfojNpsZEJ6N0GxzFrS-IwsdoREA_6xFHlw2_w/edit#gid=0&range=A62

SELECT saledate, ma FROM example_db.demo_data.house_sales
WHERE type = 'house' AND bedrooms=2;

CREATE MODEL mindsdb.house_sales_model_test_v3 
FROM example_db 
    (SELECT saledate, ma FROM demo_data.house_sales
     WHERE type = 'house' AND bedrooms=2) 
PREDICT ma 
ORDER BY saledate 
WINDOW 4 
HORIZON 2;


DESCRIBE house_sales_model_test_v3;

SELECT m.saledate as date, m.ma as forecast
  FROM mindsdb.house_sales_model_test_v3 as m 
  JOIN example_db.demo_data.house_sales as t
  WHERE t.saledate = LATEST;

SELECT m.saledate as date, m.ma as forecast
  FROM mindsdb.house_sales_model_test_v3 as m 
  JOIN example_db.demo_data.house_sales as t
  WHERE t.saledate = LATEST-1;

SELECT m.saledate as date, m.ma as forecast
  FROM mindsdb.house_sales_model_test_v3 as m 
  JOIN example_db.demo_data.house_sales as t
  WHERE t.saledate >= '2015-12-31'
  AND t.saledate <= '2017-12-31';

SELECT m.saledate as date, m.ma as forecast
  FROM mindsdb.house_sales_model_test_v3 as m 
  JOIN example_db.demo_data.house_sales as t
  WHERE t.saledate = '2015-12-31';

SELECT m.saledate as date, m.ma as forecast
  FROM mindsdb.house_sales_model_test_v3 as m 
  JOIN example_db.demo_data.house_sales as t
  WHERE t.saledate = '2007-09-30';

tomhuds · 2023-08-10T00:15:12Z

Note: JOIN should be optional, given the data is obvious

paxcema · 2023-08-25T00:17:26Z

@tomhuds feels like we should create separate issues to track what the recently discussed plan? This one is a bit overloaded.

paxcema · 2023-09-14T01:20:17Z

Priority TBD given exercise with ML team this week.

tomhuds · 2023-09-14T21:49:03Z

skipping for now - focus on anomaly detection v2

tomhuds added the bug Something isn't working label Nov 28, 2022

tomhuds assigned paxcema Nov 28, 2022

tomhuds transferred this issue from mindsdb/mindsdb Nov 28, 2022

This was referenced Nov 30, 2022

[fix] off-by-one length in prev TS target column mindsdb/lightwood#1058

Merged

Investigate how a user can get single predictions on timeseries models mindsdb/lightwood#1052

Closed

Ricram2 added the discussion Further discussion is required label Dec 5, 2022

paxcema mentioned this issue Dec 13, 2022

Force infer_row in bulk ts predictions mindsdb/lightwood#1075

Merged

tomhuds added enhancement New feature or request and removed bug Something isn't working labels Feb 14, 2023

tomhuds changed the title ~~[Bug]: Timeseries model returns HORIZON-1 rows when making bulk predictions~~ [Bug]: Timeseries refactor Feb 21, 2023

tomhuds mentioned this issue Feb 21, 2023

[REQUEST] Syntax for bigger than latest plus some offset #2844

Closed

tomhuds changed the title ~~[Bug]: Timeseries refactor~~ Timeseries refactor Mar 7, 2023

tomhuds added bug Something isn't working and removed enhancement New feature or request discussion Further discussion is required labels Mar 29, 2023

tomhuds assigned StpMax and unassigned paxcema Apr 4, 2023

StpMax mentioned this issue Apr 10, 2023

Add additional prediction row to TS results #5539

Merged

9 tasks

tomhuds assigned ea-rus May 10, 2023

tomhuds self-assigned this May 25, 2023

paxcema changed the title ~~Timeseries refactor~~ [TS] Unintuitive prediction behavior Jun 23, 2023

tomhuds unassigned StpMax and ea-rus Jul 7, 2023

tomhuds removed their assignment Jul 14, 2023

paxcema transferred this issue from mindsdb/lightwood Jul 14, 2023

tomhuds assigned StpMax, paxcema and tomhuds Jul 17, 2023

tomhuds changed the title ~~[TS] Unintuitive prediction behavior~~ CREATE FORECASTING MODEL Aug 22, 2023

tomhuds unassigned StpMax and tomhuds Aug 22, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CREATE FORECASTING MODEL #6861

CREATE FORECASTING MODEL #6861

tomhuds commented Nov 28, 2022 •

edited

paxcema commented Nov 30, 2022

paxcema commented Dec 13, 2022 •

edited

tomhuds commented Jan 31, 2023 •

edited by paxcema

tomhuds commented Feb 7, 2023

tomhuds commented Feb 14, 2023

paxcema commented Jun 23, 2023

tomhuds commented Jul 13, 2023 •

edited

tomhuds commented Aug 10, 2023

paxcema commented Aug 25, 2023

paxcema commented Sep 14, 2023

tomhuds commented Sep 14, 2023

CREATE FORECASTING MODEL #6861

CREATE FORECASTING MODEL #6861

Comments

tomhuds commented Nov 28, 2022 • edited

Is there an existing issue for this?

Current Behavior

Expected Behavior

Steps To Reproduce

Anything else?

paxcema commented Nov 30, 2022

What is happening

Solution

paxcema commented Dec 13, 2022 • edited

tomhuds commented Jan 31, 2023 • edited by paxcema

tomhuds commented Feb 7, 2023

tomhuds commented Feb 14, 2023

paxcema commented Jun 23, 2023

tomhuds commented Jul 13, 2023 • edited

tomhuds commented Aug 10, 2023

paxcema commented Aug 25, 2023

paxcema commented Sep 14, 2023

tomhuds commented Sep 14, 2023

tomhuds commented Nov 28, 2022 •

edited

paxcema commented Dec 13, 2022 •

edited

tomhuds commented Jan 31, 2023 •

edited by paxcema

tomhuds commented Jul 13, 2023 •

edited