Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CREATE FORECASTING MODEL #6861

Open
1 task done
tomhuds opened this issue Nov 28, 2022 · 11 comments
Open
1 task done

CREATE FORECASTING MODEL #6861

tomhuds opened this issue Nov 28, 2022 · 11 comments
Assignees
Labels
bug Something isn't working

Comments

@tomhuds
Copy link
Contributor

tomhuds commented Nov 28, 2022

Is there an existing issue for this?

  • I have searched the existing issues

Current Behavior

When making bulk timeseries predictions (e.g. the below), I would expect there to be HORIZON number or rows at the end.

  • Currently if horizon =1, 0 horizon rows are shown
  • Currently if horizon =3, 2 horizon rows are shown

Example of bulk ts predictions query:

SELECT m.received_at as received_at_model,
t.received_at as recieved_at_input,
t.temp as temp_true,
m.temp as temp_pred,
m.temp_explain as temp_pred_explain
FROM mindsdb.model_v5 as m
JOIN files.dataset as t
WHERE t.received_at > '2022-08-01 07:00:00';

Expected Behavior

HORIZON rows at the end

https://www.loom.com/share/e2a37269852f4f0082e187c57e893b33

Steps To Reproduce

Lightwood staging, Mindsdb staging

https://docs.google.com/document/d/1_duUhNR_hEta0sZrQyo7A8WlQj1HjYyI5aWiPpxx9cw/edit?usp=sharing

Anything else?

No response

@tomhuds tomhuds added the bug Something isn't working label Nov 28, 2022
@tomhuds tomhuds transferred this issue from mindsdb/mindsdb Nov 28, 2022
@paxcema
Copy link
Member

paxcema commented Nov 30, 2022

Okay, this is actually expected behavior. What passes as the correct behavior in the video above has to do with the lightwood handler inside mindsdb, which will need a minor fix (skip to the bottom for that). However, here's more detail for disclosure:

What is happening

All time series predictions in lightwood are handled as "bulk" predictions. You just take a bunch of measurements and transform some of the columns into arrays, depending on the problem definition.

At the end of this transformation, there will always be "cold-start" rows where the previous # of entries is not enough to fill the entire WINDOW-lengthed array, and so they contain Nones. However, assuming the input data is "big enough", there will be at least one row with complete context for each group in the DF. Let's take one of these and analyze what is the current behavior:

timestamp | group | target
30-11-2022 20:08:01 | 'A' | 4.04

Let's say the above timestamp is T. The transformed DF will look like this:

timestamp | group | __ts_previous_targets | target
[T-W+1, ..., T-1, T] | 'A' | [value_at_T-W, ..., value_at_T-1] | 4.04

And the mixers will be trained to predict the value in target given the array in timestamp and the context in __ts_previous_targets (plus all other columns if deemed useful).

Notice how we can't add the actual value into the priors array because otherwise we end up with no supervision signal. It needs to be like this. Which means that when bulk training and by extension when bulk predicting, the prediction always starts at the latest available timestamp for each row.

For the simplest case of horizon=1, this means a bulk prediction call will forecast a target value for precisely all the timestamps available in the DF. Nothing more, nothing less. If this bulk prediction corresponds to data where we already have actuals (like in the common mindsdb use case), then this is worthless. However, plug-in a df with any other future timestamps and you would get a correct forecast like any other.

If we now consider horizon=H, and make it so that the latest row in the DF matches the present, we know that for this last row, H values will be predicted. The first one, aligned with the last timestamp, and the rest of them for future timestamps of moments yet to come. In the MindsDB world, this will translate into the behavior observed above where you have H-1 out-of-sample predictions, which is not the expectation. From lightwood's point of view, however, this is not wrong.

Solution

As seen in the previous section, there is no way we can change the alignment in lightwood because we need it to be like this in order for the supervision signal to exist and flow.

What we can do in the lightwood handler, however, is to activate row timestamp inference for all time series predictions, including DATE > '$SPECIFIC_DATE' cases like in the OP, so that you always get a new row at the very end which starts at the timestamp that comes immediately after the last passed timestamp, and which spans HORIZON rows.

@paxcema
Copy link
Member

paxcema commented Dec 13, 2022

Part of the solution will actually be Lightwood-side. Fixed in mindsdb/lightwood#1075, the idea is to enable forced out-of-sample row inference even when the offset is not manually set. This will be made optional as breaking the n_rows_in == n_rows_out seems too strong of a change otherwise.

Upstream, the mindsdb handler can specify this flag at prediction time.

@tomhuds
Copy link
Contributor Author

tomhuds commented Jan 31, 2023

Aiming for EOD tuesday.

EDIT: Current situation (as opposed to OP) is that LATEST works fine, but specific date cutoffs return all predictions from that point onwards. However, it seems like a data gathering issue within mindsdb rather than inside lightwood.

@tomhuds
Copy link
Contributor Author

tomhuds commented Feb 7, 2023

@tomhuds
Copy link
Contributor Author

tomhuds commented Feb 14, 2023

Do in Q1B

@tomhuds tomhuds added enhancement New feature or request and removed bug Something isn't working labels Feb 14, 2023
@tomhuds tomhuds changed the title [Bug]: Timeseries model returns HORIZON-1 rows when making bulk predictions [Bug]: Timeseries refactor Feb 21, 2023
@tomhuds tomhuds changed the title [Bug]: Timeseries refactor Timeseries refactor Mar 7, 2023
@tomhuds tomhuds added bug Something isn't working and removed enhancement New feature or request discussion Further discussion is required labels Mar 29, 2023
@tomhuds tomhuds assigned StpMax and unassigned paxcema Apr 4, 2023
@tomhuds tomhuds self-assigned this May 25, 2023
@paxcema paxcema changed the title Timeseries refactor [TS] Unintuitive prediction behavior Jun 23, 2023
@paxcema
Copy link
Member

paxcema commented Jun 23, 2023

@tomhuds suggest moving this issue into the mindsdb repo, as all lightwood-side changes have been completed.

@tomhuds tomhuds unassigned StpMax and ea-rus Jul 7, 2023
@tomhuds
Copy link
Contributor Author

tomhuds commented Jul 13, 2023

Did testing - see 'proposed' sheet, rows 62 and below:
https://docs.google.com/spreadsheets/d/1xGsyTcfojNpsZEJ6N0GxzFrS-IwsdoREA_6xFHlw2_w/edit#gid=0&range=A62

SELECT saledate, ma FROM example_db.demo_data.house_sales
WHERE type = 'house' AND bedrooms=2;

CREATE MODEL mindsdb.house_sales_model_test_v3 
FROM example_db 
    (SELECT saledate, ma FROM demo_data.house_sales
     WHERE type = 'house' AND bedrooms=2) 
PREDICT ma 
ORDER BY saledate 
WINDOW 4 
HORIZON 2;


DESCRIBE house_sales_model_test_v3;

SELECT m.saledate as date, m.ma as forecast
  FROM mindsdb.house_sales_model_test_v3 as m 
  JOIN example_db.demo_data.house_sales as t
  WHERE t.saledate = LATEST;

SELECT m.saledate as date, m.ma as forecast
  FROM mindsdb.house_sales_model_test_v3 as m 
  JOIN example_db.demo_data.house_sales as t
  WHERE t.saledate = LATEST-1;

SELECT m.saledate as date, m.ma as forecast
  FROM mindsdb.house_sales_model_test_v3 as m 
  JOIN example_db.demo_data.house_sales as t
  WHERE t.saledate >= '2015-12-31'
  AND t.saledate <= '2017-12-31';

SELECT m.saledate as date, m.ma as forecast
  FROM mindsdb.house_sales_model_test_v3 as m 
  JOIN example_db.demo_data.house_sales as t
  WHERE t.saledate = '2015-12-31';

SELECT m.saledate as date, m.ma as forecast
  FROM mindsdb.house_sales_model_test_v3 as m 
  JOIN example_db.demo_data.house_sales as t
  WHERE t.saledate = '2007-09-30';

@tomhuds tomhuds removed their assignment Jul 14, 2023
@paxcema paxcema transferred this issue from mindsdb/lightwood Jul 14, 2023
@tomhuds
Copy link
Contributor Author

tomhuds commented Aug 10, 2023

Note: JOIN should be optional, given the data is obvious

@tomhuds tomhuds changed the title [TS] Unintuitive prediction behavior CREATE FORECASTING MODEL Aug 22, 2023
@paxcema
Copy link
Member

paxcema commented Aug 25, 2023

@tomhuds feels like we should create separate issues to track what the recently discussed plan? This one is a bit overloaded.

@paxcema
Copy link
Member

paxcema commented Sep 14, 2023

Priority TBD given exercise with ML team this week.

@tomhuds
Copy link
Contributor Author

tomhuds commented Sep 14, 2023

skipping for now - focus on anomaly detection v2

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

5 participants