Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is using continuous and categorical features with ts_learner possible? If not, how can window_len in get_tabular_dls be set? #766

Open
CMobley7 opened this issue May 10, 2023 · 5 comments

Comments

@CMobley7
Copy link

CMobley7 commented May 10, 2023

My apologies for the dumb question. I have a target variable and continuous and categorical features in a dataframe. The categorical features are dynamic. I'd like to train a time series model, such as the TSTPlus, on a sliding window of these features that doesn't include the target. I plan to test this out with a categorical and continuous target, but the examples below assume a categorical target. Unfortunately, I'm struggling to ascertain how to do this.

splits = TimeSplitter(valid_size=0.15)(df_classification.index)
procs = [Categorify, FillMissing, Normalize]

to = get_tabular_ds(
    df_classification,
    procs=procs,
    cat_names=cat_names,
    cont_names=cont_names,
    y_names="triggers",
    splits=splits,
)

dls = to.dataloaders(bs=64, seq_len=20, seq_first=True)

class_weights = compute_class_weight("balanced", classes=[-1, 0, 1], y=to.train.y)
class_weights = torch.tensor(class_weights, dtype=torch.float32)

learn = ts_learner(
    dls,
    TSTPlus,
    metrics=[F1Score(average="macro")],
    loss_func=CrossEntropyLossFlat(weight=class_weights),
    lr=1e-4,
)
X, y = apply_sliding_window(df_classification, window_len=20, horizon=0, x_vars=slice(None, -1), y_vars=-1)

class_weights = compute_class_weight("balanced", classes=[-1, 0, 1], y=y)
class_weights = torch.tensor(class_weights, dtype=torch.float32)

splits = TimeSplitter(valid_size=0.15)(y)
tfms  = [None, [TSClassification()]]
batch_tfms = [TSStandardize(by_sample=False, by_var=True)]
dls = get_ts_dls(X, y, splits=splits, tfms=tfms, batch_tfms=batch_tfms, inplace=False)

learn = ts_learner(
    dls,
    TSTPlus,
    metrics=[F1Score(average="macro")],
    loss_func=CrossEntropyLossFlat(weight=class_weights),
    lr=1e-4,
)

Using apply_sliding_window with window_len=20and get_ts_dls appears to get a data loader with the right window size but assumes continuous variables, while get_tabular_ds allows categorical variables but doesn't have window _len parameters. I tried converting it to a dataloader with dls = to.dataloaders(bs=64, seq_len=20, seq_first=True), but I couldn't tell if this applied the desired window, and it caused the following error when running

learn = ts_learner(
    dls,
    TSTPlus,
    metrics=[F1Score(average="macro")],
    loss_func=CrossEntropyLossFlat(weight=class_weights),
    lr=1e-4,
)

I get the following error:

AttributeError                            Traceback (most recent call last)
Cell In[43], line 1
----> 1 learn = ts_learner(
      2     dls,
      3     TSTPlus,
      4     metrics=[F1Score(average="macro")],
      5     loss_func=CrossEntropyLossFlat(weight=class_weights),
      6     lr=1e-4,
      7 )

File [~/.../site-packages/tsai/learner.py:549), in ts_learner(dls, arch, c_in, c_out, seq_len, d, splitter, loss_func, opt_func, lr, cbs, metrics, path, model_dir, wd, wd_bn_bias, train_bn, moms, train_metrics, valid_metrics, **kwargs)
    547     if arch is None: arch = InceptionTimePlus
    548     elif isinstance(arch, str): arch = get_arch(arch)
--> 549     model = build_ts_model(arch, dls=dls, c_in=c_in, c_out=c_out, seq_len=seq_len, d=d, **kwargs)
    550 if hasattr(model, "backbone") and hasattr(model, "head"):
    551     splitter = ts_splitter

File [~/.../site-packages/tsai/models/utils.py:147), in build_ts_model(arch, c_in, c_out, seq_len, d, dls, device, verbose, pretrained, weights_path, exclude_head, cut, init, arch_config, **kwargs)
    145 device = ifnone(device, default_device())
    146 if dls is not None:
--> 147     c_in = ifnone(c_in, dls.vars)
    148     c_out = ifnone(c_out, dls.c)
    149     seq_len = ifnone(seq_len, dls.len)
...
    172 res = [t for t in att.attrgot(k) if t is not None]
--> 173 if not res: raise AttributeError(k)
    174 return res[0] if len(res)==1 else L(res)

AttributeError: vars

How can I accomplish what I wrote above, if possible? I've thought of some inelegant and nonideal solutions, such as using a ts_learner but dropping the categorical features entirely, using a tabular_learner without a window length, using a tabular_learner, but creating a function that takes window_len and appends the features to the original dataframe, such as feature_1(ts-1) to feature_X(ts-window_len). These are obviously nonideal solutions, and I'd rather use a ts_learner though I plan to test out tabular learners in the future; so, learning how to set a window_len in that would be awesome to know as well.

#231 seems to indicate that this is possible now, but I didn't see any example code.

@CMobley7 CMobley7 changed the title Is Continuous and Categorical features with ts_learner possible? If not, how can window_len in get_tabular_dls be set? Is using continuous and categorical features with ts_learner possible? If not, how can window_len in get_tabular_dls be set? May 10, 2023
@Awe42
Copy link

Awe42 commented Jun 11, 2023

Hey @CMobley7 did you ever figure this out? I would also love to see an example of how the fix in #231 should be used.

Also where is it mentioned that get_ts_dls assumes continuous variables? Do you know if the same assumption is made when using TSDataLoaders.from_dsets?

@Awe42
Copy link

Awe42 commented Jun 13, 2023

For future reference, I found an example here and here.

@CMobley7
Copy link
Author

Sorry for the delay, @Awe42. I'd previously looked at the links you provided. While the MultiInputNet with get_mixed_dls would allow you to use a time series and tabular models together, I still don't yet see a way to use both continuous and categorical features with a sliding window. The get_tabular_ds function takes a dataframe, not the X and y arrays generated by apply_sliding_window though you could create a function to apply the sliding window to a dataframe and recreate the dataframe with the additional features created by apply_sliding_window, such as feature_1(ts-1) to feature_X(ts-window_len). While this would allow you to use the tabular models in Tsai, I still don't see a way to use the time series models with categorical data as they appear to only work with continuous data. So, you could use either just a tabular model or a MultiInputNet with a time series model with just the continuous data and the tabular model with both as mentioned before. However, based on #231, it should be possible to use categorical variables with at least a few of the time series models, though I haven't dug deep enough in the source code to see how that could be done. Did you figure out a better way, @Awe42? @oguiza, is there an example or gist of using a time series model with both categorical and continuous features somewhere, and is there already a function in Tsai that allows one to apply a sliding window but output a dataframe instead of X and y arrays, along with lists of the new categorical and continuous column names for use with tabular models?

@oguiza
Copy link
Contributor

oguiza commented Jun 23, 2023

Hi @CMobley7, @Awe42,
I'm currently testing some new functionality I've recently added to tsai. It's in a module called tsai.models.multimodal. It will allow you to use:

  • static categorical features
  • static continuous features
  • observed (time-dependent - past only) categorical features
  • observed (time-dependent - past only) continuous features

You may want to test it as well with your own data.
I plan to create a tutorial if I find the tests work out well.

@cjsombric
Copy link

@oguiza Have you created a tutorial around the new tsai.models.multimodal module? Or is there another update on this thread?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants