Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Transformer Model and TST are not converging. #634

Open
HasnainKhanNiazi opened this issue Dec 5, 2022 · 6 comments
Open

Transformer Model and TST are not converging. #634

HasnainKhanNiazi opened this issue Dec 5, 2022 · 6 comments
Labels
question Further information is requested under review Waiting for clarification, confirmation, etc

Comments

@HasnainKhanNiazi
Copy link

HasnainKhanNiazi commented Dec 5, 2022

I am working on a regression problem where I am using TransformerModel and TST for training. My dataset and model config can be seen below.

Dataset For both models
Window Length = 100
Features at one time step = 94
I am using batch_tfms=TSStandardize(by_var=True) as it has been shown in the original paper also.

Model Config Transformer
d_model=768
n_head=12
n_layers=12
loss=MSELossFlat

Model Config TST
n_layers=12
d_model=768
n_heads=12

TransformerModel is taking around 3 hours for one epoch and right now, the 34th epoch is in training but the lowest validation loss that I got for TransformerModel and TST was at the 9th epoch but after that, both models are not converging.

My dataset looks like this,

A B C D E F G H
34 19.5 19.5 1 0.1 0 -35.7742 -2.25
34 19.5 19.5 1 -0.1 0 -39.1072 -2.25
34 19.5 19.5 1 0 0 -38.885 -2.5
34 19.5 19.5 1 1 0 --38.6628 -2.5

For obvious reasons, I am not able to post the whole dataset. Any help will be appreciated. Thanks

@oguiza
Copy link
Contributor

oguiza commented Dec 5, 2022

Hi @HasnainKhanNiazi,
Here are a few comments:

TransformerModel is taking around 3 hours for one epoch and right now, the 34th epoch is in training but the lowest validation loss that I got for TransformerModel and TST was at the 9th epoch but after that, both models are converging.

Are you using a GPU? This is a really long time per epoch, unless your dataset is huge.
Do you mean converging or diverging? Based on what you say, it seems your models may be overfitting (your key metric mse is growing after some time). In case of overfitting, Jeremy Howard (fastai) recommends the following steps (in order):

  • Get more data
  • Use data augmentation
  • Use a more generalizable architecture (which use batchnorm, etc - you are already doing this).
  • Use regularization: use weight decay, lower your learning rate, ...
  • Reduce architecture complexity (The model architecture you are using is really big with 30M+ parameters (compared to the models generally used with time series). This is fine if you have a large number of samples)

@oguiza oguiza added the question Further information is requested label Dec 5, 2022
@HasnainKhanNiazi
Copy link
Author

HasnainKhanNiazi commented Dec 5, 2022

Hi @oguiza, thanks for your insights. Yes, I am using a GPU (Nvidia A100) for training. It is taking 3 hours for one epoch as the dataset is really huge. I don't think the model is overfitting as the training loss is quite huge but in the case of overfitting, the training loss shouldn't be that huge.

I will change the model architecture for sure, I was trying to recreate BERT for the regression problem as BERT was having the same config I am using.

I am attaching an image of the training, it may help find out the core problem.

Screenshot from 2022-12-05 11-55-17

EDIT: I am also using MetaDataSet as I have the data distributed in multiple files.
Length of len(mdset) is 4083010.

@HasnainKhanNiazi HasnainKhanNiazi changed the title Transformer Model and TST are not convgering. Transformer Model and TST are not convergin. Dec 5, 2022
@HasnainKhanNiazi HasnainKhanNiazi changed the title Transformer Model and TST are not convergin. Transformer Model and TST are not converging. Dec 5, 2022
@oguiza
Copy link
Contributor

oguiza commented Dec 5, 2022

Hi @HasnainKhanNiazi,
Looking at the losses, the model is not learning anything.
Something I'd recommend is that you use a small dataset to train. This way, you can run multiple iterations until so see it starts learning. Then you can scale up.
Looking at the large loss, it seems the issue is related to how you are scaling the data.

@HasnainKhanNiazi
Copy link
Author

Thanks @oguiza , I will train on smaller chunks, I will keep this issue open for now and will close it after getting to a conclusion. I will post an update here also. Thanks

@HasnainKhanNiazi
Copy link
Author

Hi @oguiza , I have been doing some experiments with transformers and I have implemented some basic architectures such as;

  1. CNN + Vanilla Encoders + CNN
  2. Vanilla Encoders + CNN
  3. Vanilla Encoders + MLP

And all of these models are learning and validation loss is decreasing but when it comes to using the same data with TST and TSTPlus architectures, models aren't learning anything. I am not sure what could be wrong as I am doing the same data preprocessing in both cases but TST and TSTPlus models aren't learning anything.

@oguiza oguiza added the under review Waiting for clarification, confirmation, etc label Feb 16, 2023
@oguiza
Copy link
Contributor

oguiza commented Mar 16, 2023

Hi @HasnainKhanNiazi ,
Sorry for the late reply. Could you please paste a code snippet to reproduce the issue? I have not been able to reproduce it.
Have you tried using your approach with any of the datasets available in tsai?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested under review Waiting for clarification, confirmation, etc
Projects
None yet
Development

No branches or pull requests

2 participants