-
Notifications
You must be signed in to change notification settings - Fork 613
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Transformer Model and TST are not converging. #634
Comments
Hi @HasnainKhanNiazi,
Are you using a GPU? This is a really long time per epoch, unless your dataset is huge.
|
Hi @oguiza, thanks for your insights. Yes, I am using a GPU (Nvidia A100) for training. It is taking 3 hours for one epoch as the dataset is really huge. I don't think the model is overfitting as the training loss is quite huge but in the case of overfitting, the training loss shouldn't be that huge. I will change the model architecture for sure, I was trying to recreate BERT for the regression problem as BERT was having the same config I am using. I am attaching an image of the training, it may help find out the core problem. EDIT: I am also using MetaDataSet as I have the data distributed in multiple files. |
Hi @HasnainKhanNiazi, |
Thanks @oguiza , I will train on smaller chunks, I will keep this issue open for now and will close it after getting to a conclusion. I will post an update here also. Thanks |
Hi @oguiza , I have been doing some experiments with transformers and I have implemented some basic architectures such as;
And all of these models are learning and validation loss is decreasing but when it comes to using the same data with TST and TSTPlus architectures, models aren't learning anything. I am not sure what could be wrong as I am doing the same data preprocessing in both cases but TST and TSTPlus models aren't learning anything. |
Hi @HasnainKhanNiazi , |
I am working on a regression problem where I am using
TransformerModel
andTST
for training. My dataset and model config can be seen below.Dataset For both models
Window Length = 100
Features at one time step = 94
I am using
batch_tfms=TSStandardize(by_var=True)
as it has been shown in the original paper also.Model Config Transformer
d_model=768
n_head=12
n_layers=12
loss=MSELossFlat
Model Config TST
n_layers=12
d_model=768
n_heads=12
TransformerModel is taking around 3 hours for one epoch and right now, the 34th epoch is in training but the lowest validation loss that I got for TransformerModel and TST was at the 9th epoch but after that, both models are not converging.
My dataset looks like this,
For obvious reasons, I am not able to post the whole dataset. Any help will be appreciated. Thanks
The text was updated successfully, but these errors were encountered: