Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Run on arbitary video #34

Open
LeKiet258 opened this issue Feb 28, 2023 · 10 comments
Open

Run on arbitary video #34

LeKiet258 opened this issue Feb 28, 2023 · 10 comments

Comments

@LeKiet258
Copy link

Hi, I want to run SmoothNet on an arbitary video, but it seems like a ground truth for that video is required even for the inference phase, right?

@juxuan27
Copy link
Contributor

Hi @LeKiet258 ! Thank you for your focus.
Ground truth is not needed for inference, however, you might need to process your own data following instructions here. You can directly leave gt value as 0 if you don't have them. If you want to run inference with visualization, you can also refer to mmhuman3d, which support smoothnet.

@LeKiet258
Copy link
Author

Thanks for your fast reply! MMHuman3d currently doesn't support my model of choice yet, which is HybrIK, so I have to respectively clone HybrIK to produce the output and then run SmoothNet on this output, hence my question. As for the gt value set to 0, did you mean by setting the pose and shape parameters ? (in my case, I use SMPL body representation)

Screenshot 2023-02-28 143046

@LeKiet258 LeKiet258 reopened this Feb 28, 2023
@juxuan27
Copy link
Contributor

In this repo, we have not provide inference pipeline. So you can use the evaluation pipeline for testing. Due to the evaluation process need ground truth to calculate metrics, we recommend to add groundtruth in data processing. However, for the reason that you only need the inference part, you can set the ground truth value to any random value (e.g., 0). But you may still need to modify the code to obtain the smoothed output pose from the whole pipeline.

@ghost
Copy link

ghost commented Feb 28, 2023

@LeKiet258 Could you share your repo? I am also trying SmoothNet with HybrIK without success.

@juxuan27 a inference pipeline is very useful for people to try your work. Please consider adding one

@ailingzengzzz
Copy link
Contributor

Hi @nick008a ,

Thanks for the kind suggestion! We will add it in March.

@ghost
Copy link

ghost commented Mar 2, 2023

@ailingzengzzz thank you for the reply!!
Would be even better if it can be used something like HybrIK

@LeKiet258
Copy link
Author

@nick008a I haven't successfully combined HybrIK with SmoothNet yet, so I changed HybrIK with another model in order to run. Sorry :(

@lucasjinreal
Copy link

@LeKiet258 @nick008a @ailingzengzzz I have a simillar question, from the code provided on 2d, the dataset preparing for evaluation actually used a middle postiion data as normalization reference:

gt_data = (self.ground_truth_data_joints_2d[position]-H36M_IMG_SHAPE/2)/(H36M_IMG_SHAPE/2) # normalization

how to do that in realtime situation say I have 16 length only? And noticed it normalized with a Magic number H36M_IMageshape which is 1000, but when keypoints detected on arbitary videos, the image shapes can be various, does it matter or not?

@ailingzengzzz
Copy link
Contributor

Hi @lucasjinreal ,

I'm sorry for the late reply. I check this version and find I did not present the general normalization. Here you can use the following functions to normalize with arbitrary videos and input the normalized keypoint positions into SmoothNet, and then denormalize the smoothed keypoints into the image coordinate.

def normalize_screen_coordinates(X, w, h):
    # Normalize so that [0, w] is mapped to [-1, 1], while preserving the aspect ratio
    if X.shape[-1] == 3: #input 3d pose
        X_norm = X[..., :2]
        X_norm = X_norm / w * 2 - [1, h / w]
        X_out = np.concatenate((X_norm, X[..., 2:3] / 1000), -1)
    else:
        assert X.shape[-1] == 2
        X_out = X / w * 2 - [1, h / w]
    return X_out
def image_coordinates(X, w, h):
    # Reverse camera frame normalization
    if X.shape[-1] == 3: #input 3d pose
        X_norm = X[..., :2]
        X_norm[..., :1] = (X_norm[..., :1] + 1) * w / 2
        X_norm[..., 1:2] = (X_norm[..., 1:2] + h / w) * w / 2
        X_out = torch.cat([X_norm, X[..., 2:3] * 1000], -1)
    else:
        assert X.shape[-1] == 2
        X_out = (X + [1, h / w]) * w / 2
    return X_out

@ywyw1
Copy link

ywyw1 commented Jun 9, 2023

May I ask what method you used?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants