Did any of you successfully infer the model on the TikTok dataset image & video? I re-implemented the script but the output is just a mess.