Abstract: Multimodal pre-training models have been successfully applied to the vision domain, and a large number of researches have been conducted on this basis for video action recognition of short ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results