Skip to content
/ CSUN Public

Cross-modal Semantic Understanding for Video Moment Localization

Notifications You must be signed in to change notification settings

Huyp777/CSUN

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 

Repository files navigation

CLEAR

Cross-Modal Semantic Alignment for Moment Localization via Nature Language

We propose an end-to-end Coarse-to-fine cross-modaL sEmantic Alignment netwoRk, dubbed CLEAR, to efficiently localization target moments within the given video via diverse natural language queries.
Concretely, we first design a dual-path neural network, comprising two independent modules: the video encoding network (VEN) and the query encoding network (QEN).
Thereinto, the VEN applies our proposed hierarchical semantic strategy to the input video for generating the corresponding moment candidates and modeling their semantic relevance. The QEN adopts the word embedding based bi-directional LSTM network (Bi-LSTM) to understand the corresponding semantics of the diverse given queries.
Afterwards, we develop a multi-granularity interaction network (MIN) to achieve high-quality moment localization in an effective coarse-to-fine manner. To be more specific, it first utilizes efficient coarse-grained semantic pruning to filter out corresponding semantic ranges and ignore irrelevant parts, and then performs fine-grained semantic fusing for accurate moments localization.
We conduct extensive experiments on two benchmark datasets ActivityNet Captions and TACoS. The experimental results show that our proposed model is more effective, efficient than the state-of-the-art models.
The introduction of CLEAR in details will be given in the form of an authorized patent and a published paper later.
An illustration of the framework of CLEAR is shown in the following figure.

Dateset

How to run

Please place the data files to the appropriate path and set it in tacos.py and activitynet_captions.py.

python tacos.py

or

python activitynet_captions.py

About

Cross-modal Semantic Understanding for Video Moment Localization

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages