Multi-Modal Video Dialog State Tracking in the Wild
Adnen Abdessaied*, Lei Shi, Andreas Bulling
;
Abstract
"We present figures/mixeri con.pdf −−anovelvideodialogmodeloperatingoveragenericmulti− modalstatetrackingscheme.Currentmodelsthatclaimtoperf ormmulti−modalstatetrackingf allshortintwoma (1)T heyeithertrackonlyonemodality(mostlythevisualinput)or(2)theytargetsyntheticdatasetsthatdonotref lec worldin−the−wildscenarios.Ourmodeladdressesthesetwolimitationsinanattempttoclosethiscrucialresearch modalgraphstructurelearningmethod.Subsequently, thelearnedlocalgraphsandf eaturesareparsedtogethertof grainedgraphnodef eaturesareusedtoenhancethehiddenstatesof thebackboneV ision− LanguageM odel(V LM ). achievesnewstate−of −the−artresultsonfivechallengingbenchmarks."
Related Material
[pdf]
[supplementary material]
[DOI]