You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
One of our claim is that the multimodal fusion between the image and the question representations is a critical component. Thus, our proposed model uses a Tucker Decomposition of the correlation Tensor to model reacher multimodal interactions in order to provide proper answers. Our best model is based on :
68
+
One of our claim is that the multimodal fusion between the image and the question representations is a critical component. Thus, our proposed model uses a Tucker Decomposition of the correlation Tensor to model richer multimodal interactions in order to provide proper answers. Our best model is based on :
69
69
70
70
- a pretrained Skipthoughts for the question model,
71
71
- features from a pretrained Resnet-152 (with images of size 3x448x448) for the image model,
@@ -206,7 +206,7 @@ We plan to add:
206
206
We currently provide four models:
207
207
208
208
- MLBNoAtt: a strong baseline (BayesianGRU + Element-wise product)
209
-
-[MLBAtt](https://arxiv.org/abs/1610.04325): the previous state-of-the-art which add an attention strategy
209
+
-[MLBAtt](https://arxiv.org/abs/1610.04325): the previous state-of-the-art which adds an attention strategy
210
210
- MutanNoAtt: our proof of concept (BayesianGRU + Mutan Fusion)
211
211
- MutanAtt: the current state-of-the-art
212
212
@@ -341,4 +341,4 @@ Please cite the arXiv paper if you use Mutan in your work:
341
341
342
342
## Acknowledgment
343
343
344
-
Special thanks to the authors of [MLB](https://arxiv.org/abs/1610.04325) for providing some [Torch7 code](https://github.com/jnhwkim/MulLowBiVQA), [MCB](https://arxiv.org/abs/1606.01847) for providing some [Caffe code](https://github.com/akirafukui/vqa-mcb), and our professors and friends from LIP6 for the perfect working atmosphere.
344
+
Special thanks to the authors of [MLB](https://arxiv.org/abs/1610.04325) for providing some [Torch7 code](https://github.com/jnhwkim/MulLowBiVQA), [MCB](https://arxiv.org/abs/1606.01847) for providing some [Caffe code](https://github.com/akirafukui/vqa-mcb), and our professors and friends from LIP6 for the perfect working atmosphere.
0 commit comments