Skip to content

[Feature] Add FluxViT Model - Towards Deployment-Efficient Video Models #2902

Open
@sarim-next

Description

@sarim-next

What is the problem this feature will solve?

Current popular video training methods operate on a fixed number of tokens sampled from a predetermined spatiotemporal grid. This leads to suboptimal accuracy-computation trade-offs due to inherent video redundancy. Additionally, these models lack adaptability to varying computational budgets for downstream tasks, hindering the application of competitive models in real-world scenarios with limited resources.

What is the feature?

The feature request is to add the FluxViT model, as described in the paper "Make Your Training Flexible: Towards Deployment-Efficient Video Models". FluxViT introduces a new test setting, "Token Optimization," which maximizes input information across different computational budgets by optimizing the set of input tokens through token selection from more suitably sampled videos. It utilizes a novel augmentation tool called "Flux" which makes the sampling grid flexible and leverages token selection. Integrating Flux into video training frameworks boosts model robustness with minimal additional cost. The paper demonstrates that FluxViT achieves state-of-the-art results across various video understanding tasks with standard costs and can match the performance of previous state-of-the-art models with significantly reduced computational cost (e.g., using only 1/4 tokens).

What alternatives have you considered?

The paper discusses alternatives like token reduction on densely sampled tokens and existing methods for flexible network training that operate at different spatial or temporal resolutions. However, it argues that these approaches are suboptimal as they either suffer from performance degradation with significant reduction rates or fail to optimize token capacity utilization under computational constraints.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions