New software architecture developed by UC San Diego computer scientists improves cost and efficiency of video processing
A lot can change in a matter of (milli)seconds. Slow-motion video is often the arbiter of that precise critical moment when, say, a soccer ball crosses the goal or a race car zooms past the finish line. It can help aerospace and drone engineers better understand the rapidly flapping wings of insects, butterflies and hummingbirds and potentially mimic their motions.
Whether in sports moments, nature documentaries or animal behavior studies, a new technique developed by University of California San Diego computer scientists could have far-reaching impact. In an effort to smooth out slow-mo, they have broken new ground in a technique for video processing known as video frame interpolation – a way of digitally “sandwiching” additional animation frames between existing ones while meanwhile evening out any blur to achieve a fluid effect.
This process – which accounts for much of the cinematic science behind, for example, fake slow-mo effects – has historically been accomplished through hand-designed, computation-heavy video processing modules such as flow-warping, which warps the input images to the desired frame.
Video isn’t always predictable though, since motion patterns are not always linear. As video frames have to keep up with a user’s movement in a virtual reality setting, for instance, frames adjacent to one another can essentially slide apart, leading to what’s known as an occlusion. Occlusions lead to a “break” in the illusion of virtual reality – a visual glitch ruining the effect of existing in a virtual world.
To overcome these limitations, computer scientists led by Associate Professor of Computer Science and Engineering (CSE) Manmohan Chandraker have proposed a new video frame interpolation framework called FLAVR, or Flow-Agnostic Video Representations for Fast Frame Interpolation. This end-to-end trainable software architecture uses 3D space-time convolutions – a machine learning approach that can learn to reason out non-linear motions in video and prevent any temporal-spatial glitches. Chandraker is also affiliated with the Center for Visual Computing at the UC San Diego Jacobs School of Engineering.
The work was done in collaboration with CSE PhD student and first author Tarun Kalluri and research scientists Du Tran and Deepak Pathak from Meta AI (formerly Facebook AI Research).
The best speed vs accuracy tradeoff
“Our work breaks new ground in video frame interpolation, wherein we do away with most of hand-designed, computation heavy modules like flow-warping and use a complete end-to-end trainable and deployable architecture for this purpose -- as a result, we achieve huge improvements in running time, output quality as well as ease of deploying on hardware,” said Kalluri.
Their published results were selected as the Best Paper Finalist at the 2023 Winter Conference on Applications of Computer Vision and show a sixfold increase in speed for multi-frame interpolation as compared to current state-of-the-art methods. The results demonstrate the best speed vs. accuracy trade-off, even while requiring no additional visual data (such as optical flow rate or depth maps). FLAVR can also be used to apply slo-mo filters to videos captured in real time.
Chandraker said the team “consistently demonstrates superior qualitative and quantitative results compared with prior methods on popular benchmarks,” such as Vimeo-90K, Adobe-240FPS, and GoPro.
“Most importantly FLAVR Is 14% better compared to architectures which run at the same speed and six times faster compared to methods that deliver the same accuracy, leading to the best speed vs accuracy trade off,” he said.
Insects and birds in flight, motor racing and more
Potential applications for FLAVR include sports analytics (replays, video assisted referrals, player analytics, etc.), gaming and animation (generating high frame-per-second graphics at a cheaper cost), or aesthetically improving videos (such as adding slo-mo filters to videos captured from mobile phones in real-time).
In sports competition and broadcasting, for example, such super slow-motion can impact crucial decision-making around ambiguous events that happen "in-between frames," such as a cricket batsman reaching inside the crease while completing a run. FLAVR can also enhance the visual quality of broadcast, such as visualizing a fast projectile in shooting or archery, or split-second motions in motor racing.
Another application of FLAVR has already been demonstrated in the area of animal research. Chandraker and his colleagues used the technique to process insect flight-motion videos provided by Assistant Research Professor Adrian Smith of North Carolina State University. The resulting project video demonstrates FLAVR’s ability to create slo-mo flight patterns, even at the extreme rate of 960FPS.
“There might be just good visualization applications, such as for nature documentaries,” said Chandraker. “Birds-of-paradise, for instance, have evolved exquisite plumage used in complex courtship dances, and understanding bird flight is important from many different research perspectives,” from behavioral (signaling) to physiological (how flight might relate to diet, energy) and evolutionary (how to unique hovering and control behaviors arise).
“We hope FLAVR can be used as a tool by various such communities to better analyze extreme-motion videos,” Chandraker said.
The code required to train and run these models are open-sourced and available at https://github.com/tarun005/
You May Also Like
Stay in the Know
Keep up with all the latest from UC San Diego. Subscribe to the newsletter today.