Soundify: Matching Sound Effects to Video
High-Resolution Image Synthesis with Latent Diffusion Models
Back to Categories
Join the team
Join the team
A curated list of research papers we are reading.
The Animation Transformer: Visual Correspondence via Segment Matching
Evan Casey, Víctor Pérez, Zhuoru Li, Harry Teitelman, Nick Boyajian, Tim Pulver, Mike Manh, William Grisaitis
A lot of ML tasks on the video domain rely on creating visual correspondences; that is, matching parts of a frame that represent same content across frames of the video, usually at a pixel or patch level. This paper, which focuses on hand-drawn animation, considers correspondences between line-enclosed segments instead of pixels, significantly reducing the amount of computation required while improving their accuracy compared to pixel-level approaches.
Layered Neural Atlases for Consistent Video Editing
Yoni Kasten, Dolev Ofri, Oliver Wang, Tali Dekel
Traditional techniques for editing video content based on “keyframe edits” rely on optical flow, which is often inaccurate, or 3D geometry, which is not always available. This paper introduces a new 2D representation called “neural atlases,” which are 2D mosaics of the video content that can be used to magically edit the entire video at once, and can be computed in a differentiable way.
Resolution-robust Large Mask Inpainting with Fourier Convolutions
Roman Suvorov, Elizaveta Logacheva, Anton Mashikhin, Anastasia Remizova, Arsenii Ashukha, Aleksei Silvestrov, Naejin Kong, Harshith Goka, Kiwoong Park, Victor Lempitsky
This paper presents an image inpainting model that uses a novel operator, the Fast Fourier Convolution, to address one of the biggest limitations in previous inpainting work — hallucinating large missing regions in an image. The results are astonishing!
Drop the GAN: In Defense of Patches Nearest Neighbors as Single Image Generative Models
Niv Granot, Ben Feinstein, Assaf Shocher, Shai Bagon, Michal Irani
Generative adversarial networks have become the de-facto method for generative modeling in the image domain. Yet they are still time-consuming and difficult to train, and often produce unpredictable artifacts that are hard to control. This paper challenges the notion that GANs are the solution to all generative problems by introducing a simple nearest-neighbor patch-based method for generating new images from a single image, leading to orders of magnitude improvements in speed.
StyleCLIP: Text-Driven Manipulation of StyleGAN Imagery
Or Patashnik, Zongze Wu, Eli Shechtman, Daniel Cohen-Or, Dani Lischinski
Manipulate StyleGAN images with text. Multimodal transformers such as CLIP open up so many possibilities for text-based media editing. A new paradigm in creative tools that relies less on precise manipulation of sliders and anchor points and more on imaginative descriptions and prompts. Hello, post-slider interfaces?
Enhancing Photorealism Enhancement
Stephan R. Richter, Hassan Abu AlHaija, Vladlen Koltun
We saw NVIDIA make the first steps towards adding ConvNets as a rendering pass in video games to perform real-time super-resolution with DLSS. This paper takes that approach to a next level by using image-to-image GANs applied to G-buffers from the game engine to generate temporally consistent photorealistic GTA V frames.
Skip-Convolutions for Efficient Video Processing
Amirhossein Habibian, Davide Abati, Taco S. Cohen, Babak Ehteshami Bejnordi
There are decades of work in image and video compression taking advantage of insights on the human perceptual system and the redundancies in videos to reduce bandwidth with techniques such as DCT coding, chroma subsampling, and motion compensation. This is one of a few recent papers that uses an idea analogous to motion compensation in the context of DNN inference on video by only operating on the residuals between frames, significantly saving compute.
Growing 3D Artefacts and Functional Machines with Neural Cellular Automata
Shyam Sudhakaran, Djordje Grbic, Siyan Li, Adam Katona, Elias Najarro, Claire Glanois, Sebastian Risi
Biology has been a consistent source of inspiration for new architectures and techniques in machine learning, from neural networks to genetic algorithms. Neural Cell Automata (NCAs) bring ideas from morphogenesis, the process by which biological organisms self-assemble from a single cell, to the world of deep neural networks and differentiable computing. In this paper, the authors apply NCAs to Minecraft to evolve complex buildings, and even machines, from a single block!