NVIDIA researchers have demonstrated a new type of video compression technology that replaces the traditional video codec with a neural network to drastically reduce video bandwidth. The technology is presented as a potential solution for streaming video in situations where Internet availability is limited, such as using a webcam to chat with clients while on a slow Internet connection.
The new technology is made possible using NVIDIA Maxine, a cloud-AI video streaming platform for developers. According to the researchers, using AI-based video compression can strip video bandwidth usage down to 1/10th of the bandwidth that would otherwise be used by the common H.264 video codec. For users, this could result in what NVIDIA calls a ‘smoother’ experience that uses up less mobile data.
In a video explaining the technology, researchers demonstrate their AI-based video compression alongside H.264 compression with both videos limited to the same low bandwidth. With the traditional video compression, the resulting low-bandwidth video is very pixelated and blocky, but the AI-compressed video is smooth and relatively clear.
This is made possible by extracting the key facial points on the subject’s face, such as the position of the eyes and mouth, then sending that data to the recipient. The AI technology then reconstructs the subject’s face and animates it in real time using the keypoint data, the end result being very low bandwidth usage compared to the image quality on the receiver’s end.
There are some other advantages to using AI-based compression that exceed the capabilities of traditional video technologies, as well. One example is Free View, a feature in which the AI platform can rotate the subject so that they appear to be facing the recipient even when, in reality, their camera is positioned off to the side and they appear to be staring into the distance.
Likewise, the keypoints extracted from the subject’s face could also be used to apply their movements to other characters, including fully animated characters, expanding beyond the AI-powered filters that have become popular some video apps like Snapchat. Similar technology is already on the market in the form of Apple’s AI-based Animoji.
The use of artificial intelligence to modify videos isn’t new; most major video conferencing apps now include the option of replacing one’s real-life background with a different one, including intelligent AI-based background blurring. However, NVIDIA’s real-time AI-based video compression takes things to a new level by using AI to not only generate the subject in real time, but also modify them in convenient ways, such as aligning their face with a virtual front-facing camera.
The technology could usher in an era of clearer, more consistent video conferencing experiences, particularly for those on slow Internet connections, while using less data than current options. However, the demonstration has also raised concerns that largely mirror ones related to deepfake technologies — namely, the potential for exploiting such technologies to produce inauthentic content.
Artificial intelligence technology is advancing at a clipped rate and, in many cases, can be used to imperceptibly alter videos and images. Work is already underway to exceed those capabilities, however, by fully generating photo-realistic content using AI rather than modifying existing real-world content.
The Allen Institute for AI recently demonstrated the latest evolution in this effort by using both images and text to create a machine learning algorithm that possesses a very basic sense of abstract reasoning, for example. NVIDIA Research has also contributed extensively to this rapidly evolving technology, with past demonstrations including generating landscapes from sketches, generating photo-realistic portraits and even swapping facial expressions between animals.
A number of companies are working to develop counter technologies capable of detecting manipulated content by looking for markers otherwise invisible to the human eye. In 2019, Adobe Research teamed up with UC Berkeley to develop and demonstrate an AI capable of not only identifying portrait manipulations, but also automatically reversing the changes to display the original, unmodified content.
The general public doesn’t yet have access to these types of technologies, however, generally leaving them vulnerable to the manipulated media that permeates social media.