Generative A-Eye #7 - 24th Sept,2024

A (more or less) daily newsletter featuring brief summaries of the latest papers related to AI-based human image synthesis, or to research related to this topic.

Sep 24, 2024

Tuesday is splurge day at Arxiv, with 180+ posts - not so much that’s fascinating for the AI-based human synthesis field, though.

One thing I did notice elsewhere was an SD-based LivePortrait clone, at a project site called HelloMeme.

When I first pointed it out on LinkedIn today, it had a non-functional Arxiv button link. This button has, at the time of writing, been removed – though the site still has a ‘technical report coming soon’ button, and a link to an empty GitHub repo).

[If you are reading this in an email, clicking the video below will take you to SubStack, where you can actually watch the video]

Curious stuff.

Okay, then…

ReLoo: Reconstructing Humans Dressed in Loose Garments from Monocular Video in the Wild

Replicating physic-based cloth behavior, long since a solved problem in CGI, requires innovative use of motion priors, in order to provide a truly flexible neural recreation that doesn’t just ‘paste’ the trained movement into the synthesis.

This new offering is a typically Frankenstein-like architecture, concatenating multiple modules and methods into an end-to-end framework. Predictably, it uses SMPL as a CGI-based bridge.

The physics of the clothing are, in this case, driven by an inferred skeleton, so some kind of interpretive framework was clearly necessary – and with truly voluminous clothing such as depicted in the image below, eliciting the underlying figure at the conceptual/training stages is a considerable feat.

‘While previous years have seen great progress in the 3D reconstruction of humans from monocular videos, few of the state-of-the-art methods are able to handle loose garments that exhibit large non-rigid surface deformations during articulation. This limits the application of such methods to humans that are dressed in standard pants or T-shirts. Our method, ReLoo, overcomes this limitation and reconstructs high-quality 3D models of humans dressed in loose garments from monocular in-the-wild videos. To tackle this problem, we first establish a layered neural human representation that decomposes clothed humans into a neural inner body and outer clothing. On top of the layered neural representation, we further introduce a non-hierarchical virtual bone deformation module for the clothing layer that can freely move, which allows the accurate recovery of non-rigidly deforming loose clothing.’

http://export.arxiv.org/abs/2409.15269
SUPPLEMENTARY: https://files.ait.ethz.ch/projects/reloo/resource/supp.pdf
https://moygcc.github.io/ReLoo/

MIMAFace: Face Animation via Motion-Identity Modulated Appearance Feature Learning

This is an interesting project, and one that - again - seems inspired by the success of LivePortrait. However, the examples at the source site are of varying quality. In any case, this is firmly in LDM territory, since the model tested was based on SD1.5, which stubbornly remains the hobbyist’s favorite, due to anti-litigation measures in later iterations from Stability.ai.

‘[A] Motion-Identity Modulated Appearance Learning Module (MIA) that modulates CLIP features at both motion and identity levels. Additionally, to tackle the semantic/ color discontinuities between clips, we design an Inter-clip Affinity Learning Module (ICA) to model temporal relationships across clips. Our method achieves precise facial motion control (i.e., expressions and gaze), faithful identity preservation, and generates animation videos that maintain both intra/inter-clip temporal consistency. Moreover, it easily adapts to various modalities of driving sources’

https://mimaface2024.github.io/mimaface.github.io/
http://export.arxiv.org/abs/2409.15179

Human Hair Reconstruction with Strand-Aligned 3D Gaussians

Hair is one of the thorniest problems in neural image synthesis, and is a challenge that plagued CGI for decades. Accurate hair-style simulation involves a blend of physics, lighting and chaos theory (!) that is hard to evaluate and reproduce – at least for a close-up.

So it’s interesting to see this project using something as new as Gaussian Splatting to tackle the issue. In this case the hairstyles use ‘strands’ as a guideline for attaching and cohering the splats.

‘[A] new hair modeling method that uses a dual representation of classical hair strands and 3D Gaussians to produce accurate and realistic strand-based reconstructions from multi-view data. In contrast to recent approaches that leverage unstructured Gaussians to model human avatars, our method reconstructs the hair using 3D polylines, or strands. This fundamental difference allows the use of the resulting hairstyles out-of-the-box in modern computer graphics engines for editing, rendering, and simulation.’

http://export.arxiv.org/abs/2409.14778

Other papers of interest today

Reflecting Reality: Enabling Diffusion Models to Produce Faithful Mirror Reflections

http://export.arxiv.org/abs/2409.14677
https://val.cds.iisc.ac.in/reflecting-reality.github.io/

JVID: Joint Video-Image Diffusion for Visual-Quality and Temporal-Consistency in Video Generation

http://export.arxiv.org/abs/2409.14149

ExFMan: Rendering 3D Dynamic Humans with Hybrid Monocular Blurry Frames and Events

https://arxiv.org/abs/2409.14103

Finally, I put another piece up at unite.ai, about a recent paper that I felt did not get enough attention.

Detecting Video-conference Deepfakes With a Smartphone’s ‘Vibrate’ Function

Using a smartphone’s native ‘vibrate’ function to reveal fundamental shortcomings in a streaming deepfake system such as DeepFaceLive strikes me as a better direction than the ‘frequency’-based deepfake detection papers that have characterized the research sector’s 2024 output.

https://www.unite.ai/detecting-video-conference-deepfakes-with-a-smartphones-vibrate-function/

_________________________

My domain expertise is in AI image synthesis, and I’m the former science content head at Metaphysic.ai. I’m an AI developer, current machine learning practitioner, and an educator. I’m also a native Brit, currently resident in Bucharest, but possibly interested in relocation.

If you want to see more extensive examples of my writing on research, as well as some epic features (many of which hit big at Hacker News and garnered significant traffic), check out my portfolio website at https://martinanderson.ai.

Martin’s Newsletter

Discussion about this post