Generative A-Eye #15 - 10th Oct,2024
A (more or less) daily newsletter featuring brief summaries of the latest papers related to AI-based human image synthesis, or to research related to this topic.
Arxiv published nothing yesterday (Wednesday 9th), leading to an avalanche of catch-up posts today.
I wrote up one of them into a piece for unite.ai:
Using JPEG Compression to Improve Neural Network Training
‘A new research paper from Canada has proposed a framework that deliberately introduces JPEG compression into the training scheme of a neural network, and manages to obtain better results – and better resistance to adversarial attacks.’
https://www.unite.ai/using-jpeg-compression-to-improve-neural-network-training/
Here are the other ones that caught my eye:
MimicTalk: Mimicking a personalized and expressive 3D talking face in minutes
One of a growing number of projects that represent the potential revitalization of NeRF after 2+ years of flogging LDMs to death in the literature, and finding them intractable (in terms of post facto editing).
‘[The] first attempt that exploits the rich knowledge from a NeRF-based person-agnostic generic model for improving the efficiency and robustness of personalized TFG. To be specific, (1) we first come up with a person-agnostic 3D TFG model as the base model and propose to adapt it into a specific identity; (2) we propose a static-dynamic-hybrid adaptation pipeline to help the model learn the personalized static appearance and facial dynamic features; (3) To generate the facial motion of the personalized talking style, we propose an in-context stylized audio-to-motion model that mimics the implicit talking style provided in the reference video without information loss by an explicit style representation. The adaptation process to an unseen identity can be performed in 15 minutes, which is 47 times faster than previous person-dependent methods.’
https://mimictalk.github.io/index.html
https://arxiv.org/abs/2410.06734
Generative Portrait Shadow Removal
This offering from Adobe doesn’t entirely convince, as the de-shadowed faces are clearly diffusion output. The more interesting aspect is that there is any demand for this (besides some obvious examples, such as the hand-on-face shadow).
‘[A] high-fidelity portrait shadow removal model that can effectively enhance the image of a portrait by predicting its appearance under disturbing shadows and highlights. Portrait shadow removal is a highly ill-posed problem where multiple plausible solutions can be found based on a single image. While existing works have solved this problem by predicting the appearance residuals that can propagate local shadow distribution, such methods are often incomplete and lead to unnatural predictions, especially for portraits with hard shadows. We overcome the limitations of existing local propagation methods by formulating the removal problem as a generation task where a diffusion model learns to globally rebuild the human appearance from scratch as a condition of an input portrait image’
https://arxiv.org/abs/2410.05525
There is another paper of interest at Arxiv, called ShieldDiff, which addresses auto-censoring in diffusion models; but the keywords and images are too risky to include here. Take a look, if the topic interests you :
http://export.arxiv.org/abs/2410.05309
My domain expertise is in AI image synthesis, and I’m the former science content head at Metaphysic.ai. I’m an occasional machine learning practitioner, and an educator. I’m also a native Brit, currently resident in Bucharest.
If you want to see more extensive examples of my writing on research, as well as some epic features (many of which hit big at Hacker News and garnered significant traffic), check out my portfolio website at https://martinanderson.ai.