Generative A-Eye #13 - 7th Oct,2024
A (more or less) daily newsletter featuring brief summaries of the latest papers related to AI-based human image synthesis, or to research related to this topic.
Once training data has been abstracted into the latent space of a diffusion model, it can be difficult, usually impossible, to get the model to spit out any one of the exact images it was trained on.
It’s notable that despite the very small number of images needed for a LoRA or a DreamBooth checkpoint, very few of the multiple thousands of fine-tuned models at Civit provide a link to the source dataset (plausible deniability, I suppose).
However, a new US paper has emerged which claims to be able to reconstruct up to 20% of the training data from checkpoints and LoRAs, which is more than one would need to litigate, or issue a complaint.
I have written about the paper at Unite.ai - take a read, if the subject interests you:
Extracting Training Data From Fine-Tuned Stable Diffusion Models
https://www.unite.ai/extracting-training-data-from-fine-tuned-stable-diffusion-models/
Today’s papers, from a pretty thin Monday:
Estimating Body Volume and Height Using 3D Data
I like the idea of associating volume estimation with real-world weight, and it seems that this line of research would be a boon to AI-based physics engines too.
‘Accurate body weight estimation is critical in emergency medicine for proper dosing of weight-based medications, yet direct measurement is often impractical in urgent situations. This paper presents a non-invasive method for estimating body weight by calculating total body volume and height using 3D imaging technology.’
https://arxiv.org/abs/2410.02800
ScriptViz: A Visualization Tool to Aid Scriptwriting based on a Large Movie Database
I’m not sure this particular project is fully-cooked, but there’s the germ of an idea here around RAG and similar ‘live’ AI technologies being an aid to creative workflows.
'In this paper, we develop ScriptViz to provide external visualization based on a large movie database for the screenwriting process. It retrieves reference visuals on the fly based on scripts' text and dialogue from a large movie database. The tool provides two types of control on visual elements that enable writers to 1) see exactly what they want with fixed visual elements and 2) see variances in uncertain elements. User evaluation among 15 scriptwriters shows that ScriptViz is able to present scriptwriters with consistent yet diverse visual possibilities, aligning closely with their scripts and helping their creation'
https://virtualfilmstudio.github.io/projects/scriptviz/
https://arxiv.org/abs/2410.03224
[Combining] Text-based and Drag-based Editing for Precise and Flexible Image Editing
If there is an associated project page or demo site for this interesting paper, I cannot find it in the abstract. However, LivePortrait and its off-shoots represent a formidable standard to beat.
'Specifically, text-based methods often fail to describe the desired modifications precisely, while drag-based methods suffer from ambiguity. To address these issues, we proposed \textbf{CLIPDrag}, a novel image editing method that is the first to combine text and drag signals for precise and ambiguity-free manipulations on diffusion models. To fully leverage these two signals, we treat text signals as global guidance and drag points as local information. Then we introduce a novel global-local motion supervision method to integrate text signals into existing drag-based methods by adapting a pre-trained language-vision model like CLIP'
https://arxiv.org/abs/2410.03097
My domain expertise is in AI image synthesis, and I’m the former science content head at Metaphysic.ai. I’m an occasional machine learning practitioner, and an educator. I’m also a native Brit, currently resident in Bucharest.
If you want to see more extensive examples of my writing on research, as well as some epic features (many of which hit big at Hacker News and garnered significant traffic), check out my portfolio website at https://martinanderson.ai.