525
Spotify is going to clone podcasters’ voices — and translate them to other languages
(www.theverge.com)
This is a most excellent place for technology news and articles.
I looked at your sources or at least one of them. The problem is, that, as you said, I am a layman at least when it comes To AI. I do know how fMRI works though.
And I stand corrected. Some of those pictures do closely resemble the original. Impressive, although not all subjects seem to produce the same level of detail and accuracy. Unfortunately, I have no way to verify the AI side of the paper. It is mind boggling that such images can be constructed from voxels of such size. 1.8mm contain close to 100k neurons and even more synapses. And the fMRI signal itself is only ablood oxygen level overshoot in these areas and no direct measurement of neural activity. It makes me wonder what constraints and tricks had to be used to generate these images. I guess combining the semantic meaning of the image in combination with the broader image helped. Meaning inferring pixel color (e.g. Mostly blue with some gray on the middle) and then adding the sematic meaning (plane) to then combine these two.
Truly amazing, but I do remain somewhat sceptical.
The model inferred meaning much the same way it infers meaning from text. Short phrases can generate intricate images accurate to author intent using stable diffusion.
The models themselves in those studies leveraged stable diffusion as the mechanism of image generation, but instead of text prompts, they use fMRI data training.