Hallucinated LDM-generated cartoons

No matter how tempted I am with the prospect of unlimited power, I will not consume any energy field bigger than my head.
‐ Peter's Evil Overlord List

So, you want weird AI-hallucinated cartoons? Boy, do I ever have you covered. Choose your fighter!

(content warnings: these are giant 'contact sheets' of 24 images each; they're big: 3,072 × 3,072 pixels; some of them are really weird, it's computer generated cartoons)
page 1
page 2
page 3
page 4
page 5
page 6
page 7
page 8
page 9
page 10

These are conjured up randomly by the computer (using the v1 LDM finetune, epoch 26 -- which is still training, over 1600 GPU hours under its belt, but how can I stop it, the thing keeps getting better?); the conditioning to the model is a bunch of embeddings picked randomly from a grabbag of about a thousand various trained cartoon art concepts (representing characters, styles, directors, genres, moods/vibes, etc.) jumbled together at random, along with generated generic prompts ("cartoon pig", e.g.) Just toss the dice and see what falls out -- the results are consistently surprising and weird, and they run the entire gamut from "pure computer-generated nightmare fuel" to "really fantastic, actually".

I say 'cartoons' when there's also the pixel art style (it's getting good, really) and a few non-cartoon things that might as well be cartoons (everything is better with a little bit of Mystery Science Theater 3000 mixed in!)

I dig the algorithmic purity and straightforwardness of LDM generated imagery -- it's an efficient way to encode ideas -- and with just a slight plan of what you're doing when you train it, you can get it to recreate pretty much anything consistently, even vague emotional states, concepts of solid drawing, specific poses for hands or mouths, etc. Now this is just random, undirected slop -- things the computer is combining aimlessly -- largely unguided and barely even touching any of the aesthetic embeddings I've trained in -- but for all of that, it's still way better than a DeviantArt-O-Matic, which is either inspiring or horrifying or probably both.

You do not want to see what these things do when you use them with intention -- or maybe you do? I sure do! I'm training control networks, I'm assembling the most improbable datasets, I'm building foundry channels for molten media. 🎵 I'm Not The Only One! 🎶

You're going to see some of the synaesthesia that I see pretty shortly. I have to create a number of samples myself, then I have to generate a bunch more with the assistance of the LDM, then I have to train a proper embedding into one of the finetunes (it'll probably be v3, v2 is about to start soon), but it'll happen, by God, like it or not. Don't let your dreams just be memes!

And I can tell you something else that you probably don't really want to know, but the longer you train these models on film, the closer the interpolations between samples look like actual animation. (This should be completely unsurprising to anybody that's thought about it, but I don't think very many people have thought about it yet.) Oh, for sure, interpolations like this video aren't going to be confused for fine animation (this month, anyway -- do check out the air guitar at 1:23 though): but it's close to zero (human) effort to make, the computer is doing the direction more or less randomly, so this is a minimum baseline, and I can tell you a dozen different actionable, accomplishable things that each will improve the output meaningfully, and together probably would get pretty close to production quality -- even for this kind of meandered computer-directed animation, before you use the trained control nets with actual intention. With the control nets, with humans selecting the key frames and every 10 frames or so, we're already there, really. I'm satisfied that there aren't any problems or limitations here that can't be overcome technically, even with just existing tools.

The bigger takeaway for me is: This model, with its mere 900M parameter U-Net, is absolutely going to learn how 'walking' works on its own, completely self-supervised -- you can see it in simple spherical interpolations -- and I wouldn't believe it if it wasn't happening before my eyes. The hands required some captioning, but a walk cycle doesn't? It'll just learn that if you train it long enough on enough still images; it's awesome in both senses of the word (an increasingly common emotion, I find.) Immensely proud of this little computational graph. Keep going, you can do it!

Also, just my v2 is going to make all of this look utterly like crap in a couple of months. Imagine what anybody with resources besides a big brain and a giant workstation containing a whole bunch of little brains can do!

On the bright side, models like these could mimic all sorts of art styles and techniques that weren't previously practical to do and make them suddenly super low-cost, and that could be a good thing. Like, if you have some artistic talent, any idea (however grandoise) for an image or animation, and a lot of patience, you could just make it happen -- weave it from voodoo basically. There are a lot of old styles that deserve revivals, reinterpretations, remixes, and reruns. And you can essentially invent your own, just combining abstract embeddings trained on your own art and your own preferences. (The abstract aesthetic embeddings are the best things ever, BTW. You can conjure up vibes just by giving them a name. If you can point to it, if you can identify something when you see it, you can train it and you will recreate it as you will. It's almost like there's a kind of sympathetic magic that actually works now.)

If you'd like to learn how to train these yourself, boy howdy can I tell you things. In fact, if you need any kind of AI system, and you aren't easy with giving OpenAI or Alphabet access to your data -- I don't blame you -- I can probably give you the tools to Do It Yourself. Hell, there are FOSS distributed triply-parallelized training harnesses these days, and that's most of the hard part done, so you can do it yourself on commodity hardware and probably should.

About the only innocuous things you can do with this technology, I've decided, are toys and games, and I'm going to do that until I get consigned to the Torment Nexus; obviously I'm turned off by surveillance, gray-hat hacking, or propaganda at scale; AI infosec is going to be a nightmare for the forseeable future (though if that's your thing holy shit is there a lot of money out there right now), and I doubt instruction tuning and RLHF and RLAIF saves GPT-4 from the machine equivalent of schizophrenia (if it does, so much the worse for the world.)

In this day and age, if you're going to build minds, you can be Dr. Frankenstein and attempt to build the biggest baddest shoggoth, you can be Noonien Soong and try to solve the insoluable problems of impressing human ethics on a machine (you can't impress them on the trainers and operators of the machine, so even if you succeed, we still get Lore: just you watch), or you can be Geppetto and make toys. I've made my choice and you should make the same one. Toys at least will guaranteed enrich your life and they can have surprising utility -- I bet you could even learn how to build a better shoggoth from playing with the right toys. If you're IN to that kind of thing, I mean.

Toys were also the thing that got regular people into computers in the first place (that and VisiCalc -- just a game by a different name, we're a fun species), so if you want to see AI that isn't a complete flaming pile of turds I think you know the path. I was reading some back issues of Creative Computing from the mid-1970s -- the dawn of the microcomputer revolution, the last time there was a moment remotely like this -- and many of the anxieties then were the same as now (data leaks! hacking! malevolent AI!) but the optimism of the people that saw the good future was inspiring -- and don't get me wrong, if we survive the near-term, this is going to make a whole lot of things a whole lot better. If we make it a few years, who knows, we might survive climate change using artificial intelligence, for example. I'd LIKE us to take a real stab at problems like that, anyway. Nobody is going to convince evil people not to attempt evil things here, but enough assholes with some imagination whittling good things out of 'foundation models' of dubious use might make anybody reconsider why they're so on-line all the time. Being a Geppetto might make the blow a little softer if we do get full cyberdystopia hell and all end up relegated to human zoos. We can go there with our own little breakdancing robots and kick some fascist ass along the way. You get what I'm saying? The Dr. Frankensteins are tools, the Noonien Soongs are pissing into the wind, but the Geppettos all know the score and they'll bring tasty treats to the masses before our evil overlords destroy us all. So get Geppetto-pilled, even in the true doom case we at least tried not to be complete bastards along the way.

By the way, on the language model front, I got LLaMA-30B inference running, quantized to 8 bits so it fits into two 3090s. (I have an application for an LLM that actually makes sense! That's an even easier way to get the weights from FAIR than a torrent... generations from LLaMA-30B are very definitely superior to GPT-3-davinci002 -- this is the first open source model I can confidently say that about. It kicks all kinds of ass. 65B should be pretty astonishing. Which is great because OpenAI and Sam Altman are currently on my 'worst things in the world' list -- I say that as somebody whose 401K is packed with Microsoft common. The AI arms race is going to be a bunch of creative destruction on scales heretofore only imagined, and I'm not certain the principals actually understand that.)

Ah well, what can I do? I'm running Vulcan's own dream forge on my counter, that's what I can do, and I'll be able to do more tomorrow. This hammer hits hard! If it is the end of the world, there'll be Skittles and fireworks for everyone, at least.