Pandemic Diary 6/1/2023

Pandemic Diary -- 1 June 2023

Warning: a wall of text, and then lots and lots of pictures!

Let us all appreciate Red Dwarf predicting the actual year 2023 (video). Kryten does his reinforcement learning with a holo-mallet.

At least four incredible things happened this week, and I'm still processing all of them, and I need to triage my time pretty tightly to avoid decision paralysis and it's been years since I've had that kind of challenge. What a week. What a wonderful week. God, it's good to be alive. Happy June, everybody.

First, congratulations to (fellow Petals Projecters!) Tim Dettmers, Artidoro Pagnoni, Ari Holtzman, and Luke Zettlemoyer over at the University of Washington. Their qLoRA paper is a clever (and super-cheap!) approach to a difficult problem. I've been a big fan of fine-tuning the Fun Size language models for a while -- look, you make do with what you have -- but doing it to even the sleekest shoggoth is expensive and takes forever -- well, until it isn't and doesn't. Great googly moogly!

I... have a lot to say about this (it's incredible) but honestly this is one of those things that sounds lunatic unless you just see it in action (what's the big deal? a more efficient way to do low-rank adapter tuning?), and I still haven't really stopped jumping up and down with excitement, so stay tuned. We're going to see some things, you're going to hold them in your hand. If you want a peek, go check out Guanaco-33B -- there's a simple demo you can just run on HuggingFace -- but it gets a lot better than that. That's just a "look what you can do, broadly, with only 33 billion parameters!" demo. When you have boutique data and a specific use case to sharpen your model toward? Mmmmm, you can make power tools, power tools with natural language interface that can outperform any general-purpose model. It's so good.

All the noises that VC tools are making in the media about "AI existential risk, ooga-booga, be afraid!" currently are not unrelated to these things, by the way, if it wasn't completely obvious to you. We live in interesting times.

The second marvelous thing that happened in this marvelous week -- and this is even geekier, I'm sorry -- is that ggml, the C++ library behind that llama.cpp thing, got CUDA support (and steering vectors like "no toast!", incidentally, which is also cool.) Not in the official release yet, but there are pull requests with it fully implemented, so we're off to the races. This is another thing that maybe sounds minor and weird and mundane for something 'marvelous', but there are implications.

I'm already a fan of ggml. Most of what I've been doing the last year is just replicating models described by somebody else -- I make improvements, sure: improving performance, tying them together with other models or programs to make little tools, or training the models to burnished brilliance using a few fancy tricks I've picked up or (re)invented along the way, then I move to the next thing. But ggml and model quantization and the tools around it makes a lot of skunkworks stuff, at or near state-of-the-art, more accessible, and there's the potential of finding actually useful advances, new things, poking around there. I've used it to play around with samplers and quantitative benchmarks close-up for LLMs the past six weeks, because there are a lot of interesting ways to produce text with a computer beyond beam search and top_p/top_k, and benchmarks for strong LLMs are becoming all terrible and meaningless (perplexity is seldom a measure of anything you care about at larger scales or after tuning, and bridging the gap between qualitative and quantitative metrics without throwing your hands up and asking another language model to maybe succeed at it would be nice). You do fast iterative tests or long experiments on massive numbers of outputs of actually capable models and get somewhere with tools like that, without a massive compute cluster; a laptop will do. And if I get something useful or interesting, it's just a pull request or a few e-mails and those that know, will know, and that's the way it should be. A nice open ecosystem where useful things are still getting done. I'm just here for peace, liberty, freedom, and the rapid proliferation of artificial intelligence; unfortunately, in the world we're building right now, I think we will not have the first three without the fourth.

But that isn't quite the reason this is kind of big -- one of the things slowing native AI-powered consumer apps is building anything on top of Python makes it a pain in the butt to release. (There are various solutions of course, but they're all a pain in the butt.) Python dependency hell is one of the lower tiers, I'm pretty sure; you want to use virtual environments to avoid your Python-based deep learning systems breaking some point down the line, and that kind of process is horrible for anybody that isn't accustomed to command-line terminals (and many people that are).

With a light self-contained low/no-dependency C++ version, you can compile it, download whatever model to pair with the thing, and off you go. That's something people that aren't terminal terminal dorks might actually use. Still not really ideal (what? you have to compile it or trust some binary?!) but a hell of a lot more user-friendly than "here's hoping you notice you didn't install pip in your new conda environment before you run the setup script and install the pippable stuff over whatever's in your base environment, while the conda stuff installs in the new place, so now neither environment works, merry Christmas". And yes, this means that the ideal deep learning setup has at least six different versions of Python, ten of numpy, six of torch, three of TensorFlow, and four of HuggingFace transformers installed. This is normal. This is fine.

But of course now with CUDA support you don't have to sacrifice GPU acceleration on ggml; if that's available your models will cook. The ability to build things that work cross-platform (on edge devices OR powerful servers OR web browsers) with relatively easy installation on one source code base with hardware acceleration where available should help a lot with app builders.

Those are two niche and technical things, but they are big in themselves and their implications are maybe much bigger. Maybe in a month something else that cannot be explained in a single sentence without sounding like a Martian happens that changes the landscape, who knows? I promise the next two incredible things are more relatable to normal people not living inside their own heads. You have to understand, things are perfectly ordinary over here.

The third wonderful thing that happened in Wonderland this week is that we had a birthday party. Actual real world things happening in real life with real people! My sister and my father both arrived to this world on the same calendar day, and I am blessed to have them both, so we do the double birthday bash each year. It was absolutely gorgeous outside -- we spent most of the day in the backyard -- and I felt the sun and we had a cookout. The nieces and the brother-in-law and the dogs were there. Presents and presence were exchanged. We laughed a lot. It was beautiful.

No matter where we end up riding with Phaeton in his hijacked chariot -- whether we crash and burn, or bring the dawn -- that's just where we're going, but this stuff along the way is what's going to be matter; it's fleeting and irreplaceable. I don't exist to make sand draw memes or write sonnets for me (although I am enjoying doing that tremendously and it's probably necessary for me to do!)

The fourth thing is maybe more relatable too: those pretty pictures! Puzzle Box is still doing its thing; I don't have any reason to stop it, it insists on further practice and it keeps spitting out better and better stuff so I let it continue. Has to be around 2000 GPU-hours now?

Lucky epoch 13 dropped a couple days ago, anyway, and is it ever an improvement; while there are a few special less-obvious things that I do to help this work so well, most of it's just good data and quality machine practice. If you can put together a good data set, tilted to whatever your own tastes are, and you can get at least an RTX 3090 worth of compute (there are some free-tier cloud compute options!), you can make one like this, too. Make sure your captioning game is on point, however you do it (manually, with machine captions, or a mix -- I'm using a mix): using accurate captions improves the model's accuracy to conditioning tremendously (i.e. you're a lot more likely to get exactly what you ask for.) I've always prioritized accuracy up with good aesthetics; it's wonderfully satisfying to see something strange work with no effort beyond some careful choreographing at the start and all those quintillions of FLOPs of training. There are plenty of freely distributable high quality datasets out there -- I've been collecting and trading datasets with other people and/or robots for a long time, and some of the public domain/CC-redistributable stuff will be going up in torrents.

How do I know when it's "done"? Who knows? I've still never seen a deep neural network that wasn't undertrained; more likely I'll just train Puzzle Box v2 until there's a better architecture for training a v3 on top. Compute cost is trivial; a single RTX 3090 draws 350W so ~8 kW-hr/day, electricity is cheap (I, blessedly, live with abundant and super-cheap electricity.) It's like a Sally Struthers commercial ("for less than 70 cents a day, you can train your own paintbot!") At some point it'll just sort of oscillate around whatever optimum it's found, but we're still making headway so I don't know when that will be.

Wait for the loss to minimize? That happened three epochs ago (at least that's the current minimum) but epoch 13 is qualitatively significantly better than epoch 10 and nothing appears to be overfitting and sample diversity is tremendous, so. The thing to remember is that validation accuracy, while nice, is optional. The loss number matters when the product isn't quality yet; right now I could compute FID against some standard dataset, but fidelity to COCO is not what I'm optimizing for. I am optimizing for light, joy, and happiness.

How do you optimize for that, you wonder? Well, you steer the boat. If the boat doesn't head where you want, you steer it where you want to go; you show the model more of what you like until you get it. I hear that there are a few people spending literally thousands of dollars a month on Midjourney generations. While I understand the magic of synthesizing imagery using the spirit of the Djinn, I can't even imagine wanting to pay that kind of money to access a model that I can't steer myself. I hope those poor people find salvation, and soon, because you can do so much better. Steer the boat!

Really, despite being completely different architectures, there are similarities in finetuning diffusion models and language models. At the end of the day they're both just big computational graphs and you're trying to improve the quality of their output by having it learn from better/task-appropriate examples. In pre-training, scale rules all in both cases, but after that, data quality dominates, and after seeing that unfold a few times it's starting to seem to me that's all there is. No matter what kind of deep learning model, there is this quiet latent ability under the surface that cannot be accessed by simple conditioning without appropriate finetuning. And once your model is sufficiently well-trained to leverage that ability, quantitative metrics begin to mean less. Why is Puzzle Box v2-e13 qualitatively better than Puzzle Box v2-e10 -- in accuracy, in aesthetic quality, in output diversity, in recontexualizability, in inpainting, in outpainting, etc., it's just making better output and not even really close -- while it reports a higher loss on the validation set? I dunno. Why do quantitative benchmarks in language models get harder to interpret or compare and less informative as you train farther? I dunno that either, yet, but it's quite possibly the same general reason!

The current paradigm of broad but midwit models and narrowly superhuman tools that anybody can make is absolutely the sweet spot for this kind of thing IMO, so we should enjoy it for whatever amount of time it exists. Should the massively-multimodal models capable of driving robot killcops roll out in a year or three, we might not get so much grace from our overlords (this is what many of the VC tools warning about "AI safety" are building or investing in now BTW, so it's easy to see where sincerity is in this space. We can match the models shortly enough -- there are no secrets in linear algebra, and systems engineering is only a little more tricky -- but the specialized robot hardware is more difficult. I continue whispering Voltaire's prayer -- "Lord, make my enemies ridiculous" -- it's worked thus far.)

Anyway, enough nonsense, here are pretty pictures that my model made that I think could go on the refrigerator. I should be clear: this was Stable Diffusion 1.x, it's only been trained at 512px resolution -- only a quarter-megapixel, this is a small model! -- although many of these generations are bigger (by training non-square aspect ratios you can expand the size of your canvas quite a bit and still get good generations.) These were generated only with text prompts, no extra conditioning (you can often do even better by using the aesthetic embeddings, or the control networks, negative prompting, etc.) I give you the prompts used with the picture; I prefer mostly short and uncomplicated prompts -- leave 'vitamin words' and 'magic spells' and bizarre strings of styles or genres or artist names for weird generations that need them, a good model should give you pleasing results from straightforward conditioning -- although many of these (I think) are quite amusing. I should say that, besides the aesthetic quality (oh, it's okay) and the accuracy (it's doing all right), the diversity in output in this one pleases me; I'm not sure how all of those things got balanced in a U-Net with 850 million parameters, beyond just training through that long, long tail, but between the pretraining and all the tuners before me and my lunatic large-scale reinforcement learning and aesthetic preference projects it's seen the world and it knows how to press my buttons. What a beautiful thing.

beautiful fireworks in the sky

the server room, 1985

figurative portrait of a geometric scifi android

a ruined portal in the forest

a 1980s glamor shot portrait, costume jewelry, sequin dress

a magical castle in the clouds, surrounded by trees and rainbows

a forest under the stars, azure and green

a beautiful starling

a photo of a hamster wearing an orange beanie and sunglasses holding a sign that reads HIP HOP HAMSTER

spaceships battling in deep space

underwater shot looking up at the sun

porcelain Japanese doll with sakura blossoms on her face

a cartoon pig

a pug padawan

an abstract geometric pattern (1/3)

an abstract geometric pattern (2/3)

an abstract geometric pattern (3/3)

painting of a curled tabby cat on a blue table pink flower wallpaper

a dilapidated house on the edge of a cliff overlooking a stormy sea

an ancient Greek ship flying through the sky

old VHS screenshot of a neon cityscape

an alligator in a leafy wetland

a Liberace impersonator performing in a public bathroom

an afternoon picnic in a beautiful park

schematics for a Babbage difference engine, in pencil

a temple in a serene Japanese garden

complex Bauhaus fabric construction, oil on canvas, soft pastel

a woman made of bones and gears, with crab legs for hair, nightmare fuel

a creepy hand reaching from the water

a woman wearing in a white dress in front of a glowing window

a mattress made entirely of green Jell-O

a stairway to heaven, chiaroscuro oil painting

a dump truck scuba diving in a coral reef

a person walking with their dogs in the rain

an implausible oil painting

coffee time, digital drawing

a cosmic peacock in an apple tree

engineers and scientists trying to fix GPT-4

a painting of a shaman

the excursion, in the style of Tristram James Ellis

a crystal dragon pounces from above

a portrait painting of Pee-Wee Herman by Gilbert Stuart

a soldier with a machete

the evil killer sees you in the dark

a woman hunts monsters with her pet lightning rat

vast infinite library, wide color spectrum (1/2)

vast infinite library, wide color spectrum (2/2)

woman walking in modern streets by Michelangelo

a beautiful milkshake with whipped cream and sprinkles

underwater Pride parade (1/2)

underwater Pride parade (2/2)

a steampunk palace in flames

an uncanny dark road at night, horror

an inflatable toilet in the middle of Manhattan

an ancient ship made of rusted copper

a cat sitting in a teacup

a cafe in Paris

a cassawary knight in armor

a witch casting a flame spell

sunken ruins of an ancient city, in eerie green light

a painting of a surreal dream

an axolotl

a moon over a tranquil landscape

a lamp shaped like a dog

the alchemal laboratory in my house

a beautiful oil painting

the sun shines through the canopy of leaves in the forest

the cockpit of an airplane

the NRA's chimpanzee mascot

Polychrome, the daughter of the rainbow

a church icon on the planet of the apes

a friendly purple alien appears in an apparition

you're about to be run over by a tank, bub

Mozart on a motorbike (1/2)

Mozart on a motorbike (2/2)

a steampunk android, 1890s artificial intelligence

diaphonous intricate vivid portrait of a woman in reverie

brutalist Pride month sculpture

a rococco samurai wearing a turban

1870s deep learning rig by Jules Verne and NVIDIA

lo-fi desert scenery through an open window

a grey rectangle

a Victorian house sinks to the bottom of the ocean with chains and anchors

a bird sleeping peacefully in a bed like a human (1/1)

a bird sleeping peacefully in a bed like a human (2/2)

the ancient temple of Apollo

Jeff Foxworthy in his horror film, "Are You Smarter than a Serial Killer?"

an illustration of an angry bear

music video, The Buggles, "A.I. Killed the Video Star", 2027

a demonic miasma enveloping a lost soul in a dead forest

the vaporwave mind flayer

a woman trapped in a cube of glass slowly filling with water

Arches National Park

a beautiful house, environmental dreamwave digital art

a painting of enchantment, magic, and wonder

a room in a five star hotel with an ocean view (1/2)

a room in a five star hotel with an ocean view (2/2)

a moss-covered turtle walking through an overgrown city

Lisa Frank's garden of peace and serenity

candid trail cam footage of George R. R. Martin

1970s retrofuture Mars mission rocket

an angry mobster

a clear photo of the Loch Ness monster

idk, just give me an anime girl, that's what these things make right?

a crumbling ancient Roman statue of the Incredible Hulk

over-the-top Saturday morning cartoon villain secret lair (1/2)

over-the-top Saturday morning cartoon villain secret lair (2/2)

a portrait painting of Hulk Hogan by Leonardo da Vinci

chronic depression

undead D&D artificer

a forest fire, watch out!

a courtroom sketch of a Toyota SUV on trial

a Mesopotamian city in the time of Gilgamesh

a girl with blue hair and badass sunglasses

interior still life early 20th century kitchen

a skeleton found with pirate treasure!

Donald Trump's final form

a man made of tacos

a seascape featuring a lighthouse

a family that lives solely in cyberspace

a retrofuture computer astrologer

an artificial intelligence

the litter we left after partying on the Moon

an angel over a tumultuous sea

a portrait of Count Dracula

a bright full moon over a snowy landscape

Barbie wearing a blindfold

hyperrealist charcoal ink splatter fighter

a sumptuous feast

Buddha as a bunny (1/2)

Buddha as a bunny (2/2)

a dog at the top of a mountain at sunset

a lizard samurai

the eerie moonlit jungle

photo taken inside a nuclear silo

a maximally vibrant sunset

Alphabet interrogating the chatbot before releasing it

two gravitationally interacting galaxies

Bill Gates working at his desktop

a steam locomotive in the rain

an anime powerlifter throwing a hadouken attack

a mecha princess (1/2)

a mecha princess (2/2)

a rooster that's secretly a fire-breathing dragon

a deep mysterious forest, with a slight forboding

1960s concept art of a human megapolis on Mars

will-o'-the-wisps, a cosmic autumn night

zombie cyborgs. you didn't think it was possible, did you?

the pirate king returns from the grave for revenge

the C++ programmer of median attractiveness

bodybuilders selling perfume in a mystical jungle

PALM-E

a rhinoceros beetle the size of King Kong

interior photograph of a clown church

driving Uncle Cletus's car into the swamp

the performance

Che Guevara by solarpunk Klimt, textures

alternate history WWI with British war blimps

an optical illusion... that isn't

a feeling of confined loneliness, introversion, and quiet

humans are being cloned from the inside out in strange blue sci-fi vats

a Renaissance portrait of that one asshole

an ancient mecha-angel in Atlantis

a Jedi duel depicted in art nouveau style

a solitary tree, standing tall and proud

cosmic chimp

a picture of primal fear

Yggdrasil

cinematic art deco illustration of lady who's sure she's going places

the recommendation algorithms begin to take their effect

some eldrich unholy insectoid nightmare, idk

a beautiful dog with blue hair and badass sunglasses (1/2)

a beautiful dog with blue hair and badass sunglasses (2/2)

an autumn dryad wearing a crown of leaves (1/2)

an autumn dryad wearing a crown of leaves (2/2)

a bubble riding a bicycle

dragons are a lot like kitty cats at this age (1/3)

dragons are a lot like kitty cats at this age (2/3)

dragons are a lot like kitty cats at this age (3/3)

a brave knight of ice and frost

it's a tree, except whoops, it's a mushroom

earth's 4 corner simultaneous 4 day Time Cube

a cake made of mostly good things

the really weird room in the White House that never gets used

Cortinarius sp.

a fashionable T-rex with a handbag

oil painting of a jazz band performing on the beach at sunset

the difference engine, album cover

a fantasy owl alchemist wearing armor

diorama of a post-apocalyptic city scene complete with smoke and lights

a cat in a trench coat (1/2)

a cat in a trench coat (2/2)

a hell hound, but he's actually a good boy

a forest trail with a sign that says "WELCOME TO THE WOODS"

the Furby knows her day will come

the secret garden in the sacred library

a cat that's a wizard wearing a Superman costume

a wildcat eating at Burger King wearing a crown

a fox kit, CGI animation

33mm underwater photography, a sunken PS1 controller

little gerbil friends

rockabilly soda shop, chrome and Kodachrome

a future destroyed by technology

a corgi wearing glasses sitting on a couch

a Christy girl, by H. C. Christy

a glitched horse (1/2)

a glitched horse (2/2)

UFOs intervene in the Battle of Waterloo

floral home altar, pastels

the witch's cottage

the Lisa Frank monolith in the valley of flowers

a cartoon iguana

a catgirl by Eugène Boudin

a rusty old farm truck

a lifelike reconstruction of Neanderthal man

a plate of mashed potatoes as served in Heaven

cartoon Sasquatch

Vincent van Gogh readies the sniper rifle

a radiant illustration of a beautiful scene

space war!!!

a smiling cartoon sloth

early morning

The Adventures of Sherlock Holmes, by Thomas Kinkade

the jilted lover's leap

Bill Shatner wants you to know he wasn't the Liberace impersonator

half dragon, half horse

badgers and mushrooms

an English garden, but it's in Vegas

a jack o'lantern as big as a house

a smiling sloth wearing a leather jacket, cowboy hat, bowtie and kilt, holding a staff and a book

tiny elves living on a tree branch

a man playing guitar under a coconut tree in the Philippines

a hippopotamus riding a bicycle

a solarpunk Roman gladiator in gleaming armor, digital art

gouache aquarium painting

a charming scene made of colorful clay

Queen Elizabeth II locked in a Yu-Gi-Oh duel with a white dragon

a steamy onsen

a man and his robot assistant by Moebius

a whimsical painting of a giant colossus moth

a portrait painting of Mr. T, by Norman Rockwell

Dakar Rally, 2040

the cybernetic werewolf is about to strike!

a photo of a guinea pig fishing in a stream wearing a little straw hat

a weeping willow dryad

local news TV coverage of a clown on a unicycle

the climactic fight sequence from a 1970s wuxia kung-fu action film

an underwater retrofuture sci-fi industrial extraction complex

isometric forest, pixel art

a kitty astronaut

cartoon illustration of the biohazard crew

a surfin' granny

a close up photo of a wee tiny little armchair

little baby Cthulhu

behold the world that you helped to destroy

Lake Mead in winter

your Crunchyroll husbando with crystal artifacts

a frog-faced demon

somebody caught moments before they sneeze

a mouse standing in the rain holding an umbrella (1/2)

a mouse standing in the rain holding an umbrella (2/2)

watercolor art of a cute magic lion

a house decorated for both Christmas and Halloween

a teddy bear walking down the street

a rabbit dressed in Victorian attire (1/2)

a rabbit dressed in Victorian attire (2/2)

portrait of a Mayan lady

a husband and wife watch the sunset atop a mountain

a lioness with many lion cubs

white foxes sleeping in the snow

cyberpunk Mark Zuckerberg welcomes you to the metaverse

a man serving a woman ice cream in a 1950s diner

walking among the redwoods is like being in another world

seashells on the beach

a hedgehog using a calculator (1/2)

a hedgehog using a calculator (2/2)

isometric retro computer vaporwave

technical schematics for a steampunk airship

Lee van Cleef as a sci-fi space rogue

a close-up of a wyvern

a sign that says "DEEP LEARNING"

the Eiffel Tower is obliterated by a meteor strike

a tree with leaves that look like purple balloons

a polar bear holding a lollipop on top of a building

a portrait of a dog dressed as a businessman

an illustration of an elaborate treehouse

a watercolor illustration of a chibi crystal ice queen (1/2)

a watercolor illustration of a chibi crystal ice queen (2/2)

a pirate mermaid

Merry Christmas!

a happy funny good dog

a fox under a tree wearing headphones

a hard-boiled noir detective that's a turtle in a trenchcoat

a Daliesque forgotten temple, ruined with time

a painting of a dog eating cake