-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
I Forced an AI to Watch Santa Claus Conquers the Martians #33
Comments
I tried it out on some panels of Watchmen because that's what I had on hand. The results are better than I expected, but probably not good enough: "A close up of a piece of paper" It seems panels with a lot of text are going to be challenging, and Watchmen is a very wordy comic. |
That's quite good, if you feed it all panels from a whole comic, replace "person" with a fleshed our character, it may be interesting |
I shifted this a little bit, or at least I want to try a different source for my imagery. What follows is the result of passing frames of a video through Microsoft's image description API. Paragraph breaks happen when the API failed to produce a caption.
|
I'm considering:
|
There is a flow of narrative but the abstract nature cries out for specifics, however disjointed. Is there a way to feed each return from the image description API into something else to get returns that fill in-between? |
I like that output! The first paragraph looks very repetitive, but not every sentence is actually the same. I kind of like that. Maybe just deduplicate and enlarge sentences that are actually identical, and when it alternates between two similar sentences, leave that alone for effect? (Actually, I guess that would be the natural outcome of simple deduplication—it would be more difficult to not do that.) On the other hand, something that annoyed me as a grammar enthusiast (and doesn't seem to be your fault, because it's just the text the service gives you) is that the descriptions always say "A close up of…", when it should be "A closeup of…". (Chrome's grammar checker wants to change "closeup" to "close up" when I type it, too. I think Chrome is teaching people bad grammar. 😠) |
I'm calling it done. I may tweak it some more, but for the record I definitely had a 50k+ PDF before midnight. I just had one last bug that took a few minutes to figure out. I wanted to do this process on a different movie -- the sample above is from one of the Transformers movies -- but I didn't get around to finding a good digital copy of the full length film. Instead, I went with something quicker, easier, and out of copyright: Santa Claus Conquers the Martians. I extract 14,635 frames, then fed them to Microsoft's Cognitive Services for automatically-generated captions. Then, I just added some syntax to help it flow a little better and created a more book-like layout. This is less ambitious than my other ideas, but I'm pretty happy with it. Unlike those other ideas, I actually got this one finished! |
:)
|
First output was like Gertrude Stein improving Don DeLillo. Second is more straight Beckett. I assembled a 5-page poem a while back solely out of captions from a stock photo site and it reads a lot like #1. |
I have three ideas this year, and I'm making separate issues for each. This is one of them.
A little while ago I made a twitter bot out of the text you get when you ask Microsoft Cognitive services to describe what it sees in an image. There are some limitations to that API, but I bet it could work on a series of images to eventually produce 50K words of text. The trick will be figuring out a meaningful series of images.
It might work to feed it a graphic novel one panel at a time, but I don't think the AI works very well with drawn images. I feel like it's just going to say "a picture of a drawing" every time.
I have yet to try it, though, so maybe that will work or maybe I'll need to think of something else.
The text was updated successfully, but these errors were encountered: