Read Your Movies As Automatically Generated Comic Books

March 22, 2021

A research paper from Dalian University of Technology in China and City University of Hong Kong (direct PDF link) outlines a system that automatically generates comic books from videos. But how can an algorithm boil down video scenes to appropriately reflect the gravity of the scene in a still image? This impressive feat is accomplished by saving two still images per second, then segments the frames into scenes through analysis of region-of-interest and importance ranking.

For its next trick, speech for each scene is processed by combining subtitle information with the audio track of the video. The audio is analyzed for emotion to determine the appropriate speech bubble type and size of the subtitle text. Frames are even analyzed to establish which person is speaking for proper placement of the bubbles. It can then create layouts of the keyframes, determining panel sizes for each page based on the region-of-interest analysis.

The process is completed by stylizing the keyframes with flat color through quantization, for that classic cel shading look, and then populating the layouts with each frame and word balloon.

The team conducted a study with 40 users, pitting their results against previous techniques which require more human intervention and still besting them in every measure. Like any great superhero, the team still sees room for improvement. In the future, they would like to improve the accuracy of keyframe selection and propose using a neural network to do so.

Thanks to [Qes] for the tip!

13 thoughts on “Read Your Movies As Automatically Generated Comic Books”

Tobasco da Gama says:

March 22, 2021 at 8:42 am

I could do without the posterize effect, but the layout and frame selection is incredible.

Report comment

Reply
1. Ostracus says:
  
  March 22, 2021 at 8:56 am
  
  I could see this being used for meme generation.
  
  Report comment
  
  Reply
2. UnderSampled says:
  
  March 22, 2021 at 8:59 am
  
  I was wondering if there might be a neural style transfer technique that would be better. I don’t think there are, but certainly there should be some much better filters that could be used, à la “A Scanner Darkly” (2006)
  
  Report comment
  
  Reply
  1. Kevin O'Brien says:
    
    March 22, 2021 at 3:05 pm
    
    The visuals from A Scanner Darkly would be rather difficult to recreate, as it wasn’t a filter.
    Filmed frames were handed off to animators who rotoscoped every other frame and then a proprietary frame interpolation program (which, IIRC, was either developed for A Scanner Darkly or Waking Life) filled in the other 50% of the frames.
    
    (I got some interesting details on A Scanner Darkly when I met Golden Army Trio, who did the soundtrack, at a concert which was improperly booked. I was the only person who showed up and they were kind enough to chat with me for a bit before packing up.)
    
    I’m not sure how well ML methods could recreate it, but it’d be interesting to see.
    
    Report comment
    
    Reply
3. scott.tx says:
  
  March 22, 2021 at 12:56 pm
  
  maybe the posterize is to increase compressibility as much as possible.
  
  Report comment
  
  Reply
bluecollarcritic says:

March 22, 2021 at 9:12 am

This is impressive

Report comment

Reply
Michał Margula says:

March 22, 2021 at 11:07 am

Imagine using it for personal movies from your phone.
This is really impressive and could give another outlook on well known movies.

Report comment

Reply
Hirudinea says:

March 22, 2021 at 12:39 pm

Might liven up a Zoom call.

Report comment

Reply
1. 𐂀 𐂅 says:
  
  March 22, 2021 at 3:13 pm
  
  Sounds like Microsoft Comic Chat on steroids.
  
  Report comment
  
  Reply
  1. bat says:
    
    March 22, 2021 at 4:39 pm
    
    but speech bubbles shall have comic sans everywhere.
    
    Report comment
    
    Reply
Alan says:

March 22, 2021 at 9:36 pm

There are plenty of action sequences with minimal speech, and a tendency to shake the camera (shaky cam) along with rapid editing of viewpoint. Several “Jason Bourne” action sequences are shot in this style.

Rapid cut scenes are similar to a stroboscope, and can be a problem for people with photosensitive epilepsy.

A paper reproduction of the scene might help people who find these scenes confusing, nauseating, or dangerous.

Report comment

Reply
SlowEng says:

March 23, 2021 at 9:06 am

I think the description of “research paper” is a bit strong. One, the paper is not peer reviewed. Two, at whopping 19 pages this “research paper” is very light on actual specifics and data. It seems to be a lot hand waiving and resting only a few short examples that seem to demonstrate it works but could have just as easily been faked or cherry picked.

While I have some experience in AI/ML, my actual expertise is in electromagnetics, this “research paper” really does not seem any where on par with traditional academic research papers.

I am very skeptical as to how valid this paper actually is.

Report comment

Reply
Kyra says:

October 23, 2022 at 4:48 am

Hello – I would like to get in touch with the CEO or Director of this company. I own several films as well as a comic book company. I am interested in getting to know more and prices. Thank you.

Report comment

Reply