Creating the ‘Kingdom’: Wētā FX Brings Life to the Latest ‘Planet of the Apes’ Outing

VFX Supervisor Erik Winquist and Animation Supervisor Paul Story talk apes, plates, solvers, water, and other constituents of their heroic work on 20th Century Studios’ all-new action-adventure, ‘Kingdom of the Planet of the Apes.’

From The Lord of the Rings to Avatar, Godzilla, and the Planet of the Apes franchise, New Zealand-based VFX studio Wētā FX has helped create some of the most spectacular and memorable big-screen experiences of the last three decades. In Kingdom of the Planet of the Apes, the fourth installment in the rebooted simian series, directed by Wes Ball, Wētā has again done yeoman’s duty to ensure that the visual effects and animation are equal to the aspirations of the storytelling.

Specifcally, Wētā FX:

was solely responsible for creating the visual effects in Kingdom, having also crafted the digital apes and environments for the previous films in the series.
designed and built 11 new high-resolution digital characters, as well as a handful of secondary characters, all of which have greater abilities to speak.
leveraged innovations from previous Apes movies and Avatar: The Way of Water, and applied them in new ways, including the use of dual-camera facial rigs to capture the detail and emotion of a performance with greater fidelity.
utilized over 1,000 crew members to realize the VFX required for almost 1,500 shots.

Among those overseeing this gargantuan undertaking was Overall VFX Supervisor Erik Winquist, who led both the production and Wētā FX teams, and Animation Supervisor Paul Story. Here’s what they told us about their experiences.

But first, enjoy the film’s final trailer and the “Worldbuilding Featurette:”

Dan Sarto: Wētā has worked on all the films in the new franchise, beginning with Rise of the Planet of the Apes in 2011. Did that make it easier to work on this latest one?

Erik Winquist: As far as the work's concerned, it was all brand-new stuff in the sense that we had 11 new hero characters, plus a bunch of secondary characters to fill the thing out. The big thing with this installment is that it’s set 300 years after the last one, and now all the communication between the characters is spoken. There's hardly any sign language, where previously it was the other way around. Before, a lot of our acting, or a lot of the actors whose performances we were translating, was very much about pantomime or what was going on in their face.

Paul Story: Knowing that we had so much dialogue, we needed to make sure that we had a fast, efficient way that we could get the performances through to our animators and make sure we had a nice consistent base for them to work from. So, we used a similar sort of rig to the one used in the previous Apes films, but we used a solver, a deep-learning solver this time around. Previously, we typically would go through an actor puppet to check that we were getting the same performance, and then translate that through to our characters. But with the advancements we've made over the last few years, we're able now, in a roundabout way, to map directly to our characters.

DS: You don't need the intermediate digital puppet?

PS: We do when we’re setting it up and, as far as the solver goes, it's still in there in the background. It sort of solves through that, but then it goes through a mapping process to get to our character. So that just takes out a whole level of tweaking and sorting out that the animator would have to do otherwise. We can go straight to our character, and we get a nice consistent base for each animator to start on, and a consistent performance for them to work from as well, which is really helpful. If you’re keyframing, each animator has their own sort of style. But when we go through a solver, we're making sure from the beginning that we have something consistent to work from.

DS: Have there been comparable advances in terms of motion capture? I think I remember someone at Wētā saying, with regard to one of the previous Apes films, that there was now greater freedom in how mocap performances can be shot. For example, you could shoot in the forest, or wherever you needed to go, and it no longer had to be a super-controlled environment. Is that accurate?

EW: The capture side of it has obviously taken a step forward now as well. The thing that's interesting about these Apes films is that Rise came on the back of the first Avatar movie, and was able to take the performance-capture thing that Avatar had essentially invented, and really refined in the confines of a motion-capture stage, and take that outside. Fast-forward 10 years, and here we are again coming on the heels of another big Avatar film that had spent years further refining and honing the performance capture. And now we've taken it outside again.

In the previous Apes films, we were using just a single face camera, for example; now we are using a stereo pair of small cameras on that face rig. Because of that, the MoCap team is able to generate a per-frame 3D mesh of the actor's face, which feeds into the deep-learning solver. And it really just gives us a much more nuanced picture of what's going on in every facial expression. In the same way that the Avatar film was using the live 3D depth comping via a pair of machine vision cameras that were straddling the matte box of the hero motion picture camera, we were using the stereo pair of machine vision cameras, which were there in every setup with that motion picture camera, as a means to get a better view, a greater field of view of what was in front of the lens.

So, we could always incorporate that if we needed to. For example, if we were out in the forest someplace and the mocap coverage was starting to get a little bit spotty, we could always fall back on using a 3D depth reconstruction of what was in front of the lens – which was seeing a lot more than the anamorphic lens that they were shooting with was able to see. We could incorporate that into the body solve for the performances, if necessary. So that gave us the confidence to go forward if things got a bit sketchy with our coverage.

So, if Wes and Gyula [Pados], our DP, came up with something on the spot and decided to maybe re-block a scene, we could go forward knowing that there was always a means to make sure that we had that performance in the can, so to speak, and not have to spend a huge amount of time tediously match-moving in post. Wes told me at the beginning of this thing, "Oh, yeah, we really like to run and gun. We're used to getting like 30-something setups a day." And I was like, "Whoa, put on the brakes, buddy. I want to make sure you get everything you need for the movie you're trying to make, but you need to be aware that there are requirements if we’re going to be able to pull this off in post."

I know that the plates were the bane of Wes's shooting life on this movie, but it was kind of a necessary evil for us. The environments we were capturing were all over the place in New South Wales and Australia, and they were incredibly complex spaces with the camera moving past rows of bushes and trees, ferns, and so on. We were just out in the woods basically. And to have to paint out an actor to replace them with an ape character in a lot of those settings was very daunting. And so we wanted to make sure that we were getting really solid clean plates everywhere we went, with the intent that those plates were what people were going to see in the movie, not the performance take. Sometimes we managed to pull that off, but often the camera blocking, the cinematic language that these guys were using in the film, was very much more of a handheld aesthetic.

And many shots were long takes, as well. I can't remember how many visual effects shots in this movie are over a thousand frames, but there are a number of really long shots in this film, which, on one hand is a really great testament to the performances being able to sustain a long shot like that, but it also means that now the camera operator has to try and memorize all those camera beats and try and reproduce that in a clean plate context. Really daunting.

So, we probably had fewer clean plates in the final film than we did on the previous two films because [Dawn and War director] Matt Reeves’s camera language was more controlled, more considered – like slow pushes and slow zooms and more static stuff. Whereas here, even when our camera was on a crane, it was essentially handheld because of the way they were driving it.

PS: And once we’re into post capture, there is still a big process that we go through with the animators to make sure that we have got all those little details of the performance nuanced to our character as well. Making sure the eyes are matching – that's first and foremost in any sort of performance capture, because that's where most of the emotion is captured, in the eyes. And then of course the lip sync, going in and fine-tuning the lips to make sure that looks fine on an ape as well.

DS: It seems like, regardless of the innovations that lead the filmmaking process, it's still incredibly complicated to animate these characters so they’re believable.

PS: There's a lot that goes into it. The capture gives us that base and timing and a lot of the nuances, but then there is the fine-tuning. We need to make sure that the intent of the performance we get from the actor is portrayed properly in the ape as well. The straight capture is never going to give us that exactly. It needs our interpretation to make it work.

DS: Are there any sequences you could cite as being particularly difficult?

EW: I would say the big thing that probably stands out in the minds of an audience is all the water that’s involved. I mean, we've got a sequence that takes place above and in a river towards the middle of the film, and then, in the third act, there's a huge flood that takes place. In both cases, but especially with what happens during the flood, we very quickly get out of a plate-based context because there really was only so much we could shoot.

We couldn't practically dump that much water even if we wanted to. But the other concern was that, if we had our actors in a tank, the interaction with the water wouldn't have made any sense in the context of an ape. They have different silhouettes, for example. The other thing is that the environment, the space, once we finally get into this silo buried in a mountain, doesn't exist. There was no environment we could have meaningfully shot for that, so that becomes an entirely digital sequence.

PS: I mean, obviously, we had a base layout that was created by the layout team. And then it was our job to go in and figure out the story beats, and manipulate things around to make sure that they worked within camera and could tell the story that Wes was after. Certainly, Wes had a rough idea of what he wanted with it, but we just did a generic capture of Owen and the other apes running away from the water and escaping, and then having that little standoff with Sylva. And it was our job to go through, reconstruct that, and figure out how things connected between the different beats, as well, and animating through that.

DS: With regard to the early previs on something like this, how much are you working with a director to try and figure out what the story beats are going to be and what the environment may be? How much does knowing what you're going to need to produce the visual effects and the animation impact what you're doing on the previs side?

PS: The big thing was having that communication between me and Erik, more than with Wes, so as to make sure that we had an environment that was going to work for water and that we were working within the constraints that would allow us to finish the film.

EW: With regard to previs in particular, I came into this project knowing that Wes comes from a 3D animation background. He's very comfortable working through that. And so I figured that he was going to be the kind of filmmaker who would want to previs the whole thing out. That was the way he thought about filmmaking. I think these days Unreal Engine is his tool of choice. But I was really surprised that he was very much into working with a storyboard artist to figure out the basics of a scene.

They were not very interested in doing a lot of previs on the film. This is, I think, the third project he's done with Gyula, his DP. And the two of them have quite a shorthand now. They have a very similar aesthetic, and they're on the same page with that stuff. I talked about the water at the end, but we should also talk about the whole climbing sequence at the beginning of the movie, because that was the other scene that we had a big hand in, in terms of previs. In prep on the movie, the production engaged Halon to come on board and take a look at some key things that we knew we needed to figure out, just so that we would be able to capture enough material with our cast while we had them in Sydney.

And so they had blocked out some of the bones of this thing. But I think by the time we really got through it, and had done that capture and really got into the sequence, it was apparent that we needed to make some adjustments. And so that's where animators came in and really started sculpting what became of that sequence. That then informed what we ultimately could do down the line with pickups for bits and pieces that we needed.

PS: It was quite a big, long sequence that Halon worked out, so we had to really condense and tighten it, and create a little more energy. There were certain parts that Wes wanted to hold onto, but it was about us connecting those dots and making sure that we had, with the new capture, things that would work around that.

EW: Especially with a scene like that, you have to do previs in a way that you’ll ultimately be able to execute at the end of the day. Oftentimes you find that previs is all about cheating. “We'll just crash this thing into that thing with this thing right here.” Sometimes it feels like they only care about making one cool shot, not necessarily thinking about how the whole thing has to string together. And when it comes down to doing that kind of thing in an incredibly high-resolution setup, we’re trying to see how we can rearrange some of the beats so that we can work within a building, instead of every shot being its own custom good mashup of parts.

So that was the other side of some of the things we had to slightly reconfigure and rework – what is the building, first of all? Let's come up with what the building needs to be that can support the story beats that we know we need to happen, and then let's come in and see if we can find some slightly different angles or slightly different blocking for the characters to still tell that same story, but in the context of a legitimate environment.

DS: There are so many sequels to so many franchises, some of which land better than others. What do you think is behind the continuing popularity of the Apes films?

EW: To me, it's just really solid storytelling that happens to be portrayed by amazing actors that are realized incredibly well as photoreal characters. You really get sucked into the world. When I was first approached about doing the project, my first reaction was, "I don't know, I'm not sure I'm that into doing a part four. I mean, Caesar just died. Where are we going with this?" But Wes's whole pitch was, "No, no, no, the script goes way far into the future, and we'll have a whole new set of characters, a whole new storyline." And it was the imagery that evokes – showing us a world that's beyond where we are now in terms of the timeline. It's this really fantastic world-building that the films have accomplished.