2020-07-01

Dev Blog 22 - Run, Run, Run, As Fast As You Can

Hello! I am writing this to you from Mars, because everything on Earth was making me mad, so I left. Technically, I uploaded this post 7 minutes ago, - it just took a while to get to you.

Thankfully, even though the latency is several minutes from Earth to Mars, I’m still able to play against others in one of my favourite games of all time, Super Smash Bros. Melee, thanks to Project Slippi somehow adding online play with rollback.

I wonder if some kind of rollback netcode would make online multiplayer with Rolled Out more consistent? Probably not. But maybe. No it won’t. Yes it will ❤

Let’s get into the update.

Seesaws are done!!!

ComplexPlane worked his magic again, and we’ve finally got a brand new mechanic in game, up and running.

As usual, our art bloc:

bg1

bg2

Updated character animations!

snowcone run

morris medium walk comparison

Look at Morris’s brand new swagger on the left.

Following that, we have a few blurbs from ComplexPlane and CraftedCart.

Seesaws: Slightly Less Obvious Than You Might Imagine, But Still Pretty Straightforward

Hey everyone! I’m back to working on Rolled Out again this summer after taking a half-year or so off for school-related stuff. I previously worked on rebuilding our collision physics system, and I’m now working on implementing new mechanics for the game! For this post, I’ll try to keep it short and sweet and talk a bit about seesaws, a game mechanic which I recently implemented.

Seesaws: you know them and love them. You’ve know what they’re like from IRL experience as a kid, and have probably seen them in other marble-rolling or platforming games as well. In Rolled Out, we create a seesaw by declaring that some otherwise-normal triangle mesh is a seesaw, and giving it an axis to rotate about. Seesaws are special from a physics perspective because the ball affects their motion and their motion affects the ball. When the ball touches a trimesh we’ve decided is a seesaw, we want the ball’s distance from the seesaw axis, as well as the ball’s velocity against the seesaw, to affect how fast the seesaw is rotating (its angular velocity). A rotating seesaw should also affect the ball: when it collides with the ball, the ball should react as if any other animated trimesh collided with it. Seesaws also affect their own animation: a seesaw’s springiness controls how much it tries to reset to a neutral angle, and its friction limits how fast it can rotate about its axis in general.

Simulating Seesaw Motion

When computing a seesaw’s motion, we don’t care about certain properties that a “real” physical seesaw would have, such as its moment of inertia. Ultimately, we just need to decide how much to add or subtract to the seesaw’s angular velocity based on whichever criteria we decide looks realistic enough. As eluded to earlier, in this simple model there are three things which will affect a seesaw’s angular velocity: springiness, friction, and ball collision. The first two are very simple to compute. For springiness, multiply how much the seesaw is rotated compared to neutral by a per-seesaw springiness value, and add this to the seesaw’s angular velocity. Friction follows in a similar vein: multiply the seesaw’s velocity by a per-seesaw friction constant, and add this to the seesaw’s angular velocity. Adjusting the seesaw’s angular velocity based on collision with the ball takes a bit more math.

If we were simulating a real seesaw, we might think of the ball’s effect on the seesaw in terms of torque, which is related to how much force the ball applies to the seesaw. However, in this physics engine we’re not really concerned with modeling forces per-se, but rather just changes in velocity, and that goes for modeling the ball as well. So instead, we use the ball’s velocity relative to the seesaw as a gauge for how much “force” it is applying to the seesaw. You might be wondering though: if the ball is sitting motionless on a seesaw, how could a velocity of zero affect the seesaw? Great question. In the real world, a ball sitting on a seesaw would push the seesaw down due to the force of gravity; in the game, the ball will actually has a small velocity pointing downward before it touches the seesaw. The ball has this initial downward velocity as a direct result of simulating the force of gravity in the game.

So, the amount we add to the seesaw’s angular velocity is related to the ball’s velocity when it touches the seesaw, cool. What else do we need to do? First, we need to take the dot product of the ball’s velocity relative to the seesaw with the collision normal, scaled by the collision normal. The gist of why we need to do this is that when the ball collides with a triangle, only the velocity going towards it should affect the seesaw, whereas the “sliding” velocity should not. The normal varies greatly depending on whether the ball collided with a triangle face, edge, or vertex. Next, we take the cross product of this vector with the vector pointing from the seesaw’s axis to the point which the ball collided with the seesaw. In layman’s terms, this does three things for us: it makes the seesaw rotate faster if the ball hit further away from the seesaw axis, it makes the seesaw rotate faster if the ball hit the seesaw more head-on, and it tells us whether to affect the seesaw’s rotation in the clockwise or counterclockwise direction. Finally, taking the dot product of this vector with a vector representing the seesaw’s rotation axis gives us a single number to add to the seesaw’s angular velocity, which may be positive or negative depending on rotation direction. In practice, we scale this final number by a sensitivity constant, which lets us make bigger seesaws feel like they have more mass and thus accelerate slower.

Maybe you don’t want to see an equation, but how about a picture?

Seesaw Affected by Ball Collision

Ball Collision with Seesaw

When the ball hits a seesaw, it should get a collision knockback as well, and this varies depending on where it hit the seesaw and how fast the seesaw is rotating. For the most part we can treat the seesaw like any other animated stage mesh in this regard, except for a small caveat: we want the seesaw’s speed to be affected by a ball collision before applying a collision response to the ball. If the ball hits a super-sensitive seesaw at high speed, we don’t want the ball to bounce off the seesaw like it’s a brick wall, we want the ball to first affect the seesaw’s velocity, and then bounce back based on the seesaw’s new velocity.

What about updating the seesaw’s angle of rotation? This is done based on the seesaw’s current angular velocity before any ball collisions occur. If we updated the seesaw’s angle after ball collisions occurred, the seesaw might sometimes look like it’s clipping into the ball.

Farewell

This went a little longer than I wanted, but still not too bad hopefully. That’s all for now - take it away, CraftedCart!

Daft optimizations

So I’ve got a funny story this time round. :) If you don’t care about all the gory details, there’s a tl;dr near the
bottom of this section.

Over the past few days, we’ve been doing a round of optimizations - seeing where the game is being bottlenecked and trying to get it to run better on lower end machines. So… my first thought was to shove the game through Callgrind - we must be trying to do too much work on the CPU each frame so I’ll use Callgrind to track down which places are consuming the most CPU time and see if I can make ‘em run a bit faster.

Doing less work on the game thread…

Callgrind is slow… really slow (just running it tanks the game’s framerate from about 60 FPS on my test level down to 0.6 FPS), but that’s no issue to me - I just need Callgrind to tell me, relatively speaking what takes the most CPU time in a frame.

Call graph

If you’ve never seen one of these graphs before, what this shows is the paths that code execution takes (starting from the top and drilling down). Each rectangle represents a function, and a bigger percentage numbers in those rectangles means a function took a longer amount of time to complete its work.

We figured most work is done inside FPlayerBall::Tick, given the game tended to slow down a lot when spawning in a ball, so I jumped there to see what exactly consumes the most time inside that function, aaand unsurprisingly most of it is within physics code. While I’m ok with shuffling physics code around a bit, actually trying to optimize it I was not comfortable with. Regardless, I poked around a bit, FORCEINLINE-ing some small but very frequently use functions, trying to avoid many smaller memory allocations, caching physics meshes and animated objects so that they wouldn’t have to be queried every frame, and what not. Stage tilting was also offloaded onto the GPU to save the render thread from having to re-figure out the transforms of everything on the stage each time you tilt the stage.

So… did any of that make a difference? Well… yes, but actually no.

At this point, we’re gonna have to take a small detour into how Unreal Engine does its updating every frame. In my mind, I had been assuming that the engine would call into game code, giving us a chance to update animations, do physics work, etc. before the engine started to render a frame. What actually happens is both of these things - updating the game and rendering a frame - happen simultaneously.

Anyways, armed with this new knowledge, I started paying attention to not just the FPS/frametimes, but the amount of time it takes to update the game vs the amount of time it takes for the rendering thread to send stuff to the GPU vs the amount of time it takes the GPU to render a frame. Here’s a before and after:

Before
After

So these small optimizations meant game updating took slightly less long by about a millisecond and a half, but it’s mostly meaningless (for me, and for our weak test system) given the GPU is the bottleneck here. If you happen to have a particularly weak CPU though, then maybe this might help just a little.

Doing less work on the GPU…

Naturally, knowing that the GPU was the bottleneck, we started focusing our attention on that. Ok.. let’s take a looksie at shader complexity then.

Shader complexity visualization

Hmm.. looking pretty good here on the whole. The ball could perhaps do with a bit of work but really this is looking
pretty good. At this point, Brandon went ahead did a bit of work on the ball shader, and while he was at he he also took
the chance to reduce the poly count of the ball and Morris. The ball clocked in at 8k triangles, with Morris being a
further 18k! Let’s see what Brandon did, shall we.

After simplifying shaders/reducing poly counts

A…ah.

Ooooook then, how about we look at our draw calls…

Draw calls

Excuse me??? How the heck are we averaging around 1500 mesh draw calls, that’s insane! For those unaware what a draw call is, each draw call is a bit of work being handled by a driver before being sent off to the GPU - the fewer of these you have, the less information the CPU will have to spew off to the GPU.

At this point, I figured I’d try changing the plugin we use to generate meshes on the stage. Unreal ships with a ProceduralMeshComponent which allows us to do this, though I’ve been aware of the RuntimeMeshComponent plugin which claims to do near-enough the same except perform better. I figured maybe, just maybe it could reduce draw calls, or at the very least help in some other places.

So.. I slotted the plugin into our game (after patching it a bit because of course nothing wants work right out of the box), replaced all uses of the ProceduralMeshComponent with the RuntimeMeshComponent, aaaand…

After switching to RuntimeMeshComponent

Ok, a slight improvement on the Draw thread, but overall no dice. Draw calls didn’t change either.

At this point I wasn’t really quite sure what to do any more. I was just flipping through the Unreal Engine documentation, seeing if there were any other tools built-in to the engine that could give me some insight. Turns out, Unreal has a GPU profiler - sounds useful.

The daft thing

So.. I whipped out the GPU profiler in the middle of gameplay to see if it’d tell me anything new.

The GPU profiler - the larger a block is, the more time it spent on the GPU

…aaaaaand it was at this point I started laughing to myself! So so so sooo much time on the GPU had been spent in that SceneCapture_BP_PlayerBallPawn_C_0 object, it was a wonder that I was still managing to hit 60FPS in normal gameplay, I thought to myself.

A bit of context: a long while ago, Brandon was experimenting with making the ball appear reflective by taking a 360deg capture of the world every frame at the ball, and projecting the captured texture onto the ball. For obvious reasons, rendering the world twice (once for the reflection, and again for display on your monitor) absolutely tanked the performance, so that feature was left disabled. As part of code cleanups, I had been unwiring various scripts that Brandon had made and replacing them with C++ implementations. The way the world capturing worked is that the ball would spawn in with the scene capture component, and Brandon would remove the component on spawn to disable it.

As I had been unwiring everything, the scene capture component then stopped being deleted on ball spawn and such started capturing the world around it on every frame again, except that it never showed up on the ball since I also unwired the logic Brandon had to show ball reflections!

So… after not rendering the entire scene twice per frame, what are our final results?

The final statistics

Hoooh… theeeere we go!

tl;dr

The tl;dr of this is that in the past, we had been experimenting with ball reflections which required rendering the entire scene twice per frame! We then disabled this by default as, for obvious reasons, it tanked performance. As part of some code cleanup, I had accidentally re-enabled rendering the scene twice, but without showing ball reflections, so I never noticed it. After a bit of investigating, we realized that aaand put an end to it.

We’re making outstanding progress lately. Hopefully, we can keep the trend going!

Thanks for reading, and see you on the 15th.