As a programmer, you should keep in mind the performance implications of your code.

This is not just about optimizing code, but also about consciously choosing code archictectures that favour performance.

Important: it doesn’t matter if things work “fine” in your computer. Your code could end up running in wildly different computers, where the performance patterns may not match those of your computer. Even if you develop using a slow computer, its slowliness is not necessarily a good reference that you can use: a different combination of mods, a different combination of hardware, of drivers, OS updates, etc can all lead to a worse performance than what your low-spec computer exhibits.

Basic optimization tips

This document assumes you already have basic knowledge of optimization. Some examples:

Knowing how optimization is done in general: measure, then tweak, then measure again to evaluate the validity of your tweaks.
Never assuming you know how your code will behave. Always measure. Measure as many times as needed to ensure validity of the numbers. Dedicate explicit effort to determine if your measurements are reliable or if, on the contrary, they are incorrectly biased by external factors (hot vs cold caches, external programs, thermal throttling, etc).
Knowing basic optimization techniques: what big O notation means and how its used. What are cache systems and knowing their possible tradeofs (such as memory vs cpu use). Taking advantage of data locality to use the available RAM bandwidth more efficiently. Knowing when the bottleneck is I/O (such as storage or networks) and how to best work around those limitations. Etc.

If you don’t already have basic optimization knowledge, it’s advisable that you first dedicate some time to learning about it. The rest of this document is a collection of some of optimization techniques that are specific to BeamNG software, or which are harder to find information about elsewhere.

IMPORTANT: Do not blindly assume you know which part of code is the main bottleneck, always measure to identify where your optimization efforts can be the most efficient.

For example, there’s no point shaving 0.25 milliseconds in a function, if you are not planning to optimize the next function which is bleeding 3 milliseconds per frame with an avoidable O(N^2) loop.

LuaJIT optimization: BeamNG tips

As part of the BeamNG Lua ecosystem, we use a few tools to do measurements. As a mod programmer, these will be helpful for you too:

timeprobe() function: measures the time between 2 consecutive runs.
gcprobe() function: measures the increase in GC workload between 2 consecutive runs (see Avoid garbage collection section below).
lua/common/luaProfiler.lua class: allows you to split your code into multiple sections, including sections inside repeated executions (such as loops), and measure both GC load and time on each.
lua/common/luaProfiler.lua class: also allows to detect performance spikes (stutter) and show the GC load and time measurements that led to it.
getAllVehicles(), vehiclesIterator(), activeVehiclesIterator() functions in GELUA side : retrieves vehicle objects with zero GC overhead. Prefer this over be:getPlayerVehicle() and similar calls, which will reduce the framerate.

LuaJIT optimization: Generic tips

In addition to those BeamNG-specific tips, there’s also generic Lua and LuaJIT optimizations you should try to follow.

LuaJIT: Loops

As a general rule, for i,n loops will be faster than ipairs() loops, which in turn are faster than pairs() loops.

Always pick the faster loop type if you have no good reason to pick a slower variant. If you can trivially (re)design your code to work with arrays, instead of arbitrary key-value tables, then that will allow the use of ipairs(), which will be faster than pairs(), all else being equal.

As usual, if you are unsure or don’t have a lot of practice doing optimizations, you probably want to verify by measuring the improvements instead of assuming your changes are okay.

LuaJIT: Local symbols

The location where a function or variable is defined has a performance impact.

Accessing a variable that’s nested deep inside some table structure will be slow, for example:

-- very slow access, AVOID THIS:
foo(myTable[4]["foobar"][myIndex][50])
bar(myTable[4]["foobar"][myIndex][50])
baz(myTable[4]["foobar"][myIndex][50])

-- faster access with a cache, DO THIS:
local myVar = myTable[4]["foobar"][myIndex][50]
foo(myVar)
bar(myVar)
baz(myVar)

In the same way, the scope of a variable can have the same performance effects. This makes sense once you know that a file-local symbol is stored in a file-specific Lua table. And that a global symbol is contained in a global Lua table of variables.

Whenever you use some symbol, the LuaJIT interpreter will begin by checking that local table, then the parent table, until reaching the global table of symbols. Each of those table accesses cost performance. So from the point of view of pure performance, local variables are preferable to global variables.

For example: this is why the BeamNG LUA Extensions system will make your extension available as myMod_myExtension, rather than as myMod.myExtension (saving one or more table accesses). It’s also why you’ll find local max = math.max in several official BeamNG files, as it saves one table access. Etc.

The gains may not be huge, and any impact in code maintainability/readability is always something to consider. Sometimes it’s better to have readable code than a slightly faster code. In other cases, such as very commonly used libraries or functions, and which rarely are modified, performance will probably take a front seat, sacrificing code maintainability in the name of framerate.

LuaJIT: References

Here’s an assorted list of links with information about how the LuaJIT interpreter works, as well as numerous optimization tips:

Load-time vs run-time performance

When following various optimization techniques, you may find yourself having to choose between making the mod faster to start up, versus making the framerate higher once the mod has loaded.

You should use common sense when choosing this balance. Normally the choice is to move complexity to the loading times, if you can then get better performance afterwards.

For example, a high GC load (see Avoid garbage collection section below) during startup is acceptable if you can later manage to have zero GC load while the simulator is running. On the other hand, if certain optimization makes the loading time 5min longer while gaining only a 0.5% of framerate in exchange, that migth not be a worthwhile tradeof.

Update rate

When writing your code, you’ll need to think about how often your code will run. Should it run once per graphics frame? Maybe once per physics tick? Once per User Interface refresh? Maybe a fixed rate of 15Hz? Etc.

Note: To learn more about the fundamental update rates available, please check the Virtual Machine’s Update Rate section.

While picking a high update rate for your code is easy from a developer perspective, this will not only have a negative impact on framerates, but it will also lead to greater chances of stutter and of unstable framerates.

The rule of thumb here is to pick the lowest possible rate that you can get away with, while still making sense for your particular application.

For most “gameplay” purposes (such as keeping track of a score, or other similar high level concepts), following the User Interface update rate is probably enough. Or alternatively, following the graphics update rate.

Only in extremely rare cases will you need to resort to a physics update rate. Any code that is run at physics update rate will need to be written extremely carefully to avoid a heavy impact in framerate for people running a computer with the minimum hardware specs. You’ll need to apply all the knowledge included in this document, and more. You’ll also want to include only the absolute minimum code in the physics update, moving everything that’s non-essential to the graphics update.

You may notice that our official code only uses physics rates as our very last resort, when nothing else can posibly work from the point of view of mathematics.

You will also notice that, unlike what many game development guides advice, at BeamNG we avoid fixed rate calculations as much as possible. We understand that fixed rate updates can make your life easier as a programmer: it’s easier to write stable math for a stable rate, than to write stable math for a constantly variable rate. However, the downside is that a fixed rate workload will not scale up nor down according to the available computer resources. A fixed rate means that you’ll need to settle with a suboptimal compromise, where low-end hardware will suffer an unnecessarily high computing cost, while the high-end hardware will be unnecessarily missing the extra detail that it could be calculating. With that in mind, running on a variable rate (such as graphics framerate or user-interface rate) means that you can provide higher fidelity in high-end computers, while also being friendly to low-end computers.

Note: Writing math that can work under extreme rate variations is hard: for this reason, the reported graphics “update rate” is guaranteed to be a minimum of 20 Hz. When the computer is unable to reach 20 FPS, then the simulation will slow down as needed. This is a guarantee that helps you to program your math with a safe baseline rate.

A useful tool you should use when writing code that works on a variable update rate, is the Options > Display > Limit framerate slider. You can set it to 20 FPS to test your math under the conditions of a worst-case-scenario (20 Hz updates, if your code is hooked to graphics updates), and you can disable this limiter together with Options > Graphics > Lowest to try to reach as high a framerate as possible. A good place to achieve a high framerate is the Grid, Small, Pure level while using no traffic vehicles.

Avoid garbage collection

If you are a programmer of a language featuring garbage collection (such as Lua or Javascript), and you are not familiar with what a garbage collector (GC) is, then please search the internet for information and learn the basics about them before continuing.

GC is a convenient feature that some high-level languages provide, but they can have a large impact when used in a performance-intensive environment, such as real time simulators. The GC will hide new/delete from you, but in exchange it will take a toll in two ways:

Lower framerate: the GC has to run to do its work, and this garbage bookkeeping workload is going to rob some framerate.
Variable framerate: the GC workload may not be evenly spread over time, but might be bunched up cyclically. This can lead to both stutter (negative spikes in framerate), as well as rubberbanding (the framerate being high for a second, low for another second, then high again, etc) which will lead to an undesirable effect of slowmotion/fastmotion.

To reduce the GC load, first you need to be able to measure it. In the case of Lua, we offer two features for this:

gcprobe() function: run it before/after a piece of code, and it will tell how many bytes of garbage that code has generated.
ctrl-shift-f > Tools menu > Log gelua profile: this will log how many bytes of garbage each GELUA extension has generated (use ~ to see the logs) during the last graphics frame.

Once you know how much GC load your code is generating, you need to find ways to reduce it. As a general rule, this means avoiding the creation (the allocation) of new objects.

Try to re-use objects across multiple consecutive calls to your extension hooks. For example, you might want to have a parent-scope Lua variable that gets reused, rather than generating a new object from scratch on each function call.
Use APIs that reduce the GC load. For example, you can reassign myVector = vec3(5,4,2) with the zero-garbage alternative myVector:set(5,4,2). Same goes for favouring setAdd and similar APIs that we offer with this exact purpose of GC reduction.
Use APIs that fully eliminate the GC load. For example, favour using X,Y,Z tuples (such as our functions that end in ....XYZ()) instead of vec3.
Etc.

There’s no fixed rule about this, and many optimizations are a tradeof between performance versus code maintainability/readability. Sometimes you may want to sacrifice short-term performance in favour of making the code easier to work with (which might in turn enable higher-level optimizations in the longer-term, thanks to the code being more understandable).

Avoid trigonometry

Working in terms of angles typically leads to using sin(), cos(), tan() and all variants of such functions. These functions can be really slow, and should be avoided when possible.

Instead, consider the use of dot product, cross product and other basic vector operations. These simpler math tools can often simplify your code, completely eliminating the need for explicit use of “angles”.

It’s relatively common for programmers to be very familiar with angles, but unfamiliar with dot/cross products. So the appeal of traditional trigonommetry is understandable, but it doesn’t mean it’s the best approach from a performance coding perspective.

Trigonometry functions are typically used to transform geometric concepts into angles, so the programer can then operate in angles; only to eventually transforming it all back to vectors or quaternions. Therefore, if you learn to work with vectors directly, you can skip those unnecessary back-and-forth conversions. Which typically leads to simpler code, and faster too.

Avoid euler angles

Very often, euler angles are used as an intermediate format, before eventually being converted back into quaternions or matrices (for consumption by the core engine). It’s advisable to avoid euler angles completely, and use quaternions or matrices.

Doing so means you can skip those back-and-forth conversions of rotations into (and then out of) euler format. This simplifies your code, and as a bonus makes it faster too.

In addition to the performance cost of such temporary conversions, they can also lead to bugs (such as losing numerical precision due to unnecessary operations), and lead to less maintainable/readable code (for example, there’s many variants of euler angles, and you might not be sure which exact euler format is accepted by each function).

The only exception where Euler angles might be acceptable, is for display to end-users, for example in a level editor UI, or similar content-creation tools:

If you need to show angles to an artist/modder in the UI, always operate with quaternions, and convert to euler only at the very end of your data pipeline, at the exact moment you need to render values on the screen.
If you can, show the euler values as a read-only values, not an editable text field. Consider offering an interactive 3D gizmo to apply rotations with the mouse/keyboard (and which will not be using euler internally, but quaternions), rather than offering a text field with 3 numbers that the user can type into.
If you absolutely, truly need to show a read-write text field with Euler angles on your editing tool UI, review your code under the scenario of multiple consecutive saves: do the Euler angles slowly drift away from the original value without the user editing the value? If so, you need to review your code pipeline to find out the source of numerical drift, and find a solution for it.

Avoid communication between virtual machines

Sending data between virtual machines can negatively affect performance, particularly in the form of latency, lower framerate, and unnecessarily varying framerate. The combination of which is typically called “lag” by end-users nowadays (even if it’s not limited to latency).

Some advice to preserve performance as much as possible:

Avoid communication between VMs altogether if you can.
Use VLUA mailboxes if that fits your requirements.
Reduce the frequency of communications to the minimum. Sending data each gfx frame is really bad, consider sending pre-computed data once per-event, or once-per-minute, etc if that’s possible.

Example of VM communication optimization

Let’s assume you are writing a mod that rates how good your burnout is. This will analyze the wheelspinning patterns, and show an amount of points in the UI.

In your initial implementation, you are sending the wheelspin information from VLUA to GELUA each graphics frame using obj:queueGameEngineLua(). Once in GELUA, all the data is analyzed, and you generate a numeric rating value. You then send this rating to the UI on each frame using guihook.trigger().

While this approach works, it’s pretty bad in terms of performance, and there’s plenty of margin for improvement:

First of all, don’t compute the burnout rating in GELUA. Instead calculate it in VLUA, where the data already exists. This avoids sending all that physics information from VLUA to GELUA.

Then, consider if you want to show a burnout only at the end-screen:

Only at end-screen? Then you only need to communicate between virtual machines a single time, at the end of the burnout.
As a real-time indicator in the UI?
- Then you may want to only update this indicator (to communicate between VMs) if it has changed since the last time.
- And depending on the visual indicator, you will want to optimize further:
  - Is it progress bar? Then on top of that, you may want to send it only on UI update frames onGuiUpdate() rather than on graphics frames onUpdate().
  - Is it a text label showing a numeric value? Then on top of that, you may want to only update this value once per second (so the user can actually read the text before it changes again)

Programming documentation feedback

If you feel this programming documentation is too high level, too low level, is missing important topics, is erroneous anywhere, etc, please write a post at this thread and ping me personally by typing @stenyak so I will get notified.

Improving framerate