The game code is implemented in QuakeC (not the same language as C, which is used to implement the Darkplaces engine), and clients interpret a "bytecode" derived from it, so that players are able to play modified versions chosen by server owners, without having the install a separate Xonotic for each server.
However, servers only run a fixed QuakeC source chosen by its owner (because clients can't send code, only receive it).
So I was wondering how much the server performance is impacted by Server QuakeC, and if it was possible to recompile it to a native executable or at least to use a just-in-time compiler.
Initially I tried to do some analysis of which entity fields may be present in which entities, hoping to save memory. However because of many `find` operations, which may return a priori arbitrary entities, doing such analysis would be complex and demand many manual annotations, which would be exhausting to do to the large source code of Xonotic.
So I decided to simply try to change the `prvm_executeprogram.h` part (which interprets the bytecode) into something that transpiles to C. Well, not simply.
Because in the QcVM bytecode every variable is global, and stores some number (which may be, for instance, a float or a function index). It is not a register machine or a stack machine. So I tried the following:
* RETURN and PARMn globals are always converted to C locals (hopefully stored in registers);
* the "locals" of QuakeC are converted to C locals, except in a few functions, that instead store them in globals so that they "leak" to certain other functions (mostly secret functions created by GMQCC to implement variadic functions);
* if a global is never modified by the QcVM code, then it's inlined IF it is not an engine global nor an autocvar;
* similarly, dynamic function calls are replaced by static calls if possible (Only now I noticed only that I forgot to check if the function index is not stored in a engine global...; I'll fix it later. EDIT: fixed it in `maybeConstCalledFun`, generated code is identical);
* each non-builtin function is compiled to a C function "W f«n»(prvm_prog_t *prog, U i0, ..., U i«s-1»)", where s is a number between 0 and 24, and where W stores vector+error; a dynamic call needs a bitmap to know which QcVM arguments are floats and which are vectors, so that they are correctly written to the C arguments;
* most builtins call the engine, but some (like max, min) are inlined;
* field access is unmodified;
* some information like number of entities (used in bounds checking) may change after calling builtins, so it needs to be re-accessed regularly, but the C code marks the operation that access such information as "pure", so GCC will coalesce consecutive calls that aren't interrupted by builtins.
You may find the transpiler QcNat.hs here: https://gitlab.com/-/snippets/3613556 + some cludgy Darkplaces patch that does not respect the formatting guidelines. The QcNat.hs is very slow (specially the const global detection), and the resulting C code is very loong.
I tried it, the game sort of worked (maybe I forgot to mark some global as mutable), however the performance was the same. (EDIT: found it, variadic functions access the PARMn's without defining them, how i missed that? need to adjust the transpiler.)
What do you think? Maybe the QcVM interpreter is not a bottleneck? Or my fault?
Thank you, developers!
However, servers only run a fixed QuakeC source chosen by its owner (because clients can't send code, only receive it).
So I was wondering how much the server performance is impacted by Server QuakeC, and if it was possible to recompile it to a native executable or at least to use a just-in-time compiler.
Initially I tried to do some analysis of which entity fields may be present in which entities, hoping to save memory. However because of many `find` operations, which may return a priori arbitrary entities, doing such analysis would be complex and demand many manual annotations, which would be exhausting to do to the large source code of Xonotic.
So I decided to simply try to change the `prvm_executeprogram.h` part (which interprets the bytecode) into something that transpiles to C. Well, not simply.
Because in the QcVM bytecode every variable is global, and stores some number (which may be, for instance, a float or a function index). It is not a register machine or a stack machine. So I tried the following:
* RETURN and PARMn globals are always converted to C locals (hopefully stored in registers);
* the "locals" of QuakeC are converted to C locals, except in a few functions, that instead store them in globals so that they "leak" to certain other functions (mostly secret functions created by GMQCC to implement variadic functions);
* if a global is never modified by the QcVM code, then it's inlined IF it is not an engine global nor an autocvar;
* similarly, dynamic function calls are replaced by static calls if possible (Only now I noticed only that I forgot to check if the function index is not stored in a engine global...; I'll fix it later. EDIT: fixed it in `maybeConstCalledFun`, generated code is identical);
* each non-builtin function is compiled to a C function "W f«n»(prvm_prog_t *prog, U i0, ..., U i«s-1»)", where s is a number between 0 and 24, and where W stores vector+error; a dynamic call needs a bitmap to know which QcVM arguments are floats and which are vectors, so that they are correctly written to the C arguments;
* most builtins call the engine, but some (like max, min) are inlined;
* field access is unmodified;
* some information like number of entities (used in bounds checking) may change after calling builtins, so it needs to be re-accessed regularly, but the C code marks the operation that access such information as "pure", so GCC will coalesce consecutive calls that aren't interrupted by builtins.
You may find the transpiler QcNat.hs here: https://gitlab.com/-/snippets/3613556 + some cludgy Darkplaces patch that does not respect the formatting guidelines. The QcNat.hs is very slow (specially the const global detection), and the resulting C code is very loong.
I tried it, the game sort of worked (maybe I forgot to mark some global as mutable), however the performance was the same. (EDIT: found it, variadic functions access the PARMn's without defining them, how i missed that? need to adjust the transpiler.)
What do you think? Maybe the QcVM interpreter is not a bottleneck? Or my fault?
Thank you, developers!