[0.17.37] Platform differences in integral mathematics
[0.17.37] Platform differences in integral mathematics
The following console command gives different results on Linux and Windows:
/c game.player.print(bit32.band(1664525*2031137496+1013904223, 0xffffffff))
The example might seem a little strange -- and the multiplication should really be 32-bit with overflow -- but it does isolate the problem.
Related issue where numerical calculations caused a desync: viewtopic.php?t=62674
Whether a fix is possible is not clear to me, but I would hope for at least a workaround. I think it likely that there are floating point differences in Lua 5.2 on Linux and Windows that cannot be reconciled. But unlike Lua 5.3, there is no integral subtype, so this leaves a big gap in basic mathematical operations. I'd be more than happy if the fix was to add in some integer maths support into Factorio's mod API. Even just 32-bit multiply, add, subtract, divide, and modulo would be huge -- they would supplement the existing bit32 functions to give a fairly comprehensive set of integral operations.
I wanted to approach this as a bug rather than a mod interface request however, since my players have all been in agreement that platform differences on the console like this are a bug.
The impact for me has been huge. The source of desyncs and inconsistencies in mathematical operations is a major problem for mods with a substantial procedural generation element; without reliable integer mathematics we can't even write pure lua functions to work around these problems when we identify them. It's a shame. In spite of going to great lengths to avoid all the usual desync pitfalls and provide reliable procedural generation, content creators who were going to run a community map have been unable to do so, and have moved on.
(Edited, I put bit32.rshift instead of bit32.band -- both show there is a platform difference, but bit32.band makes more sense as an example).
/c game.player.print(bit32.band(1664525*2031137496+1013904223, 0xffffffff))
The example might seem a little strange -- and the multiplication should really be 32-bit with overflow -- but it does isolate the problem.
Related issue where numerical calculations caused a desync: viewtopic.php?t=62674
Whether a fix is possible is not clear to me, but I would hope for at least a workaround. I think it likely that there are floating point differences in Lua 5.2 on Linux and Windows that cannot be reconciled. But unlike Lua 5.3, there is no integral subtype, so this leaves a big gap in basic mathematical operations. I'd be more than happy if the fix was to add in some integer maths support into Factorio's mod API. Even just 32-bit multiply, add, subtract, divide, and modulo would be huge -- they would supplement the existing bit32 functions to give a fairly comprehensive set of integral operations.
I wanted to approach this as a bug rather than a mod interface request however, since my players have all been in agreement that platform differences on the console like this are a bug.
The impact for me has been huge. The source of desyncs and inconsistencies in mathematical operations is a major problem for mods with a substantial procedural generation element; without reliable integer mathematics we can't even write pure lua functions to work around these problems when we identify them. It's a shame. In spite of going to great lengths to avoid all the usual desync pitfalls and provide reliable procedural generation, content creators who were going to run a community map have been unable to do so, and have moved on.
(Edited, I put bit32.rshift instead of bit32.band -- both show there is a platform difference, but bit32.band makes more sense as an example).
Last edited by H8UL on Wed May 08, 2019 6:25 pm, edited 1 time in total.
Shameless mod plugging: Ribbon Maze
Re: [0.17.37] Platform differences in integral mathematics
Would this work for you:
It's in the code from very long time and never caused any desyncs as far as I can tell. It's used to normalize seeds for lua random generator.
My guess is that main drawback of it is that it might be slower then your method.
Code: Select all
local function normalize(n) -- keep numbers at 32 bits
return floor(n) % 0xffffffff
end
My guess is that main drawback of it is that it might be slower then your method.
Re: [0.17.37] Platform differences in integral mathematics
Just out of curiocity: what results do you get in each case?
Re: [0.17.37] Platform differences in integral mathematics
There's a lot of that in my code already, and that ain't the half of it. Take a look at https://github.com/h8ul-modder/factorio ... b/cmwc.lua and c.f. my new post here about why I even need to do this viewtopic.php?f=34&t=70588orzelek wrote: ↑Wed May 08, 2019 5:23 pm Would this work for you:It's in the code from very long time and never caused any desyncs as far as I can tell. It's used to normalize seeds for lua random generator.Code: Select all
local function normalize(n) -- keep numbers at 32 bits return floor(n) % 0xffffffff end
My guess is that main drawback of it is that it might be slower then your method.
Shameless mod plugging: Ribbon Maze
Re: [0.17.37] Platform differences in integral mathematics
I've read that but I admit I haven't noticed any problems caused by that rng behaviour.H8UL wrote: ↑Wed May 08, 2019 6:03 pmThere's a lot of that in my code already, and that ain't the half of it. Take a look at https://github.com/h8ul-modder/factorio ... b/cmwc.lua and c.f. my new post here about why I even need to do this viewtopic.php?f=34&t=70588orzelek wrote: ↑Wed May 08, 2019 5:23 pm Would this work for you:It's in the code from very long time and never caused any desyncs as far as I can tell. It's used to normalize seeds for lua random generator.Code: Select all
local function normalize(n) -- keep numbers at 32 bits return floor(n) % 0xffffffff end
My guess is that main drawback of it is that it might be slower then your method.
If you multiply your seed through high enough values you will rarely end up with seeds in problematic range.
Unless it's that way in any range?
RSO creates seeds from x/y coordinates pretty frequently and I haven't noticed any problems with layout duplication on map.
Re: [0.17.37] Platform differences in integral mathematics
Perhaps, but an even better way to mutate the seed into something that isn't directly related to neighbouring seeds is to put it through an LCG, and that's what has this problem. If it helps to avoid the debate of using LuaRandomGenerator: I originally wrote my RNG to be used at the data stage. I now use it in Ribbon Maze but intend to use it in the data stage in future. In such a situation I cannot rely on the provided RNG systems anyway. You'll see if you look at Serendipity that they've written out a pure Lua RNG implementation. Whether that runs into OS differences in basic maths, I am unsure.orzelek wrote: ↑Wed May 08, 2019 6:11 pmI've read that but I admit I haven't noticed any problems caused by that rng behaviour.H8UL wrote: ↑Wed May 08, 2019 6:03 pmThere's a lot of that in my code already, and that ain't the half of it. Take a look at https://github.com/h8ul-modder/factorio ... b/cmwc.lua and c.f. my new post here about why I even need to do this viewtopic.php?f=34&t=70588orzelek wrote: ↑Wed May 08, 2019 5:23 pm Would this work for you:It's in the code from very long time and never caused any desyncs as far as I can tell. It's used to normalize seeds for lua random generator.Code: Select all
local function normalize(n) -- keep numbers at 32 bits return floor(n) % 0xffffffff end
My guess is that main drawback of it is that it might be slower then your method.
If you multiply your seed through high enough values you will rarely end up with seeds in problematic range.
Unless it's that way in any range?
RSO creates seeds from x/y coordinates pretty frequently and I haven't noticed any problems with layout duplication on map.
But even if LuaRandomGenerator was amazing, I should be able to do basic maths without fear that if numbers are "too big" in some unspecified way, then it can cause a desync.
Shameless mod plugging: Ribbon Maze
Re: [0.17.37] Platform differences in integral mathematics
When executing:
Code: Select all
/c game.player.print(bit32.band(1664525*2031137496+1013904223, 0xffffffff))
Linux: 2158106711
And when executed in C (you can try it in an online repl https://repl.it/languages/c):
Code: Select all
int main(void) {
long a = 1664525L*2031137496+1013904223;
long b = a & 0xFFFFFFFFL;
int c = (int)b;
printf("%lu\n", b);
printf("%u\n", c);
return 0;
}
Code: Select all
2158106711
2158106711
Shameless mod plugging: Ribbon Maze
Re: [0.17.37] Platform differences in integral mathematics
"long" is 64 bit on mac and linux. "long" is 32 bit on windows.
"long long" is 64 bit on mac and linux. "long long" is 64 bit on windows.
Most likely it's that garbage again.
"long long" is 64 bit on mac and linux. "long long" is 64 bit on windows.
Most likely it's that garbage again.
If you want to get ahold of me I'm almost always on Discord.
- TruePikachu
- Filter Inserter
- Posts: 978
- Joined: Sat Apr 09, 2016 8:39 pm
- Contact:
Re: [0.17.37] Platform differences in integral mathematics
The described math above doesn't depend on any bits past #31 -- the result is masked to just the low 32 bits, and neither addition nor multiplication have rightwards carry.
I'm currently investigating the internal math behind this, I'll have an answer in a few hours.
-----
EDIT:
Going by a copy of vanilla Lua 5.2.1 sources I have lying around, there's three methods that Lua will use to convert a Lua-side number (`lua_Number` -- `double` by default) to a 32-bit unsigned integer (`lua_Unsigned` -- `unsigned int` by default, intended to be something like `uint32_t`). It will either use MASM `fld`/`fistp` (MS-only), type-pun the `lua_Number` to `lua_Unsigned` via a `union` (I don't really want to do the long math right now to check if that implementation is correct), run a modulo via well-defined floating point operations, or just cast. The actual method lies in the macro `lua_number2unsigned` in `llimits.h`.
It sounds like (I haven't checked the Factorio binary to verify) something like the type-punning method is being used on Windows, and there's an off-by-one error in there somewhere -- I noticed that the reported Windows output is half the Linux output, rounded up (or to nearest even, I'm not sure).
----
EDIT 2:
Just wrote up a quick test of that type-pun.
Code: Select all
#include <cstdint>
#include <iostream>
union luai_Cast {
double l_d;
std::uint32_t l_p[2];
};
int main() {
constexpr double foo = 3380880154433623.0; // Number that gets passed into bit32.band
volatile union luai_Cast u;
u.l_d = foo + 6755399441055744.0; // 10136279595489368.0
std::cout << "Casting: " << static_cast<uint32_t>(foo) << std::endl;
std::cout << "Type punning: " << u.l_p[0] << std::endl;
return 0;
}
Code: Select all
Casting: 2158106711
Type punning: 1079053356
Re: [0.17.37] Platform differences in integral mathematics
According to the Lua 5.2 reference, "all functions accept numeric arguments in the range (-2⁵¹,+2⁵¹)." The number you supplied (1664525*2031137496+1013904223) is outside that range. I'd file this under "Undefined behavior causes undefined behavior".H8UL wrote: ↑Wed May 08, 2019 6:49 pmWhen executing:
Windows: 1079053356Code: Select all
/c game.player.print(bit32.band(1664525*2031137496+1013904223, 0xffffffff))
Linux: 2158106711
It absolutely depends on bits past 31. On Windows, the first line of main is equivalent toTruePikachu wrote: ↑Thu May 09, 2019 12:34 amThe described math above doesn't depend on any bits past #31
Code: Select all
int32_t a = 0xC02E4`80A21857L;
Even if that is defined, the printf is definitely undefined behavior, since %lu is a promise to pass a 64-bit integer, but a is a 32-bit integer.
Re: [0.17.37] Platform differences in integral mathematics
I was looking into that logic some time ago and thought specifically that it shouldn't be using the type punning logic because it would be a problem area. My IDE was telling me it wasn't being used but C being C and the Lua library being written in C it makes *heavy* use of macros which obfuscated the logic enough that I didn't see it was in fact using type punning on the windows build but standard casting on the other platforms.TruePikachu wrote: ↑Thu May 09, 2019 12:34 am I'm currently investigating the internal math behind this, I'll have an answer in a few hours.
-----
EDIT:
Going by a copy of vanilla Lua 5.2.1 sources I have lying around, there's three methods that Lua will use to convert a Lua-side number (`lua_Number` -- `double` by default) to a 32-bit unsigned integer (`lua_Unsigned` -- `unsigned int` by default, intended to be something like `uint32_t`). It will either use MASM `fld`/`fistp` (MS-only), type-pun the `lua_Number` to `lua_Unsigned` via a `union` (I don't really want to do the long math right now to check if that implementation is correct), run a modulo via well-defined floating point operations, or just cast. The actual method lies in the macro `lua_number2unsigned` in `llimits.h`.
It sounds like (I haven't checked the Factorio binary to verify) something like the type-punning method is being used on Windows, and there's an off-by-one error in there somewhere
I just deleted the "special" cast logic and forced every platform to use simple casts + made a test for it. So, this is now fixed for the next version of 0.17.
If you want to get ahold of me I'm almost always on Discord.
- TruePikachu
- Filter Inserter
- Posts: 978
- Joined: Sat Apr 09, 2016 8:39 pm
- Contact:
Re: [0.17.37] Platform differences in integral mathematics
The problem is, undefined behaviour doesn't mix well with determinism. The behaviour might be undefined, yes, but it should still be identical regardless of platform.
I was speaking in terms of variables being unsigned (where, in C++ at least, overflow and underflow are well-defined). Even if it's signed math, however, assuming that signed integers wrap around the same way, you get the same result (but it's a negative number since the high bit is set). In Lua, the math is done with double-precision floats, which have full integer precision within 53 bits.
printf(3) states that `l` is a long and `ll` is a long long. There are no issues with the length modifier in the printf (only with the fact that `u` is used with a signed argument).
Re: [0.17.37] Platform differences in integral mathematics
That's amazing! Thank you so much!Rseding91 wrote: ↑Thu May 09, 2019 3:27 amI was looking into that logic some time ago and thought specifically that it shouldn't be using the type punning logic because it would be a problem area. My IDE was telling me it wasn't being used but C being C and the Lua library being written in C it makes *heavy* use of macros which obfuscated the logic enough that I didn't see it was in fact using type punning on the windows build but standard casting on the other platforms.TruePikachu wrote: ↑Thu May 09, 2019 12:34 am I'm currently investigating the internal math behind this, I'll have an answer in a few hours.
-----
EDIT:
Going by a copy of vanilla Lua 5.2.1 sources I have...
I just deleted the "special" cast logic and forced every platform to use simple casts + made a test for it. So, this is now fixed for the next version of 0.17.
Edit: also thanks to everyone else for their input/investigation, amazing community as always!
Shameless mod plugging: Ribbon Maze