[0.17.37] Platform differences in integral mathematics

H8UL · Post by **H8UL** » Wed May 08, 2019 2:07 pm

The following console command gives different results on Linux and Windows:

/c game.player.print(bit32.band(1664525*2031137496+1013904223, 0xffffffff))

The example might seem a little strange -- and the multiplication should really be 32-bit with overflow -- but it does isolate the problem.

Related issue where numerical calculations caused a desync: viewtopic.php?t=62674

Whether a fix is possible is not clear to me, but I would hope for at least a workaround. I think it likely that there are floating point differences in Lua 5.2 on Linux and Windows that cannot be reconciled. But unlike Lua 5.3, there is no integral subtype, so this leaves a big gap in basic mathematical operations. I'd be more than happy if the fix was to add in some integer maths support into Factorio's mod API. Even just 32-bit multiply, add, subtract, divide, and modulo would be huge -- they would supplement the existing bit32 functions to give a fairly comprehensive set of integral operations.

I wanted to approach this as a bug rather than a mod interface request however, since my players have all been in agreement that platform differences on the console like this are a bug.

The impact for me has been huge. The source of desyncs and inconsistencies in mathematical operations is a major problem for mods with a substantial procedural generation element; without reliable integer mathematics we can't even write pure lua functions to work around these problems when we identify them. It's a shame. In spite of going to great lengths to avoid all the usual desync pitfalls and provide reliable procedural generation, content creators who were going to run a community map have been unable to do so, and have moved on.

(Edited, I put bit32.rshift instead of bit32.band -- both show there is a platform difference, but bit32.band makes more sense as an example).

orzelek · Post by **orzelek** » Wed May 08, 2019 5:23 pm

Would this work for you:

Code: Select all

local function normalize(n) -- keep numbers at 32 bits
	return floor(n) % 0xffffffff
end

It's in the code from very long time and never caused any desyncs as far as I can tell. It's used to normalize seeds for lua random generator.
My guess is that main drawback of it is that it might be slower then your method.

Merssedes · Post by **Merssedes** » Wed May 08, 2019 5:41 pm

Just out of curiocity: what results do you get in each case?

H8UL · Post by **H8UL** » Wed May 08, 2019 6:03 pm

orzelek wrote: ↑Wed May 08, 2019 5:23 pm Would this work for you:
Code: Select all
local function normalize(n) -- keep numbers at 32 bits
	return floor(n) % 0xffffffff
end
It's in the code from very long time and never caused any desyncs as far as I can tell. It's used to normalize seeds for lua random generator.
My guess is that main drawback of it is that it might be slower then your method.

There's a lot of that in my code already, and that ain't the half of it. Take a look at https://github.com/h8ul-modder/factorio ... b/cmwc.lua and c.f. my new post here about why I even need to do this viewtopic.php?f=34&t=70588

orzelek · Post by **orzelek** » Wed May 08, 2019 6:11 pm

H8UL wrote: ↑Wed May 08, 2019 6:03 pm
orzelek wrote: ↑Wed May 08, 2019 5:23 pm Would this work for you:
Code: Select all
local function normalize(n) -- keep numbers at 32 bits
	return floor(n) % 0xffffffff
end
It's in the code from very long time and never caused any desyncs as far as I can tell. It's used to normalize seeds for lua random generator.
My guess is that main drawback of it is that it might be slower then your method.
There's a lot of that in my code already, and that ain't the half of it. Take a look at https://github.com/h8ul-modder/factorio ... b/cmwc.lua and c.f. my new post here about why I even need to do this viewtopic.php?f=34&t=70588

I've read that but I admit I haven't noticed any problems caused by that rng behaviour.
If you multiply your seed through high enough values you will rarely end up with seeds in problematic range.

Unless it's that way in any range?

RSO creates seeds from x/y coordinates pretty frequently and I haven't noticed any problems with layout duplication on map.

H8UL · Post by **H8UL** » Wed May 08, 2019 6:37 pm

orzelek wrote: ↑Wed May 08, 2019 6:11 pm
H8UL wrote: ↑Wed May 08, 2019 6:03 pm
orzelek wrote: ↑Wed May 08, 2019 5:23 pm Would this work for you:
Code: Select all
local function normalize(n) -- keep numbers at 32 bits
	return floor(n) % 0xffffffff
end
It's in the code from very long time and never caused any desyncs as far as I can tell. It's used to normalize seeds for lua random generator.
My guess is that main drawback of it is that it might be slower then your method.
There's a lot of that in my code already, and that ain't the half of it. Take a look at https://github.com/h8ul-modder/factorio ... b/cmwc.lua and c.f. my new post here about why I even need to do this viewtopic.php?f=34&t=70588
I've read that but I admit I haven't noticed any problems caused by that rng behaviour.
If you multiply your seed through high enough values you will rarely end up with seeds in problematic range.

Unless it's that way in any range?

RSO creates seeds from x/y coordinates pretty frequently and I haven't noticed any problems with layout duplication on map.

Perhaps, but an even better way to mutate the seed into something that isn't directly related to neighbouring seeds is to put it through an LCG, and that's what has this problem. If it helps to avoid the debate of using LuaRandomGenerator: I originally wrote my RNG to be used at the data stage. I now use it in Ribbon Maze but intend to use it in the data stage in future. In such a situation I cannot rely on the provided RNG systems anyway. You'll see if you look at Serendipity that they've written out a pure Lua RNG implementation. Whether that runs into OS differences in basic maths, I am unsure.

But even if LuaRandomGenerator was amazing, I should be able to do basic maths without fear that if numbers are "too big" in some unspecified way, then it can cause a desync.

H8UL · Post by **H8UL** » Wed May 08, 2019 6:49 pm

Merssedes wrote: ↑Wed May 08, 2019 5:41 pm Just out of curiocity: what results do you get in each case?

When executing:

Code: Select all

/c game.player.print(bit32.band(1664525*2031137496+1013904223, 0xffffffff))

Windows: 1079053356

Linux: 2158106711

And when executed in C (you can try it in an online repl https://repl.it/languages/c):

Code: Select all

int main(void) {
  long a = 1664525L*2031137496+1013904223;
  long b = a & 0xFFFFFFFFL;
  int c = (int)b;
  printf("%lu\n", b);
  printf("%u\n", c);
  return 0;
}

Output:

Code: Select all

2158106711
2158106711

So they all agree, except Windows.

Post by **Rseding91** » Wed May 08, 2019 10:13 pm

"long" is 64 bit on mac and linux. "long" is 32 bit on windows.
"long long" is 64 bit on mac and linux. "long long" is 64 bit on windows.

Most likely it's that garbage again.

TruePikachu · Post by **TruePikachu** » Thu May 09, 2019 12:34 am

Rseding91 wrote: ↑Wed May 08, 2019 10:13 pm "long" is 64 bit on mac and linux. "long" is 32 bit on windows.
"long long" is 64 bit on mac and linux. "long long" is 64 bit on windows.

Most likely it's that garbage again.

The described math above doesn't depend on any bits past #31 -- the result is masked to just the low 32 bits, and neither addition nor multiplication have rightwards carry.

I'm currently investigating the internal math behind this, I'll have an answer in a few hours.

-----
EDIT:
Going by a copy of vanilla Lua 5.2.1 sources I have lying around, there's three methods that Lua will use to convert a Lua-side number (`lua_Number` -- `double` by default) to a 32-bit unsigned integer (`lua_Unsigned` -- `unsigned int` by default, intended to be something like `uint32_t`). It will either use MASM `fld`/`fistp` (MS-only), type-pun the `lua_Number` to `lua_Unsigned` via a `union` (I don't really want to do the long math right now to check if that implementation is correct), run a modulo via well-defined floating point operations, or just cast. The actual method lies in the macro `lua_number2unsigned` in `llimits.h`.

It sounds like (I haven't checked the Factorio binary to verify) something like the type-punning method is being used on Windows, and there's an off-by-one error in there somewhere -- I noticed that the reported Windows output is half the Linux output, rounded up (or to nearest even, I'm not sure).

----
EDIT 2:
Just wrote up a quick test of that type-pun.

Code: Select all

#include <cstdint>
#include <iostream>

union luai_Cast {
	double l_d;
	std::uint32_t l_p[2];
};

int main() {
	constexpr double foo = 3380880154433623.0; // Number that gets passed into bit32.band
	volatile union luai_Cast u;
	u.l_d = foo + 6755399441055744.0; // 10136279595489368.0
	std::cout << "Casting:      " << static_cast<uint32_t>(foo) << std::endl;
	std::cout << "Type punning: " << u.l_p[0] << std::endl;
	return 0;
}

The above code, compiled for Windows x86-64 (not that it should matter), outputs:

Code: Select all

Casting:      2158106711
Type punning: 1079053356

DaleStan · Post by **DaleStan** » Thu May 09, 2019 3:18 am

H8UL wrote: ↑Wed May 08, 2019 6:49 pmWhen executing:
Code: Select all
/c game.player.print(bit32.band(1664525*2031137496+1013904223, 0xffffffff))
Windows: 1079053356

Linux: 2158106711

According to the Lua 5.2 reference, "all functions accept numeric arguments in the range (-2⁵¹,+2⁵¹)." The number you supplied (1664525*2031137496+1013904223) is outside that range. I'd file this under "Undefined behavior causes undefined behavior".

TruePikachu wrote: ↑Thu May 09, 2019 12:34 am
Rseding91 wrote: ↑Wed May 08, 2019 10:13 pm "long" is 64 bit on mac and linux. "long" is 32 bit on windows.
"long long" is 64 bit on mac and linux. "long long" is 64 bit on windows.

Most likely it's that garbage again.
The described math above doesn't depend on any bits past #31

It absolutely depends on bits past 31. On Windows, the first line of main is equivalent to

Code: Select all

int32_t a = 0xC02E4`80A21857L;

That's 52 bits. C being C, that most likely invokes undefined behavior.
Even if that is defined, the printf is definitely undefined behavior, since %lu is a promise to pass a 64-bit integer, but a is a 32-bit integer.

Post by **Rseding91** » Thu May 09, 2019 3:27 am

TruePikachu wrote: ↑Thu May 09, 2019 12:34 am I'm currently investigating the internal math behind this, I'll have an answer in a few hours.

-----
EDIT:
Going by a copy of vanilla Lua 5.2.1 sources I have lying around, there's three methods that Lua will use to convert a Lua-side number (`lua_Number` -- `double` by default) to a 32-bit unsigned integer (`lua_Unsigned` -- `unsigned int` by default, intended to be something like `uint32_t`). It will either use MASM `fld`/`fistp` (MS-only), type-pun the `lua_Number` to `lua_Unsigned` via a `union` (I don't really want to do the long math right now to check if that implementation is correct), run a modulo via well-defined floating point operations, or just cast. The actual method lies in the macro `lua_number2unsigned` in `llimits.h`.

It sounds like (I haven't checked the Factorio binary to verify) something like the type-punning method is being used on Windows, and there's an off-by-one error in there somewhere

I was looking into that logic some time ago and thought specifically that it shouldn't be using the type punning logic because it would be a problem area. My IDE was telling me it wasn't being used but C being C and the Lua library being written in C it makes *heavy* use of macros which obfuscated the logic enough that I didn't see it was in fact using type punning on the windows build but standard casting on the other platforms.

I just deleted the "special" cast logic and forced every platform to use simple casts + made a test for it. So, this is now fixed for the next version of 0.17.

TruePikachu · Post by **TruePikachu** » Thu May 09, 2019 3:57 am

DaleStan wrote: ↑Thu May 09, 2019 3:18 am I'd file this under "Undefined behavior causes undefined behavior".

The problem is, undefined behaviour doesn't mix well with determinism. The behaviour might be undefined, yes, but it should still be identical regardless of platform.

DaleStan wrote: ↑Thu May 09, 2019 3:18 am That's 52 bits. C being C, that most likely invokes undefined behavior.

I was speaking in terms of variables being unsigned (where, in C++ at least, overflow and underflow are well-defined). Even if it's signed math, however, assuming that signed integers wrap around the same way, you get the same result (but it's a negative number since the high bit is set). In Lua, the math is done with double-precision floats, which have full integer precision within 53 bits.

DaleStan wrote: ↑Thu May 09, 2019 3:18 am Even if that is defined, the printf is definitely undefined behavior, since %lu is a promise to pass a 64-bit integer, but a is a 32-bit integer.

printf(3) states that `l` is a long and `ll` is a long long. There are no issues with the length modifier in the printf (only with the fact that `u` is used with a signed argument).

H8UL · Post by **H8UL** » Thu May 09, 2019 7:52 am

Rseding91 wrote: ↑Thu May 09, 2019 3:27 am
TruePikachu wrote: ↑Thu May 09, 2019 12:34 am I'm currently investigating the internal math behind this, I'll have an answer in a few hours.

-----
EDIT:
Going by a copy of vanilla Lua 5.2.1 sources I have...
I was looking into that logic some time ago and thought specifically that it shouldn't be using the type punning logic because it would be a problem area. My IDE was telling me it wasn't being used but C being C and the Lua library being written in C it makes *heavy* use of macros which obfuscated the logic enough that I didn't see it was in fact using type punning on the windows build but standard casting on the other platforms.

I just deleted the "special" cast logic and forced every platform to use simple casts + made a test for it. So, this is now fixed for the next version of 0.17.

That's amazing! Thank you so much!

Edit: also thanks to everyone else for their input/investigation, amazing community as always!

Factorio Forums

[0.17.37] Platform differences in integral mathematics

[0.17.37] Platform differences in integral mathematics

Re: [0.17.37] Platform differences in integral mathematics

Re: [0.17.37] Platform differences in integral mathematics

Re: [0.17.37] Platform differences in integral mathematics

Re: [0.17.37] Platform differences in integral mathematics

Re: [0.17.37] Platform differences in integral mathematics

Re: [0.17.37] Platform differences in integral mathematics

Re: [0.17.37] Platform differences in integral mathematics

Re: [0.17.37] Platform differences in integral mathematics

Re: [0.17.37] Platform differences in integral mathematics

Re: [0.17.37] Platform differences in integral mathematics

Re: [0.17.37] Platform differences in integral mathematics

Re: [0.17.37] Platform differences in integral mathematics