Page 1 of 1

[Rseding91] [0.17.74] mass UnitGroup creation and deletion causes desync if stored in global table using #table

Posted: Mon Oct 28, 2019 3:07 pm
by Muppet9010
If you create and destroy a lot of unit groups this can cause a desync in heavy mode.

In a clean 0.17.74 vanilla game.
Enable heavy mode (toggle-heavy-mode)
Run the below script a few times (up to 5) and a desync will be reported.

Code: Select all

/c
global.groups = global.groups or {}
local surface = game.player.surface
local force = game.player.force
for i = 1, 100 do
    local unit = surface.create_entity{name="small-biter", position={10,10}, force=force}
    local group = surface.create_unit_group{position={10,10}, force=force}
    group.add_member(unit)
    global.groups[#global.groups+1] = group
end
game.print("#global.groups: " .. #global.groups)

local toDestroy = math.random(10,30)
for i=1, toDestroy do
    local groupIndex = math.random(1, #global.groups)
    local group = global.groups[groupIndex]
    if group ~= nil and group.valid then
        group.destroy()
        global.groups[groupIndex] = nil
    end
end

This issue and their identification of the Lua code triggering the desync was pointed out to me by a fan of Comfy.
If the line "global.groups[groupIndex] = nil" isn't present the desync doesn't occur. So I assume its some internal lua/c++ game engine thing?

Re: [Rseding91] [0.17.74] mass UnitGroup creation and deletion causes desync

Posted: Wed Oct 30, 2019 3:14 pm
by Rseding91
The issue is coming from directly assigning nil to table values when the entire table is numeric-keyed. Lua in its ever-smartness decides (seemingly randomly) to convert the table between a hash table and an array depending on how many nils are in the table and where they're located.

I don't have a solution to this at the moment and likely won't have one for some time. The "#" operator on sparse tables gives you back garbage values (as you're finding out) and just doesn't work on hash tables. Basically don't use the "#" operator - it's just trouble.

Re: [Rseding91] [0.17.74] mass UnitGroup creation and deletion causes desync

Posted: Wed Oct 30, 2019 3:35 pm
by BlueTemplar
Rseding91 wrote:
Wed Oct 30, 2019 3:14 pm
Basically don't use the "#" operator - it's just trouble.
FFS !

(Is it at least safe to use it during the Factorio-loading stages ?)

Re: [Rseding91] [0.17.74] mass UnitGroup creation and deletion causes desync

Posted: Thu Oct 31, 2019 9:32 am
by boskid
For now it looks that length operator (#) on its own works ok (deterministic), but issue (desync) comes from cases where global variable is save-loaded: globals are serialised using serpent.dump on server and then loaded in client but here when loading, lua may choose different internal format than server's and so length operator may give different value.
This means that length operator on tables that were assigned to global in previous ticks (or in same tick in different update phase) may desync when lua chooses hash instead of array or array instead of hash.

I am not sure if it is worth fixing because of performance of all table operations.

This two reproductions are interesting since first will give second's initialiser and second will give first's initialiser:

Code: Select all

/c global.t = {0,0,0,nil,nil,nil,nil,nil,nil,nil,nil,nil,nil,nil,nil,nil,nil,nil,nil,nil,nil,0,0,0,0,0,0,0,0,0,0,0,0}
/c game.print(serpent.dump(global.t))
and

Code: Select all

/c global.t = {[1]=0,[2]=0,[3]=0,[22]=0,[23]=0,[24]=0,[25]=0,[26]=0,[27]=0,[28]=0,[29]=0,[30]=0,[31]=0,[32]=0,[33]=0}
/c game.print(serpent.dump(global.t))

Re: [Rseding91] [0.17.74] mass UnitGroup creation and deletion causes desync

Posted: Thu Oct 31, 2019 9:42 am
by Bilka
In that case, see 74166

Re: [Rseding91] [0.17.74] mass UnitGroup creation and deletion causes desync if stored in global table using #table

Posted: Thu Jul 02, 2020 2:47 pm
by Rseding91
Boskid looked at this more and the fix would make every single usage of # from O(logN) to O(N) which I consider far worse than the rare issue you're running into.

Re: [Rseding91] [0.17.74] mass UnitGroup creation and deletion causes desync if stored in global table using #table

Posted: Thu Jul 02, 2020 3:12 pm
by boskid
Simply speaking: length operator # uses binary search (O(logN)) to quickly compute amount of non-nil contiguous elements with numeric indexes. It should be deterministic when there are no gaps. When there are gaps, it may desync since it uses sizearray as starting point and so middle points may or may not hit a gap making result belong to longer or shorter part of array. table_size goes over every element and is O(N), but ignores sizearray and should always give the same value.

Code: Select all

/c
  local a = {}
  for i = 1,1023 do
    a[i] = 1
  end
  a[512]=nil
  a[256]=nil
  a[128]=nil
  a[64]=nil
  a[32]=nil
  a[16]=nil
  a[8]=nil
  a[4]=nil
  a[2]=nil
  game.print(#a)
above example most likely will print "1" because binary search hits all the gaps.

From the opposite end, there is this example:

Code: Select all

/c
  local a = {}
  a[512] = 1
  a[768] = 1
  a[896] = 1
  a[960] = 1
  a[992] = 1
  a[1008] = 1
  a[1016] = 1
  a[1020] = 1
  a[1022] = 1
  a[1023] = 1
  game.print(#a)
where size will be 1023 because binary search hits only the non-nil elements.

Re: [Rseding91] [0.17.74] mass UnitGroup creation and deletion causes desync if stored in global table using #table

Posted: Sat Jul 04, 2020 3:05 pm
by Muppet9010
Yea i think the fact # ever found entries above the first nil is what surprised me and I assume lead to the author of this code reaching this situation where it generally "seemed" to work.
As you said its a Lua "feature" of the # operator, with your example code reproducing the same oddness in a standalone Lua demo site. Confirmed by if I lower the size to the table index to 1020, the # operator then behaves as expected, stopping at first nil value.

I was looking at it seeing if I thought it would be helpful to modders to have it documented, but as its just another odd Lua feature that would seem like the start of a long list.