How do you debug desync problems?

Place to post guides, observations, things related to modding that are not mods themselves.
Post Reply
swni
Long Handed Inserter
Long Handed Inserter
Posts: 91
Joined: Sat Mar 05, 2016 1:54 am
Contact:

How do you debug desync problems?

Post by swni »

I thought I understood how synchronization worked properly... the first mod I wrote involved random numbers and caching and such complications (1400 lines of code called from control.lua) and I never experienced a single desync, but another one I wrote immediately experienced desyncs when tested in multiplayer. I was able to mostly resolve the desyncs with a small change to the code but I have no idea why one version of the code works and the other does not.

So what conditions are necessary for a desync to occur? What are examples of things you could do in a mod that would cause a desync? How can you predict whether code will work correctly before testing it? And how do you identify what is causing a desync other than trial and error?

User avatar
prg
Filter Inserter
Filter Inserter
Posts: 947
Joined: Mon Jan 19, 2015 12:39 am
Contact:

Re: How do you debug desync problems?

Post by prg »

Popular desync causes are not putting things into global and doing something dumb in on_load. Maybe post the working and non-working code so we can figure out what's going on.
Automatic Belt (and pipe) Planner—Automate yet another aspect of constructing your factory!

swni
Long Handed Inserter
Long Handed Inserter
Posts: 91
Joined: Sat Mar 05, 2016 1:54 am
Contact:

Re: How do you debug desync problems?

Post by swni »

I didn't post my code because I think it would take a while for someone else to decipher how it works, whereas if I learned some general principles of desync debugging I could do it out myself. Also if I knew what caused desyncs I could avoid these problems better in the future myself. None-the-less, if you want to try, the mod is here: https://mods.factorio.com/mods/erst/diffuse-resources The relevant code is in lib/randxy.lua, but you might need to dig through control.lua as well. The commented out code is the part that desyncs (specifically the checking whether xs[x] == nil gives different results on different clients), and I resolved the problem by replacing it with just "return math.random()".

Additionally, the mod sometimes desyncs when "db" (this prints a message to output) is called. This happens for example in "reroll_everything()". (I have no idea whether the desync is related to db in any way, that's just when the desync happens.)

(All data is stored in global and I don't do much in on_load other than set up event handlers.)

User avatar
prg
Filter Inserter
Filter Inserter
Posts: 947
Joined: Mon Jan 19, 2015 12:39 am
Contact:

Re: How do you debug desync problems?

Post by prg »

swni wrote:All data is stored in global
Except for all the local variables you define at the start of control.lua, right?

Code: Select all

local msg = 0
function db(s)
    local p = game.players[1]
    if p ~= nil then
        p.print('[' .. msg .. ']    ' .. s)
        msg = msg + 1
    end
end
Player1 starts the game, msg = 0. A debug print happens, after that msg = 1 for Player1.
Player2 joins, msg = 0 for Player2.
Another debug print happens with msg = 1 for Player1 and msg = 0 for Player2.
Automatic Belt (and pipe) Planner—Automate yet another aspect of constructing your factory!

swni
Long Handed Inserter
Long Handed Inserter
Posts: 91
Joined: Sat Mar 05, 2016 1:54 am
Contact:

Re: How do you debug desync problems?

Post by swni »

The only effect of different values of msg is causing different print() commands to be issued, which I had assumed couldn't cause a desync (since presumably the player messages are not being recorded in the save file). I guess that is not the case? If so then that explains the problem with db.

("tf" is another local variable but notice its value is set on_load using data in global; my other local variables are all either constants or functions)

...any luck with lib/randxy.lua? I do save the results of calling RandXY in a local variable but its data is being populated with stuff from global. Thanks for looking into this!

User avatar
prg
Filter Inserter
Filter Inserter
Posts: 947
Joined: Mon Jan 19, 2015 12:39 am
Contact:

Re: How do you debug desync problems?

Post by prg »

swni wrote:The only effect of different values of msg is causing different print() commands to be issued, which I had assumed couldn't cause a desync (since presumably the player messages are not being recorded in the save file). I guess that is not the case? If so then that explains the problem with db.
Load a game, open the console. All the messages ever printed are still there.
swni wrote:("tf" is another local variable but notice its value is set on_load using data in global; my other local variables are all either constants or functions)
Mhh... might be fine, but why don't you just put tf into global, too?
swni wrote:...any luck with lib/randxy.lua? I do save the results of calling RandXY in a local variable but its data is being populated with stuff from global. Thanks for looking into this!

Code: Select all

local random_amount = RandXY("amount")
local random_type = RandXY("type")
When that code runs outside of an event handler, global is still empty.
Automatic Belt (and pipe) Planner—Automate yet another aspect of constructing your factory!

swni
Long Handed Inserter
Long Handed Inserter
Posts: 91
Joined: Sat Mar 05, 2016 1:54 am
Contact:

Re: How do you debug desync problems?

Post by swni »

prg wrote:
swni wrote:("tf" is another local variable but notice its value is set on_load using data in global; my other local variables are all either constants or functions)
Mhh... might be fine, but why don't you just put tf into global, too?
Fair enough, better safe than sorry.
prg wrote:When that code runs outside of an event handler, global is still empty.
Well that explains everything! I had no idea global was populated with the saved data after control.lua was run and before on_load is called. Confusingly global does not start as "nil" which would have made it clear that it is not yet valid (and I would have gotten a useful error message instead of a desync).

Thanks for figuring out the source of the problem.

Post Reply

Return to “Modding discussion”