[0.16.9] Infinite loop while trying to save
Posted: Sat Dec 30, 2017 2:10 am
This happened while a friend and I were playing on my server (headless linux), the server attempted to autosave and froze at 0% of the save progress bar.
Upon examining the CPU usage one thread was constantly at 100% while others were idle. No IO was occurring. I attached strace to this thread and observed no syscalls occurring.
I shut down the server and the game gave both of us the opportunity to save the game ourselves. We both did so. We both (linux, windows) got the same behaviour as the server, so it's consistent across OSes.
I managed to get a stack trace off the thread in question using gdb (note this trace is from the linux client, not the headless server):
Server log, last known good save (understandably I can't provide the actual save that failed to write, but this save was from about 10min prior), and core dump from my client (sadly I didn't manage to get the core dump off the server) available on google drive:
https://drive.google.com/drive/folders/ ... sp=sharing
The core dump is gzipped.
Also included is partial-save.tmp.zip, which is the partial save the server wrote before getting stuck.
Some other misc notes:
Client did not respond to a SIGTERM when trying to save, I had to SIGKILL (though I didn't try INT). The server responded to a double-SIGINT.
Not running any mods.
The save is huge, sorry. Let me know if there's anything else I can provide, things to try if it happens again, etc.
Upon examining the CPU usage one thread was constantly at 100% while others were idle. No IO was occurring. I attached strace to this thread and observed no syscalls occurring.
I shut down the server and the game gave both of us the opportunity to save the game ourselves. We both did so. We both (linux, windows) got the same behaviour as the server, so it's consistent across OSes.
I managed to get a stack trace off the thread in question using gdb (note this trace is from the linux client, not the headless server):
Code: Select all
#0 ListSizeHolder<LogisticMember, LogisticMemberNetworkTag, false>::get () at src/Util/Container/IntrusiveList.hpp:45
#1 IntrusiveList<LogisticMember, LogisticMemberNetworkTag, false>::size ()
at src/Util/Container/IntrusiveList.hpp:182
#2 LogisticPointContainerSaver<IntrusiveList<LogisticMember, LogisticMemberNetworkTag, false> >::LogisticPointContainerSaver () at src/Logistics/LogisticSaveLoadHelper.hpp:91
#3 0x00000000007e983b in LogisticNetwork::preSaveHook ()
at /tmp/factorio-euUOCP/src/Logistics/LogisticNetwork.cpp:171
#4 0x00000000007e9e77 in LogisticManager::preSaveHook ()
at /tmp/factorio-euUOCP/src/Logistics/LogisticManager.cpp:343
#5 0x00000000007e9f50 in ForceData::preSaveHook () at /tmp/factorio-euUOCP/src/Force/ForceData.cpp:805
#6 0x0000000000a8ca94 in ForceManager::preSaveHook () at /tmp/factorio-euUOCP/src/Force/ForceManager.cpp:79
#7 Map::save () at /tmp/factorio-euUOCP/src/Map/Map.cpp:1083
#8 0x0000000000a8d4f4 in Scenario::saveMap () at /tmp/factorio-euUOCP/src/Scenario/Scenario.cpp:623
#9 0x0000000000a8dcab in Scenario::saveAs () at /tmp/factorio-euUOCP/src/Scenario/Scenario.cpp:549
#10 0x0000000000b5f1b6 in ParallelScenarioSaver::doSave ()
at /tmp/factorio-euUOCP/src/Scenario/ParallelScenarioSaver.cpp:87
#11 0x0000000001489c0f in execute_native_thread_routine ()
#12 0x00007f9ef1df008a in start_thread () from /usr/lib/libpthread.so.0
#13 0x00007f9ef02a342f in clone () from /usr/lib/libc.so.6
https://drive.google.com/drive/folders/ ... sp=sharing
The core dump is gzipped.
Also included is partial-save.tmp.zip, which is the partial save the server wrote before getting stuck.
Some other misc notes:
Client did not respond to a SIGTERM when trying to save, I had to SIGKILL (though I didn't try INT). The server responded to a double-SIGINT.
Not running any mods.
The save is huge, sorry. Let me know if there's anything else I can provide, things to try if it happens again, etc.