Friday Facts #408 - Statistics improvements, Linux adventures
-
- Fast Inserter
- Posts: 140
- Joined: Wed Apr 26, 2017 11:29 pm
- Contact:
Re: Friday Facts #408 - Statistics improvements, Linux adventures
yay for more linux! <3
Thanks for looking into my fork-save report.
Thanks for looking into my fork-save report.
-
- Inserter
- Posts: 38
- Joined: Tue May 14, 2019 12:56 am
- Contact:
Re: @raiguard RE: background saving
From what I understand, that should/wouldn't/can't actually help. The "fork" syscall is extremely efficient, and uses copy-on-write to avoid duplicating things it doesn't have to. So if it's not actually used, it shouldn't actually use extra RAM anyway. While it MAY be possible for the forked instance to "free" things that it knows that 1. it doesn't need. 2. the original will modify and 3. are large enough to make a difference, it's unlikely that such an effort would yield useful results. And may actually cause problems.Greaka wrote: ↑Fri Apr 26, 2024 12:46 pm I don't know if you can take anything away from this, I just wanted to share a user story after you asked for feedback. If it's possible, it might be worth exploring if the forked Factorio instance could unload essentially everything but raw entity data or similar. I don't know much about the internals of the game, even after reading all FFFs.
I'll admit that I'm a bit puzzled by the claim that it takes twice as much memory though - most of that should be libraries, graphics, etc. that aren't going to be modified by EITHER process, and thus won't be duplicated no matter what either does.
Unless the "save" instance is "optimizing" by trying to "free" things it doesn't need - that could really cause problems. I.E. it's trying to to what you suggest, but it REALLY SHOULDN'T. Which IMHO is the more likely result of trying to make it work better.
- FerrariMAF
- Long Handed Inserter
- Posts: 89
- Joined: Fri May 26, 2017 11:39 am
- Contact:
Re: Friday Facts #408 - Statistics improvements, Linux adventures
Yes!! Please make pollution_statistics production by force and surface!So what do you think? Are there any other statistics improvements you can think about for 2.0?
-
- Inserter
- Posts: 23
- Joined: Fri Apr 26, 2024 12:57 pm
- Contact:
Re: Friday Facts #408 - Statistics improvements, Linux adventures
I have to Nth the idea to make the time scale and tabs in production statistics sticky. (And repeat that 5s is an almost uselessly noisy timescale for anything in the game, so it's an especially bad option to select all the time.)
I also wish there was more smoothing on the production graphs for dealing with spiky data. For instance, I have a setup that's supposed to be delivering 240/m science. But because all the assemblers I need for that are not perfectly evenly distributed in starting time, the actual numbers if I hover over the graph flicker from (say) 190 to 220 to 270 with no easy way to read what the rolling average is. I can't tell if I'm actually producing 240/m or if there's a bottleneck somewhere until it's been running awhile and I can see the 10-minute total. (For these applications, 1 minute totals are often too spiky as well.)
Finally, it's needlessly hard to click on a specific item in the filter list if there are several at the same production rate. A common scenario: you're producing red, green, and blue science at the same average rate. You go to see how your blue science is doing. But in the moment that you try to click on it, a couple of red science assemblers happen to finish so it jumps up above the other two and you accidentally click it. Then you try to click it in its new spot, and it swaps with green science. And then you try to unclick green science and red science, and end up clicking on blue twice. And then you decide that you know what, it's probably fine, you don't need to know. The suggestion was made a few FFFs ago that logistics network contents should only get ordered by quantity when you open the panel, so that they don't jump around if things change while you're reading; I would apply that to production statistics too.
I also wish there was more smoothing on the production graphs for dealing with spiky data. For instance, I have a setup that's supposed to be delivering 240/m science. But because all the assemblers I need for that are not perfectly evenly distributed in starting time, the actual numbers if I hover over the graph flicker from (say) 190 to 220 to 270 with no easy way to read what the rolling average is. I can't tell if I'm actually producing 240/m or if there's a bottleneck somewhere until it's been running awhile and I can see the 10-minute total. (For these applications, 1 minute totals are often too spiky as well.)
Finally, it's needlessly hard to click on a specific item in the filter list if there are several at the same production rate. A common scenario: you're producing red, green, and blue science at the same average rate. You go to see how your blue science is doing. But in the moment that you try to click on it, a couple of red science assemblers happen to finish so it jumps up above the other two and you accidentally click it. Then you try to click it in its new spot, and it swaps with green science. And then you try to unclick green science and red science, and end up clicking on blue twice. And then you decide that you know what, it's probably fine, you don't need to know. The suggestion was made a few FFFs ago that logistics network contents should only get ordered by quantity when you open the panel, so that they don't jump around if things change while you're reading; I would apply that to production statistics too.
Definitely agree with this. Right now the power grid statistics don't distinguish between "at capacity" (or, because of rounding, near capacity) and "over capacity" at all; it's only visible by looking for spots where your power distribution/generation is suspiciously flat for long stretches. Seeing the actual value of demand spikes that are higher than my supply would be very helpful when I'm trying to plan out how much more power I need.
-
- Fast Inserter
- Posts: 145
- Joined: Mon Apr 18, 2016 8:08 pm
- Contact:
Re: Friday Facts #408 - Statistics improvements, Linux adventures
Async saving is one of those oddball things that seems like it's a workaround to a bigger problem. If saves were barely noticeable then why even bother to put the effort in for a feature that 1% of 1% of the userbase uses? Why not work towards a solution that helps all users?
I'd guess that some exploration has been done into the save system in the past, but it still seems like there is more work that could be done to improve it. One of the most obvious things that I question about the save system is why it doesn't implement a differential / incremental backup like most commercial backup software does? Backup software successfully does change tracking on large databases and can snapshot a database in minutes / hours / days, etc. If you needed to reload the snapshot you don't need to reload the entire DB, you simply reload the changes. A user could be in the middle of a game, their internet disconnects, and then they need to redownload the entire map and save state from scratch. Why? Is it not possible to have the server generate a "full" backup when it does it's saves, and then simply generate a differential from the last backup? If you're on a map that's 200MB in size, and you disconnect for 5 minutes, you really shouldn't need to redownload all 200MB. If the only thing that needs to be updated is changes since the last backup file you have, then it seems like users would greatly benefit as maps grow in size.
I can certainly bet it's much more complex than I make it sound, but it would be appreciated by all if another look into why saving takes as long as it does and any improvements would be greatly appreciated!
I'd guess that some exploration has been done into the save system in the past, but it still seems like there is more work that could be done to improve it. One of the most obvious things that I question about the save system is why it doesn't implement a differential / incremental backup like most commercial backup software does? Backup software successfully does change tracking on large databases and can snapshot a database in minutes / hours / days, etc. If you needed to reload the snapshot you don't need to reload the entire DB, you simply reload the changes. A user could be in the middle of a game, their internet disconnects, and then they need to redownload the entire map and save state from scratch. Why? Is it not possible to have the server generate a "full" backup when it does it's saves, and then simply generate a differential from the last backup? If you're on a map that's 200MB in size, and you disconnect for 5 minutes, you really shouldn't need to redownload all 200MB. If the only thing that needs to be updated is changes since the last backup file you have, then it seems like users would greatly benefit as maps grow in size.
I can certainly bet it's much more complex than I make it sound, but it would be appreciated by all if another look into why saving takes as long as it does and any improvements would be greatly appreciated!
Re: @raiguard RE: background saving
If the fork() finishes and the forked process checks if it is the child and not the parent, the forking already took place. The resources are already duplicated. There is no benefit in then releasing anything, just to release everything after the memory dump of the map finished anyway.
The fork() system call doesn't really copy everything a process owns. On modern unix systems, it duplicates the small process table entry and some management data within the kernel, but process memory isn't physically duplicated. With virtual memory management, only single memory pages are copied on the fly into process private memory space if one of the processes writes something into their memory - it's copy on write. If the parent Factorio process that continues to run changes 5% of its process memory while the child is saving at the same time, only this 5% is actually copied by the kernel and allocated separately.
Windows also has mechanisms of shared memory combined with copy-on-write, however it's very Windows specific and definitely not as easy to use as a simple fork() with a little bit of process synchronization. So I guess it's not feasible to get this background saving to Windows as well. However, it would be a really great QoL feature if implemented nonetheless. Every time the autosave kicks in, the world stops spinning and my heart misses the beat until the autosave is complete. If the autosave should not complete for some reason, you will find me dead in my chair.
Re: Friday Facts #408 - Statistics improvements, Linux adventures
I think my single biggest frustration with the current state of the statistics page has always been locating and then selecting the resources I currently want to single out while working on diagnosing and correcting specific resource bottlenecks.
The most intense cause of problems with this for me is how, for lack of a better word, "jittery" the list can be when on the default 5s, or even on higher ones when two resources happen to be at similar production rates for extended periods. A way to either lock the entire page in place and prevent resorting based on current production rate so that they can more consistently be clicked or a filter search that allows finding resources by name would be incredibly helpful for this.
Meanwhile with power consumption, I suspect that the accumulator charge graph will be incredibly helpful. The only other main issue I tend to have with reading the power statistics is that once I migrate over to solar panels and accumulators, the constant back-and-forth on the graph makes it difficult to read at 1h timescale, due to how close the swaps are, and a completely illegible scribble beyond that.
I notice some other people suggesting a stacked chart view, and I think that would be especially helpful for the power view, as trying to get a sense for how much of total power is being used over time by each power consumer is difficult once you have so many lines overlapping each other.
Meanwhile as a Linux user myself I am incredibly excited to try out the asynchronous saving feature!
The most intense cause of problems with this for me is how, for lack of a better word, "jittery" the list can be when on the default 5s, or even on higher ones when two resources happen to be at similar production rates for extended periods. A way to either lock the entire page in place and prevent resorting based on current production rate so that they can more consistently be clicked or a filter search that allows finding resources by name would be incredibly helpful for this.
Meanwhile with power consumption, I suspect that the accumulator charge graph will be incredibly helpful. The only other main issue I tend to have with reading the power statistics is that once I migrate over to solar panels and accumulators, the constant back-and-forth on the graph makes it difficult to read at 1h timescale, due to how close the swaps are, and a completely illegible scribble beyond that.
I notice some other people suggesting a stacked chart view, and I think that would be especially helpful for the power view, as trying to get a sense for how much of total power is being used over time by each power consumer is difficult once you have so many lines overlapping each other.
Meanwhile as a Linux user myself I am incredibly excited to try out the asynchronous saving feature!
-
- Filter Inserter
- Posts: 947
- Joined: Wed Nov 25, 2015 11:44 am
- Contact:
Re: Friday Facts #408 - Statistics improvements, Linux adventures
I love to tinker with builds until the actual production matches the calculated maximum. It would be great to have output per second and/or more digits in the stats, both presumably through some setting.So what do you think? Are there any other statistics improvements you can think about for 2.0?
Also, thanks for the Linux work, have been using the Linux build since I got the game (.14?) and switched on background saving when doing my k2se run last year, no real issues.
Re: Friday Facts #408 - Statistics improvements, Linux adventures
Global production statistics should be called “universal” production statistics instead, because you’re on different globes
-
- Smart Inserter
- Posts: 2768
- Joined: Tue Apr 25, 2017 2:01 pm
- Contact:
Re: @raiguard RE: background saving
As far as I know, Factorio runs almost entirely in the memory, unlike most other games, which is why even slight issues with your ram are noticeable in Factorio while everything else seemingly runs just fine and you need so much more for this game as the factory grows. I imagine this is what contributes to the double ram usage, though note they were likely referring to the memory that Factorio was using for your running game, not everything else.jgilmore42 wrote: ↑Fri Apr 26, 2024 1:43 pm I'll admit that I'm a bit puzzled by the claim that it takes twice as much memory though - most of that should be libraries, graphics, etc. that aren't going to be modified by EITHER process, and thus won't be duplicated no matter what either does.
My Mods: Classic Factorio Basic Oil Processing | Sulfur Production from Oils | Wood to Oil Processing | Infinite Resources - Normal Yield | Tree Saplings (Redux) | Alien Biomes Tweaked | Restrictions on Artificial Tiles | New Gear Girl & HR Graphics
Re: Friday Facts #408 - Statistics improvements, Linux adventures
Thank you, raiguard, for caring about Linux! You're my hero.
Re: @raiguard RE: background saving
It seems you're mixing some things up. What Factorio is sensitive to is RAM and cache access time. Not the amount. Most other games aren't optimized anywhere near as well for memory speed to matter a great deal.FuryoftheStars wrote: ↑Fri Apr 26, 2024 2:33 pm As far as I know, Factorio runs almost entirely in the memory, unlike most other games, which is why even slight issues with your ram are noticeable in Factorio while everything else seemingly runs just fine and you need so much more for this game as the factory grows.
Re: Friday Facts #408 - Statistics improvements, Linux adventures
Personally my biggest request is to make the contents of the search bar sticky.Are there any other statistics improvements you can think about for 2.0?
If I'm using the production statistics window, most of the time I'm checking production of an item against either consumption or some target amount.
That means I'm frequently checking the production value for some item, making some changes in the factory, then coming back and checking production of the same item, then going back to the factory to make further changes (if necessary) before checking the stats window again to see whether my changes were enough, or whether I need to expand production in some other manner.
Remembering the last used timescale would also be useful, but changing the time scale is only one mouse click, rather than the mouse click to open the search bar, then 3-4 (or more) keystrokes so whatever item I want to look at is easily visible.
-
- Smart Inserter
- Posts: 2768
- Joined: Tue Apr 25, 2017 2:01 pm
- Contact:
Re: @raiguard RE: background saving
No, I did not say the amount of ram consumed by Factorio has bearing on its sensitivity to ram issues. I said the fact that it runs almost entirely in the ram does. There's a difference.Serenity wrote: ↑Fri Apr 26, 2024 2:46 pmIt seems you're mixing some things up. What Factorio is sensitive to is RAM and cache access time. Not the amount. Most other games aren't optimized anywhere near as well for memory speed to matter a great deal.FuryoftheStars wrote: ↑Fri Apr 26, 2024 2:33 pm As far as I know, Factorio runs almost entirely in the memory, unlike most other games, which is why even slight issues with your ram are noticeable in Factorio while everything else seemingly runs just fine and you need so much more for this game as the factory grows.
My Mods: Classic Factorio Basic Oil Processing | Sulfur Production from Oils | Wood to Oil Processing | Infinite Resources - Normal Yield | Tree Saplings (Redux) | Alien Biomes Tweaked | Restrictions on Artificial Tiles | New Gear Girl & HR Graphics
-
- Manual Inserter
- Posts: 2
- Joined: Wed Jun 03, 2015 8:21 pm
- Contact:
Re: Friday Facts #408 - Statistics improvements, Linux adventures
Are you going to bring non blocking saving to single player? I really only see the multiplayer setting. Regardless thanks for the linux support, however there are definitely some crashes lurking on linux factorio much more so than windows. Haven't been really able to reproduce them however:(
Keep up the good work!
Keep up the good work!
Re: Friday Facts #408 - Statistics improvements, Linux adventures
I've used Linux for literal decades, and have been Linux-only since Vista. So I greatly appreciate Wube's and raigard's efforts to provide and support native Linux builds. And other games that run natively on Linux (Rimworld and Starsector are two other favorites I play often).
Edit: I can't think of a single time Factorio has crashed. But also, I don't have background saving enabled, and I run with fairly minimal game settings.
Edit: I can't think of a single time Factorio has crashed. But also, I don't have background saving enabled, and I run with fairly minimal game settings.
Last edited by Kadet123 on Fri Apr 26, 2024 3:27 pm, edited 1 time in total.
Re: @raiguard RE: background saving
The memory cost is in that the memory in the saving fork is constant, while the main game process continues to simulate and may be changing all kinds of game data. While the save is running, many memory writes in the main game process incur the time and RAM overhead of a page copy.Tertius wrote: ↑Fri Apr 26, 2024 2:13 pmIf the fork() finishes and the forked process checks if it is the child and not the parent, the forking already took place. The resources are already duplicated. There is no benefit in then releasing anything, just to release everything after the memory dump of the map finished anyway.
The fork() system call doesn't really copy everything a process owns. On modern unix systems, it duplicates the small process table entry and some management data within the kernel, but process memory isn't physically duplicated. With virtual memory management, only single memory pages are copied on the fly into process private memory space if one of the processes writes something into their memory - it's copy on write. If the parent Factorio process that continues to run changes 5% of its process memory while the child is saving at the same time, only this 5% is actually copied by the kernel and allocated separately.
-
- Burner Inserter
- Posts: 5
- Joined: Wed Mar 20, 2024 12:37 pm
- Contact:
Re: Friday Facts #408 - Statistics improvements, Linux adventures
@raiguard
Thanks a ton for caring for Linux-Factorio. As a native I highly appreciate the work, makes my krastorio run smoother
Are there plans to include some more performance optimizations on Linux? E.g. the mimalloc allocator (viewtopic.php?f=69&t=102492) made quite some difference for me.
Thanks a ton for caring for Linux-Factorio. As a native I highly appreciate the work, makes my krastorio run smoother
Are there plans to include some more performance optimizations on Linux? E.g. the mimalloc allocator (viewtopic.php?f=69&t=102492) made quite some difference for me.