Friday Facts #204 - Another day, another optimisation

Gymble · Post by **Gymble** » Fri Aug 18, 2017 7:20 pm

Very many entities will be aware ahead of time when they will next need to do work. An assembler doesn't need to do a thing for X ticks until its craft is done. An inserter is doing nothing for Y ticks until its swing is complete. Etc. Even an inserter hovering over a belt can predict to some degree when a relevant item (or empty spot) will next come up.

Except that lack of power slow down entities.

pleegwat · Post by **pleegwat** » Fri Aug 18, 2017 7:28 pm

Gymble wrote:
Very many entities will be aware ahead of time when they will next need to do work. An assembler doesn't need to do a thing for X ticks until its craft is done. An inserter is doing nothing for Y ticks until its swing is complete. Etc. Even an inserter hovering over a belt can predict to some degree when a relevant item (or empty spot) will next come up.
Except that lack of power slow down entities.

Right. Which means instead of one time variable you have one per power grid. Or several - I'm not sure if all entities slow down by the same factor? And you'd need to store which time variable to use with each entity reference.

Post by **Rseding91** » Fri Aug 18, 2017 7:31 pm

Rakshasa wrote:You guys are using... linked lists?...

Why, oh why do you use linked lists? It is supposed to be common knowledge that in almost all situations even a straight vector is better.

At the very least something like C++'s deque would be better.

Tell me how you'd get: O(1) time to turn an entity on/off and use a vector to maintain which entities are active and allow an entity to go active/inactive during the update loop as you iterate active entities.

You can't have them all

By using a linked list with the entity itself being the link you have O(1) time to turn any entity on/off and can iterate all active entities while still allowing other entities to go active/inactive during the update loop - you can't do that with a vector as mutating the vector while iterating isn't allowed.

As with virtually everything programming: you can't use absolutes - as you said "in almost all situations" -> this isn't one of them

TurkleTrenox · Post by **TurkleTrenox** » Fri Aug 18, 2017 7:39 pm

I wish i could get a good Pota.. I mean,PC

Zulan · Post by **Zulan** » Fri Aug 18, 2017 7:40 pm

Impatient wrote:you lost me this time. how did you determine what byte range to prefetch? and also, i assume, you have to cast those raw bytes into a valid object. how do you know which bytes go into what variable? is the variables sequence the same on all hardware architectures? how did you address that uncertainty? how do you make sure, the cast is correct? checksum? hardware specific casts? hm,yes,yes, i get the picture. pretty nice coding stunt. seems risky at first glance.

Software prefetching is completely agnostic to the type. It's just an instruction with a memory address that is a hint to hardware. Getting the "correct" range or individual variable types would be impossible - because the object type itself is dynamic.

In fact I figured out the optimal byte range to prefetch through experimentation - and confirmed it it sensible.

The original charts

pleegwat · Post by **pleegwat** » Fri Aug 18, 2017 7:50 pm

Rseding91 wrote:
Rakshasa wrote:You guys are using... linked lists?...

Why, oh why do you use linked lists? It is supposed to be common knowledge that in almost all situations even a straight vector is better.

At the very least something like C++'s deque would be better.
Tell me how you'd get: O(1) time to turn an entity on/off and use a vector to maintain which entities are active and allow an entity to go active/inactive during the update loop as you iterate active entities.

You can't have them all By using a linked list with the entity itself being the link you have O(1) time to turn any entity on/off and can iterate all active entities while still allowing other entities to go active/inactive during the update loop - you can't do that with a vector as mutating the vector while iterating isn't allowed.

As with virtually everything programming: you can't use absolutes - as you said "in almost all situations" -> this isn't one of them

I don't know your code of course, and I'm hampered because my experience is C not C++. Depending on the list and the type of removals, I'd wonder if you really need to delete them or can just mark them inactive. If a new entity gets placed on the map (or one gets removed), then yes the list needs to be updated. If it's just an assembler or inserter going into an indefinite wait (or coming out of it), just marking it as inactive is probably enough.

I don't have numbers to back this up, but if you have an array of 100 entities and 80 of them are inactive, iterating past the inactive entities without dereferencing the entity is probably still just as fast as iterating over a 20-element array.

Zulan · Post by **Zulan** » Fri Aug 18, 2017 7:55 pm

AlastarFrost wrote:I would approach this from a different perspective (I don't know if it is viable, but it doesn't hurt to explore it):

Looking at the factory as a graph of interacting entities, you will have tightly coupled areas (like an actual production plant with belts, splitters, inserters, production ...) that are loosely coupled (by trains, very long belts, large distance logistic networks ...).

If you now identify the tightly coupled areas, especially the ones outside the players attention (the window he is looking at), you can run each area through multiple ticks (like fast forwarding each area 500ms at a time) and get a nice hot cache for that. Then you switch to the next area. As they are coupled by things like trains, that are "slow" to fill, that will be hardly noticable.

I've had similar thoughts before. If you have a fixed set of "simulation rules" / interactions, it is certainly possible to make a good scalable parallel implementation. But we are talking about code that grows - and a multitude of Entities with very complex interactions. A consequent parallelization would mean to impose a strong limit on interactions. It's not impossible - but it certainly conflicts with dynamically developing a game(play).

Post by **kovarex** » Fri Aug 18, 2017 7:58 pm

Rakshasa wrote:You guys are using... linked lists?...

Why, oh why do you use linked lists? It is supposed to be common knowledge that in almost all situations even a straight vector is better.

At the very least something like C++'s deque would be better.

Linked lists are generally bad if the things you store there is something like int, and if you use it in a way like std::list<int>.
But our active entity linked list is intrusive. This means, that the next and previous pointers are stored directly in the entity itself without the need for indirection.
The point is, that the entity needs to be "opened" anyway to be updated, and the entity is dynamically allocated, so having it linked like this doesn't add any additional memory gets compared to something like array of pointers.
Even though, we might actually change this in the future, probably just to make the structures smaller as pointers are just too big in 64bit system

pleegwat wrote:
AssaultRaven wrote:
pleegwat wrote:A linked list of multiply-inheriting classes? I'd have expected an array of structs.
I'm surprised to. I'd have expected C++ to handle virtual function calls and inheritance at compile-time, but apparently things are setup so that some of the required class structure isn't known until run-time?
Thinking further on it it's worse: This suggests they're evaluating every object in every tick.

Even if I've got a fully beaconed up yellow assembler making copper wire, that's going to do a craft like what, every 6 ticks?
So that's 1 tick out of 6 that it needs to do work. The other 5, at most you're updating the craft progress (don't do that. Save the time it'll be done instead).

Very many entities will be aware ahead of time when they will next need to do work. An assembler doesn't need to do a thing for X ticks until its craft is done. An inserter is doing nothing for Y ticks until its swing is complete. Etc. Even an inserter hovering over a belt can predict to some degree when a relevant item (or empty spot) will next come up.

We are actually doing it for some of the entities already, the smoke for example (mentioned in the https://www.factorio.com/blog/post/fff-84). We are slowly adding more and more things into this system.

Post by **Rseding91** » Fri Aug 18, 2017 7:59 pm

pleegwat wrote:I don't know your code of course, and I'm hampered because my experience is C not C++. Depending on the list and the type of removals, I'd wonder if you really need to delete them or can just mark them inactive. If a new entity gets placed on the map (or one gets removed), then yes the list needs to be updated. If it's just an assembler or inserter going into an indefinite wait (or coming out of it), just marking it as inactive is probably enough.

I don't have numbers to back this up, but if you have an array of 100 entities and 80 of them are inactive, iterating past the inactive entities without dereferencing the entity is probably still just as fast as iterating over a 20-element array.

That approach would require dynamic memory allocation for each entity in the array to store the "is active" value. Otherwise you have to store it on the entity which means you still need to deference the entity as you iterate.

Either way, it's still slower than what we have now and doesn't handle the case when an entity is destroyed being O(N) + not able to happen during an update loop as it would invalidate the array while iterating it.

Post by **kovarex** » Fri Aug 18, 2017 8:02 pm

pleegwat wrote:
Rseding91 wrote:
Rakshasa wrote:You guys are using... linked lists?...

Why, oh why do you use linked lists? It is supposed to be common knowledge that in almost all situations even a straight vector is better.

At the very least something like C++'s deque would be better.
Tell me how you'd get: O(1) time to turn an entity on/off and use a vector to maintain which entities are active and allow an entity to go active/inactive during the update loop as you iterate active entities.

You can't have them all By using a linked list with the entity itself being the link you have O(1) time to turn any entity on/off and can iterate all active entities while still allowing other entities to go active/inactive during the update loop - you can't do that with a vector as mutating the vector while iterating isn't allowed.

As with virtually everything programming: you can't use absolutes - as you said "in almost all situations" -> this isn't one of them
I don't know your code of course, and I'm hampered because my experience is C not C++. Depending on the list and the type of removals, I'd wonder if you really need to delete them or can just mark them inactive. If a new entity gets placed on the map (or one gets removed), then yes the list needs to be updated. If it's just an assembler or inserter going into an indefinite wait (or coming out of it), just marking it as inactive is probably enough.

I don't have numbers to back this up, but if you have an array of 100 entities and 80 of them are inactive, iterating past the inactive entities without dereferencing the entity is probably still just as fast as iterating over a 20-element array.

Well, if you had arry of some structure with pointer + active, then maybe yes, but our overall goal is to turn the load of inactive entities to absolute minimum. Inactive turrets (turrets with no enemies around) take 0 time in the update, you can have as many as you like, it is still 0. When enemies come, only turrets nearby get activated. With the proposed technique, these turrets would still draw some cpu at least a little.

Post by **kovarex** » Fri Aug 18, 2017 8:05 pm

Rseding91 wrote:
pleegwat wrote:I don't know your code of course, and I'm hampered because my experience is C not C++. Depending on the list and the type of removals, I'd wonder if you really need to delete them or can just mark them inactive. If a new entity gets placed on the map (or one gets removed), then yes the list needs to be updated. If it's just an assembler or inserter going into an indefinite wait (or coming out of it), just marking it as inactive is probably enough.

I don't have numbers to back this up, but if you have an array of 100 entities and 80 of them are inactive, iterating past the inactive entities without dereferencing the entity is probably still just as fast as iterating over a 20-element array.
That approach would require dynamic memory allocation for each entity in the array to store the "is active" value. Otherwise you have to store it on the entity which means you still need to deference the entity as you iterate.

Either way, it's still slower than what we have now and doesn't handle the case when an entity is destroyed being O(N) + not able to happen during an update loop as it would invalidate the array while iterating it.

Actually, one possibility could be to have a vector of ActiveEntity poitners and remove from it by switching with the last one. The entity would know the index of its position in the vector. This way you still have O(1) removal, instead of 2 pointers, you have 1 index, and you only have to touch one entity when removing, not the 2 neighbours.

Gymble · Post by **Gymble** » Fri Aug 18, 2017 8:09 pm

Having an array of struct with each struct containing a boolean for the active state will be an issue if an entity must be activated and processed within the same update cycle.

With a linked list you just move the activated entity from the inactive list to the end of the active one. It takes no time and does not invalidate iterators.

pleegwat · Post by **pleegwat** » Fri Aug 18, 2017 8:21 pm

kovarex wrote:
Rseding91 wrote:
pleegwat wrote:I don't know your code of course, and I'm hampered because my experience is C not C++. Depending on the list and the type of removals, I'd wonder if you really need to delete them or can just mark them inactive. If a new entity gets placed on the map (or one gets removed), then yes the list needs to be updated. If it's just an assembler or inserter going into an indefinite wait (or coming out of it), just marking it as inactive is probably enough.

I don't have numbers to back this up, but if you have an array of 100 entities and 80 of them are inactive, iterating past the inactive entities without dereferencing the entity is probably still just as fast as iterating over a 20-element array.
That approach would require dynamic memory allocation for each entity in the array to store the "is active" value. Otherwise you have to store it on the entity which means you still need to deference the entity as you iterate.

Either way, it's still slower than what we have now and doesn't handle the case when an entity is destroyed being O(N) + not able to happen during an update loop as it would invalidate the array while iterating it.
Actually, one possibility could be to have a vector of ActiveEntity poitners and remove from it by switching with the last one. The entity would know the index of its position in the vector. This way you still have O(1) removal, instead of 2 pointers, you have 1 index, and you only have to touch one entity when removing, not the 2 neighbours.

Doesn't work if you're actually destroying entities outside their own update call though, because you may end up swapping an entity which acted this cycle with one which has not.

Thinking further, this also depends on how expensive these updates are. It's still an array of pointers, and there's a limit to how many of those you can prefetch before running out of cache. Which is why my initial thought was more for the 'nothing to do here, move along' ticks in ostensibly active entities. Even then though, the more fields you need to duplicate the more complex your code gets, and the smaller the gain.

malventano · Post by **malventano** » Fri Aug 18, 2017 8:22 pm

Koub wrote:
To get a bit more diversity, the measurements for this chart were done on a different CPU (i7-6700K vs i7-4790K previously)
If I may, I think the tests for optimization should be run not only on Intel, but also on AMD CPUs : we've seen recently how the architecture of the last AMD architecture has unforeseen effect on performance.

^ This. We have seen definite issues with the memory latency of recent AMD CPUs negatively impacting games - specifically those sensitive to it, and Factorio appears to be *very* sensitive to it. We've done testing on Ryzen and ThreadRipper showing relatively high latencies to the DRAM when compared to Intel CPUs.

If the Factorio devs need some testing done on these platforms, hit me up and I can try to make it happen.

Allyn Malventano
Editor, PC Perspective

iamwyza · Post by **iamwyza** » Fri Aug 18, 2017 8:24 pm

pleegwat wrote:
AssaultRaven wrote:
pleegwat wrote:A linked list of multiply-inheriting classes? I'd have expected an array of structs.
I'm surprised to. I'd have expected C++ to handle virtual function calls and inheritance at compile-time, but apparently things are setup so that some of the required class structure isn't known until run-time?
Thinking further on it it's worse: This suggests they're evaluating every object in every tick.

Even if I've got a fully beaconed up yellow assembler making copper wire, that's going to do a craft like what, every 6 ticks?
So that's 1 tick out of 6 that it needs to do work. The other 5, at most you're updating the craft progress (don't do that. Save the time it'll be done instead).

No. With mods like bob's, I can easily beacon up a single assembler making copper wire at the game's limit (1/tick). Sure not in vanilla, but easily in non-vanilla.

Rakshasa · Post by **Rakshasa** » Fri Aug 18, 2017 8:34 pm

kovarex wrote:Linked lists are generally bad if the things you store there is something like int, and if you use it in a way like std::list<int>.
But our active entity linked list is intrusive. This means, that the next and previous pointers are stored directly in the entity itself without the need for indirection.
The point is, that the entity needs to be "opened" anyway to be updated, and the entity is dynamically allocated, so having it linked like this doesn't add any additional memory gets compared to something like array of pointers.
Even though, we might actually change this in the future, probably just to make the structures smaller as pointers are just too big in 64bit system

My point was that using vectors/deque is more efficient even for larger structures than any 'int', e.g. for activate/deactivate something like std::partition does three simultaneous sequential memory access instead of random lookups of cachelines for each iteration.

The point is that if you think storing pointers and ints are the point at which vectors lose out to linked lists then you're wrong. Depending on how you access and update entries even if the size is a couple of cachelines they might still be better off in a vector or a deque.

One option is to move relevant data down to the struct stored in the container, and make the object-dependent data accessible through a pointer. If you have an active/inactive for each loop where most cases can be solved by looking at the base struct then it will likely be more efficient use a vector and std::partition than a linked list.

NeilN · Post by **NeilN** » Fri Aug 18, 2017 8:45 pm

Have the developers ever defined what they mean by "large bases" and "mega bases"? It would be useful to know (in terms of number of entities/factories/bots/belts) when reading about how these optimizations are tested and measured.

safan · Post by **safan** » Fri Aug 18, 2017 9:00 pm

i completely didn't understand anything in this fff

pleegwat · Post by **pleegwat** » Fri Aug 18, 2017 9:02 pm

NeilN wrote:Have the developers ever defined what they mean by "large bases" and "mega bases"? It would be useful to know (in terms of number of entities/factories/bots/belts) when reading about how these optimizations are tested and measured.

Excellent point. Some people might think a rocket an hour is a megabase. Others might require a rocket a minute.

AlastarFrost · Post by **AlastarFrost** » Fri Aug 18, 2017 9:24 pm

Zulan wrote:
AlastarFrost wrote:I would approach this from a different perspective (I don't know if it is viable, but it doesn't hurt to explore it):

Looking at the factory as a graph of interacting entities, you will have tightly coupled areas (like an actual production plant with belts, splitters, inserters, production ...) that are loosely coupled (by trains, very long belts, large distance logistic networks ...).

If you now identify the tightly coupled areas, especially the ones outside the players attention (the window he is looking at), you can run each area through multiple ticks (like fast forwarding each area 500ms at a time) and get a nice hot cache for that. Then you switch to the next area. As they are coupled by things like trains, that are "slow" to fill, that will be hardly noticable.
I've had similar thoughts before. If you have a fixed set of "simulation rules" / interactions, it is certainly possible to make a good scalable parallel implementation. But we are talking about code that grows - and a multitude of Entities with very complex interactions. A consequent parallelization would mean to impose a strong limit on interactions. It's not impossible - but it certainly conflicts with dynamically developing a game(play).

It looks like everything is connected tightly with everything else so that you can not break it apart, but thats actually not true.
If you want to come to a reasonable simulation, not a perfect one, everything that has a large enough buffer creates a propagation delay of every action that is big enough to extrapolate the outcome without a big error margin. Thats why I mentioned trains: If a mining facility has to fill a train before the outcome is propagated (the train starts moving), you could simulate it until the train starts right away if noone is looking, and not touch it until the train leaves. Then you could simulate in large chunks until the buffer chests run full, until the next train arrives.

A big problem is the current implementation of the power network. It seems to connect everything on a per-tick-basis, because it does not inherently buffer. But you could not manage a power network in the real world, if that was the case. So what happens? First, the network itself has a capacity. If you rapidly increase the load, you take from that capacity. Then a fast reacting power plant kicks in (usually gas turbines) to compensate for the consumption spike. If the consumption goes on, coal or other slow reacting powerplants raise their output (starting a coal engine block can take over half an hour). So if the powernetwork in factorio had an inherent capacity that you could "borrow" power from, you could run you partial factory for a few ticks on that.

Of course that would introduce some error into how machines slow down if the power gets low, but to me that looks just like a propagation delay. If you actually lay out the system to make use of propagation delays, thats not a bad thing. You are basically reducing the granularity of the simulation at the loosely coupled edges, thats not such a bad tradeoff. You just have to make sure that the power network does not tightly couple everything, but if you accept some propagation delay that is certainly possible.

Factorio Forums

Friday Facts #204 - Another day, another optimisation

Re: Friday Facts #204 - Another day, another optimisation

Re: Friday Facts #204 - Another day, another optimisation

Re: Friday Facts #204 - Another day, another optimisation

Re: Friday Facts #204 - Another day, another optimisation

Re: Friday Facts #204 - Another day, another optimisation

Re: Friday Facts #204 - Another day, another optimisation

Re: Friday Facts #204 - Another day, another optimisation

Re: Friday Facts #204 - Another day, another optimisation

Re: Friday Facts #204 - Another day, another optimisation

Re: Friday Facts #204 - Another day, another optimisation

Re: Friday Facts #204 - Another day, another optimisation

Re: Friday Facts #204 - Another day, another optimisation

Re: Friday Facts #204 - Another day, another optimisation

Re: Friday Facts #204 - Another day, another optimisation

Re: Friday Facts #204 - Another day, another optimisation

Re: Friday Facts #204 - Another day, another optimisation

Re: Friday Facts #204 - Another day, another optimisation

Re: Friday Facts #204 - Another day, another optimisation

Re: Friday Facts #204 - Another day, another optimisation

Re: Friday Facts #204 - Another day, another optimisation