Have you made this NN yet? If not I would like to have a go. It seems you've come up with 2 ideas here; self training and stacking, I'll start with self training which is the net up the top left.
I have a few questions about the net.
By output did you mean the same as action? Shouldn't the net already know what action it is doing.
The light red layer is the logical output. Or...the output the network thinks it's action will produce.
Where does this variable come from? Is it based on external stimuli?
Last, but not least, the yellow dot is what the AI predicts the training variable to be