So far, the book is presenting a generic, elementary framework for modeling systems ("determinate machines", in his words). There are 2 parts: First, there's the set of states the system can take on, call them St. Next, there's a function ("transformation", mapping the state space onto itself), call it F, which describes the evolution of the state after a unit of time. This function takes you from the state at one moment, to the state at the next moment.
FSt = St+1
(I'm not using parentheses for F because I think it's cleaner this way, and because I'm not worried about associativity right now.)
Specifically, Ashby uses discrete time units, insisting that we can always generalize to continuous time if we need to. It seems the only necessary constraint on F is that its domain must cover the entire state space (otherwise, there'd be states we don't know how to evolve). If the function is one-to-one, then it is "information conserving", or "reversible" (it has an inverse), as is generally the case in Physics.
This is all well and good for entirely closed systems which are completely internally deterministic. But in the real world, there is essentially always causality flowing into or out of a system. Also, any system can be broken down into a series of coupled subsystems, so we want our framework to have a level of recursion in its applicability. How do we handle this?
Ashby's approach is simply to have multiple time evolution functions, with our choice of function changing based on the environment. I understand this best as a strategy for breaking systems into coupled subsystems, so I will explain it that way. Say we have a completely closed system S, and we want to break it into 2 coupled systems, A and B. The state space of S can typically be described by some variables, say, (a, b), and the evolution function F maps some particular values onto some other values. I will split S into A and B such that A gets (a) and B gets (b).
St = (at, bt)
FSt = St+1 = (at+1, bt+1)
S → {A, B}
At = (at)
Bt = (bt)
FSt = St+1 = (at+1, bt+1)
S → {A, B}
At = (at)
Bt = (bt)
Our function F evolves states of S. It doesn't have enough information to evolve states of A and B, because the way each evolves depends on the state of the other. To evolve A and B, we need new functions. Ashby gives each a set of functions, parameterized based on the other subsystem's state. Essentially, the part of the state that was lost in the split is just shoved onto the evolution function as a parameter. I suppose I should give A and B's functions different names, so I will call them Gb and Ha respectably.
F(a1, b1) = (a2, b2)
G(b1) A1 = G(b1)(a1) = A2 = (a2)
H(a1) B1 = H(a1)(b1) = B2 = (b2)
...yeah. I hope this is easy to follow.
One thing that stands out to me is the way this framework prioritizes time. What if I wanted to model, say, some water, specifically whether it is a solid, liquid, or gas, as a function of temperature. So, my states are {solid, liquid, gas}, and I would want a function that maps temperatures onto states. This is a model of a system which does not care at all about time, only temperature. Instead of mapping states onto states over time, I'm just mapping temperatures onto states.
I can adapt Ashby's framework to look more similar to this temperature model by coming up with a function which, rather than mapping one state onto the next, maps a time onto a state. Again, call St the state at time t, with an evolution function F, and get:
F 2St = FFSt = FSt+1 = St+2
F δSt = St+δ
F tS0 = St
F δSt = St+δ
F tS0 = St
Given some initial state S0, we can get the state as a function of time simply by evolving it t times. If F is reversible, it can also go backwards (eg t = -1, abuse notation to your heart's content). There may be complication in generalizing to continuous time, as you'll need to know what F 0.5 means, for example.
Anyway, we now have a map which takes you directly from times to states, just like my map from temperatures to liquidity states of water. The main difference is that we also need to know an initial state, S0. It is a "global" look at the system over time, rather than Ashby's "local" look at how each individual state moves to the next state.
I think the global look is a bit more general/elementary, in a certain sense, because the local look postulates that the current state holds enough information on its own to predict the next state. This is the assumption of determinism, it seems, which is why Ashby focuses on "determinate machines". The global framework does not assume this, so it can be used to analyze the way systems respond to things other than time (eg temperature).