The model of each neuron in the network includes a nonlinear activation function that is differentiable.
Let T=x(n),t(n)n=1N denote the training sample.
Let yi(n) denote the function signal produced by output neuron j.
The error signal produced at neuron j is defined by
ej(n)=dj(n)−yj(n)=tj(n)−yj(n)
The instantaneous error energy of neuron j is defined by Ej(n)=ej2(n)/2.
Total instantaneous error energy of the whole network E(n)=∑j∈CEj(n) where C includes all neurons in output layer.
With N training samples, the error energy averaged over the training sample or empirical risk is
Eav(N)=N1n=1∑NE(n)=21j∈C∑ej2(n)
y(x,w)=σ(i=1∑Dwjixi+wj0)
wijk : weight for node j in layer k for incoming node i
bik : bias for node i in layer k
aik : product sum plus bias (activation) for node i in layer k
oik : output for node i in layer k
rk : number of nodes in layer k
The output layerδjm=(y^−y)f’(ajm)
The hidden layerδjk=f’(ajk)l=1∑rk+1wjlk+1δlk+1
Let G=(V,E) be a graph, Nu be the neighbourhood of node u∈V, xu be features of node u, and euv be the feature of edge (u,v)∈E.
MPNN layer can be expressed as follows
hu=ϕ(xu,u∈Nu⨁ψ(xu,xv,euv)),
where ϕ and ψ are differentiable functions (e.g. artificial neural networks),
⨁ is permutation invariant aggregation operator that can accept an arbitrary number of inputs (e.g. element-wise sum, mean, max).
ϕ is update function,
and ψ is message function.