Neural Network

Multilayer Perceptrons #

The model of each neuron in the network includes a nonlinear activation function that is differentiable. Let T=x(n),t(n)n=1NT = {x(n), t(n)}_{n=1}^N denote the training sample. Let yi(n)y_i(n) denote the function signal produced by output neuron jj. The error signal produced at neuron jj is defined by

ej(n)=dj(n)yj(n)=tj(n)yj(n) \begin{aligned} e_j(n) &= d_j(n) - y_j(n)\\ &= t_j(n) - y_j(n) \end{aligned}

The instantaneous error energy of neuron jj is defined by Ej(n)=ej2(n)/2E_j(n) = e_j^2(n)/2.

Total instantaneous error energy of the whole network E(n)=jCEj(n)E(n) = \sum_{j \in C} E_j(n) where CC includes all neurons in output layer.

With NN training samples, the error energy averaged over the training sample or empirical risk is

Eav(N)=1Nn=1NE(n)=12jCej2(n) \begin{aligned} E_{av}(N) &= \frac{1}{N} \sum_{n=1}^{N} \mathcal{E}(n)\\ &= \frac{1}{2} \sum_{j \in C} e_j^2(n) \end{aligned}


y(x,w)=σ(i=1Dwjixi+wj0)y(\mathbf{x, w}) = \sigma(\sum_{i=1}^{D}w_{ji}x_i + w_{j0})

wijkw^k_{ij} : weight for node jj in layer kk for incoming node ii

bikb^k_i : bias for node ii in layer kk

aika^k_i : product sum plus bias (activation) for node ii in layer kk

oiko^k_i : output for node ii in layer kk

rkr_k : number of nodes in layer kk

The output layer δjm=(y^y)f(ajm)\delta^m_j = (\hat{y}-y) f’(a^m_j)

The hidden layer δjk=f(ajk)l=1rk+1wjlk+1δlk+1\delta^k_j = f’(a^k_j) \sum_{l=1}^{r^{k+1}} w_{jl}^{k+1} \delta_{l}^{k+1}

Graph Neural Network #

Message Passing Neural Networks (MPNNs)

Let G=(V,E)G=(V, E) be a graph, NuN_u be the neighbourhood of node uVu \in V, xu\mathbf{x_u} be features of node uu, and euv\mathbf{e_{uv}} be the feature of edge (u,v)E(u, v) \in E. MPNN layer can be expressed as follows

hu=ϕ(xu,uNuψ(xu,xv,euv)),\mathbf{h_u} = \phi(\mathbf{x_u}, \bigoplus_{u \in N_u} \psi(\mathbf{x_u}, \mathbf{x_v}, \mathbf{e_{uv}})),

where ϕ\phi and ψ\psi are differentiable functions (e.g. artificial neural networks), \bigoplus is permutation invariant aggregation operator that can accept an arbitrary number of inputs (e.g. element-wise sum, mean, max). ϕ\phi is update function, and ψ\psi is message function.

x0

f

x1