# COMP305

# COMP305 W1-6

## introduction

- Lawfulness, fairness, transparency 透明度
- Integrity & Confidentiality
- Accuracy
- Purpose limitation
- Data minimization
- Storage time limitation
- Requirement

## ANN

propagation 传播

spikes 峰值

threshold 阈值

intensity 强度

duration 持续时间

……

Discrete Time + Binary Input

密度够高：temporal summation > excitation threshold

输入够多：spatial summation > excitation threshold

refractory period：不应期 ---> Discrete Time

in/output unified

## McCulloch-Pitts Neuron

excitatory connection：+1 兴奋/激活连接

inhibitory connection： -1 抑制连接

\[
X^t=1 \text { if and only if } S^{t-1}=\sum w_i a_i^{t-1} \geq \theta
\text {, and } w_i>0, \forall a_i^{t-1}>0
\] 说白了就是 **no inhibitory connection** is
activited and sum is larger than threshold.

What can we do with a MP neuron?

**Simple** logical functions: AND, OR and NOT

必须是 **Linear separable** 的

完备性？：

形式化，等价图灵机

An MP neural **network** can implement **any
multivariable propositional logic function**, with the thresholds
and weights being appropriately selected.

Furthermore, the **discrete time**, or unity
**delay** property of the model makes it even
**possible to build a sequential digital circuitry**.

## Hebb’s Rule

The simplest formulation of Hebb’s rule is to increase weight of connection at every next instant in the way

\[ w_i^{t+1}=w_i^t+\Delta w_i^t, \] Where \[ \Delta w_i^t=C a_i^t X^{t+1} \] | | | | -------------- | ----------------------------------------------------------- | | \(w_i^{t}\) | the weight of connection at instant \(t\) | | \(w_i^{t+1}\) | the weight of connection at the next instant \(t+1\) | | \(\Delta w_i^t\) | the increment by which the weight of connection is enlarged | | \(C\) | positive coefficient which determines learning rate | | \(a_i^t\) | input value from the presynaptic neuron at instant | | \(X^{t+1}\) | output of the postsynaptic neuron at the instant \(t+1\) |

Algorithm of Hebb’s Rule for a **Single** Neuron (NOT
MP)

- Set the neuron threshold value \(\theta\) and the learning rate \(C\).
- Set random initial values for the weights of connections \(w_i^t\).
- Give instant input values \(a_i^t\) by the input units.
- Compute the instant state of the neuron \(S^t=\sum_i w_i^t a_i^t\)
- Compute the instant output of the neuron \(X^{t+1}\) \[ X^{t+1}=g\left(S^t\right)=H\left(S^t-\theta\right)= \begin{cases}1, & S^t \geq \theta \\ 0, & S^t<\theta\end{cases} \]
- Compute the instant corrections to the weights of connections \(\Delta w_i^t=\) \(C a_i^t X^{t+1}\)
- Update the weights of connections \(w_i^{t+1}=w_i^t+\Delta w_i^t\)
- Go to the step 3.

## Normalized Hebb’s Rule (Oja’s Rule)

Formulation of **Normalized** Hebb’s Rule (**Oja’s
rule**)

- Set the neuron threshold value \(\theta\) and the learning rate \(C\).
- Set random initial values for the weights of connections \(w_i^t\).
**Normalization.**- Give instant input values \(a_i^t\) by the input units.
- Compute the instant state of the neuron \(S^t=\sum_i w_i^t a_i^t\)
- Compute the instant output of the neuron \(X^{t+1}\)

\[ X^{t+1}=g\left(S^t\right)=H\left(S^t-\theta\right)= \begin{cases}1, & S^t \geq \theta \\ 0, & S^t<\theta\end{cases} \]

- Compute the instant corrections to the weights of connections \(\Delta w_i^t=\) \(C a_i^t X^{t+1}\)
- Update the weights of connections \(w_i^{t+1}=w_i^t+\Delta w_i^t\)
- Go to the step 3.

## Kohonen’s Rule (Competitive Learning)

Hebb’s Rule+ **winner takes it all**

**Unsupervised Learning** Unsupervised learning is a
type of machine learning in which the algorithm is not provided with any
pre-assigned labels or scores for the training data. As a result,
unsupervised learning algorithms must first self-discover any naturally
occurring patterns in that training data set.

A common example is clustering (聚类), that is, an unsupervised network can group similar sets of input patterns into clusters predicated upon a predetermined set of criteria relating the components of the data.

Clustering can be achieved when we extend the single neuron to the network with multiple outputs.

## Perceptron

A Perceptron is a neural network that changes with “experience” using
an **error-correction rule**

• Training set: a set of input patterns repeatedly presented to the network during training; • Target output (label): the predefined correct output of an input pattern in the training set.

According to the rule, weight of a response unit changes when it makes error response to the input presented to the network.

The network outputs 𝑿𝒋 are then compared to the desired outputs specified in the training set and obtain the error.

**SUM OF INPUT * WEIGHT** \[
S_j=\sum_{i=0}^n w_{j i} a_i
\] \[
X_j=f(S_j)
\] \[
e_j=(t_j-X_j)
\]

\[ \begin{gathered} \Delta w_{j i}^k=C e_j^k a_i^k\\ \\ w_{j i}^k=w_{j i}^k+\Delta w_{j i}^k \end{gathered} \]

对于多输出, The network performance during training session is
measured by a **root-mean-square (RMS)** error value :
\[
\mathrm{RMS}=\sqrt{\frac{\sum_{k=1}^r
\sum_{j=1}^m\left(e_j^k\right)^2}{r m}}
\]

对于\(w^k_{ji}\) k是所在元层数，j是指向元编号，i是本元编号。

The process of **weights adjustment** is called
perceptron “learning” or “training”.

**fully interconnected**：every input neuron is
connected to every output neuron.

**Single** Perceptron's output is
**binary**

L15 2p 过程

### L16

Convergence of Perceptron Learning Algorithm

Hidden Neurons

• The input is processed and propagated from one layer to the next, until the final result is computed. • This process represents the forward propagation.

### L18

The **output error** of a multilayer perceptron for the
**𝑘-th input pattern** is a half of the squared error:
\[
E^k=\frac{1}{2} \sum_{j=1}^m\left(e_j^k\right)^2=\frac{1}{2}
\sum_{j=1}^m\left(t_j^k-X_j^k\right)^2
\] error backpropagation 要用梯度下降法Gradient descent

对于梯度下降法，我们需要使用generic sigmoidal activation function以作激活函数 activation function： \[ f(S)=\frac{\alpha}{1+e^{-\beta S+\gamma}}+\lambda \] 那么，我们就有导数： \[ f^{\prime}(S)=\frac{d f}{d S}=\frac{\beta}{\alpha} \cdot(f(S)+\lambda)(\alpha+\lambda-f(S)) \]

然后change weight \[ w_{j i}^k=w_{j i}^k+\Delta w_{j i}^k\\ \Delta w_{j i}^k =-C \frac{\partial E}{\partial w_{j i}^k} \]

## Genetic Algorithms

### L21

生物

**crossing-over**

**Mutation** is a random change of a “letter”, single
nucleotide in a chromosome.

Different possible settings for a trait, e.g. blue, brown, black eye
colour, are called **alleles**.

The **complete collection** of all genetic material,
that is **all chromosomes taken together**, is called the
organism’s **genome** or **genotype**.

### L22

自然演化：

**Parallelism**：Every individual in the population is
**tested independently in parallel with the others**, and
that speeds up evolution of the population.

**Adaptation**：Due to the natural selection, only those
individuals best **fitted for the current environment** do
survive. The selection results in the population as a whole best adapted
to the environment.

**Optimization**：Due to the natural selection,
**only individuals best fitted or “optimal” for the current
environment** do survive and reproduce. Thus, the selection also
performs optimization on an individual level.

### L23

#### GAs by John Holland.

Holland introduced • a “**population**” of
**binary strings ** (a string of characters) which he
called “**chromosomes**”. • The “population” evolves using
kind of “natural selection” together with the genetics-inspired
operators of **crossover**, **mutation**, and
**inversion**. • **Bits** in a “chromosome”
represent **genes**, and each “gene” is an instance of a
particular “**allele**”, 0 or 1.

• The **selection operator** chooses those chromosomes
in the population that will be allowed to reproduce, and on average the
fitter chromosomes produce more offspring than the less fit ones. •
**Crossover** exchange **subparts** of two
chromosomes • **Mutation** randomly changes the
**allele values** at some locations in the chromosome. •
**Inversion** reverses the order of **a continuous
section** of the chromosome rearranging the genes order.

• ***Translocation** is a mutation when part of a
chromosome is cut out and moved to a different location in the
chromosome.

**Population** of chromosomes is the **evolving
unit** 进化的单位

*Fitness Function*

To evaluate how good the candidate solution solves the problem

• Takes a chromosome as an input; • Produces its quantitative fitness evaluation as an output.

Two important requirements: • Should **correlate closely with
the designer's goal** • Should be **computationally
efficient**

** Search Space**：Set of all possible solutions
to a problem in consideration is called a Genetic Algorithm search
space.

** Fitness Landscape**：representation of all
possible solutions along with their fitness.

### L24

#### Basic Structure of a Genetic Algorithm

- Randomly generate initial population of n bit strings (“chromosomes”).
- Evaluate the fitness of each string in the population.
- Repeat the following steps until next generation of n individual
produced.
- Select pair of parent chromosomes from current population according to their fitness, i.e. chromosomes with higher fitness are selected more often.
- Apply crossover (with probability).
- Apply mutation (with probability of occurrence) .

- Apply generational replacement.
- Go to 2 or terminate if termination condition is met.

### L25

#### Schema

A schema is a **similarity template** describing a
**subset** of strings with **similarities** at
**certain** string **positions**.

- Other name for a schema is a
**building block**of a chromosome.

The **Order** of a **schema 𝐻** is
represented as **𝑂(𝐻)** , denoting the **number of
defining symbols** (non-* symbols) it contains.

The **defining length** of a **schema 𝐻**
is represented as **𝛿(𝐻)** , denoting the **maximum
distance between two defining symbols**.

#### Schema Theorem

**Highly fit**, **short defining length**,
**low order** **schemas** increase
**exponentially** in **frequency** in
**successive** generations.

### L26

#### Schema Theorem

Consider a specific schema 𝐻. If we use a classical GA with Roulette
Wheel Selection, crossover (pc) and mutation (pm). We have: \[
m(H, t+1) \geq m(H, t) \cdot \frac{f(H, t)}{\bar{f}}\left(1-p_c
\frac{\delta(H)}{l-1}-O(H) \frac{p_m}{l}\right)
\] • **𝑚(𝐻,𝑡)** represents the number of chromosomes
matching schema **𝐻** at generation **𝑡**, •
**𝑓(𝐻,𝑡)** represents the mean fitness of chromosomes
matching schema **𝐻** at generation **𝑡**, •
\(\bar{f}\) represents the mean fitness
of individuals in the population at generation **𝑡**, •
\(p_c\) \(p_m\) represent the probability of
crossover and mutation, respectively, • 𝑙 is the length of each
chromosome.