Multilayer perceptron

Reading tips: chapter 6

What is the advantage of using several layers?

What is the disadvantage?

How does learning using Backprop work?

Derive the Backprop method mathematically.

Will performance improve if more number of layers are used?

How many hidden nodes are needed?

Is it always better to have as many nodes as possible?

How fast is the learning?

Can a multi layer perceptron learn all mappings?

Do you always get the same weights for the same data set?

Can the internal representation (the weights) be intuitively interpreted?