Answers to exam 110316 1) 1b 2c 3b 4a 2) 1 correct 2 correct 3 wrong, the temperature controls the level of randomness in the activation function 4 wrong, the steepness must be set in relation to the value of the weights, the learning becomes very slow in the tail-end of the sigmoid 5 wrong, TD-learning is a method for reinforcement learning 3) 12, 9 weights and 3 biases 4) All data points of a class must be on the same side of a hyperplane. The XOR-problem can not be separated and the case of one class above 5 and the other class below 5 can be separated. 5) The outnode is 0.3*1.0 + -1.2*1.0 + 0.6*1.0 + 0.1*1.0 = -0.2 The pattern is thus classified as negative, but should be positive. weight change: deltaw=0.1*x new weights: [0.4, -1.1, 0.7] and new bias 0.2 6) 1 correct 2 wrong, it is suffieient that the step length is positive 3 wrong, it can be constant 4 wrong, connections are never directed in both directions in these nets 5 wrong, the temperature has nothing to do in this rule 6 correct 7) Overfitting means the network has become too specialized to the training data. It means there are too many parameters (weights) in relation to number of data points. By testing the generalizastion (performance on unseen data) the generalization can be estimated. 8) An attractor is a stable activation state (a state where all nodes do not change output). A spurious attractor is an attractor that does not correspond to a leant pattern. 9) one other fix point is the anti-pattern [-1 1 -1 1 -1] [1 -1 1 -1 1]*w take first column -> -4, e.g. -1 second column 4, e.g. 1 third column -4,e.g. -1 fourth column 4, e.g. 1 fifth column -4, e.g. -1 which is the same as the input E=-w*[1 1 1 1 1]*[1 1 1 1 1]' =+4 10) Topology preserving maps means that neighbouring points in the input space tends to lie close in the output space. A 10-dimensional inputs space is mapped to a 2D-plane by 10 nodes coupled to nodes arranged in a 2D-grid (2D neighbourhood grid). The outnode stronges activated is the winner and this and its neighbours are moved closer to the input pattern. 11) 1 Bolzmann: energy (-concensus) Backprop: average error as a function of the weights Hopfield: energy 2 Bolzmann: no, one seeks the optimum/maximum Backprop: no, these give unnecessary bad solutions Hopfield: yes, this is where patterns are stored 3 Bolzmann: slow reduction of the temperature Backprop: some tric like restarting the learning or using momentum Hopfield: 12) These are nodes that never win. One can for instance move them a little bit even if they don't win. 13) The value function represents a map from states to values. (The value is defined as the sum of future rewards.) What goes in are states and what comes out is the value. The point is to find the path that leads to the maxomal summed reward that can be obtained if one starts in a given state and ends in another given state (the goal). This path is the path to select. 14) The weight modifications are proportional to both the node activation and the derivative of the transfer function. With small weights the activity becomes small and with large weights the derivative becomes small. Both make the learning go very slow. 15) Input is the 30 values. These values should first be normalized by subracting the mean and dividing by the standard deviation (done separately for each sensor). All 30 inputs connect to 2 output nodes. The value of these output nodes are produced unsupervised. Training is done by Oja's modified Hebb-rule deltaW=eta*y*(x^T-y^TW). The output will after training describe the plane in the input space where the variance is largest. 16) The task is a classification problem, to classify the input as awake or sleepy. I use Backprop. As input layer you have 7+7+7 = 21 nodes (7 means, 7 differences, 7 differences of differences*). As output you have 1 node where 0 means sleepy and 1 means awake. In the hidden layer you have N nodes (find N by trial and error, look for best generalization). Training data comes from the measured data + your classification. Your classification is used as the supervised signal (correct ouput). *the idea is that sleepy/sleeping persons don't move as much as awake persons, so changes (differences) might separate the two classes