Transcriptomics: Advanced assignmentPlease note this is an individual assignment.
Classify gene expression dataSolving this part will raise your grade one step.
The objective in this assignment is to identify two classes of samples (possible "good" and "bad" samples, but we call them "1" and "2") using the method in the paper by Slonim et al. Using a set of classified samples, "training data", you will identify what genes that are informative and help distinguish the classes. This knowledge will then be applied to a test set of non-classified samples, the "patients". Who are in class 1?
The first column is an integer with sample id. The first sample is 1 and the last sample is 200.
The second column indicates sample class. For training data, it says "1" or "2" here, and for the test data, which you are two classify, the unknown class is indicated by "*".
Then there are 50 more columns with "expression values" from 50 genes. Some genes are typically upregulated in class 1, and others are more likely to be high for class 2. Some genes are simply "noisy", i.e., they have the same distribution in both class 1 and class2.
Your solutionHand in a prediction of what samples belong to class 1 and class 2 in a file formatted with two columns: first sample id (in the range 101-200) and then you class prediction ("1" or "2").
Also submit your implementation running the experiment!
You can email your results to Lasse.