bild
Skolan för
datavetenskap
och kommunikation
KTH / CSC / Kurser / DD2399 / omsys11 / Labs / Transcriptomics: Advanced

Transcriptomics: Advanced assignment

Please note this is an individual assignment.

Classify gene expression data

Solving this part will raise your grade one step.

The objective in this assignment is to identify two classes of samples (possible "good" and "bad" samples, but we call them "1" and "2") using the method in the paper by Slonim et al. Using a set of classified samples, "training data", you will identify what genes that are informative and help distinguish the classes. This knowledge will then be applied to a test set of non-classified samples, the "patients". Who are in class 1?

Data

The file train+test contains samples from both training data and test data, in the form of comma-separated values, from a two-class data set.

The first column is an integer with sample id. The first sample is 1 and the last sample is 200.

The second column indicates sample class. For training data, it says "1" or "2" here, and for the test data, which you are two classify, the unknown class is indicated by "*".

Then there are 50 more columns with "expression values" from 50 genes. Some genes are typically upregulated in class 1, and others are more likely to be high for class 2. Some genes are simply "noisy", i.e., they have the same distribution in both class 1 and class2.

Your solution

Hand in a prediction of what samples belong to class 1 and class 2 in a file formatted with two columns: first sample id (in the range 101-200) and then you class prediction ("1" or "2").

Also submit your implementation running the experiment!

You can email your results to Lasse.

Copyright © Sidansvarig: Lars Arvestad); setEmail(arve@csc.kth.se <www-kurs@csc.kth.se>
Uppdaterad 2011-03-02