bild
Skolan för
datavetenskap
och kommunikation
KTH / CSC / Kurser / DD2475 / ir10

Information Retrieval, ir10

A course in Computer Science focusing on basic theory, models, and methods for information retrieval.

News

May 31: The course analysis can now be found under Course Analysis in the menu to the right.

March 16: Your comments and ideas are very valuable in the process of improve and develop the course. Please take a few minutes and fill out the course survey:

March 11: Thank you for a successful poster session with interesting discussions! The grades on the project will be communicated to you via email in the coming week. When you are ready to present the remaining assignments, contact either Hedvig or Johan via email.

March 3: New information about the poster presentation is now added to the Project page. The time of the poster session is changed to Friday, March 11, 10.00-12-00 in room 304, Teknikringen 14, level 3.

February 21: There is only one talk today at 13.15, by Jussi Karlgren. The second talk today, by Hercules Dalianis, is cancelled due to illness.

February 4: You can now reserve a time for examination of Assignment 3 (and, if you would like, for Assignments 1 and 2). Please book a 20 min time slot for examination, using this Doodle web form. Book one time slot for each assignment you would like to show.

February 2: An article has been appended to the reading list of Lecture 11. Details about the articles can be found under Literature below.

January 10: We urge all students to complete Assignments 1 and 2 as soon as possible. Please book a 20 min time slot for examination, using this Doodle web form. If you are unsure which assignments you have left, look in rapp.nada.kth.se/rapp. The reason why we want all students to be on time with the assignments, is that the examination of the project following the assignments will involve a great deal of student interaction. March 11 is therefore a very hard deadline.

January 4: The projects have now been posted on the homepage. Please have a look at the instructions, and notify Hedvig via email about I) who are in your project group, II) what project you would like to work with. Several groups can work on the same project. Please plan your time so that the projects are ready (well) before March 11. Late projects have to wait to the next poster presentation (in the worst case, May 2012).

January 3: The date for the final project presentations has changed. The presentations will now take place on Friday, March 11, 8.00-12-00 in room 304.

January 3: Assignment 3 is now posted on the homepage, and will be examined February 8.

December 16: The exam results are now published in rapp.nada.kth.se/rapp and will soon be available in LADOK.

December 14: The exam, together with answer suggestions, is now available on the homepage, under Written Exam in the menu. Correction is under way, results to be expected by the end of this week.

December 9: We urge all students to complete Assignments 1 and 2 before period 3 starts. Please send an email to Hedvig to set up a time for examination of the assignments. If you already showed assignments to Johan and were asked to make completions, please contact him instead. If you are unsure which assignments you have left, look in rapp.nada.kth.se/rapp. The reason why we want all students to be on time with the assignments, is that the examination of the project following the assignments will involve a great deal of student interaction. March 11 is therefore a very hard deadline.

December 7: Assignment 2 will be examined December 9. Please book a 20 min time slot for examination, using this Doodle web form. Read the instructions before booking a time!

December 7: The exam will only cover Manning Chapters 1-9, see Written Exam in the menu.

November 30: Sub-assignment 2.2 is now posted on the homepage.

November 24: Sub-assignment 2.1 is now posted on the homepage. Sub-assignment 2.2, the last part of Assignment 2, will be posted adfter the weekend. Assignment 2 will be examined December 9.

November 16: Assignment 1 can be examined, either November 18 or November 25. Please book a 20 min time slot for examination, using this Doodle web form. Read the instructions before booking a time!

November 15: Please re-read the instructions for sub-assignment 1.3, as there was an important piece of information missing before (now indicated in blue).

November 11: Sub-assignment 1.3 will be posted on the homepage later today. We are sorry for the late announcement. Please contact Hedvig Kjellstr÷m or Johan Boye if you have any questions about the sub-assignment. Assignment 1 will be examined next Thursday, November 18, 8.00-10.00 in computer hall Grň.

November 10: On Tuesday there is a talk of interest to you: Bending the curse of dimensionality. Bj÷rn Jˇnsson, ReykjavÝk University (www.ru.is/faculty/bjorn). Room Q26, November 16, 16.00.

October 28: You can now start with Assignment 1. Please go to Computer Assignments in the menu.

October 20: CSC has an administration system, Rapp, linked with LADOK. There, you can keep track of individual examination results before they are reported into LADOK. Please go to rapp.nada.kth.se/rapp and register as soon as possible.

Learning Outcomes

After completing the course you will be able to:
  • Explain the concepts of indexing, vocabulary, normalization and dictionary in Information Retrieval
  • Define a boolean model and a vector space model, and explain the differences between them
  • Explain the differences between classification and clustering
  • Discuss the differences between different classification and clustering methods
  • Choose a suitable classification or clustering method depending on the problem constraints at hand
  • Implement classification in a boolean model and a vector space model
  • Implement a basic clustering method
  • Give account of a basic spectral method
  • Evaluate information retrieval algorithms, and give an account of the difficulties of evaluation
  • Explain the basics of XML and Web search

Content

Basic and advanced techniques for information systems: information extraction; efficient text indexing; indexing of non-text data; Boolean and vector space retrieval models; evaluation and interface issues; XML, structure of Web search engines; clustering, classification; spectral methods, random indexing; data mining.

Literature

Required Text Book

  • C. D. Manning, P. Raghavan and H. SchŘtze, Introduction to Information Retrieval, Cambridge University Press, 2008.
The book can be ordered from your favorite internet bookstore, and found using ISBN 0521865719. Virtually all material from the book is also available online at nlp.stanford.edu/IR-book/information-retrieval-book.html.

Required Article

  • U. von Luxburg, A tutorial on spectral clustering, Statistics and Computing 17(4), 2007.
The article can be found online at arxiv.org/pdf/0711.0189.

Optional Books

  • I. Witten, A. Moffat and T. Bell, Managing Gigabytes, Morgan Kaufmann, 1999.
Useful as a reference for technical Information Retrieval in the first half of the course. Available online at books.google.com.
  • S. Marsland, Machine Learning: An Algorithmic Perspective, Taylor and Francis, 2009.

   

Useful as a reference for topics related to Machine Learning, classification, clustering and probability. Available online at books.google.com.
  • S. Chakrabarti, Mining the Web, Morgan Kaufmann, 2003.
Covers many topics in the last part of the course. Available online at books.google.com.

Other Resources

To get an idea of state-of-the-art in Information Retrieval research and development, take a look at the program of the annual conference ACM SIGIR.

Examination

Assignments

The examination in the course is performed through:
  • Three computer assignments (3 credits). The computer assignments are performed in groups of two students, and presented orally by the computer. Grade (normally the same for all group members): P(pass) / F(fail).
  • A written exam (3 credits). The exam is 5 hours long and takes place after the first half of the course, in December. Grade: A - F(fail).
  • A project assignment (3 credits). The projects are performed in groups of two students, and presented with a short written report, as well as an oral poster presentation. Grade (normally the same for all group members): A - F(fail).
Details about the assignments themselves can be found under Written Exam, Computer Assignments and Project in the menu.

Grading

Course grades are assigned according to the following (CA = computer assignment grade, WE = written exam grade, PA = project assignmnent grade):
If CA = F, WE = F or PA = F, that part of the course has to be re-examined, until CA = P, WE >= E and PA >= E. The course grade is the average of WE and PA, according to the following:

WE
A
B
C
D
E
PA
A
A
A
B
B
C
B
A
B
B
C
C
C
B
B
C
C
D
D
B
C
C
D
D
E
C
C
D
D
E

Copyright © Sidansvarig: Hedvig Kjellstr÷m <hedvig@nada.kth.se>
Uppdaterad 2011-05-31