Search Engines and Information Retrieval Systems, ir12
A course in Computer Science focusing on basic theory, models, and methods for information retrieval.
If you took DD2475 ir10, and have lab assignments and/or the project left, you are very welcome to finish these during the spring of 2012. Please contact Hedvig Kjellström (see People in the meny) to let us know that you are following the course.
News
February 14: You can now book a time slot for presenting your solutions to Computer Assignment 2, using this Doodle. The presentation takes place in front of a computer in Spelhallen on February 24. All members of the group have to be present, and be prepared to answer questions on all parts of the assignment.
January 30: We have rearranged the order of lectures 3-5, to give you the theory for Assignment 2 earlier - tomorrow, we will go through ranked retrieval. Sorry about the late notice! (The reason for the old order of lectures was that index compression was a part of assignment 1 last year, and had to be covered as early as possible in the lecture series.)
January 27: There was a bug in the MegaIndex class in computer assignment 1.4, that caused merged indexes not to be saved. This bug is now corrected. Please download the new version!
January 24: You can now book a time slot for presenting your solutions to Computer Assignment 1, using this Doodle. The presentation takes place in front of a computer in Sporthallen on January 31. All members of the group have to be present, and be prepared to answer questions on all parts of the assignment.
October 21: The homepages are now up and running. The course has changed names (and course code from DD2475 to DD2476), but cover essentially the same content, with a slight shift in focus towards web search, with e.g. a deeper coverage of linked document retrieval.
Learning Outcomes
After completing the course you will be able to:
- explain the concepts of indexing, vocabulary, normalization and dictionary in Information Retrieval,
- give an account of different text similarity measures, and select a similarity measure suitable for the problem at hand,
- define a boolean model and a vector space model, and explain the differences between them,
- implement a method for ranked retrieval of a very large number of documents with hyperlinks between them,
- evaluate information retrieval algorithms, and give an account of the difficulties of evaluation,
- give an account of the structure of a Web search engine.
Content
Basic and advanced techniques for information systems: information extraction; efficient text indexing; indexing of non-text data; Boolean and vector space retrieval models; evaluation and interface issues; structure of Web search engines.
Literature
Required Text Book
- C. D. Manning, P. Raghavan and H. Schütze, Introduction to Information Retrieval, Cambridge University Press, 2008.
The book can be ordered from your favorite internet bookstore, and found using ISBN 0521865719. Virtually all material from the book is also available online at nlp.stanford.edu/IR-book/information-retrieval-book.html.
|
|
Article
- K. Avrachenkov, N. Litvak, D. Nemirovsky and N. Osipova, Monte Carlo Methods in PageRank Computation: When One Iteration is Sufficient, SIAM Journal on
Numerical Analysis 45(2), 2007.
Only Sections 1-2 of this article are covered in the course.
Useful as a reference for topics related to Machine Learning, classification, clustering and probability. Available online at
books.google.com.
- S. Chakrabarti, Mining the Web, Morgan Kaufmann, 2003.
Covers many topics in the last part of the course. Available online at
books.google.com.
Other Resources
To get an idea of state-of-the-art in Information Retrieval research and development, take a look at the program of the annual conference ACM SIGIR.
Examination
Assignments
The examination in the course is performed through:
- Two computer assignments (3 credits). The computer assignments are performed in groups of two students, and presented orally by the computer. Grade (normally the same for all group members): P(pass) / F(fail).
- A written exam (3 credits). The exam is 5 hours long and takes place after the first half of the course, in December. The written exam can not be taken before the computer assignments are graded with P(pass). Grade: A - F(fail).
- A project assignment (3 credits). The projects are performed in groups of two students, and presented with a short written report, as well as an oral poster presentation. Grade (normally the same for all group members): A - F(fail).
Details about the assignments themselves can be found under Written Exam, Computer Assignments and Project in the menu.
Grading
Course grades are assigned according to the following (CA = computer assignment grade, WE = written exam grade, PA = project assignmnent grade):
If CA = F, WE = F or PA = F, that part of the course has to be re-examined, until CA = P, WE >= E and PA >= E. The course grade is the average of WE and PA, according to the following:
| WE | A | B | C | D | E |
| PA | | | | | | |
| A | | A | A | B | B | C |
| B | | A | B | B | C | C |
| C | | B | B | C | C | D |
| D | | B | C | C | D | D |
| E | | C | C | D | D | E |
The grading criteria for the different learning outcomes are:
| Outcome |
E |
D |
C |
B |
A |
|
| Explain the concepts of indexing, vocabulary, normalization and dictionary in Information Retrieval |
X |
X |
X |
X |
X |
| Examined through laboration 1 and exam |
|
| Give an account of different text similarity measures, and select a similarity measure suitable for the problem at hand |
X |
X |
X |
X |
X |
| Examined through exam and project |
|
| Define a boolean model and a vector space model, and explain the differences between them |
X |
X |
X |
X |
X |
| Examined through laboration 1, 2 and exam |
|
| Implement a method for ranked retrieval of a very large number of documents with hyperlinks between them |
X |
X |
X |
X |
X |
| Examined through laboration 2 and exam |
|
| Evaluate information retrieval algorithms, and give an account of the difficulties of evaluation |
X |
X |
X |
X |
X |
| Examined through exam and project |
|
| Give an account of the structure of a Web search engine |
X |
X |
X |
X |
X |
| Examined through exam and project |