Search Engines and Information Retrieval Systems, ir13
A course in Computer Science focusing on basic theory, models, and methods for information retrieval.
If you took DD2476 ir12, and have lab assignments and/or the project left, you are very welcome to finish these during the spring of 2013. Please contact Hedvig Kjellström (see People in the meny) to let us know that you are following the course.
May 30: Thanks again for your effort with the projects - it was loads of fun to see your poster presentations!
May 2: Tomorrow, May 3 at 10.15 in E3, there is a guest lecture by Simon Stenström, Findwise. It is in the schedule on the homepage, but maybe not in TimeEdit? You are all very welcome!
May 2: The detailed schedule for the poster presentation is now on the homepage, please have a look at Project in the menu to the left. To find your group number, look in the file /info/DD2476/ir13/project/project_groups.txt on the CSC UNIX system.
April 25: It is not fun to have to be this explicit, but it is to be fair to the students that make an effort and do their part of the work. To ensure that all students participate actively in the projects, we have decided the following:
March 22: One more project has been added to the list, #6 from Spotify. Please adjust your project ranking lists if you want to include this project, and send the new ranking list to me before midnight today Friday March 22. (They can only host one group, so we might have to turn down many requests. If you select this project, please provide an account of the Python experience within the group - this will be one evaluation criteria.)
March 19: We have now decided upon the deadlines for assignment grades in April and May. There will be examination sessions similar to the one this afternoon. You are welcome with your assignment presentation then - or this afternoon!
March 18: The projects are now posted on the homepage, see Project in the menu to the left. The contact person of each group should send Hedvig an email with a ranked list of three project choices, before Friday March 22, at 12 noon. This list will then be compiled, and the project of each group announced in /info/DD2476/ir13/project/project_groups.txt on the CSC UNIX system.
March 8: The project groups have now been formed. (The grouping is done so that all group members have approximately the same grade on Assignment 1 and 2 - heightening the possibility that you are on the same ambition level in this course. We have also tried to create as good a mix of swedish/international and male/female students as possible.) You can find a list of all groups in the file /info/DD2476/ir13/project/project_groups.txt on the CSC UNIX system. Please contact the other members of your group asap, and decide upon one person who will be a contact. This person should then send Hedvig an email. The project proposals will come out next week, and then it is good that you have talked amongst yourselves.
March 6: For Assignment 3, it is even more important than for Assignment 2 to understand the vector representation of documents and queries. Key concepts for Task 3.1 are centroid and vector addition. Read the book! (See the schedule for chapters corresponding to Lectures 4 and 6.) The book is freely available online, so money is no excuse not to read it...
February 25: The description of length normalization in Task 3.1 of Assignment 3 has been updated. Now it should be a bit clearer. You do NOT have to get the exact same ranking as we did! It depends on the tf/tf-idf representation, and of the document/query length representation.
February 22: Assignment 3 is now published, see Computer Assignments in the menu to the left.
February 20: Assignment 3 is expected on Friday, after Hedvig's grant application deadline. It will comprise topics from Lectures 6 and 7, and build on the code that you developed in Assignments 1 and 2. Therefore, a good preparation for Assignment 3 is to tend to all comments about the Assignment 2 code that you got at the presentation yesterday!
February 17: Clarification concerning Assignment 2:
February 10: Clarification concerning Assignment 2:
February 7: Error in Assignment 2 now corrected. The power iteration page rank top 50 list was wrong before, due to a bug. This has been changed in the assignment description, Task 2.3.
February 6: If you have not yet passed the oral examination of Assignment 1, please send an email to Hedvig and Johan to book a time for presentation.
February 4: Error in Assignment 2 now corrected. The union query "november eller december" should return 474 documents, not 472. This has been changed in the assignment description, Tasks 2.2 and 2.5.
January 25: Three clarifications concerning Assignment 1:
January 22: Assignment 1 is updated with changes in the grading criteria (a slight slack), as well as clarifications in Tasks 1.4 and 1.5. This has also been mentioned at today's lecture.
January 15: Please register yourself in the CSC result registration system Rapp. Do this as soon as possible so that we know how many are actively following the course.
January 3: TimeEdit is now updated with the correct rooms.
December 20: Due to the unexpectedly large number of participants, we have made some changes in the schedule. Lecture 1 has been moved two hours, to January 15, 15-17, and all lectures are held in bigger rooms. The changes can be seen in the Schedule in the menu to the left. Note that TimeEdit has not been updated yet - the schedule at these pages is the correct one!
December 20: Computer assignments 1 and 2 are now published, see Computer Assignments in the menu to the left. All information about deadlines, grade requirements, etc., can be found in the assignments, and on this homepage (bottom of main page, Schedule, and Computer Assignments). Computer assignment 3 is under development and will be published in January.
November 22: The homepages are now up and running. There are two changes in this years course: The written exam has been replaced by a more in-depth lab course, and the labs are examined individually instead of in pairs.
After completing the course you will be able to:
Basic and advanced techniques for information systems: information extraction; efficient text indexing; indexing of non-text data; Boolean and vector space retrieval models; evaluation and interface issues; structure of Web search engines.
Other ResourcesTo get an idea of state-of-the-art in Information Retrieval research and development, take a look at the program of the annual conference ACM SIGIR.
AssignmentsThe examination in the course is performed through:
GradingCourse grades are assigned according to the following (CA = computer assignment grade, PA = project assignment grade):
The course grade is the weighted average of CA and PA, according to the following: