 Schedule
|
Schedule: Search Engines and Information Retrieval Systems, ir12
Period 3
| Where and When | Activity | Reading | Examination |
January 17 13.15-15.00 B22 |
Lecture 1: Introduction, boolean retrieval, course practicalities Hedvig Kjellström, Johan Boye | Manning Chapter 1, 2 | |
January 24 13.15-15.00 L44 |
Lecture 2: Term vocabulary, dictionaries and tolerant retrieval Johan Boye |
Manning Chapter 2, 3 | |
January 31 13.15-15.00 B22 |
Lecture 3: Scoring, weigthing, vector space model Johan Boye |
Manning Chapter 6, 7 | |
January 31 15.00-17.00 Sporthallen |
Computer hall session Hedvig Kjellström, Johan Boye |
| Oral examination of Assignment 1 in front of computer |
February 7 10.15-12.00 E34 |
Lecture 4: Retrieval of documents with hyperlinks Johan Boye, Hedvig Kjellström |
Manning Chapter 21 Avrachenkov Sections 1-2 | |
February 14 10.15-12.00 D33 |
Lecture 5: Index construction, index compression Johan Boye |
Manning Chapter 4, 5 | |
February 21 10.15-12.00 Q22 |
Lecture 6: Evaluation, relevance feedback, query expansion Hedvig Kjellström |
Manning Chapter 8, 9 | |
February 24 13.00-15.00 Spelhallen |
Computer hall session Hedvig Kjellström, Johan Boye |
| Oral examination of Assignment 2 in front of computer |
February 28 10.15-12.00 D41 |
Lecture 7: Probabilistic information retrieval, language models Hedvig Kjellström |
Manning Chapter 11, 12 | |
March 15 14.00-19.00 V34 |
Written exam |
| Written exam |
Period 4
| Where and When | Activity | Reading | Examination |
March 21 13.15-15.00 L31 |
Lecture 8: Classification Hedvig Kjellström |
Manning Chapter 13.1-4, 14.1-3, 15.3 | |
March 28 13.15-15.00 L31 |
Lecture 9: Clustering, web crawling Hedvig Kjellström |
Manning Chapter 16.1-4, 17.1-3, 19, 20 | |
April 11 15.15-17.00 L44 |
Lecture 10: Some useful additions to a search engine, Random Indexing Viggo Kann |
Magnus Sahlgren: An Introduction to Random Indexing (2005)
http://www.sics.se/~mange/papers/RI_intro.pdf | |
April 18 13.15-15.00 B24 |
Lecture 11: Guest lectures Simon Stenström, Findwise
Information Retrieval and Findability
The presentation will describe the process of building a search system rather than a search engine. There are many search engines out there, but the search engines are just one small part of creating a Findability solution. With a Findability solution we at Findwise mean both using the full potential of search technology and focusing on the four other critical dimensions of Findability; Business, Users, Information and Organization. This presentation will however focus on the technical parts such as the architecture of a search system. It will explain what we did to handle the real case search scenario at Uppsala University, starting at unstructured files ending up with a usable search system with an understandable user interface. You can see the result at http://search.uu.se/en/.
Oscar Täckström, SICS
Sentiment Analysis
In the last ten years, sentiment analysis has grown from a rather obscure sub-field of natural language processing to a highly productive and diverse field of research with a growing business impact. The basic assumption of all approaches to sentiment analysis is that we can learn something about people's attitudes towards other people, things and ideas, by looking at what they write or say about them. With the explosive growth of online media, such as blogs, micro-blogs and online fora, therefore comes a rich source of data from which we can learn more about people's attitudes and preferences. This knowledge, is fundamental to, for example, brand management and business intelligence, but may also find use in sociology and political science.
In this lecture, I will give an overview of some important tasks and current approaches to sentiment analysis, focusing on the role of linguistic representations and tools from machine learning. Different levels of analysis requires different tools and I will spend some time discussing how to decide on the appropriate level and which tools to use for different levels. Specifically, I will discuss how one can create and use polarity lexicons, the limits of lexicon based approaches and if time permits, I will describe some recent work on using graphical models to model fine-grained sentiment in product reviews. |
April 25 13.15-15.00 B24 |
Lecture 12: Guest lectures Hercules Dalianis, SU DSV
Some applications using clinical corpora to assist the clinician, her managers and clinical research
Today a large number of Electronic Patient Records (EPRs) are produced for legal reasons but they are very seldom reused, neither for clinical research nor for business (hospital) intelligence reasons. Moreover, the clinician’s daily work in documenting the patient status is not always supported in a proper way. Hospital management needs key and real time information of the health care processes. Simultaneously, patients have become more demanding customers that want to be involved in their own health care process. We are aiming to support these demands.
Clinical documentation forms an abundant source to extract valuable information that can be used for this purpose, however clinical corpora contain protected health information and must be kept in a safe way. Today only in Sweden (with a population of 10 million) 4-10 million pages of patient records are produced each year.
We have studied the Stockholm EPR Corpus, a huge clinical document collection written in Swedish, containing over one million patient records. The document collection is distributed over 900 clinics from the Stockholm area encompassing three years 2006-2008. We have used this clinical corpus as a knowledge base to develop a set of tools that can work as basic building blocks for the future tools for health engineering. We have been assisted by physicians that have interpreted the content in the clinical text to us, they have annotated the clinical text and they have also set requirements on these tools together with their colleagues. We have identified four groups of users in the health domain: physicians, clinical researchers, hospital management and patients. We will show examples on these tools and the benefits they will give to health care.
1) For physicians: Automatic ICD-10 assignment 2) For clinical researchers: Comorbidity networks 3) For hospital management: ICD-10 validation and adverse event detection and finally 4) For patients: automatic text summarization.
Project homepage:
http://dsv.su.se/forskning/health/.
Magnus Rosell, Recorded Future
Text Clustering Exploration
Text clustering can be used to explore the contents of a text set. We have developed a visualization method that aids such exploration, and implemented it in a tool, called Infomat. It presents the representation matrix directly in two dimensions. When the order of texts and words are changed, by for instance clustering, distributional patterns that indicate similarities between texts and words appear. We have used Infomat to explore a set of free text answers about occupation from a questionnaire given to over 40 000 Swedish twins. The questionnaire also contained a closed answer regarding smoking. We compared several clusterings of the text answers to the closed answer, regarded as a categorization, by means of clustering evaluation. A recurring text cluster of high quality led us to formulate the hypothesis that "farmers smoke less than the average", which we later could verify by reading previous studies. This hypothesis generation method could be used on any set of texts that is coupled with data that is restricted to a limited number of possible values. |
May 16 10.00-12.00 Fantum, Lindstedtsv 24, floor 5 |
Project presentations |
| Written report hand-in Oral presentation in front of poster |
|