bild

Skolan för
elektroteknik
och datavetenskap

Sidan är under uppdatering.

Essay Project Proposals

Themes: Security, protocols and systems, programming languages, algorithms and complexity

Security

Title: Fuzzing

Theme: Security

Subject: Fuzzing is a technique for software vulnerability testing. The idea is to provide applications with ranges of valid or invalid input and in this way try to cause the application to act in an unexpected or faulty manner. An efficient fuzzer will give good coverage at small cost, i.e. it will generate as many and as diverse faults as possible while using as few and as small tests as possible.

The inputs may be randomly generated, or they may be guided by knowledge of the protocols and applications, and by heuristics, i.e. test generation strategies that are known to often give good results for the type of application under stress. Often, some form of grammar is used to generate the well-formed inputs and otherwise guide the test generation process.

Examples of fuzzers suitable for this project:

Peach, http://peachfuzzer.com/
SPIKE, http://www.blackhat.com/presentations/bh-usa-02/bh-us-02-aitel-spike.ppt
Sulley, see www.fuzzing.org

CAVEAT: *Never* attempt to perform vulnerability or penetration testing without proper authorization!

Possible essay projects:

Document and evaluate Peach. Understand and document the algorithm for test generation.
Develop and evaluate a fuzzer module in one of the major fuzzers below for some simple application protocol
Develop a fuzzing lab for a future 4th year course on software security

References:

Sutton, Greene, Amini: Fuzzing: Brute Force Vulnerability Discovery. Addison-Wesley 2007
Myers, Sandler, Badgett, Thomas: The Art of Software Testing. Wiley 200
www.fuzzing.org
Lectures on fuzzing at Polytechnic Institute of New York: http://cryptocity.squarespace.com/fuzzing/

Title: WebGoat, WebGoat Lesson Development

Theme: Security

Subject: WebGoat is a J2EE web application developed by OWASP, the Open Web Application Security Project, to teach web application security. The intention is to provide a safe but realistic teaching environment for students to experiment with and learn various types of attacks on web applications. Currently, WebGoat has lessons for many common attack types, such as SQL injection attack, cross-site scripting (xss) attacks, and much else.

Possible essay projects:

Evaluate WebGoat as a computer security teaching tool (for students with background in didactics)
Contribute a lesson or a patch to the WebGoat framework

References:

Stuttard, Pinto: The Web Application Hacker's handbook: Discovering and Explointing Security Flaws, Wiley 2007
Hope, Walther: Web Security Testing Cookbook: Systematic Techniques to Find Problems Fast, O'Reilly 2008
WebGoat documentation and web application security material available from www.owasp.org

Supervisor: Mads Dam

Title: Security Monitoring

Theme: Security

Subject: Security monitoring is a technique used to enforce security policies, for instance for access control. A very general and flexible approach to security monitoring uses automata to represent security policies. For instance, a security automaton with two states *unchecked* and *checked* can be used to define the policy that boarding of an aircraft should be allowed only after checkin has been properly completed.

In a recently completed EU project S3MS (www.s3ms.org) we have explored the use of security automata and automata inlining for Java and J2ME and a prototype inliner has been developed. A lot of open ends are left, however, which may be suitable for small student projects.

Possible essay projects:

Study the S3MS inlining tool, test and fix some bugs, propose and implement some minor extension such as a new data type, or adapt to a new application by adding new API calls
Examine the ConSpec policy format and study how suitable it is to define security policies for a not too large Java application of your choice. Does it contribute anything to WebGoat, for instance?
Propose some simple language extensions to make ConSpec more usable in practice. Or, study background information on web application security (see e.g. owasp.org) to perform a preliminary evaluation of how suitable security automata are to protect against a range of common web application attacks

References:

I. Aktug and K. Naliuka, ConSpec: A Formal Language for Policy Specification, Proc. FLACOS '07
http://www.csc.kth.se/~landreas/inlining/
S3MS inlining report: http://www.csc.kth.se/~irem/S3MS/Inliner/downloads/Inliner.pdf
http://www.csc.kth.se/~irem/S3MS/Inliner/

Supervisor: Mads Dam

Title: Secure Multiparty Computation Made Simple

Theme: Security

Subject: Secure multiparty computation (SMC) is the problem of securely distributing a computation among several, mutually distrusting parties, so that the computation nonetheless can be completed with strong guarantees that secrets are not revealed, and that, if one of the parties try to cheat by not completing a computation step properly, that party will be discovered. General results by Ben-Or, Goldreich, Widgerson and others show that any computation can be secretly shared but the contructions are complex and not very efficient. A recent paper by Maurer [Maurer-06] gives a particularly simple construction which is nonetheless powerful and may be suitable for essay work for mathematically inclined students.

Possible essay projects:

Review a selected paper in the SMC area and reprove the results.
Study the Maurer paper and test the proposed approach by implementing the algorithms (to the extent time allows).

References:

Maurer: Secure multi-party computation made simple. Discrete Appl. Math. vol 154, pp 370-381. DOI: http://dx.doi.org/10.1016/j.dam.2005.03.020

Supervisor: Mads Dam

Title: Secure execution of python

Theme: Security

Subject: Executing code in a controlled environment, a sandbox, is a way of handling the problem of executing untrusted code. People download web pages containing Javascript code that is run in the browser, for example. The Java programming language was designed with a specific security solution that has been refined that allows you to place restrictions on what a program may do when you execute it. There are at least three approaches to the problem: restrict the language so that code cannot do dangerous things, restrict the environment in which it executes so that any damage is limited or monitor what it does an allow or deny it access to resources during run-time.

Python is becoming an increasingly popular language. Investigate various solutions to safe execution of python code. Test some approaches and make a recommendation for how python programs might be sandboxed if language support for python were to be added to Kattis.

References:

A starting point. How can I run an untrusted Python script safely (i.e. Sandbox)
Kattis

Supervisor: Mikael Goldmann

Title: Hashing schemes

Theme: Algorithms and data structures

Subject: Implementations of dictionary data structures often rely on binary trees or hash tables. Hashing requires collision handling for which there are several strategies. The most common variants of are hashing with chaining and open addressing. Both come with an average case constant time lookup but no strong guarantees on worst case performance. Newer schemes such as perfect hashing and cuckoo hashing are aimed at having good worst case performance as well.

Study several hashing schemes, including at least one of the two newer ones mentioned above. Devise benchmarks for testing their efficiency under various mixes of dictionary operations apply the benchmarks to dictionary implementations. Find out how dictionaries are supported in some common computer languages, and hence what kind of performance one might expect.

References:

Supervisor: Mikael Goldmann

Title: Proof of work

Theme: Security

Subject: One way to try to avoid DOS attacks, brute force attacks or having someone use a mail server for mass email (SPAM) is to use a proof of work protocol. Essentially it involves the server not processing a client request right away. Instead it sends a puzzle to the client and the client needs to return a solution before the request is processed. The idea is that checking the solution should be much simpler than solving the problem, thereby limiting the rate at which the server would process requests from the client while not overloading the server with the task of checking solutions.

Study, present and implement a proof of work protocol. Explain assumptions and examine performance.

References:

Supervisor: Mikael Goldmann

Title: Secure multipatry computation

Theme: Security

Subject: In the design of secure cryptographic protocols, one important aspect is to define what it would mean for the protocol to be secure — without a definition of security, it is of course impossible to prove that a protocol is secure. The notion of an ideal functionality can be intuitively described by an example. Consider the following sealed bid auction: There is a trusted person to whom all others can send a bid over a secure channel. Once the auction is over the winner is the one who handed in the highest bid, and that person pays the value of the second highest bid. The trusted person announces the winner and the value of the second highest bid and the bidders do not learn anything else about the other bids. This would be our ideal functionality.

Now, let us say that we want to remove the trusted person and instead use cryptography. We then want to prove that the cryptographic protocol behaves the same as the ideal functionality, i.e., that (for all practiacal purposes) the outcome is the same and the information leaked to the participants is the same.

Viff is a python framework for implementing secure multiparty computation. Use it to implement some simple examples. Document your examples and the properties of Viff.

References:

VIFF, the Virtual Ideal Functionality Framework

Supervisor: Mikael Goldmann

Title: Cognitive authentication schemes

Theme: Security

Subject: Cognitive authentication schemes are an alternative to password authentication. Passwords have the advantage of being simple to use as they only require you to remember a secret, but they are vulnerable to eavesdropping attacks such as key loggers that might be installed on a computer that you do not yourself control. As an alternative, there are schemes relying on the user to memorize a few images and the system then prompting the user with a set of images, some of which are among the one the user has memorized. A simplified description would be that the system presents the user with a matrix of images and the user responds by telling the system which rows contains images from the user's set. Repeating this a number of times decreases the likelihood that an attacker would guess the correct rows each time. After, say, 10 successful challenges the user is authenticated.

Study such a scheme and either implement it and do usability tests or study and implement attacks on such a scheme. The report should describe the studied system, your tests and your findings.

References:

Daphna Weinshall: Cognitive Authentication Schemes Safe Against Spyware (Short Paper) (this paper should be available from KTH's network)
Golle and Wagner: Cryptanalysis of a Cognitive Authentication Scheme
Sonia Chiasson Robert Biddle P.C. van Oorschot: A Second Look at the Usability of Click-Based Graphical Passwords

Supervisor: Mikael Goldmann

Protocols and Systems

Title: Google Wave

Theme: Protocols and systems

Subject: Google Wave is both a web application hosted by Google and a protocol for real-time communication and collaboration. Essentially, the wave protocol maintains a distributed tree which may contain text and various media types, and which users grow in collaboration. In real-time operation a wave is a form of chat, except that it offers a very dynamic and flexible form of collaboration. Waves may also be stored and replayed at a later time and be embedded in e.g. a blog.

Possible essay projects:

Document the Google Wave architecture, develop a simple wave gadget using the API's provided, and test the gadget in Googles wave sandbox
Document the Wave Federation architecture, the Wave Federation Protocol and data models.
Operational transforms is used by the Federation Protocol to ensure that users have a consistent view of the wave, despite the fact that updates are performed by different providers at different times. Document the state of the art in operational transforms and evaluate the approach taken by the Google Wave design.

References:

Google Wave introduction wave.google.com
The Wave Federation Protocol www.waveprotocol.org
Nichols, Curtis, Dixon, Lamping: High-latency, low-bandwidth windowing in the Jupiter collaboration system. Proc. Symp. User Interface Software and Technology 1995
Sun, Ellis: Operational transformation in real-time group editors: issues, algorithms, and achievements. Proc. 1998 ACM conference on Computer Supported Collaborative Work.

Supervisor: Mads Dam

Title: Data Serialization, parsing, deparsing - Protobuf, Thrift, Avro

Theme: Protocols and systems

Subject: Many applications need to convert structured data to serialized form, for instance for data communication. Serialization tools such as Protobuf, Thrift, or Hadoop Avro take as input a description of a data structure, or data schema. The tools then produce methods to encode to and decode from binary representation, including things like getters and setters to access individual fields.

Possible essay projects:

Document one of the tools, its data structure definition mechanism, and the encoding/decoding algorithm
Evaluate one of the tools on some example application
Compare two of the tools
Use a fuzzer (see project proposals on fuzzing) to test one of the tools
Contribute a patch or addon to one of the tools
Formalize the syntax of one of the tools data structure/schema definition language. Generate lexer/parser for it using one of the common lexer/parser generators.

References:

Protobuf: http://code.google.com/p/protobuf/
Thrift: http://incubator.apache.org/thrift/
Avro: http://hadoop.apache.org/avro/docs/current/

Supervisor: Mads Dam

Title: Hadoop

Theme: Protocols and systems

Subject: Apache Hadoop is a large and complex framework for building large scale, reliable distributed systems including database support for large tables in the style of Google's bigtable, a distributed file system, systems management tools, and an implementation of Google's MapReduce framework for processing large amounts of data in parallel. Each of the Hadoop subprojects easily offers study material for several essay projects, and can cater for both students that are practically and theoretically inclined.

Project ideas:

Study and document the architecture and design of one of Hadoop's subsystems such as HBase, HDFS, MapReduce, or Pig. The ambitious might want to join one of the subprojects anf contribute a patch or something.

References:

The Hadoop project homepage http://hadoop.apache.org/
White: Hadoop: The definitive guide. O'Reilly 2009
Dean, Ghemawat: MapReduce: Simplified Data Processing on Large Clusters. Proc. OSDI'04.
Chang et al: Bigtable: a distributed storage system for structured data. Proc. OSDI'06
Olston et al: Pig Latin: A Not-So-Foreign Language for Data Processing. Proc. SigMod'08

Supervisor: Mads Dam

Title: Mechanical Turk

Theme: Protocols and systems

Subject: Mechanical Turk (or Mturk) is a work infrastructure, provided by Amazon, that allow people to perform quick tasks over the web for a small payment. The tasks (called "HITs", "Human Intelligence Tasks") are typically things that computers do poorly, like tagging pictures or assessing the effectiveness of commercial slogans. The people that perform the tasks (the "Workers"), sign in and choose a HIT they like and are qualified to do. The business that has created the task (the "Requester") checks the quality of the performed work, and pays the helper if the work is well done.

Possible essay project:

Assess the MTurk infrastructure. Sign up as a worker and perform a number of HITs. What kind of tasks are appropriate as HITs? Suggest improvements and extensions. The Mechanical Turk has been critized for being an unethical "virtual sweatshop". What does this mean exactly? Formulate your own view and argue for it.

References:

https://www.mturk.com/mturk/welcome

Supervisor: Johan Boye

Programming Languages

Title: A Prototype Implementation of Featherweight Java in Prolog

Theme: Programming languages

Subject: Featherweight Java is a minimal version of Java designed for research into Java like languages. The key features of Featherweight Java - syntax, type system, computation rules - are formalized by simple rules that can be implemented in a logic programming language such as Prolog (or Java, or a functional language of your choice. Prolog is convenient for the purpose, though) without too much effort. One or more projects can be defined focusing on one or more of parsing, type checking, and interpreting (computing).

Project ideas:

Document the Featherweight Java language, give examples of how the typing and computation rules work, give a high-level implementation of either or both the typing and computation rules in Prolog, demonstrate the implementation on one or two small example programs.
Extend Featherweight Java with one or two primitive datatypes. Propose rules for typing and computation and give examples.

References:

Igarashi, Pierce, Wadler: Featherweight Java: A minimal core calculus for Java and GJ. ACM Transactions on Programming Languages and Systems 2001.
Clocksin, Mellish: Programming in Prolog. Springer 2003

Supervisor: Mads Dam

Title: Software model checking using BLAST or SPIN

Theme: Programming languages

Subject: Model checking is based on the idea of automatically traversing a program execution graph to determine some desired property such as: Is there a way for program line 47 to be executed? Will execution eventually terminate? Does the equation x = y^2 - tmp always hold? Since these problems are generally undecidable, answers cannot be guaranteed, neither can they be guaranteed to be obtainable in reasonable time. Even so, with good engineering and using suitable abstractions, good results can nonetheless often be obtained. Examples of model checkers are BLAST, for C programs, and SPIN which is mainly applied to communications software or protocol modeling.

Project ideas:

The goal of an essay project on software model checking would be to learn a model checking tool such as BLAST or SPIN and apply it to a selected application. Many examples exist in the literature which can serve a suitable starting points. It would be possible, for instance, to start from a worked example and augment it step by step by adding some additional features. It is important to keep in mind that complexity often increases very quickly, so by adding features simplifications of the existing code/models will very often be needed. So small increments are highly recommended.

References:

SPIN: http://spinroot.com/spin/whatispin.html
BLAST: http://mtc.epfl.ch/software-tools/blast/index-epfl.php

Supervisor: Mads Dam

Title: Call by contract and SPEC#

Theme: Programming languages

Subject: In the call-by-contract paradigm, methods, objects, and classes are equipped with contracts - formal assertions - that describe how they are expected to behave. For instance, methods may be equipped with pre- and postconditions to determine properties that should be established upon method entry, resp. exit. Much work has been been done to realize such contract-oriented programming frameworks over the past 10-15 years. A well-known example for Java is the Java Modeling Language, JML, for which a number of prototype tools have been developed. See the reference below. Another example is SPEC# which extends the C# programming language with contracts. A related project is Code Contracts, also at Microsoft Research, which uses Microsoft's common intermediate language format to support contracts in a more language-independent way.

Project ideas:

Projects can be defined based on one of the JML tools, SPEC#, or Code Contracts. Many other tools with similar functionality exists, including some developed here. A suitable project will be to select one of the above tools and learn it well enough to apply it to a selected application. Given the time available for the work, it is better to start from something modest and add complexity, than the other way around. One possible outcome for the ambitious might be to produce a lab suitable for a future advanced programming languages course.

References:

JML: http://www.eecs.ucf.edu/~leavens/JML/
SPEC#: http://research.microsoft.com/en-us/projects/specsharp/
Code contracts: http://research.microsoft.com/en-us/projects/contracts/default.aspx

Supervisor: Mads Dam

Title: Makumba

Theme: Web development frameworks/Programming languages

Subject: Makumba is a web development framework for programmers who know HTML and some query language like SQL. A novice programmer can start working with Makumba after a training of 1-2 hours. Makumba renders good scalability for teams and projects: a 10-person team (with membership changing each year) developed an intranet with several thousand features over the course of 8 years. Some of the rules of Makumba applications are written in Java, but the bulk of the application is done in MQL (Makumba's query language, a subset of Hibernate HQL) and HTML. Makumba is available in JSP (Java Server Pages) as a "tag library". Since most aspects of development (viewing data, entering and modifying data, authentication, authorization) can be done in Makumba using queries or query fragments, Makumba is said to be query-centric.

Possible essay projects:

Examine the Makumba architecture, program a small application and suggest improvements to the Makumba architecture and language (tag library, data definition language, query language, query fragments, etc).
Program a small application with Makumba and a with a web development framework of your choice (e.g. Ruby on Rails) that you were not familiar with. Compare the two applications in regard to speed of development, code intuitivity, code scalability, team scalability (i.e. how the application would fare if maintained by a larger group whose members change often)
Makumba was designed in the web 1.0 days. Suggest how Makumba can be improved for web 2.0, beyond the <mak:section> tag
Suggest how Makumba could be re-implemented on top of more modern web development platforms such as Java Server Faces, without losing its query-centric and novice-friendliness principles
Consider how a query-centric approach like Makumba's can be used to program a non-web (GUI) application.

References:

Bogdan, C., Mayer, R., Makumba: the Role of Technology or the Sustainability of Amateur Programming Practice and Community, in Proceedings of the Fourth Communities and Technologies Conference, Penn State, June 2009
www.makumba.org
http://rubyonrails.org/
http://java.sun.com/products/jsp/
http://java.sun.com/javaee/javaserverfaces/

Supervisor: Cristian Bogdan

Language Technology

Title: Linguistic steganography

Theme: Language technology

Subject: Steganography is the technique of hiding secret messages. For instance, if one writes a message with lemon juice on a piece of paper, the paper looks blank until it is heated, when the message magically re-appears. Another example is the technique (used during the second world war) of shrinking a text to a tiny dot, and then hiding the dot on top of a full stop in an innocuous letter.

A recent twist of the same idea is linguistic steganography, where secret machine-readable data is encoded as a seemingly innocuous natural-language text. This can be done either by generating the natural-language text from scratch in such a way that it encodes the secret message, or alternatively by modifying an existing text. In the latter case, the encoding is made by making cleverly substituting sentences, words or characters in the original text.

Possible essay project:

Design and describe a method of generating steganograms of secret messages. Preferably, the steganograms should look like realistic texts, e.g. spam mails, or news items one might find in "Metro" or "Aftonbladet".

References:

Towards linguistic steganography: A systematic investigation of approaches, systems and issues. http://richard.bergmair.eu/pub/towlingsteg-rep-inoff-b5.pdf

Supervisor: Johan Boye

Title: Automatic translation

Theme: Language technology

Subject: Online tools for automatic translation, like Google Translate and BabelFish, are getting increasingly better. But how good are they? And what methods do they use?

Possible essay project:

Try out an available translation program, like Google Translate of BabelFish, on a number of Swedish texts of different genres. Compare their proper English translations with the results from the translation programs. (Also try translation from English to Swedish). Analyze the strong and weak points of the programs. Which kinds of texts are most amenable to automatic translation? Is it possible to figure out details of the translation algorithms by analyzing their mistakes?

References:

D. Jurafsky and J. Martin. Speech and Language Processing, chapter 21. Prentice-Hall, 2000.
Statistical machine translation. http://www.statmt.org/
Google Translate http://www.google.se/language_tools?hl=sv
BabelFish http://babelfish.yahoo.com/

Supervisor: Johan Boye

Title: Co-reference resolution experiment

Theme: Language technology

Subject: Words or expressions in a text that refer to the same object are said to "co-refer". Finding co-referring expressions is an important subproblem in some natural-language applications, such as information retrieval. As an example, consider the following text snippet:

"När Babbage övergav sin första maskin tappade staten förtroendet för honom och beslöt att minska förlusterna genom att dra sig ur projektet. De hade redan satsat 17 470 pund, nog med pengar för att bygga ett par slagskepp." (from Simon Singh, "Kodboken", page 86, Norsteds förlag, 2000).

In this passage "Babbage" and "honom" are co-referring expressions, as are "staten" and "De".

Possible essay project:

Design and perform a co-reference experiment where you let humans find co-referring expressions in a number of sentences or paragraphs. Compare the results with the results you would obtain from one or two of the algorithms for automatic co-reference resolution described in the literature. (Note: You don't have implement the algorithm(s), but you can execute them "by hand"). Use the comparison to point out strong and weak points in the algorithm(s), and if possible suggest improvements.

References:

D. Jurafsky and J. Martin. Speech and Language Processing, chapter 18.1. Prentice-Hall, 2000.
C. Kennedy and B. Boguraev. Anaphora for everyone: Pronominal anaphora resolution without a parser. http://www.research.ibm.com/talent/documents/bran_anaphora_coling96.pdf

Supervisor: Johan Boye

Title: Detecting plagiarism

Theme: Language technology

Subject: Plagiarism is the fraudulent act of copying parts of someone else's text and presenting it as one's own work. Plagiarism in schools and universities is a bigger problem than one might think (e.g. see the statistics presented at http://www.plagiarism.org/). Consequently, there are several tools on the market that attempts to detect plagiarism. The question is how good these tools are.

Possible essay projects:

Experiment with a freely available program for plagiarism detection, such as the free version of PlagiarismDetect at http://www.plagiarismdetect.com. (Several commercial programs also have a 15-day free trial period.) Try the program on examples. Do the programs give you more than what you would get from simply typing sentences from the text into Google? Can you foil the programs by making some syntactical changes in the text? Is it perhaps even possible to design an algorithm that automatically transforms a text as to avoid being caught by plagiarism detectors? Use your results to suggest improvements to the plagiarism detection methods.
Design your own plagiarism detection method, implement it and conduct the experiments described in the previous bullet point.

References:

http://www.plagiarism.org
http://www.plagiarismdetect.com

Supervisor: Johan Boye

Title: Automatic classification of opinions

Theme: Language technology

Subject: Many companies worry about their public image. Therefore it is of great interest to be able to automatically sift through blogs and articles and count the number of occurrences where the a certain product is mentioned in a positive way, and compare it to the number of times it is mentioned in a negative way. It order to do this rapidly and to a low cost, it would be preferable to use a simple classification method, provided it gives sufficiently accurate results. One possibility is to use a "bag-of-words" approach, i.e. only take into account which words occur in the text, not their grammatical relations.

Possible essay project:

Use Weka or some other open-source machine learning toolkit to build a bag-of-words classifier for movie reviews, wine reviews or something similar. Each review in the training set could be classified with "positive", "negative", or "neutral". Try out various machine learning methods and see which one works best.

References:

I. Witten and E. Frank. Data Mining, 2nd edition. Elsevier, 2005.
Weka toolkit at http://www.cs.waikato.ac.nz/ml/weka/

Supervisor: Johan Boye

Title: Wizard-of-Oz studies

Theme: Language technology

Subject: A Wizard-of-Oz study is an experiment where people are asked to communicate with a system which they are led to believe are fully automatic, but is actually operated by a human being. In particular, this methodology has been extensively used when developing natural-language interfaces.

Possible essay project:

Describe the Wizard-of-Oz methodology, its possible areas of application, its strengths and weaknesses, ethical considerations, examples, etc.

References:

J. Kelly. An empirical methodology for writing user-friendly natural language computer applications. Proceedings of ACM SIG-CHI '83 Human Factors in Computing systems (Boston, 12-15 December 1983), New York: ACM, pp. 193-196. http://delivery.acm.org/10.1145/810000/801609/p193-kelley.pdf?key1=801609&key2=8487553621&coll=portal&dl=ACM&CFID=4504974&CFTOKEN=88992333 . The first description of the methodology.
N. Dahlbäck, A. Jånsson and L. Ahrenberg. Wizard of Oz studies -- why and how. 1st international conference on Intelligent user interfaces, 1993. http://www.ida.liu.se/~arnjo/papers/kbs.pdf
M. Wiren, R. Eklund, F. Engberg and J. Westerberg. Experiences of an In-Service Wizard-of-Oz Data Collection for the Deployment of a Call-Routing Application. Workshop on Bridging the Gap: Academic and Industrial Research in Dialog Technologies, 2007 http://delivery.acm.org/10.1145/1560000/1556336/p56-wiren.pdf?key1=1556336&key2=7037553621&coll=GUIDE&dl=GUIDE&CFID=71342487&CFTOKEN=63281950

Supervisor: Johan Boye

Artificial Intelligence

Title: Self-learning game player

Theme: Artificial intelligence

Subject: Reinforcement learning is a technique of letting computer programs learn what actions to take in various situations simply by trying out (some of) the possible actions and observing the result. Thus, reinforcement learning is "unsupervised" -- the program learns by interacting with the environment and is never explicitly told which action is the best one.

A game-playing program might use reinforcement learning to acquire its strategic knowledge. A way of bootstrapping that knowledge is to let the program play against itself. It is an interesting exercise to find out how good a program can become just by using this technique.

Possible essay project:

Implement a self-learning game player using some reinformcement learning method (e.g. Q-learning) for a simple board game (e.g. "luffarschack", "fyra i rad", "flip" http://www.cheapass.com/free/games/flip.html). Investigate how fast the program learns using various parameter settings, and how well it fares against an opponent which uses a different strategy.

References:

R. Sutton and R. Barto. Reinforcement learning. The MIT Press, 1998.

Supervisor: Johan Boye

Title: Is artificial intelligence possible?

Theme: Artificial intelligence

Subject: The question whether or not a machine can be intelligent has been hotly debated since the inception of the term "artificial intelligence" (and indeed well before that). Some of the most outspoken opponents have been the American philosopher John Searle, known for his "Chinese Room" thought experiment, and the British matematician Roger Penrose.

Possible essay project:

Summarize the arguments that have been put forward for and against the statement "Artificial intelligence is possible". (Note that this requires a definition of the term "artificial intelligence"; several definitions are possible). Formulate your own view and argue for it.

References:

A. Turing: "Computing machinery and intelligence", http://www.abelard.org/turpap/turpap.php (originally published in "Mind" 1950). The seminal paper in artificial intelligence; very readable. Contains the first description of the "Turing test" (called the "imitation game" in the article).
J. Searle: "Minds, brains and programs". http://web.archive.org/web/20071210043312/http://members.aol.com/NeoNoetics/MindsBrainsPrograms.html (originally published in Behavioral and Brain Sciences). Contains the original Chinese room argument.
S. Russell and P. Norvig. "Artificial intelligence -- A modern approach", chapters 1 and 26. Prentice-Hall, 1995.

Supervisor: Johan Boye

Algorithms and Problems

Title: Multi-skill call centers

Theme: Problem solving

Subject: Modern call centers often handle a large variety of calls. When a customer calls, he is first asked to give information concerning his reason for calling. This is usually made either via a push-button interface, or an interface that employs automatic speech recognition. Once the type of the call has been identified, the call is routed to a queue serviced by dedicated service agents.

On the other hand, many service agents have multiple skills and are capable of handling more than one type of call. Thus, in general the relation between call types and service agents is a many-to-many relation (i.e. calls of a given type can be handled by several different agents, and a given agent can handle serveral different call types).

Due to the ever-increasing proliferation of such multi-skill call centers, their organisation and properties has become an active research area. Two of the problems studied by computer scientists and mathematicians is the staffing an routing problems:

How many agents are needed, given quality constraints such as "The expected queue time must not exceed 20 seconds" or "The probability that a customer must wait for more than 1 minute must be smaller than 1%"?
How many skills, and what combination of skills, should the different agents have?
How should calls be routed in order to minimize waiting times for customers, and to minimize the time service agents are idle?

Possible essay project:

Survey and describe the state-of-the-art in this active research field, in particular with respect to the staffing and routing problems.

References:

G. Koole. Call center mathematics. http://www.cs.vu.nl/~koole/ccmath/book.pdf
Koole, G and Pot, A. (2008) An overview of routing and staffing algorithms in multi-skill customer contact centers. http://www.cs.vu.nl/~koole/articles/report06a/art.pdf
Wallace R. and Whitt W. (2005) A staffing algorithm for call centers with skill-based routing. Manufacturing and service operations management, Vol 7, issue 4, 276-294. http://pages.stern.nyu.edu/~gjanakir/WhittPaper.pdf

Supervisor: Johan Boye

Title: Page Ranking Algorithms for the Web

Theme: Graph Algorithms

Subject: The GOOGLE search engine orders the pages found according to their PageRank, described in [1]. For a large graph, the computation of PageRank is a serious undertaking, and efficient algorithms are an active area for research.

Project ideas: A survey of methods are given in [3]. The essay shall explain the ideas behind the ranking, its mathematical model of the web, and discuss algorithms to compute the ranking, with [1] and [3] as background. Paper [2] proposes a pre-ordering which is a standard graph algorithm to find strongly connected graphs, which is also used in solving large, sparse linear systems of equations. The essay could review [2] in light of [1] and [3], and/or report on the group’s own experiments carried out with test matrices, see [2].

References:

[1] S.Brin, L.Page, R.Motwami, and T.Winograd, "The PageRank citation ranking: Bringing order to the web". Technical report, Computer Science Department, Stanford University, 1998, http://ilpubs.stanford.edu:8090/422/

[2] http://dsp.vscht.cz/konference_matlab/matlab05/prispevky/pultarova/pultarova.pdf

[3] A.N. Langville, C.D. Meyer, "A Survey of Eigenvector Methods for Web Information Retrieval", meyer.math.ncsu.edu/Meyer/PS_Files/Survey.pdf

Supervisor: Jesper Oppelstrup

Computer Graphics

Titel: Partikelsystem, Flockbeteende

Tema: Datorgrafik

Ämne: Partikelsystem bygger på modellering och visualisering genom att användning av ett större eller mindre antal partiklar som kan ge bilder av t.ex. fyrverkerier, rök, vattenkaskader etc. Tekniken kan också användas för att modellera flockbeteenden hos t.ex. fåglar som tilldelas vissa egenskaper för att hålla ihop, undvika kollisioner etc.

Möjliga kandidatprojekt:

Visualisering av rök med hjälp av partikelsystem
Visualisering av fyrverkerier med hjälp av partikelsystem
Flockbeteenden hos fåglar, fiskar etc

Referenser:

W. Reeves, Particle systems – a technique for modelling a class of fuzzy objects, Transaction on Graphics, April 1983
C. Reynolds, Boids, Background and Update, http://www.red3d.com/cwr/boids/

Handledare: Lars Kjelldahl

Titel: Visualisering av hår

Tema: Datorgrafik

Att visualisera hår är en utmaning för den som arbetar med datorgrafik. Bakgrunden är att det normalt är många hårstrån – huvudet på en människa har kanske 100 000 hårstrån. Hår är normalt dessutom delvis transparent och vidare kan man vara intresserad av att skapa rörliga sekvenser med hår (animering) och då blir det ännu fler problem att hantera.

Möjliga kandidatprojekt:

Översikt över problem och metoder för att rendera hår

Referenser:
Det finns artiklar som är mindre tekniska än nedanstående länkar

Erik Sintorn’s Homepage, Chalmers, http://www.ce.chalmers.se/~d00sint/Erik/Homepage.html
SIGGRAPH 2004 Course Slides, http://www.rhythm.com/~ivan/hairRender.html

Handledare: Lars Kjelldahl

Titel: Borttagning av skymda ytor

Tema: Datorgrafik

Ämne: Att ta bort skymda ytor är ett klassiskt problem inom datorgrafik. Problemet är alltså att man har en geometrisk modell av 3D-objekt, men ingen information om vad som är synligt när man betraktar objekten från en viss punkt. Vid utritningen av objekten på skärmen måste man ta reda på vad som syns och ska ritas ut och vad som inte syns och alltså inte ska ritas ut.

Möjliga kandidatprojekt:

Jämförelse av två algoritmer för borttagning av skymda ytor
Grundtekniker för borttagning av skymda ytor
Studium av en algoritm för borttagning av skymda ytor

Referenser:
Mycket finns skrivet och kan relativt enkelt hittas. En klassisk översikt ger fortfarande mycket information om grunderna:
I. E. Sutherland, R. F. Sproull, R. A. Schumacker, A Characterization of Ten Hidden-Surface Algorithms, ACM Computing Surveys, March 1974

Handledare: Lars Kjelldahl

Titel: Grafik med begränsade resurser

Tema: Datorgrafik

Ämne: Mycket dagens utveckling går mot bärbar utrustning, bl.a. i form av mobiltelefoner. Grafik blir allt vanligare och självklart men de bärbara utrustningarna innebär vissa problem med små skärmar, begränsad minnes- och processorkapacitet, batterikapacitet mm. I framtiden kan vissa av dessa problem minskas eller elimineras, men t.ex. den begränsade skärmstorleken är svårare.

Möjliga kandidatprojekt:

Tekniker för begränsad skärmstorlek hos bärbar utrustning
Grafikprogramvaror för bärbar utrustning

Referenser:
K. Pulli, T. Aarnio, V. Miettinen, K. Roimela, J. Vaarala, Mobile 3D Graphics with OpenGL ES and M3G, Morgan Kaufmann, 2008
Stanfordkurs om Iphone-programmering, http://www.stanford.edu/class/cs193p/cgi-bin/drupal/

Handledare: Lars Kjelldahl

Titel: Multidimensionell visualisering

Tema: Datorgrafik

Ämne: Vår vanliga värld är tredimensionell vilket vi använder när 3D-data ska presenteras. Det blir svårare när data har flera dimensioner. Man kan t.ex. använda färg för att ange ytterligare en dimension. Det här är en del av området informationsvisualisering och där arbetar man med presentationsformer som parallella koordinater, punktdiagram (scatter plots), mm
Exempel på system finns såsom Gapminder, Qlikview mm

Möjliga kandidatprojekt:

Visualisering av multidimensionella data med ”parallell coordinates”
Visualisering av multidimensionella data med ”scatter plots” (bubbeldiagram)
Visualisering av energidata

Referenser:

B. Spence, Information Visualization, design for interaction, Pearson, second edition, 2007
M. Jern, Interaktiv analytisk visualisering av SCB’s Energianvändning och CO2 utsläpp per invånare för samtliga kommuner i Sverige

Handledare: Lars Kjelldahl

Titel: Perceptuella aspekter på datorgrafik och visualisering

Tema: Datorgrafik

Ämne: Perception spelar roll för vad man ska visualisera i samband med datorgrafik. En vanligt förekommande standardteknik är att rita objekt på långt avstånd med lägre upplösning. Den tekniken kallas för ”level of detail” (LOD).
Det finns också andra tekniker för att utnyttja vad det mänskliga synsystemet inte uppfattar.

Möjliga kandidatprojekt:

Undersökning av begreppet LOD
Översikt av perceptuella aspekter på datorgrafik och visualisering
Tekniska aspekter på vad vi inte ser i animering

Referenser:

E. Angel, Interactive Computer Graphics, Pearson, 2006
L. Kjelldahl, A survey of some perceptual features for computer graphics and visualization, SIGRAD conference, 2003

Handledare: Lars Kjelldahl

Titel: Textur och ytdetaljer

Tema: Datorgrafik

Ämne: Normalt när man ritar ut datorgrafikbilder gör man geometriska modeller av de objekt som ska ritas ut. När man har mycket detaljer som t.ex. på en tegelvägg blir det praktiskt svårt (tidsödande) att göra modeller av alla detaljer. Då använder man i stället en slags tapet (textur) som man klistrar på den yta där detaljerna ska vara. Tekniken kallas för texture mapping och finns i olika varianter förutom vanlig standard texture mapping, finns bump mapping, environment mapping mm.

Möjliga kandidatprojekt:

Avbildningsproblem vid texture mapping
Tekniker för bump mapping

Referenser:

E. Angel, Interactive Computer Graphics, Pearson, 2006

Handledare: Lars Kjelldahl

Semantic Web

Title: Nepomuk

Theme: Semantic Technologies / Collaboration Technology

Subject: Nepomuk is the "social semantic desktop" and stems from an European project which put together state-of-the-art semantic technologies "to develop a comprehensive solution for extending the personal desktop into a collaboration environment which supports both the personal information management and the sharing and exchange across social and organizational relations". Nepomuk produced software (of different characters) for two platforms, the Java platform (an Eclipse application called PSEW) and Linux (the Nepomuk components are distributed with KDE).

Possible essay projects:

Use Nepomuk PSEW (Java/Eclipse) to crawl data from a desktop, and document the semantic technologies behind it. Use the Personal Information Management Ontology (PIMO) to "describe your world" and document how Nepomuk can help in semantically-enhancing your desktop. Develop a small semantically-enhanced application on top of Nepomuk APIs and comment on that experience.
Use Nepomuk KDE on a Linux machine and document the semantic technologies behind it. Use Personal Information Management Ontology (PIMO) or other Nepomuk ontologies to "describe your world" and document how Nepomuk can help in semantically-enhancing your desktop. Develop a small semantically-enhanced application on top of Linux Nepomuk APIs and comment on that experience.
Comparison between Nepomuk PSEW, Nepomuk KDE and related technologies like Gnowsis.
Comparison in PSEW between Nepomuk-simple and PIMO browser

References:

http://www.semanticdesktop.org/
Nepomuk, http://nepomuk.semanticdesktop.org/xwiki/bin/view/Main1/, see especially Publications and Deliverables
http://dev.nepomuk.semanticdesktop.org/
http://www.gnowsis.org/
http://www.kdenews.org/2009/12/10/exploring-new-nepomuk-features-mandriva-linux-2010
Slashdot articles on Nepomuk

Supervisor: Cristian Bogdan

Title: Itsme

Theme: Semantic Technologies / Collaboration Technology

Subject: Itsme is aiming to find new personal computing metaphors (based on Stories and Venues) as the desktop is considered unsuitable for today's personal computing. In the process, itsme developed very interesting technologies, partly based on Nepomuk and partly developed in-house. The back-end runs on Linux. The technology can be currently tried out via an emulator.

Possible essay projects:

Take part in the itsme community, document and discuss their technology approach for a new personal computing metaphor. Install itsme (or work with its emulator) and use their APIs to develop it further at either the front-end or the back-end, and comment on the process and on the involved technology
Technology comparison between itsme and Nepomuk.

References:

www.itsme.it
http://www.itsme.it/wp-content/uploads/2009/03/itsme_cards-1.pdf
Giorgio De Michelis, Marco Loregian, "From CSCW to new workstations: The itsme project," cts, pp.xvii-xix, 2009 International Symposium on Collaborative Technologies and Systems, 2009
Nepomuk, http://nepomuk.semanticdesktop.org/xwiki/bin/view/Main1/, see especially Publications and Deliverables
http://dev.nepomuk.semanticdesktop.org/

Supervisor: Cristian Bogdan

Title: Aether

Theme:Semantic Technologies / Collaboration Technology

Subject: Awareness in computer supported collaborative work (CSCW) is defined as "understanding the activity of others in the context of one's own activity". The Aether engine attempts to provide awareness between users of digital objects interlinked in a "semantic network". One core idea of Aether is that information a user action "percolates" through the semantic network until i reaches users that are interested in the respective action. Percolation is a computationally intensive process. Although Aether was defined in 1997, the compuational challenges for its implementation led to the first implementation in a realistic usage environment to only come in 2008. The current Aether implementation site is a collaborative Interactive Development Environment called Parade with 20 users working with copies of an intranet containing about 3000 files each (therefore the semantic network size is over 60000). To process such large amounts of data for percolation, the implementation uses a relational database. The implementation also aims to explain to the user how percolation took place, for users to understand why they are notified of certain actions.

Possible essay projects:

Test and document the Aether implementation in Parade, comment on the design decisssions made to optimize the percolation, and propose alternatives to or improvements of the percolation alghorithm.
Compare Aether percolation with Nepomuk "spread of activation" and consider how SoA could be used instead of the current Aether percolation algorithm.

References:

Sandor O., Bogdan, C. and Bowers, J. (1997) Aether, an awareness engine for CSCW, in proceedings of ECSCW 1997
Gay, M. (2009) Aether mediated awareness for the ParaDe development environment, master thesis, INSA Lyon
Nepomuk, http://nepomuk.semanticdesktop.org/xwiki/bin/view/Main1/, see especially Publications and Deliverables
http://dev.nepomuk.semanticdesktop.org/
Troussov, A., Levner. E., Bogdan, C., Judge, J., Botvich, D., Spreading Activation Methods, in Shawkat A., Xiang, Y. (eds). Dynamic and Advanced Data Mining for Progressing Technological Development, 2009 IGI Global, USA
http://www.alphaworks.ibm.com/tech/galaxy

Supervisor: Cristian Bogdan

Software Tools

Title: Software Performance Measurement / Improvement by SlowSpotter

Theme: Software tools

Subject: Performance monitoring is an issue for through-put bound applications and becomes ever more important with distributed computing and more complex computer architecures. Efficient use of the memory hierarchy is the key to efficient execution. But the road is long between the lines in your C++ code and the machine cycles. How is it possible to relate actual execution traces to the source code, so bottlenecks can be identified and avoided?

Project ideas: The essay project shall present experience from learning performance monitoring technology and trying the SlowSpotter tool (free trial version), developed for Solaris and Linux, on a few applications. The tool suggested grew out of research at Uppsala University and SUN. It has a companion ThreadSpotter for multi-threaded codes. If possible, we will make it available to the project.

References:

[1] http://en.wikipedia.org/wiki/Acumem_SlowSpotter

[2] http://www.acumem.com

Supervisor: Jesper Oppelstrup

Simulation

Title: The TimeWarp “Operating System”

Theme: Parallel Discrete Event Simulation

Subject: In discrete event simulation, the processes all operate synchronously, but the time for the next event, and which processes to be active at that time, is not known before execution of “this” event. Parallelization by distributing processes over processors is therefore of little use. The TimeWarp idea is to carry on by guesswork, post-fact to check if the sequence was right, and to back up when necessary. A similar approach is taken in compiling optimized serial code for branches such as loop terminations: the data for the most probable set of instructions to be executed next will be pre-fetched.

Project ideas: The TimeWarp was promoted by Jefferson & al. in [1] and a number of implementations followed, e.g. the GTW software, [2]. TimeWarp type algorithms have been proposed more recently for continuous system simulation with several time-scales, in particular for computer game physics engines to deal with realistic many soft-body interactions. The essay project shall describe the TimeWarp algorithm, and discuss the pitfalls, with the paper [3] as inspiration, and find material about and discuss also more recent applications such as many-body simulation.

References:

[1] D.Jefferson & al, "Time Warp Operating System", ACM Symp. on Operating Systems Principles, ACM, Austin TX., (1987), http://doi.acm.org/10.1145/41457.37508

[2] http://www.cc.gatech.edu/computing/pads/teddoc.html

[3] D. M. Nicol, X. Liu, "The dark side of risk (what your mother never told you about Time Warp)", ACM SIGSIM Simulation Digest archive 27, 1, (1997), http://doi.acm.org/10.1145/268823.268920

Supervisor: Jesper Oppelstrup

Performance optimization

Title: Automatic tuning of linear algebra routines

Theme: Performance optimization

Subject: Numerical linear algebra for (non-sparse) matrices and vectors is at the heart of many scientific computations, and development of efficient, reliable software has been intensive since 1970. The Basic Linear Algebra Subprograms, and higher level algorithms collected in e.g. the LAPACK and ScaLAPACK libraries, are now used by all numerical software such as MATLAB. Tuning of the algorithms to the properties of the memory hierarchy is necessary to obtain efficient software. This can be done “manually”, using vendor information, at great cost. For some architectures, like NVIDIA GPU, properties of the hardware are not disclosed. An alternative is an empirical search over the possible variations to find the best parameter settings for a representative set of tasks.

Project ideas: The essay shall discuss the empirical approach to library optimization, with reference to [2]. Experiments with the ATLAS software for optimization of LAPACK on Solaris, Linux, or Windows are encouraged. Paper [3] describes manual optimization on NVIDIA GPU, especially the section on how the hardware properties were measured. Paper [1] is an introduction to BLAS, etc.

References:

[1] L.S.Blackford & al., "An Updated Set of Basic Linear Algebra Subprograms (BLAS)", ACM Transactions on Mathematical Software, 28(2):135-151, June 2002. http://www.cs.utsa.edu/~whaley/papers/blast-toms.pdf

[2] R.C.Whaley, A.Petitet and J.Dongarra, "Automated Empirical Optimization of Software and the ATLAS project", Parallel Computing, 27(1-2):3-35, 2001, http://www.cs.utsa.edu/~whaley/papers.html

[3] V.Volkov, J.W.Demmel, "Benchmarking GPUs to Tune Dense Linear Algebra", SuperComputing2008, Austin, Texas, IEEE, http://portal.acm.org/citation.cfm?id=1413402 Supervisor: Jesper Oppelstrup

Copyright © Sidansvarig: Mads Dam <mfd@kth.se>
Uppdaterad 2010-01-18