Please let Kilian Risse know if you want to give a seminar or receive email announcements about upcoming seminars.
Jonas Haglund (KTH) | 13:15 in zoom
Many peripherals in computers can read and write main memory (RAM), by means of a hardware unit called direct memory access controller (DMAC). DMACs are programmed by device drivers. A device driver is a piece of software in the operating system configuring a specific peripheral. Device drivers tend to contain bugs due to a number of reasons: lacking hardware documentation, difficulty of testing, time pressure, non-trivial hardware and operating system interfaces.
If a device driver of a DMAC does not configure the DMAC correctly, the DMAC might read or write undesirable memory regions (maliciously or by accident). This can break memory isolation among applications, leading to leak of data and/or data and code manipulation. The goal of my PhD work is to contirubte with a solution to this problem. I will describe techniques for formally verifying memory isolation of DMACs, and how these verification results can be used to write software that configures a DMAC such that the DMAC respects memory isolation. This involves a specific case study, and a framework/tool (being at the end of its development) in an interactive theorem prover which can be used to describe DMACs and verify that DMACs respect memory isolation.
Amirmahmood Ahmadian (KTH) | 14:00 in room 1537
Our world today heavily relies of software systems. They are used in almost every aspect of our society and thus securing them is a crucial and important task. Today, software is used in various settings such as a web or mobile, they are built from different components, and rely on interaction with other software like database systems or servers. This results in complex and heterogeneous software systems which are implemented and deployed under different assumptions about security policies and attacker capabilities. Therefore, while investigating the security of a program in a particular setting one should consider many different factors such as the relation between its building components, the setting specific threats, and relevant attacker models and security polices. In this seminar I will be presenting my research on security of software systems under two particular settings. First I will talk about dynamic polices and the effects of changing the security policy during the execution of a program. I will discuss different attacker models and investigate their effects on our definition of security. In the second part of the presentation, I will discuss the security of programs that rely on trusted execution environments (TEE) for secure data storage and computation. The security guarantees of a TEE, relevant attacker capabilities, and their effect on the security of the programs developed for TEE will be the main discussion points of this part. Finally, I will also talk about the future direction of my research. You are welcome to attend the seminar virtually or in person: https://kth-se.zoom.us/j/68555815435
Andreas Lindner (KTH) | 13:15 in zoom
To ensure correct and secure execution of any software, we inevitably have to consider its interactions with the executing hardware. One example of this is the configuration of security features like memory protection, where misconfiguration compromises memory isolation. Another example is the proper management of hardware resources to avoid side-channel leakage or guarantee safe execution times. The main complications here stem from the complexity of modern computer hardware and the assembly interface language, which is characterized by numerous implicit effects.
For this reason, it is important to employ formal reasoning and automate as much of the software analysis as possible to widely remove the human factor. This is especially the case for critical software that directly interacts with the real world. Here, high assurance of operational safety is of utmost importance and missing reasoning steps would be fatal.
In this talk, I will present the overall picture of my thesis research work and what remains to complete it. The work basically circulates around our theorem-prover-powered binary code analysis framework HolBA and ranges from functional verification over side-channel model validation to execution time analysis. While I include summaries of the previous work, most weight lies on the ongoing project in verifying execution time bounds and the specific symbolic execution we developed for this.
You can also join this seminar in the room 4523.
Among daily tasks of database administrators (DBAs), the analysis of query workloads to identify schema issues and improving performances is crucial. Although DBAs can easily pinpoint queries repeatedly causing performance issues, it remains challenging to automatically identify subsets of queries that share some properties only (a pattern) and simultaneously foster some target measures, such as execution time. Patterns are defined on combinations of query clauses, environment variables, database alerts and metrics and help answer questions like what makes SQL queries slow? What makes I/O communications high? Automatically discovering these patterns in a huge search space and providing them as hypotheses for helping to localize issues and root-causes is important in the context of explainable AI. To tackle it, we introduce an original approach rooted on Subgroup Discovery. We show how to instantiate and develop this generic data-mining framework to identify potential causes of SQL workloads issues. We believe that such data-mining technique is not trivial to apply for DBAs. As such, we also provide a visualization tool for interactive knowledge discovery. We analyse a one week workload from hundreds of databases from our company, make both the dataset and source code available, and experimentally show that insightful hypotheses can be discovered.
Vincent Lohse | 13:30 in 4523
C\# provides static analysis libraries for template-based code analysis and code fixing. These libraries have been used by the open-source community to generate numerous NuGet packages for different use-cases. However, due to the unstructured vastness of these packages, it is difficult to find the ones required for a project and creating new analyzers and fixers take time and effort to create. Therefore, this thesis proposes a neural network, which firstly imitates existing fixers and secondly extrapolates to fixes of unseen diagnostics. To do so, the state-of-the-art of static analysis NuGet packages is examined and further used to generate a dataset with diagnostics and corresponding code fixes for 24,622 data points. Since many C\# fixers apply formatting changes, all formatting is preserved in the dataset. Furthermore, since the fixers also apply identifier changes, the tokenization of the dataset is varied between splitting identifiers by camelcase and preserving them. The neural network uses a sequence-to-sequence learning approach with the Transformer model and takes file context, diagnostic message and location as input and predicts a diff as output. It is capable of imitating 46.3\% of the fixes, normalized by diagnostic type, and for data points with unseen diagnostics, it is able to extrapolate to 11.9\% of normalized data points. For both experiments, splitting identifiers by camelcase produces the best results. Lastly, it is found that a higher proportion of formatting tokens in input has minimal positive impact on prediction success rates, whereas the proportion of formatting in output has no impact on success rates.
In this seminar, I will mainly present the formal verification of an I/O device and its driver. I/O devices are the critical components that allow a computing system to communicate with the external environment. We propose an abstract model of I/O devices and their drivers to describe the expected results of their execution, where the communication between devices is made explicit and the device-to-device information flow is analyzed. In order to handle general I/O functionalities. We propose a refinement-based approach that concretizes a correct-by-construction abstract model into an actual hardware device and its driver. As an example, we formalize the Serial Peripheral Interface (SPI) with a driver. In the HOL4 interactive theorem prover, we verified the refinement between these models by establishing a weak bisimulation. We show how this result can be used to establish both functional correctness and information flow security for both single devices and when devices are connected in an end-to-end fashion. Finally, I will introduce other ongoing research work of my doctoral study.
Christian Lidström (KTH) | 13:00 in zoom
In this seminar I will present the progress of my research project, which is focused on automated verification of safety-critical embedded software. Embedded systems today are in many cases incredibly large and complex, for example in the automotive industry, and consist of hundreds of smaller subsystems, many of which are of safety-critical nature. In order to cope with the increasing safety requirements in areas such autonomous driving, traditional means of ensuring correctness, e.g. testing, are not sufficient. Deductive verification is another method of ensuring correctness, by statically verifying that software adheres to some formal properties. For this reason, we develop methods and tools for the automation of such techniques, in order to make them usable by engineers in the industry. At the same time, we develop a contract theory where properties over large systems can be decomposed and verified at the level of software, according to established principles of system design. Finally, I will also discuss the future directions of the project.
Automatic program repair (APR) aims to automatically fix software bugs. A decade of research has generated a rich body of work on automatic program repair. Despite great success, the APR research still suffers a low precision problem in patch generation. In this talk, I will focus on my previous work towards improving the precision of automatic program repair and increasing the external validity of program repair research.
Prof. Tegawendé F. Bissyandé from University of Luxembourg (https://bissyande.github.io/) is invited as the opponent of He Ye's 50% Ph.D. seminar.
Tjark Weber (Uppsala University) | 13:15 in zoom
Interactive theorem proving allows to verify complex systems with respect to strong specifications, and with utmost rigor. Proof assistants become the guardians of truth, but who guards the guardians? Even widely used proof assistants are not immune to proofs of False. Of particular concern is the definitional mechanism, which should provide convenience and expressiveness, but must not allow inconsistencies.
We consider higher-order logic with overloading--the logic implemented by Isabelle/HOL--and show that definitions are both model- and proof- theoretically conservative (i.e., not required to prove statements that do not depend on the newly defined symbol). It follows that definitions are not a source of inconsistency. We have formalized some of our results in HOL4, uncovering and correcting mistakes in earlier proofs.
This is joint work with Arve Gengelbach and Johannes Åman Pohjola.
Automatic program repair is about fixing bugs in software without human intervention. With the recent advancement in the deep learning field, along with empirical studies that show that computer languages are as natural as natural languages, techniques from natural language processing are also used on source code. In this talk, I will focus on my previous work that uses machine learning to fix test suite bugs, build errors, and software vulnerabilities.
David Balbas Gutierrez (KTH) | 15:00 in zoom
The Learning with Errors (LWE) problem consists of distinguishing linear equations with noise from uniform values. LWE enjoys a hardness reduction from worst-case lattice problems and allows for (inefficient) versatile cryptographic constructions. To improve efficiency, Ring-LWE replaces linear equations with noisy ring products. Nowadays, Ring-LWE and its variants are extensively used for the construction of post-quantum secure cryptosystems.
In this talk, we introduce both problems and related mathematical background, such as lattices. Then, we sketch the hardness proof of LWE and explain how to extend it to Ring-LWE. Finally, we present some weaknesses and survey open problems.
Assessment can aim to both judge how well the students have fulfilled the intended learning outcomes in a course, and to guide and motivate the students’ learning. Assessment can also aim at providing students with timely and useful feedback to support their learning process. How the assessment is planned and carried out can, therefore, be said to influence the students’ learning. Introductory programming courses, given to non-computer science majors, are often given to a large group of students and to be able to provide individual feedback, teaching assistants (TAs), older students, are often employed to assist the faculty in these courses. Understanding students’, TAs’, and teachers’ experiences and use of the assessment could give us insights into how to further improve these courses in order to enhance learning among the students. During this seminar, Emma will give a brief background to her research and present the results from some of the studies she conducted, where the TAs’ and students’ perspectives have been in focus. She will also present her ongoing research projects, where the teachers’ perspective is being explored. Emma will also outline her vision for her thesis and talk about how she sees the different projects related to her overall thesis aims.
The rapidly evolving digital technologies makes the world a fascinating place to live. Innovative automated systems break conventional paradigms to connect otherwise unconnected services and devices. Our society is increasingly dependent on systems like the IoT (Internet of Things) and the Web, relying on a wide variety of applications and connected “things” from heart pacemakers, baby monitors, surveillance cameras to cars, industrial and military robots, and to large-scale systems like smart cities. The complexity and heterogeneity of these systems along with the critical reliance on the Web and IoT by our society, pose a number of questions pertaining to security and privacy. Unfortunately, the power of Web and IoT applications can be abused by malicious actors.
In this talk, we discuss how popular Web and IoT app platforms are susceptible to several novel classes of attacks that violate user confidentiality, integrity, and availability resulting in massive exfiltration and modification of sensitive information. We consider the role and the potential of methods and techniques from computer security, programming languages, and formal methods to discover and fix these vulnerabilities. We suggest short- and long-term countermeasures based on fine-grained access control and present long-term countermeasures based on tracking the flow of information in Web and IoT applications.
Sarra Habchi | 13:00 in zoom
Test flakiness is identified as a major issue that compromises the regression testing of complex software systems. Flaky tests manifest non-deterministic behaviour and send confusing signals that break developers trust in test suites. Both industrial reports and research studies highlighted the negative impact of flakiness on software quality and developers' productivity. Nonetheless, we still lack knowledge about the manifestation and management of flaky tests in practice. In this talk, I will present an empirical investigation of the sources of flaky tests and the strategies adopted by practitioners to cope with them. In support of the identified practices, I will present an approach that statically identifies flaky tests by analysing their code. This approach stands out from the current detection techniques as it does not requiretest execution logging and analysis, which are costly time and computation-wise.
When developing complex software and systems, contracts provide a means for controlling the complexity by dividing the responsibilities among the components of the system in a hierarchical fashion. In specific application areas, dedicated contract theories formalise the notion of contract and the operations on contracts in a manner that supports best the development of systems in that area. At the other end, contract meta-theories attempt to provide a systematic view on the various contract theories by axiomatising their desired properties. However, there exists a noticeable gap between the most well-known contract meta-theory of Benveniste et al., which focuses on the design of embedded and cyber-physical systems, and the established way of using contracts when developing general software, following Meyer's design-by-contract methodology. At the core of this gap appears to be the notion of procedure: while it is a central unit of composition in software development, the meta-theory does not suggest an obvious way of treating procedures as components.
In this seminar, I present a first step towards a contract theory that takes procedures as the basic building block, and is at the same time an instantiation of the meta-theory. To this end, we propose an abstract contract theory for sequential programming languages with procedures, based on denotational semantics. We show that, on the one hand, the specification of contracts of procedures in Hoare logic, and their procedure-modular verification, can be cast naturally in the framework of our abstract contract theory. On the other hand, we also show our contract theory to fulfil the axioms of the meta-theory. In this way, we give further evidence for the utility of the meta-theory, and prepare the ground for combining our instantiation with other, already existing instantiations.
This is practice for upcoming job talks aimed to a general computer science audience. Feedback is welcome. Dynamic linear algebra---algorithmic techniques for matrices that change over time---lies at the core of many applications, from continuous optimization, to efficient graph algorithms, to machine learning. Yet, until recently, the full power of dynamic linear algebra was not known and exploited in most applications. In this talk, I will describe several new advances in using these techniques and outline the limits of what can be done with them. I will overview progress on longstanding problems in dynamic shortest path data structures, regression algorithms, optimal transport and other problems, using dynamic linear algebra.
IoT platforms enable users to connect various smart devices and online services via reactive apps running on the cloud. The unintended or malicious interactions between the different (even benign) apps of a user can cause severe security and safety risks. Now we are still lacking a semantic framework for understanding interactions between IoT apps. On the other hand, the modern Web apps have a complex design and many interactions between parts of the codebase that difficult for understanding and analysis. The last decade has seen a proliferation of code-reuse attacks in the context of web applications. These attacks stem from Object Injection Vulnerabilities (OIV) enabling attacker-controlled data to abuse legitimate code fragments within a web application’s codebase to perform malicious computations, like remote code execution.
The seminar will be based on two of the published papers: (a) Friendly Fire: Cross-App Interactions in IoT Platforms. The paper proposes a semantic framework capturing the essence of cross-app interactions in IoT platforms. The framework generalizes and connects syntactic enforcement mechanisms to bisimulation-based notions of security, thus providing a baseline for formulating soundness criteria of these enforcement mechanisms. We define and implement static analyses for enforcing cross-app security and safety, and prove them sound with respect to our semantic conditions. We also leverage real-world apps to validate the practical benefits of our tools. (b) SerialDetector: Principled and Practical Exploration of Object Injection Vulnerabilities for the Web. This paper presents the first systematic approach for detecting and exploiting OIVs in .NET applications including the framework and libraries. We develop a taint-based dataflow analysis that discovers OIV patterns in .NET assemblies automatically. We then use these patterns to perform security analysis of real-world .NET apps against OIVs, including reporting three CVEs that lead remote code execution. We describe OIV exploitation methodology from the framework-agnostic principles to the analysis of certain vulnerabilities corresponding key threat models of Web apps.
The talk will also include my future research plans.
Lucas Berent | 13:00 in zoom
The Minimum Circuit Size Problem (MCSP) has gained a lot of scientific interest recently, as there are no P or NP-completeness results known, which inspired the question whether MCSP might be a candidate for an NP-intermediate problem. We review both some fundamental notions around MCSP and the latest results on hardness and approximability of the problem and give a brief outlook on possible results in the future.
Autonomous systems have progressed from theory to application especially in the last decade, thanks to the recent technological evolution. The number of autonomous systems in our surroundings is increasing rapidly. Since these systems in most cases handle privacy-sensitive data, the privacy concerns are also increasing at a similar rate. However, privacy research has not been in sync with these developments. Moreover, the systems are heterogeneous in nature and continuously evolving which makes the privacy problem even more challenging. The domain poses some unique privacy challenges which are not always possible to solve using existing solutions from other related fields. In this thesis, we identify open privacy challenges of autonomous systems and later propose solutions to some of the most prominent challenges. We investigate the privacy challenges in the context of smart home-based systems including Ambient Assisted Living (AAL) systems as well as autonomous vehicles. In the case of smart home, we propose a framework to enhance the privacy of owners during ownership change of IoT devices and conduct a systematic literature review to identify the privacy challenges of home-based health monitoring systems. For autonomous vehicles, we quantify, improve, and tune the privacy utility trade-off of the image de-identification process. Our investigation reveals that there is a lack of consideration when it comes to the privacy of autonomous systems and there are several open research questions in the domain regarding, for instance, privacy-preserving data management, quantification of privacy utility trade-off, and compliance with privacy laws. Since the field is evolving, this work can be seen as a step towards privacy preserving autonomous systems. The identified privacy concerns and their corresponding solutions presented in this thesis will help the research community to identify and address existing privacy concerns of autonomous systems. Solving the concerns will encourage the end-users to adopt the systems and enjoy the benefits without bothering about privacy.
Cyber situation awareness is about ensuring individuals to have a fair appreciation of circumstances that can affect the chances to reach their goals within the cyber domain. One example of such a goal would be cyber defense, e.g., the protection of some computer network.
According to current research, gaps within the field of cyber situation awareness include a lack of understanding of the types and quality of information sources that contribute to awareness, the importance and effects of human-automation interaction and a lack of understanding of the socio-technical dynamics of cyber defense teams.
During the seminar the context of cyber situation awareness along with accomplished results that aim to diminish some of the identified research gaps will be presented. Further, remaining planned activities within the doctoral project as well as a thesis disposition draft will be presented.
Stefan Varga is a PhD student at KTH, now giving his 80% seminar.
Testing is one of the crucial parts of the software development life cycle, to guarantee quality and identify its defects. Software testing is a costly process in terms of labor and time taken, thus demanding the need for test automation techniques. In this seminar, we will go through graph based machine learning approaches for test automation.
The seminar will be based on two of my published papers: (i) Using graph kernels and Support Vector Machine (SVM) to automate metamorphic testing. Metamorphic testing is an emerging technique which solves both the oracle problem and the test case generation problem by testing special forms of software requirements known as metamorphic requirements. The paper also explains data augmentation techniques to generate synthetic data for program analysis using mutation testing. (ii) Using graph neural networks (GNN) to predict program similarity. The program similarity is estimated by analyzing the labelled control flow graph (CFG) of the programs. The graph similarity of a program pair proposed by our methodology could be applied to several related software engineering problems (such as software plagiarism, clone identification and code refactoring).
The talk will also include my future research plans.
Software may be written in different languages, but in the end, machine code describes how the computer executes programs. While you can verify source code according to some source language semantics, compilation to machine code can introduce discrepancies in functionality and lay bare details which were abstracted away on the source-code level.
Many different methods exist to obtain verified machine code: verified compilers, translation validation, and direct verification of machine code. My research focuses on an approach which facilitates inclusion of the many mechanisms that exist on a bare-metal level instead of restricting the scope of analysis. Some examples of these mechanisms are unstructured and dynamic control flow, fine-grained concurrent and parallel execution, caching, hardware-based isolation as well as interaction with peripherals. The goal is also to include new domains in existing analysis frameworks, such as reasoning about new machine architectures and translation validation from new source languages.
In this presentation, I will present my research in general and one result in particular: a novel program logic to treat unstructured code, and I will argue how it is particularly suitable to extend for more advanced uses.
Cryptographic protocols are essential for the security of current and future communication infrastructures, including 5G and IoT. Engineering secure protocols is, however, a highly delicate and error-prone task. Engineers need improved systematic methods and supporting tools to analyze their constructions. There are two main approaches to such methods: symbolic methods rooted in formal verification and computational methods rooted in reduction-based proofs. Symbolic methods are often more abstract and hence easier to mechanize, whereas computational methods promise cryptographic soundness. The reason is that computational methods use a bit as their smallest modelling unit, whereas symbolic methods often represent composite objects such as keys and nonces as atomic symbols. Further, symbolic methods often model probabilities in a binary fashion: either an event can occur, or it cannot.
In this talk, I will present my research project in this area. Specifically, while much current work applied to practical key establishment protocols, with a few exceptions, target a single protocol of the system, our aim is to explore systems of protocols. We also aim to explore whether combinations of the symbolic and computational approaches can lead to simplified analysis methods.
Embedded systems today consist of large and complex software. In many cases, such as in the automotive industry, large parts of the systems are also safety-critical, meaning that undesired behaviour may result in large environmental or economic cost, or even loss of human life. One way to detect such behaviour is to employ the use of formal methods.
In this talk, I will present my research project which is focused on automated deductive verification of embedded systems. Deductive verification is a formal method in which we prove that code behaves according to a specification for all possible executions, providing a higher degree of confidence in the correctness of software than standard industrial methods such as testing. However, this comes at the cost of requiring that specifications are provided in a formal language, and large expertise in the tools used to this end. For this reason, we believe that the verification process needs to be automated to a large extent. I will discuss what is needed for automation to be feasible, and what are possible solutions, as well as some preliminary results. In particular, I will present work on an abstract contract theory for procedural programs, based on denotational semantics. With this contract theory, we attempt to bridge the gap between established contract theories for system design, and the procedural reasoning performed in deductive verification. The talk will also include my future research plans.
Xhevahire Tërnava (Sorbonne Université) | 14:15 in zoom
Most modern software-intensive systems, ranging from small-scale embedded systems to large-scale enterprise systems to ultra-large systems of systems, are variability-intensive. On one hand, variability is largely studied only in the context of product lines. But, the majority of these systems, such as real open-source object-oriented systems, are not part of a product line. On the other hand, the implemented variability of these systems, as an anticipated change, is neither explicit nor documented. Therefore, to effectively manage and comprehend it, software engineers need to be aware of where in the code assets is implemented variability. In this talk, I will present a research work in building (i) a tooled framework, called vm-dsl, to manually document and trace the variability of Scala and Java-based variability-intensive systems and (ii) a tooled approach, called symfinder, to automatically identify, visualize, and comprehend the variability of any real open-source Java and C++ based variability-intensive system.
Previous years' seminars.