Please let Amir Ahmadian know if you want to give a seminar or receive email announcements about upcoming seminars.
André Silva (KTH) | 11:00 in Room 4523 and https://kth-se.zoom.us/j/7606930890
Automated Program Repair (APR) has evolved significantly with the advent of Large Language Models (LLMs). Fine-tuning LLMs for program repair is a recent avenue of research, with many dimensions which have not been explored. Existing work mostly fine-tunes LLMs with naive code representations and is fundamentally limited in its ability to fine-tune larger LLMs. To address this problem, we propose RepairLLaMA, a novel program repair approach that combines 1) code representations for APR and 2) the state-of-the-art parameter-efficient LLM fine-tuning technique called LoRA. This results in RepairLLaMA producing a highly effective `program repair adapter' for fixing bugs with language models. Our experiments demonstrate the validity of both concepts. First, fine-tuning adapters with program repair specific code representations enables the model to use meaningful repair signals. Second, parameter-efficient fine-tuning helps fine-tuning to converge and contributes to the effectiveness of the repair adapter to fix data-points outside the fine-tuning data distribution. Overall, RepairLLaMA correctly fixes 125 Defects4J v2 and 82 HumanEval-Java bugs, outperforming all baselines.
Deepika Tiwari (KTH) | 10:00 in Fantum and https://kth-se.zoom.us/j/5279673473
For software systems to be reliable, they need to be tested. The responsibility of testing lies with the developers who create and maintain the system, based on their understanding of it. However, the system may deviate from these assumed behaviors when it executes in production scenarios, as a result of the interactions made by end users. We hypothesize that insights from the production behavior of an application can be used to augment its test suite, and complement developer-written tests. In my 80% seminar, I will present how observations made from executing systems can be leveraged to automatically generate tests. We will first look at Pankti, an approach that captures production states and uses them to generate regression tests that contain explicit oracles to verify production behaviors. The second contribution we will discuss is RICK, an approach for automatically generating tests that replace external objects with mocks, stub the behavior of these mocks, and verify distinct aspects of production behaviors through mock-based oracles.
Amir Ahmadian (KTH) | 11:00 in Room 1440 and https://kth-se.zoom.us/j/5089122904
Software security is imperative in today's world as our reliance on technology continues to grow. In this seminar we are going to look at software security through the lens of language-based security and information flow control. We focus on defining and enforcing realistic security policies, in the presence of various real-world attacker models and investigate their effect on the security of software systems. For the main part of this presentation, I will focus on my research on disjunctive security policies. In this section, I provide an overview of the lattice structure proposed to define disjunctive security policies, discuss our extensions to it to make it suitable for database-backed programs, and propose a provably-sound static enforcement mechanism to verify a database-backed program is inline with a disjunctive policy. Next, I will provide an overview of my ongoing research on the security of P4 programs. This will include a brief introduction to the P4 programming language and its applications, a discussion on the challenges of enforcing information flow control policies on P4 programs, as well as an overview of our proposed enforcement mechanism which relies on security type systems and abstract interpretation.
Mikhail Shcherbakov (KTH) | 13:00 in Room 1440
Researchers and developers have been aware of code reuse attacks for decades, utilizing techniques such as Return-Oriented Programming (ROP) and Jump-Oriented Programming (JOP). These methods exploit "gadgets" -- small sequences of code already present within an application -- to achieve Arbitrary Code Execution (ACE) by leveraging memory corruption vulnerabilities in languages like C and C++. Memory-managed platforms, including the JVM, .NET, and JavaScript, offer protections against such memory corruption bugs, raising questions about what primitives within these managed runtimes could lead to ACE.
In this seminar, we explore the code-reuse primitives in the JavaScript language that attackers can exploit to achieve Remote Code Execution (RCE). Our primary concern is prototype pollution, a class of vulnerabilities that enables the exploitation of code reuse attacks. We adopt a comprehensive approach to this issue, starting from the identification of prototype pollution vulnerabilities to the detection of gadgets, with the ambitious goal of discovering end-to-end exploits. We discuss various aspects of these attacks, including code patterns of prototype pollution, methodologies for their static detection, associated gadgets in Node.js and Deno server-side JavaScript runtimes, and both static and dynamic approaches, including developed tools, for gadget detection. Furthermore, we explore end-to-end exploits that demonstrate how prototype pollution can lead to RCE in widely-used open-source applications. This seminar aims to provide an understanding of prototype pollution vulnerabilities, their exploitation mechanisms, and best practices for mitigating such threats.
Khashayar Etemadi (KTH) | 10:00 in Room 4523
Automated program repair (APR) is all about searching for a patch that fixes a given bug. This is performed in two steps: first, exploring the repair search space to obtain a large and diverse set of repair patches. Second, analyzing the discovered patches to identify the ones that are likely to actually fix the bug. In this presentation, we discuss both of these steps: efficient exploration of repair search spaces and effective analysis of repair patches. We show how template based and large language model based APR tools can be used to find a large and diverse set of plausible patches. We also present how the differences and similarities between patches can be identified.
Ioana Bercea (KTH) | 10:30 in Room 1537 and Zoom
We will talk about ways in which locality is employed in the design of data structures. We will focus on dictionaries, which maintain sets under insertions and deletions and support membership queries of the form “is an element x in the set?”. We will discuss how state-of-the-art techniques for dictionaries exploit locality in the word RAM model to obtain space and time gains and also how locality is exploited in the choice of hash functions employed for the design.
Boel Nelson (Aarhus) | 11:00 in Room 1440 and https://kth-se.zoom.us/j/62761627532
Confidentiality of metadata is a challenging privacy problem. Most systems today ensure confidentiality of data using encryption, but they do not address confidentiality of metadata. For example, encrypted data still leaks metadata such as when a message is sent, how long it is, and to whom it is addressed. While metadata privacy is a known problem, existing solutions are far from perfect: they are either resource exhaustive which affects performance, or they guarantee only weak notions of privacy.
In this talk I present the problem of transport layer privacy, and outline a novel formalization of the problem using information flow control techniques that introduces a new trade-off between performance and privacy for anonymous communication. To exemplify I present a provably private protocol for instant messaging which we call Deniable Instant Messaging, DenIM for short.
Joint work in progress with E. Pagnin and A. Askarov.
Aravind Ashok Nair (KTH) | 14:00 in Room 1440 and https://kth-se.zoom.us/my/aanair.kth
Graph structures are ubiquitous and help us to understand many complex real-world applications. Graph Machine Learning (GML) is a branch of computer science that aims to understand a graph's hidden relations and structure and use them to provide powerful graph models for prediction, clustering, and classification tasks. Graphs are also capable of explaining complex behavior by analyzing graph models at the macro level or by studying the interaction of nodes and edges at the micro level. The theme of this seminar is to learn and understand the global and local behavioral properties of complex systems by modeling their graph structure. To achieve this, we investigate the graph structures of complex systems in two real-world domains - (i) Software Testing in Software Engineering and (ii) Cell Graph Modeling in Digital Pathology.
For tasks in the software testing domain, we use control flow graphs (CFGs) for understanding global behavioral properties such as (i) reverse engineering software requirements (metamorphic relations) and (ii) estimating code similarity. In the area of digital pathology, we generate cell graphs from digitized medical tissue images, where each node represents a cell nucleus and edges represent the spatial relationship between two cell nuclei. Cell-graphs can be used to understand tissue structure and function in histology.
In this seminar, we show that both global and local behavior properties of complex systems can be modeled from their graph structure using GML techniques. However, to achieve higher performance, it is critical to consider relevant: attributes, embeddings, topologies, and architectures.
Javier Cabrera (KTH) | 10:00 in Fantum Room and https://kth-se.zoom.us/j/64627724629
Since its 2015 inception, WebAssembly (Wasm) has experienced swift integration, with languages such as Rust and C/C++ now compiled to Wasm and compatible with all major browsers. Crucially, Wasm's adoption extends beyond browsers, with platforms like Fastly and Cloudflare incorporating it into their core services. However, vulnerabilities have been detected in Wasm's implementations, both in browsers and standalone runtimes, mostly due to software monoculture.
Our work explores software diversification as a proactive solution to address these vulnerabilities. We generate hundreds of variants that share functionality, while exhibiting diverse execution behaviors. This presentation outlines four contributions in this area.
First, we discuss our CROW, a superdiversification engine for Wasm implemented within the LLVM compilation pipeline. Second, we introduce MEWE, which embeds multiple variants into a single program, supporting runtime randomization. MEWE has been evaluated on worldwide content delivery network. Third, considering the evolving landscape of Wasm tools and the introduction of new compilers outputting Wasm binaries via non-LLVM methodologies, we unveil a Wasm-to-Wasm diversification solution, wasm-mutate. We demonstrate how it can be used to prevent Spectre attacks on WebAssembly programs. Finally, we demonstrate how commercial solutions' are inaccurate in detecting Wasm cryptomalwares variants and how diversification can be used to rectify this.
Jesper Amilon (KTH) | 13:15 in Room 4523 and https://kth-se.zoom.us/j/62505868883
Within embedded systems software, correctness relies mainly on testing techniques. While testing may serve well as a bug-catcher, it cannot be used to formally prove their absence. For this, an exhaustive method, i.e., formal verification, is required. However, the cost and level of skill needed for formal verification means that it is often overlooked by the industry. Furthermore, several common software aspects, a prominent one being floating point arithmetic, are not well-supported by state-of-the-art tools. In this seminar, we present our work on formal verification of embedded-systems software at the Swedish heavy-vehicle manufacturer Scania, where we focus on automating the verification process to allow for easier integration of formal verification into the development chain. To this end, we present work on automatic verification of properties of arrays in C programs and automated verification of properties of C arrays. We will also discuss an approach for combining deductive verification and model checking. Lastly, we outline future research plans, including in particular verification of C programs with floating point arithmetic.
Han Fu | 13:00 in Room 4523 and https://kth-se.zoom.us/j/6236541995
Automated continuous integration (CI) process faults can severely impact industrial code development. We investigated CI failures in four large industrial projects to enhance efficiency, reliability, and maintainability. Analyzing 11,731 builds over six months, we identified 1,414 failing builds and found that compilation errors accounted for 47% of failures, followed by testing at 36%. We also discovered 14 distinct types of testing failures and highlighted configuration problems as significant issues. To delve deeper, we utilized the "Shadow Job" CI diagnostics solution, examining 40,000 commits and categorizing compiler errors into 14 types. The five most common errors comprised 89% of all compiler errors. Our study also explored the localizability and repairability of errors, revealing their independence from resolution time, size, and distance. Furthermore, we provided insights into the human effort required to fix common compilation errors, suggesting a prioritization strategy based on error frequency and repair effort.
Our future focus is reducing failing tests and compilation errors caused by configuration and dependency problems. We will first aim to implement automatic fault localization and program repair techniques to address compilation errors resulting from configuration and dependency issues. Enhancing our "Shadow Job" implementation, we plan to develop automated solutions for error resolution, leveraging advanced techniques like neural networks and large language models. Furthermore, we will expand our study to include additional projects and companies to validate our findings and ensure broader applicability.
Amir Ahmadian (KTH) | 13:00 in Room 1537 and https://kth-se.zoom.us/j/5089122904
Our world today heavily relies of software systems. They are used in almost every aspect of our society and thus securing them is a crucial and important task. Numerous factors influence the security of a program in a particular setting, including the interplay between the program's building components, the unique threats present in that particular setting, and relevant attacker models and security policies.
In this seminar I will be presenting my research on the security of software systems under real-world policies and attacker models.
In the first part of the presentation, I will provide an overview of my research on dynamic policies and the effects of changing the security policy during the execution of a program. I will discuss different attacker models and investigate their effects on our definition of security. Next, I will provide an overview of my research on the security of programs executed on trusted execution environments (TEE). The security guarantees of a TEE, relevant attacker capabilities, and their effect on the security of the programs are discussed in this part.
In the second part of the presentation, I will be discussing my research on disjunctive policies in database-backed programs. I will explore models to understand the semantics of disjunctive security policies in database-backed programs, and propose a provably-sound static enforcement mechanism to check these disjunctive policies.
Prof. David Sands from Chalmers (https://www.cse.chalmers.se/~dave) will be joining the seminar as the discussion leader.
Virgil Gligor (CMU) | 16:00 in Room 4523 and https://kth-se.zoom.us/j/6529956822
Suppose that a trustworthy program must be booted on a commodity system that may contain persistent malware in the firmware of its device controllers. Establishing software root of trust (RoT) assures that the system’s firmware has all and only the content chosen by a trusted external verifier, or the verifier discovers unaccounted content, with high probability. Hence, RoT establishment assures that trustworthy program boot can take place in a malware-free firmware state. This is challenging because device controller malware can survive in system states across repeated secure- and trusted-boot operations and act on behalf of a powerful remote adversary. Furthermore, anti-malware tools do not have malware-unmediated access to device controllers’ firmware and thus must interact with devices controllers as with any “black box;” e.g., attempts to re-flush firmware a black-box firmware offers no guarantee of malware erasure nor prevents malware connections over the internet. I this presentation, I will illustrate both the theoretical and practical challenges of RoT establishment unconditionally; i.e., without secrets, privileged modules (e.g., TPMs, RoMs, HSMs), or adversary computation bounds. The approach proposed offers the only unconditional solution known to any security or cryptography to date.
Mojtaba Eshghie (KTH) | 13:00 in Room 1440 and https://kth-se.zoom.us/j/7998785238
Smart contracts manage blockchain assets. While smart contracts embody business processes, their platforms could be more process-aware. Mainstream smart contract programming languages such as Solidity do not have explicit notions of roles, action dependencies, and time. Instead, these concepts are implemented in program code. This makes it very hard to design and analyze smart contracts. We argue that DCR graphs are a suitable formalization tool for smart contracts because they explicitly and visually capture these features. We utilize this expressiveness to show that many common high-level design patterns in smart-contract applications can be naturally modeled this way. Applying these patterns shows that DCR graphs facilitate the development and analysis of correct and reliable smart contracts by providing a clear and easy-to-understand specification. Also, we'll talk about the uses of these designs for monitoring smart contracts.
Shravan Narayan (UT Austin) | 16:30 in Zoom
Most systems we use today are wildly insecure. This past year attackers have exploited bugs in systems like browsers and operating systems to steal user data, shut down hospitals, and stalk activists. Attackers are winning because bugs in a seemingly unimportant component like the browser's spell-checker can be used to compromise the entire browser.
In this talk, I will describe my work retrofitting existing systems towards secure design. First, I will describe RLBox, a type-driven sandboxing framework that ships in the Firefox browser. RLBox helps Firefox engineers retrofit the browser to sandbox third-party libraries and safeguard its users from attacks that exploit vulnerabilities from these libraries. Then I will describe my work on VeriWasm analysis tool and Swivel compiler which together secure edge cloud platforms against attacks that exploit compiler bugs and micro-architectural vulnerabilities like Spectre. Along the way, I will describe the challenges and research questions that only arise when trying to deploy principles techniques (e.g., from programming languages and system design) to secure huge systems like Firefox.
Camilla Björn (KTH) | 13:15 in Room 4523 and https://kth-se.zoom.us/j/64269201723
Progression is an established and important concept in curriculum development for professional education programmes. But despite its frequent usage, it is surprisingly unclear what it means in this context and how it can be utilised to optimise the fulfilment of the programme learning outcomes. There are several disambiguations of the concept, such as the idea of viewing progression as programme retention or course completion, or the constructive alignment between courses. However, there are other ways of viewing progression in professional education and there are also different aspects of progression to focus on when considering the constructive alignment between courses, such as increased scope (the depth and breadth of knowledge), increased ability to utilise the knowledge and skills in practice and increased proficiency in different skills.
In this 30% seminar, Camilla will present her research plan for her PhD project focusing on conceptualising progression in professional education with an emphasis on computer science education. Her project consists of four subprojects which will be explained in detail during the seminar including the work that is already completed.
Javier Ron (KTH) | 16:30 in Room 1440 and https://kth-se.zoom.us/j/66050844902
As all software, blockchain nodes are exposed to faults in their underlying execution stack. Unstable execution environments can disrupt the availability of blockchain nodes interfaces, resulting in downtime for users. This paper introduces the concept of N-version Blockchain nodes. This new type of node relies on simultaneous execution of different implementations of the same blockchain protocol, in the line of Avizienis’ N-version programming vision. We design and implement an N-version blockchain node prototype in the context of Ethereum, called N-ETH. We show that N-ETH is able to mitigate the effects of unstable execution environments and significantly enhance availability under environment faults. To simulate unstable execution environments, we perform fault injection at the system-call level.
Our results show that existing Ethereum node implementations behave asymmetrically under identical instability scenarios. N- ETH leverages this asymmetric behavior available in the diverse implementations of Ethereum nodes to provide increased availability, even under our most aggressive fault-injection strategies. We are the first to validate the relevance of N-version design in the domain of blockchain infrastructure. From an industrial perspective, our results are of utmost importance for businesses operating blockchain nodes.
Jimmy Koppel | 13:30 in Room 1440 and https://kth-se.zoom.us/j/63233589327
Programming languages researchers have developed many advanced tools that promise to greatly ease software engineering. Yet even conceptually simple tools are expensive to implement fully due to the complexity of the target language, and standard techniques tie an implementation to a particular target language. In order to make the development of advanced programming tools economical, these problems demand new techniques for decomposing the development of tools and automating portions of their construction, which I collectively dub “meta-metaprogramming."
In this talk, I will present my work on new techniques and frameworks for making tools easier to build and more general, and tools built on them. First I will present Cubix, a "One Tool, Many Languages" framework that allows a single tool to quickly be written for multiple programming languages. We have used Cubix to build Yogo, a semantic search tool capable of finding equivalent patterns in C, Java, and Python from a single shared query. Second, I will present Mandate, the world's first control-flow graph generator generator, able to generate multiple varieties of CFG-generator given a programming language's formal semantics. Finally, I will present our ongoing work on ECTAs (equality-constrained tree automata), a new type of data structure/constraint-solver able to efficiently search large spaces of programs subject to certain constraints. We have used ECTAs to build two synthesizers for Haskell, Hectare and Spectacular, which solidly beat their pre-existing competitors (Hoogle+ and QuickSpec) with only a fraction the engineering-effort.
Christian Lidström (KTH) | 13:15 in Room 1440 and https://kth-se.zoom.us/j/2440183377
In this seminar I will present the outline of my thesis, and describe the published and planned papers to be included. The focus of the thesis is automated verification of embedded software, and in the talk I will also briefly discuss the details of the work performed to this end. Today, embedded systems are increasingly large and complex, and in many cases of a safety-critical nature, as is the case in, for example, the automotive industry. For this reason, traditional methods of ensuring correctness, such as testing, may be insufficient. On the other hand, formal methods, such as deductive verification, are largely considered prohibitively expensive in the industry, because of the amount of expertise and manual work required. To deal with this, we have firstly developed a formal framework for compositionally designing embedded systems, with a focus on procedural software components, where high-level system properties can be decomposed into lower-level software properties and then formally verified. Secondly, we have developed techniques to automate the low-level software verification, focusing on automatic inference of function contracts to be used in deductive verification, as well as combining deductive verification with other techniques.
Marcus Schmidt Birgersson (KTH) | 11:00 in Room 1440 and https://kth-se.zoom.us/j/6256056515
Smart devices, so-called IoT devices, produce a lot of data of various kinds. These devices are in general limited both in terms of storage and processing power. This means that this data in most cases are transported to the cloud where it can be both stored and analyzed. To protect data stored in the cloud, while still being able to perform arbitrarily computations, we present a system for secure multiparty computation by the means of trusted execution environments. Principals store encrypted data in a middleware located in the cloud and provide ac- cess to a subset of this data to a trusted execution platform provided by some party that needs an aggregation of this data, the receiver. This access is given by the principals by providing the TEE with the decryption keys. By the means of remote attestation, the principals can verify that the correct computation will be carried out. Our system allows for evaluating any computation over mutually distrusting princi- pals’ data without any complex secret sharing schema. We allow for users to be added and removed at any time and no user needs to be online and active at the time of computation. For our system to work in a secure and confidential way, we assume that the middle- ware that stores the data does not cooperate with the receiver. If they do, this sets extra requirements on the aggregation function to be able to prevent such attacks. Potential solutions for this involve tools from differential privacy or statistics in general. I will end the talk by reasoning about how we can use this to further improve the security guarantees for our system.
Zimin Chen (KTH) | 10:30 in Room 4523 and https://kth-se.zoom.us/j/69265621933
Machine learning techniques are increasingly being applied to software engineering tasks such as program generation and repair. However, the design of effective input and output representations is a critical aspect of these models. Since the size of the input and output of any machine learning models are subjective to memory and computation constraints. In this presentation, we will discuss our recent research papers that propose novel approaches for designing input and output representations of machine learning for software engineering models.
Joel Gärtner (KTH) | 10:00 in Room 1440 and https://kth-se.zoom.us/j/6293265597
Quantum computers are currently under active development and may soon be capable of solving certain problems that are believed to be infeasible to solve with classical computers. In particular, Shor's algorithm is a famous quantum algorithm that allows a sufficiently powerful quantum computer to break essentially all asymmetric cryptography that is currently being used. Therefore, there is a need for post-quantum secure cryptographic algorithms, that are not only secure against classical adversaries but also against those with access to large quantum computers.
Some of the primary candidates for post-quantum cryptography are based on the assumed hardness of variants of the Learning with Errors (LWE) problem. The LWE problem can be seen as a random instance of a special type of a lattice problem, and LWE-based cryptosystems are secure if this problem is infeasible to solve. The hardness of the LWE problem is asymptotically supported by a reduction that shows that a random LWE instance is hard as long as there exists a hard instance of a standard lattice problem. However, this reduction has in practice not been able to support the security of any concrete cryptosystem. In this talk, I will discuss post-quantum lattice-based cryptography, what reductions may say about the security of typical lattice-based cryptosystems and how I have been able to parametrize a scheme with security actually based on the worst-case hardness of a standard lattice problem.
Ning Dong (KTH) | 13:15 in Room 4523, and https://kth-se.zoom.us/j/5937312883
Computing devices have become an indispensable part of people’s daily life. In order to build a trustworthy stack, both hardware and software are necessary to consider. Accordingly, my research focuses on low-level hardware and software in the computing system by applying formal verification. In this presentation, I will briefly introduce my finished work: formal verification of an I/O device and its driver, and the executable semantics of the MIL (Machine Independent Language) formalization. The first work studied the Serial Peripheral Interface (SPI) and device-to-device information flow. The MIL work is a formal platform for reasoning about the information flow of low-level programs on the microarchitecture level, and checking information leakage caused by microarchitecture features like out-of-order execution. The executable semantics is used to evaluate MIL programs' execution and partially automate the conditional noninterference proof. The central part is the formal approach to verify in-order pipelined processor circuits. We have developed and verified a pipelined processor in the interactive theorem prover HOL4. I will present how we address general pipeline challenges and how to handle them in the correctness proof.
Anoud Alshnakat (KTH) | 13:15 in Room 1537 and https://kth-se.zoom.us/my/anoud
The emergence of Software-Defined Networking (SDN) has enabled the programming of network device data planes through specialized languages such as P4. However, this flexibility in programming can result in network correctness issues due to potential errors in P4 programs. Our aim is to provide insights into the formal verification of P4 programs and their potential to enhance the correctness of data planes.
To this end, we have developed HOL4P4, a formalization of P4 semantics using the interactive theorem prover HOL4. This semantics provides a formal foundation for the verification of P4 software. We have also introduced a type system to ensure the progress of P4 programs.
In this seminar, I will cover the main building blocks of the HOL4P4 semantics and provide illustrative examples. Furthermore, I will present the type system and discuss the challenges faced in its development. Finally, I will discuss our planned work and possible future work, including the obstacles that need to be overcome to develop P4 proof-producing verification tools.
Matthías Páll Gissurarson (Chalmers) | 11:00 in Room 1440 and https://kth-se.zoom.us/j/61671598890
Automatically repairing programs is difficult, especially when the program space we have to search is huge. In this talk, I'll describe how a strongly typed language like Haskell allows us to narrow the search space to a more manageable size, a data structure known as an ECTAs that allows us to efficiently represent such entangled program spaces, and how these can be used for automatic program repair and test synthesis based on property-based testing.
Mohammad M. Ahmadpanah (Chalmers University of Technology) | 11:00 in Room 4423 and Zoom
Trigger-Action Platforms (TAPs) empower applications for connecting otherwise unconnected devices and services. The current TAPs like IFTTT require trigger services to push excessive amounts of sensitive data to the TAP regardless of whether the data will be used in the app, violating the principle of data minimization. Furthermore, the rich features of modern TAPs, including IFTTT queries to support multiple trigger services and nondeterminism of apps, have been out of the reach of previous data minimization approaches like minTAP. In this talk, I will present LazyTAP, a new paradigm for fine-grained on-demand data minimization. LazyTAP breaks away from the traditional push-all approach of coarse-grained data over-approximation. Instead, LazyTAP pulls input data on-demand, once it is accessed by the app execution. Thanks to the fine granularity, LazyTAP enables tight minimization that naturally generalizes to support multiple trigger services via queries and is robust with respect to nondeterministic behavior of the apps. We achieve seamlessness for third-party app developers by leveraging laziness to defer computation and proxy objects to load necessary remote data on-demand and behind the scenes. We formally establish the correctness of LazyTAP and its minimization properties with respect to both IFTTT and minTAP. We implement and evaluate LazyTAP on a mixture of real and synthetic app benchmarks showing that on average LazyTAP improves minimization by 95% over IFTTT and by 38% over minTAP, while incurring a tolerable performance overhead.
Romy Tsoupidi | 10:00 in Room 1537 and https://kth-se.zoom.us/j/66518599125
Increased digitalization of services and the popularity of the Internet of Things (IoT) put increasing demand on system cybersecurity. The vulnerabilities that enable malicious actors to perform cyberattacks are often present in the binary code that runs on the target device. Such attacks include code-reuse attacks and side-channel attacks. The former reuse binary code snippets to stitch a payload that performs a malicious attack. The latter use so-called side channels, such as execution time or power consumption, to infer secret information. These attacks depend largely on the low-level code that the machine executes and this code typically depends on compilers. Compilers perform semantic-preserving transformations that aim at improving the performance, reducing the size, or minimizing the energy consumption of the generated code. However, general-purpose compiler transformations often do not preserve security properties, while state-of-the-art approaches to secure compilation do not achieve highly optimized code. We propose a method that allows the compiler to preserve security properties and, at the same time, achieve highly optimized code.
In this talk, I will present my work on securing optimized code and code verification on WebAssembly code. In particular, I will talk about 1) execution-time-aware software diversification against large-scale code-reuse attacks, 2) optimizing code against power side channels, 3) combining efficiency-aware compiler-based security migitations, and 4) verifying constant-time programs in WebAssembly. The first work presents a code diversification approach that operates at the machine-lever intermediate representation using a combinatorial compiler backend to achieve fine-grained diversification at the function level, while controlling the execution-time overhead of the generated code. The second work describes an approach on optimizing (performance) software-masked code, while preserving security properties that hinder power-side channels. The third work presents an approach to software diversification to generate secure and optimized code against code-reuse attacks and side-channel attacks. Finally, the last work is an approach to verify the constant-time property, which mitigates timing side-channel attacks at the WebAssembly level using relational symbolic execution. WebAssembly is a recent low-level language for the Web and is the target of many security-critical and performance-demanding Web applications.
The material for these works, including the relevant publications (one publication that is under submission is not available on the Website) are available at: https://romits800.github.io/publications.html.
Didrik Lundberg (KTH) | 10:00 in Room 4523 and https://kth-se.zoom.us/j/63683731064
The broad topic of my research is creating the infrastructure for formal verification of low-level system software, device drivers and programmable network elements with provable security guarantees using an interactive theorem prover (ITP). Specifically, this is in the context of network and communications solutions. Part of this challenge is to incorporate precise models of various types of architectures and external or auxiliary components and make them available to be used in an interactive theorem prover together with different pieces of code. This is not merely a practical issue, but rather a question of how to formalize into theorems just how various components are interdependent in an unstructured setting, and how to formulate modular abstractions from which you can easily add and remove external components. In particular, for programmable network elements, the data plane and the control plane together with other external functionality should be made modularly composable.
Concretely, this is realized by implementation in the ITP HOL4 of unstructured program logics, a formalization of the P4 language (with associated tools) and extensions of the HolBA toolbox, most importantly to weak memory models for the multicore setting.
Guangyi Zhang | 13:30 in F3, Lindstedtsvägen 26 and https://kth-se.zoom.us/j/69024442639
Submodular functions characterize mathematically the ubiquitous ``diminishing-returns'’ property. They are widely used to describe core subjects in numerous applications, including economic utility, redundancy in information, spread of influence in social networks, and more. Optimization of submodular functions thus plays a central role in a broad range of applications in data science. However, existing works concentrate largely on optimizing a single submodular function, with a sole focus on succinct subset selection among massive data. Other circumstances have not been fully explored, for example, when there exists interplay between a submodular function and other components.
In this thesis, we study optimization beyond a single (non-decreasing) submodular function in the following three areas.
Ranking: Ranking arises naturally in the presence of multiple submodular functions, for instance, when a list of shared resources is sequentially served to satisfy multiple submodular demands with individual budgets. One concrete example is ranking web pages to satisfy multiple user intents with different ``patience.'' We first study this ranking problem under cardinality constraints, and then we consider extensions of the basic setting, including knapsack constraints, streaming settings, and robustness requirements.
Decision trees: A tree (or a policy) can be seen as an adaptive generalization of ranking. Popular decision trees for classification, including CART and C4.5, are constructed by recursive greedy splits, guided by some impurity function. These trees are accurate, but may not be easy to understand due to a potentially large tree size. A novel characterization of a decision tree is via adaptive intersection among multiple submodular functions, one for each object to be classified. We discover that this characterization enables one to analyze and control the tree complexity for a large family of impurity functions. As a consequence, we are able to produce accurate and interpretable decision trees with bounded complexity.
Diversity: Multi-objective optimization is frequently encountered in reality. One specific important form is a joint optimization of utility and diversity, where utility is often modeled by a submodular function. From the theoretical side, we discuss new techniques for such joint optimization over clustered input; from the practical side, we propose one novel application in learning diverse rule sets, where diversity encourages non-overlapping rules that deliver desirable unambiguous decisions.
For each of the three areas discussed above, we introduce novel formulations for applications in data science, and devise algorithms that are efficient, easy to implement, and have provable approximation guarantees. In addition to theory, we also highlight their practical potential by presenting empirical performance in selected applications. Our work significantly enhances the modeling capabilities of submodular optimization for novel application domains.
More info can be found in Diva: http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-322458
Joakim Blikstad | 15:00 in Room 4523 and https://kth-se.zoom.us/my/blikstad
Joakim Blikstad will present recent developments in matroid intersection algorithms. Matroid intersection is a fundamental combinatorial optimization problem which can model a range of problems, e.g. bipartite matching or packing spanning-trees / arborescences.
We obtain faster matroid intersection algorithms by studying a specific graph reachability problem, for which we show new and simple algorithms. The exact complexity of this reachability problem, however, remains open. Combining our graph reachability algorithms with recent approximation algorithms, we manage to obtain the first subquadratic matroid intersection algorithms, thus breaking the "quadratic barrier".
No background about matroid intersection is needed for the talk.
The talk is mostly based on the STOC'21 paper "Breaking the Quadratic Barrier for Matroid Intersection", (https://arxiv.org/abs/2102.05548. Joint work with Jan van den Brand, Sagnik Mukhopadhyay and Danupon Nanongkai) but also on the ICALP'21 paper "Breaking O(nr) for matroid intersection", ICALP'22 paper "Sublinear-round parallel matroid intersection" and a recent paper "Fast Algorithms via Dynamic-Oracle Matroids" still in submission.