2025

27 June 2025

Securing confidential multi-party computations by the means of trusted execution environments and formal verification tools (80% seminar)

Marcus Birgersson (KTH) | 13:30 in 1440 Henrik Eriksson

More and more confidential data is used in cloud environments, which increases the demand for protecting said data. Storing data in a secure and encrypted way is easy, however this restricts how the data can be used. Hence, protecting data at rest and at transit is fairly simple, but protection during computation is hard. This problem increases in complexity as soon as we involve several data owner's confidential data in the computation.

Secure computations of confidential data between multiple mutually distrusting parties is a problem with several solutions, all with their own benefits and drawbacks. Current techniques are either not suitable in practice due to large overheads, or focus on single user computations.

In this talk I will present 3 of my previously accepted papers focusing on several techniques for managing aggregation of confidential data in a multi-user setting. By leverage the integrity and confidentiality guarantees given by so called 'Trusted Execution Environments', it is possible to achieve secure multiparty computations while protecting data at use, with a reasonably large overhead, that can be used in a practical setting.

Finally I will end with a short overview of my ongoing work.

26 June 2025

[80% Seminar] Jonas Spenger -- Programming Models for Failure-Transparent Distributed Stateful Services

Jonas Spenger (KTH) | 10:30 in 1440, Henrik Eriksson and Zoom

Distributed stateful services are omnipresent, underpinning critical infrastructure such as E-commerce, entertainment, payment services, etc. However, building reliable services is challenging due to a lack of both flexible and failure-transparent systems. Existing technologies such as Actor systems are flexible but not failure transparent; whereas Dataflow Streaming systems are failure transparent but not flexible. To address this problem, this thesis introduces new programming models that are both flexible and failure-transparent, by unifying ideas from Actor and Dataflow Streaming systems. In particular, the thesis introduces (1) Portals, an extension to Stateful Dataflow with Actor-like messaging, and demonstrates how it can be used for service composition; (2) Windowed Conflict-free Replicated Data Types (WCRDTs), a failure-transparent extension to CRDTs, demonstrating its performance benefits for aggregation workloads; and (3) a formal definition of failure transparency together with proofs of related systems. In conclusion, the thesis contributes new and better methods for building flexible and reliable distributed services, and new methods for reasoning about them.

16 June 2025

[50% Seminar] André Silva - Gradient-Based Program Repair: Fixing Bugs in Continuous Program Spaces

André Silva | 13:00 in Room 4523, Stefan Arnborg

Automatic program repair seeks to generate correct code from buggy programs, with most approaches searching the correct program in a discrete, symbolic space of source code tokens. This symbolic search is fundamentally limited by its inability to directly reason about program behavior. We introduce Gradient-Based Program Repair (GBPR), a new paradigm that reframes program repair as continuous optimization in a differentiable numerical program space. Our core insight is to compile symbolic programs into differentiable numerical representations, enabling search in the numerical program space directly guided by program behavior. To evaluate GBPR, we present RaspBugs, a new benchmark of 1,466 buggy symbolic RASP programs and their respective numerical representations. Our experiments demonstrate that GBPR can effectively repair buggy symbolic programs by gradient-based optimization in the numerical program space, with convincing repair trajectories. To our knowledge, we are the first to state program repair as continuous optimization in a numerical program space. Our work establishes a new direction for program repair research, bridging two rich worlds: continuous optimization and program behavior.

12 June 2025

Sijing Tu PhD Defense "Models and Algorithms for Addressing Challenges in Online Social Networks"

Sijing Tu | 14:00 in F3 Flodis, Lindstedtsvägen 26

Social network platforms such as Facebook and X (formerly Twitter) facilitate convenient access to news and discussions and enable individuals to express their opinions on societal issues. In recent years, numerous challenges have emerged as social network platforms present significant societal issues, such as increasing political polarization and the circulation of misinformation and disinformation. Malicious actors have exploited these platforms to target vulnerable individuals and manipulate the content they encounter on critical societal matters. Furthermore, algorithmic mechanisms implemented by these platforms, such as information filtering and personalized news feeds, have contributed to the formation of filter bubbles. The filter bubbles restrict individuals’ exposure to diverse perspectives and reinforce existing biases on societal issues.  This thesis aims to deepen our understanding of emerging challenges in social network platforms by conceptualizing them as computational problems. We examine the intricate interplay between information flow, human interactions, and algorithmic interventions, selecting and proposing appropriate models to frame these dynamics. We transform complex real-world challenges into computational problems with precise mathematical formulations. We then analyze the complexity of these problems and design approximation algorithms to address them.  This thesis comprises six publications and is organized around four research topics. First, we examine the capacity of malicious actors to amplify political polarization and shift individuals’ opinions toward extreme viewpoints. The two associated publications consider scenarios in which malicious actors either influence the opinions of a small subset of individuals or have extensive connections in the network. Second, we propose methods to mitigate filter bubbles by increasing individuals’ exposure to diverse information, achieved either through a viral marketing campaign or by adjusting the exposure of a small subset of individuals. Third, we analyze the impact of viral marketing campaigns on the opinion-formation process, introducing a model that integrates the dynamics of information dissemination with opinion formation. Fourth, we propose the OptiRefine framework for classical problems in social network analysis, such as the max-cut problem and the densest subgraph problem. The framework defines a class of problems for which an initial solution is given. The goal is to identify a new solution that remains close to the original while optimizing predefined objective functions, such as the cut value or the subgraph density. All proposed approaches are rigorously evaluated against multiple baseline algorithms and heuristics in all publications.   Supervisors: Prof. Aristides Gionis, Associate Prof. Per Austrin Opponent: Associate Prof. Johan Ugander, Stanford University, USA Grading committee: Prof. Jie Gao, Rutgers University, USA Prof. Evaggelia Pitoura, University of Ioannina, Greece Prof. Mauro Sozio, LUISS University of Rome, Italy Deputy Committee: Assistant Prof. Angela Fontan, KTH Chair: Associate Prof. Musard Balliu, KTH

11 June 2025

Workshop after Mojtaba Eshghie PhD Defense

Cyrille Artho (KTH) | 13:30 in Room 1537

13:30 Gordon Pace Time, Contracts and Conflicts. 13:50 Gerardo Schneider Interest beyond Violation: On Points-of-Interest in Runtime Verification 14:20 Cesar Sanchez Monitorability of Smart Contracts 14:50 Coffee break 15:20 Marjan Sirjani Using Actors and Model Checking in Action 15:50 Thomas Hildebrandt DACEO: Declarative Asynchronous Choreographies with general Data-dependent Event-ordering and Objects

11 June 2025

PhD Defense: Securing Smart Contracts Against Business Logic Flaws

Mojtaba Eshghie | 09:00 in Kollegiesalen, Brinellvägen 6

In recent years, severe incidents have happened in smart contracts from often-overlooked and costly business logic flaws. These flaws occur when the implementation of a contract deviates from its intended business logic behavior, creating vulnerabilities. This thesis provides a framework of methods for preventing and detecting these vulnerabilities throughout the lifecycle of smart contracts. This work employs Dynamic Condition Response (DCR) graphs, a declarative formalism for modeling business processes, to formally model smart contract business logic. This formal DCR modeling yields precise specifications, capturing high-level design and uncovering pre-development design flaws by making implicit assumptions about the behavior explicit. This work provides a comprehensive list of formalized business logic-level design patterns that underpin an off-chain monitoring tool. This DCR-based monitor detects runtime deviations from specified logic, demonstrating a high precision in both single-chain and cross-chain decentralized applications. Furthermore, the results show that Large Language Models (LLMs), guided by DCR specifications, can synthesize diverse exploits for these flaws with remarkable precision and efficiency. Further contributions include a tool for semantic differencing of smart contract invariants that compares their logical strength despite syntactic variations, reducing noise in dynamically-mined invariants by up to 41.8%. A systematic, multi-stage lifecycle model for blockchain oracles is also presented, identifying distinct attack vectors per stage to enable targeted mitigation. Collectively, these contributions offer end-to-end strategies for building more resilient decentralized applications.

Following the defense, a lunch will be held at Germund Dahlquist (1537), floor 5, Lindstedtsvägen 3. Later in the afternoon, a workshop is scheduled from 13:30 to 16:30. This workshop will involve discussions with the grading committee, the opponent, and researchers from Chalmers University and the University of Copenhagen. More information on the workshop will be announced later.

9 June 2025

A Roadmap for Server-Side JavaScript Sandboxing

Mohammad M. Ahmadpanah (Postdoc at KTH) | 11:00 in Room 1537

In this talk, we explore the state of server-side JavaScript sandboxing, a critical mechanism for executing untrusted code securely. We demonstrate how sandbox breakouts in real-world commercial platforms can lead to exposing sensitive data and executing arbitrary code. Our study reveals that, especially after the deprecation of the popular solution vm2 due to fatal flaws, there is no go-to sandboxing solution. To understand how developers are coping, we conduct a GitHub mining study that uncovers concerning migration patterns, and we also study security trade-offs in current alternatives. In response, we introduce Fiberglass, a new sandbox based on isolated-vm, designed to enforce robust isolation while enabling secure, mediated bidirectional reference transfer between host and sandbox. We conclude with a decision tree to guide developers and researchers in choosing a suitable secure sandbox, balancing isolation guarantees with functional requirements of a given application. This is a joint work with Daniel Hedin, Tamara Rezk, and Andrei Sabelfeld.

30 May 2025

Object Graph Programming

Milos Gligoric (The University of Texas at Austin) | 11:00 in Room 1537 Germund Dahlquist

I will present Object Graph Programming (OGO), which enables reading and modifying an object graph (i.e., the entire state of the object heap) via declarative queries. OGO models the objects and their relations in the heap as an object graph thereby treating the heap as a graph database: each node in the graph is an object (e.g., an instance of a class or an instance of a metadata class) and each edge is a relation between objects (e.g., a field of one object references another object). We use Cypher, the most popular query language forgraph databases, as OGO's query language. Unlike LINQ, which uses collections (e.g., List) as a source of data, OGO views the entire object graph as a single "collection". OGO is ideal for querying collections (just like LINQ), introspecting the runtime system state (e.g., finding all instances of a given class or accessing fields via reflection), and writing assertions that have access to the entire program state. We prototyped OGO for Java in two ways: (a) by translating an object graph into a Neo4j database on which we run Cypher queries, and (b) by implementing our own in-memory graph query engine that directly queries the object heap. We used OGO to rewrite hundreds of statements in large open-source projects into OGO queries. I will present our experience with building OGO and performance of our prototypes.

30 May 2025

Opinion dynamics towards the climate talks: concatenated Friedkin-Johnsen model and social power game

Dr. Lingfei Wang (KTH) | 10:00 in Room 1440, Henrik Eriksson

In this talk, I will present two opinion dynamics models inspired by the long and complex negotiation process of the UN climate talks. The first model, the concatenated Friedkin-Johnsen (CFJ) model, describes the evolution of opinions during a series of discussions among stubborn agents. A key feature of the CFJ model is that, over time, it leads to opinion consensus—akin to the 2015 Paris Agreement, which was signed by all participating parties. The second model, the social power game, is motivated by the strategic interactions that occur during the climate talks. In this model, each agent’s level of stubbornness in the CFJ model represents her action, and the corresponding utility is her social power, which is defined as her influence on the final opinion formation. The social power game adopts an “early mover” advantage: agents who are more stubborn in early discussions gain higher social power. This effect can be explained by a diminishing return law inherent in the CFJ model. The theoretical results will be illustrated using data from the climate talks, highlighting how these models can provide insight into the dynamics of large-scale negotiations. Zoom Link: https://kth-se.zoom.us/j/68887520573

28 May 2025

Towards Practical Defenses Against Traffic Analysis (30% Seminar)

Janos Kövér | 15:00 in Lindstedtsvägen 3, floor 4, room 1440, Henrik Eriksson

Comprehensive protection of otherwise directly observable information transmitted on a network can be ensured through cryptography. However, there may be information useful for inference still available for (even passive) attackers if measures are not taken against side channel information leaks. Traffic Analysis (TA) is an established area of research, yet new architectures and deployment scenarios warrant for more investigation. In this seminar, I will provide some background on Traffic Analysis, followed by the introduction of our approach to maintaining on-path network observability while mitigating TA attacks in a microservice-communication context. Our design allows for e.g., anomaly detection performed by a semi-trusted node, while lowering a TA attacker's accuracy in classifying traffic patterns for the purpose of inferring potentially sensitive information. I will also present the evaluation results and findings related to our proof-of-concept implementation. In the last part of my talk I will give an overview of our planned research directions.

Zoom meeting: https://kth-se.zoom.us/j/64329309387

28 May 2025

When cryptocurrency meets network security: the layers, the attacks, and the defenses

Muoi Tran (Chalmers) | 10:30 in 1537, Germund Dahlquist

Cryptocurrencies, built on blockchain technology, have attracted significant attention in recent years, both in research and practice. They’ve also become prime attack targets, especially at the network layer, which fundamentally enables consensus and application innovations at higher layers. This talk explores the security of blockchain networks from both offensive and defensive perspectives. We’ll look at two stealthy attacks, namely Erebus attacks, which partition peer-to-peer networks, and Erosion attacks, which disrupt proof-of-work mining pools. We’ll then discuss why defending against such threats is challenging, largely due to the insecure nature of the Internet. Finally, we’ll briefly discuss security issues beyond the network layer, including consensus and applications. Also on Zoom: https://kth-se.zoom.us/j/66490806343

22 May 2025

Efficient discovery of significant patterns with few-shot sampling

Prof. Fabio Vandin (University of Padova) | 10:00 in Room 1440 (Henrik Eriksson)

Significant pattern mining is a fundamental task in mining transactional data, requiring to identify patterns significantly associated with the value of a given feature, the target. A natural way to capture the association of a pattern with the target is to consider its statistical significance, assessing its deviation from the (null) hypothesis of independence between the pattern and the target. While several algorithms have been proposed to find statistically significant patterns, it remains a computationally demanding task, and for complex patterns such as subgroups, no efficient solution exists. I will present FSR, an efficient algorithm to identify statistically significant patterns with rigorous guarantees on the probability of false discoveries. FSR uses a small number of resampled datasets, obtained by assigning i.i.d. labels to each transaction, to rigorously bound the supremum deviation of a quality statistic measuring the significance of patterns, while providing a high effectiveness in discovering significant patterns. This is joint work with Leonardo Pellegrina.

21 May 2025

Automated Program Repair for Smart Contracts (30% Seminar)

Sofia Bobadilla | 10:00 in Room 1537, Germund Dahlquist

Smart contracts on blockchains have suffered billions in losses due to exploitable vulnerabilities. Automated Program Repair (APR) tools promise to automatically fix these issues, but their real-world effectiveness remains largely unexamined. In this seminar, I present our novel experimental framework for evaluating smart contract APR tools against actual exploits—a critical dimension missing in current research. I will share preliminary findings from our systematic evaluation of 20 state-of-the-art APR tools on 143 vulnerable smart contracts, for which we manually crafted 91 executable exploits. Our study introduces and measures the "exploit mitigation rate," revealing surprising efficacy gaps ranging from 27% to 73% across tools. The presentation will also cover our identification of limitations in current approaches. I'll conclude with a roadmap for my ongoing PhD research, focusing on offensive security.

30 April 2025

Securing Automation Platforms: From Internet of Things to CI/CD Pipelines (30% Seminar)

Mojtaba Moazen | 15:00 in Room 1440 Henrik Eriksson

As automation becomes an increasingly integral part of modern software systems it also expands the attack surface, exposing users and systems to new classes of security and privacy threats. In this seminar, I will present our research results on securing two critical automation platforms: cloud-based Trigger-Action Platforms (TAPs) and GitHub Actions. First, I will introduce TAPShield, a system designed to secure JavaScript-driven TAPs such as Node-RED and IFTTT. TAPShield leverages confidential computing using Intel SGX and language-level sandboxing to protect users’ sensitive data from both untrusted cloud providers and malicious third-party apps. I will then present the evaluation results of TAPShield's security, performance, and compatibility on real-world applications. In the second part of the talk, I will present ongoing research on improving the security of GitHub workflows, which are widely used in software development pipelines. This work addresses API abuse attacks, where compromised or hijacked Github actions misuse API tokens beyond their intended scope, leading to supply chain attacks. Our approach aims to introduce finer-grained security enforcement, enabling precise control and monitoring of the API usage during workflow execution.

Zoom link : https://kth-se.zoom.us/j/7913501467

24 April 2025

Automated Memory Forensics: Analyzing Data Structures from Memory Dumps

Matteo Dell'Amico (University of Genoa) | 14:00 in Room 1537 Germund Dahlquist

This talk presents a memory forensics approach that reconstructs data structures from memory dumps of running software, without requiring source code or prior system information. Introduced by Oliveri et al. at NDSS 2023, the approach identifies and extracts common data structures—such as linked lists, trees, and arrays—providing insight into how software organizes its information. While the original goal was to perform memory forensics tasks at the OS level (e.g., listing running processes or open files), we recently experimented with techniques to apply the same approach in the userspace, to aid reverse engineering by revealing the internal organization of data within a binary. Attendees will learn about the theoretical foundations and practical considerations of this approach and explore its potential applications in analyzing software data structures.

24 April 2025

Automated Memory Forensics: Analyzing Data Structures from Memory Dumps

Matteo Dell'Amico (University of Genoa) | 14:00 in Room 1537 Germund Dahlquist

23 April 2025

Reskilling AI/GenAI: Exploring the 12 new roles

Amit Prabhu | 15:00 in Room 1440 Henrik Eriksson

This talk will focus on the need for reskilling, the new reskilling areas, and how to begin the reskilling journey in the age of generative AI. It is based on my upcoming 3rd book Scaling Generative AI: An Operational Readiness Framework for Enterprise. The framework consists of 4 readiness areas: Customer, Technology, Data, and People. This talk specifically addresses the People readiness area, offering practical guidance that will be directly relevant to the audience.

21 March 2025

Technical Debt Management and LLM-based Refactoring with CodeScene

Markus Borg (CodeScene / Lund University) | 14:00 in Room 1440, Henrik Eriksson

CodeScene is a modern software engineering intelligence platform that blends static code analysis with insights into organizational dynamics. Unlike traditional tools that focus solely on code quality, CodeScene also provides insights into the human aspects of software development. Core features of CodeScene include technical debt management and software visualization. CodeScene provides assistance through automated code reviews and quality gates in the CI pipeline, i.e., providing code smell detection in pull requests. Moreover, CodeScene provides an IDE integration for VS Code that helps developers mitigate code smells before committing code. Thanks to the advent of LLMs, CodeScene now offers code smell remediation in the IDE – a freemium solution that has been available for selected code smells since December 2024. In this talk, I will demo key CodeScene features and introduce the research we are doing related to technical debt management and AI-assisted development. Furthermore, I will share personal reflections on my role as a researcher with an industrial tool vendor – a position I’ve now had for 2.5 years.

14 March 2025

Adversary Emulation in the Age of Generative AI

Roberto Natella (Federico II University of Naples) | 10:00 in Room 1440 Henrik Eriksson

Cybersecurity threat actors have been evolving into complex organizations with technical and financial means to deliver powerful attacks that have significant impacts on the economy and infrastructures. These threat actors are also looking with high interest at the recent evolution of Generative AI for malicious purposes. Generative AI is also a valuable opportunity to enhance cybersecurity. This presentation will look at emerging applications of Generative AI for Adversary Emulation, that is, the emulation of attack techniques for assessment purposes. In particular, we will discuss the role of Large Language Models (LLMs) in supporting cybersecurity analysts, by automatically generating malicious code to mimic threat actors.

12 March 2025

The end of open source? Regulating open source under the cyber resilience act and the new product liability directive

Liane Colonna (Stockholm University/Swedish Law and Informatics Research Institute) | 15:00 in Room 1537 "Germund Dahlquist"

Rooted in idealism, the open-source model leverages collaborative intelligence to drive innovation, leading to major benefits for both industry and society. As open-source software (OSS) plays an increasingly central role in driving the digitalization of society, policymakers are examining the interactions between upstream open-source communities and downstream manufacturers. They aim to leverage the benefits of OSS, such as performance enhancements and adaptability across diverse domains, while ensuring software security and accountability. The regulatory landscape is on the brink of a major transformation with the recent adoption of both the Cyber Resilience Act (CRA) as well as the Product Liability Directive (PLD), raising concerns that these laws could threaten the future of OSS.

This paper investigates how the CRA and the PDL regulate OSS, specifically exploring the scope of exemptions found in the laws. It further explores how OSS practices might adapt to the evolving regulatory landscape, focusing on the importance of documentation practices to support compliance obligations, thereby ensuring OSS's continued relevance and viability. It concludes that due diligence requirements mandate a thorough assessment of OSS components to ensure their safety for integration into commercial products and services. Documentation practices like security attestations, Software Bill of Materials (SBOMs), data cards and model cards will play an increasingly important role in the software supply chain to ensure that downstream entities can meet their obligations under these new legal frameworks.

Link to paper: https://www.sciencedirect.com/science/article/pii/S0267364924001705

21 February 2025

Supply Chain Security for Server-Side JavaScript Applications (30% Seminar)

Eric Cornelissen (KTH) | 13:00 in Room 1440, Henrik Eriksson

Node.js and the JavaScript ecosystem have a reputation for complex supply chains and a large volume of publicly disclosed security advisories. Systematically attacking and defending JavaScript-based applications is a challenging task due to the dynamic nature of the language. In this seminar, we present our work on both fronts, attack and defense, for the supply chain of server-side JavaScript applications. For the former, we present a semi-automated dynamic analysis pipeline for uncovering prototype pollution (a class of vulnerabilities unique to JavaScript) gadgets in server-side JavaScript runtimes. For the latter, we present ongoing research into runtime hardening of JavaScript applications through SBOM and capability enforcement.

3 February 2025

Formal Verification and Test Design for Distributed Redundant Controllers

Marjan Sirjani (Malardalen University) | 11:00 in room 4523, Stefan Arnborg

A potential problem that may arise in the domain of distributed control systems is the existence of more than one primary controller in redundancy plans which may lead to inconsistency. We worked on an algorithm called NRP FD (Network Reference Point Failure Detection), proposed by industry, to solve this issue by prioritizing consistency over availability. I explain how by using modeling and formal verification, we discovered an issue in NRP FD where we may have two primary controllers at the same time. We then provide a solution to mitigate the identified issue, thereby enhancing the robustness and reliability of such systems. In the same context, I will also show how we used model checking for making informed decisions in designing test cases for fault-tolerant systems. I will add a discussion on how this approach may be generalized in different contexts.

10 January 2025

Fortifying Decentralized Financial Systems: Towards Secure Smart Contracts

Christof Ferreira Torres (Instituto Superior Técnico Lisboa) | 10:00 in Room 1440, Henrik Eriksson and https://kth-se.zoom.us/j/66862244070

Modern blockchains, like Ethereum, have surged in popularity over recent years, unlocking access to new financial instruments known as decentralized finance (DeFi). The security of DeFi largely depends on the security of smart contracts, which are programs deployed and executed across the blockchain. Like traditional software, smart contracts are prone to programming errors. However, unlike traditional code, their code is publicly visible and immutable. On platforms like Ethereum, smart contracts currently manage funds worth billions of dollars, making them a prime target for attackers. Over the past few years, numerous exploits have targeted smart contracts. In this talk, we will briefly explore the challenges and insights into detecting vulnerabilities in smart contracts through techniques like symbolic execution and fuzzing. We will also discuss the differences between traditional software security and economic security in the context of smart contracts. Finally, we will look at how automated program repair, using context-aware patch templates, can help protect vulnerable smart contracts.

9 January 2025

An Empirical Study on Reproducible Packaging in Open-Source Ecosystems

Giacomo Benedetti (University of Genoa) | 11:00 in Room 4523, Sefan Arnborg

The integrity of software builds is fundamental to the security of the software supply chain. While Thompson first raised the potential for attacks on build infrastructure in 1984, limited attention has been given to build integrity in the past 40 years, enabling recent attacks on SolarWinds, xz, and Ultralytics. The best-known defense against build system attacks is creating reproducible builds; however, achieving them can be complex for both technical and social reasons and thus is often viewed as impractical to obtain.

In this study, we analyze the reproducibility of builds in a novel context: reusable components distributed as packages in six popular software ecosystems (npm, Maven, PyPI, Go, RubyGems, and Cargo). A quantitative study on a representative sample of packages in each ecosystem raises concerns: Rates of reproducible builds vary widely between ecosystems, with some ecosystems having all packages reproducible whereas others have reproducibility issues in nearly every package.

However, upon deeper investigation, we identified that with relatively straightforward infrastructure configuration and patching of build tools, we can achieve very high rates of reproducible builds in all studied ecosystems. We conclude that if the ecosystems adopt our suggestions, the build process of published packages can be independently confirmed for nearly all packages without individual developer actions, and doing so will prevent significant future software supply chain attacks.

18 December 2024

Exploring New Strategies in N-Version Programming: Blockchain Resilience and LLM-Generated Versions (50% Seminar)

Javier Ron (KTH) | 10:00 in Room 4523

In this seminar I will present two works related to N-Version Programming. The first focuses on leveraging existing diversity in blockchain protocols to increase node availability; and the second in automated generation of N-version functions. Specifically, I first introduce N-ETH, an N-version Ethereum blockchain node designed to improve availability under unstable execution environments. By leveraging diverse Ethereum node implementations, N-ETH mitigates faults by relying on asymmetric behavior. This is validated via system-call level fault injection experiments. Then, I present GALÁPAGOS, a tool that uses large language models (LLMs) to automate the generation of diverse function variants for the creation of N-Version programs. GALÁPAGOS generates functionally equivalent yet diverse software variants, to mitigate miscompilation bugs. Together, these works demonstrate how system-level diversity and AI-driven software generation can provide robust fault-tolerance solutions for modern computing environments.

16 December 2024

Fairness-aware and diversity-enhancing approximation algorithms (50% seminar)

Honglian Wang (KTH) | 15:00 in Room 1440 and https://kth-se.zoom.us/j/3467429329

In this seminar, I will present two projects.

The first project, titled "Sequential Diversification with Provable Guarantees," addresses the sequential diversity maximization problem, a widely studied topic with applications in information retrieval and recommendation systems. Unlike traditional diversity maximization, we focus on a sequential setting where a user interacts with a sequence of items presented in a predefined order. At each step, the user can choose to engage with an item or quit the interaction based on certain probabilities. The objective is to determine the optimal ordering of items to maximize the expected sequential diversity the user accumulates before quitting. To tackle this, we developed two approximation algorithms and established their provable guarantees.

The second project, "Fairness-Aware PageRank via Edge Reweighting," explores the emerging problem of improving group fairness in the PageRank vector. Existing research primarily modifies inputs to the PageRank algorithm, such as the transition matrix, restart vector, or restart probability, without altering the algorithm itself. In this paper, we achieve fairness by reweighting the edge weights of the input graph. We propose a fairness loss function, whichis non-convex, and we optimzied it with a gradient descent approach.

11 November 2024

Leveraging Native Network Modality in Radio Networks via Graph Representation Learning (80% Seminar)

Yifei Jin (KTH) | 13:00 in Ada Room, Electrum and https://kth-se.zoom.us/j/62775642421

In this seminar, I will present three published papers and one under-review submission, all centered around the application of Graph Representation Learning (GRL)—encompassing Graph Neural Networks and Relational Representation Learning—to radio networks for enhanced user quality of service (QoS) estimation. With recent advancements toward 6G, network topologies have become denser and more complex, featuring intricate relational dependencies between network nodes and the surrounding RF environment. Traditional deep learning approaches, however, struggle to integrate domain-specific heuristics as representational priors within these models. Additionally, existing methods often lack the expressiveness needed to capture categorical structures, such as performance bottlenecks across multiple routing paths and inter-relations in cellular coverage. By embedding graph and relational modalities into GRL models, our work demonstrates substantial performance gains over state-of-the-art approaches, underscoring the value of domain-informed representation learning in complex network scenarios.

16 October 2024

Specular: Towards Secure, Trust-minimized Optimistic Blockchain Execution

Zhe Ye (UC Berkeley) | 17:00 in Zoom

An optimistic rollup (ORU) scales a blockchain’s throughput by delegating computation to an untrusted remote chain (L2), refereeing any state claim disagreements between mutually distrusting L2 operators via an interactive dispute resolution protocol. State-of-the-art ORUs employ a monolithic dispute resolution protocol that tightly couples an L1 referee with a specific L2 client binary—oblivious to the system’s higher-level semantics. We argue that this approach (1) magnifies monoculture failure risk, by precluding trust-minimized and permissionless participation using operator-chosen client software; (2) leads to an unnecessarily large and difficult-to-audit TCB; and, (3) suffers from a frequently-triggered, yet opaque upgrade process—both further increasing auditing overhead, and broadening the governance attack surface.To address these concerns, we outline a methodology for designing a secure and resilient ORU with a minimal TCB, by facilitating opportunistic 1-of-N-version programming. Due to its unique challenges and opportunities, we ground this work concretely in the context of the Ethereum ecosystem—where ORUs have gained significant traction. Specifically, we design a semantically-aware proof system, natively targeting the EVM and its instruction set. We present an implementation in a new ORU, Specular, that opportunistically leverages Ethereum’s existing client diversity with minimal source modification, demonstrating our approach’s feasibility.

15 October 2024

Benchmarking and Build Analysis for Breaking Dependency Updates (30% Seminar)

Frank Reyes García (KTH) | 11:00 in Room 4523

In modern software development, it is common to rely on third parties, which facilitates code reuse and speeds up project delivery. However, updating these dependencies has significant risks, as even minor version changes can lead to breaking changes that result in failed compilation of the application. In this talk, we present two innovative contributions to address the challenges posed by breaking dependency updates: BUMP, a benchmark for reproducible updates of breaking dependencies, and Breaking-Good, an automated analysis tool for fixing these failures.

BUMP contains 571 real-world-breaking dependency updates from 153 Java projects, ensuring reproducibility through Docker containers. This benchmark serves as a critical resource for researchers and developers, enabling consistent evaluations of tools to mitigate the impact of breaking changes.

Breaking-Good complements BUMP by providing an automated approach to understanding the root causes of build failures associated with dependency updates. By analyzing build logs and configuration files, Breaking-Good generates explanations and potential solutions for identified problems, significantly simplifying the debugging process for developers.

Together, these contributions increase the soundness of software development practices by improving the handling of dependency updates.

14 October 2024

Automated Performance Modeling, Benchmarking, and Insights into Plagiarism in Computer Science

Larissa Schmid (Karlsruhe Institute of Technology (KIT)) | 11:00 in Room 1537

In this talk, I will present my past and current research. The main focus will be on my work on automated performance modeling and benchmarking methodologies for scientific software and serverless workflows. In this work, we introduce methodology to reduce the cost of performance modeling by identifying parameters relevant to performance and exploiting insights about their interaction using results of a code analysis. Moreover, we introduce SeBS-Flow, a benchmarking suite for serverless workflows, enabling comparative analysis across cloud platforms. In another line of research, we are investigating plagiarism detection in computer science education and how we can automatically detect plagiarized solutions in the face of increasingly sophisticated obfuscation techniques, including AI-based methods. Finally, I will touch upon my personal commitment regarding diversity in computer science, more specifically about investigating factors encouraging and discouraging women from pursuing a PhD in computer science.

13 September 2024

Breaking the Bubble: Reducing Polarization and Boosting Information Diversity Online (50% Seminar)

Tianyi Zhou (KTH) | 10:00 in Room 1440 and https://kth-se.zoom.us/j/3467429329

Online media is an important part of modern information society and it is expected to foster diversity and break information barriers. However, the reality is quite the opposite: online media platforms often reinforce users' preexisting beliefs by showing them content that aligns with their views, creating "echo chambers" where diverse perspectives are ignored. As a result, it leads to greater social and political polarization.

In this talk, I will talk about two novel approaches that attempt to reduce online media polarization and foster information diversity. The first work is focused on modeling the impact of timeline algorithms on the political opinions of users. Timeline algorithms are key parts of online social platforms like X (Twitter), but in recent years they have been blamed for increasing polarization and disagreement in our society. Opinion-dynamics models have been used to study a variety of phenomena in online social networks, but an open question remains on how these models can be augmented to take into account the fine-grained impact of user-level timeline algorithms. We make progress on this question by providing a way to model the impact of timeline algorithms on opinion dynamics. We use our model to study the problem of minimizing polarization and disagreement and present a gradient descent-based algorithm for this problem with a theoretical guarantee. The second work was inspired by a recent debate about whether automated news aggregators, like Google News, which recommend articles from various online sources, lead readers to content that reinforces their existing beliefs and restricts their exposure to a diversity of perspectives. We design methods for selecting a set of articles that simultaneously cover all possible viewpoints, while also taking into account user preferences. In particular, we explore the trade-off between different viewpoint-coverage strategies and identify the most preferable strategy for our requirements. Through extensive experiments and case studies, we show that even though news aggregators may cover all news stories using articles from diverse media outlets, this can still lead to biased stances toward the entities mentioned in the articles. We show how this issue can be mitigated while still covering all stories and keeping the number of recommended articles small.

6 September 2024

Software Supply Chain Security at Runtime (30% Seminar)

Aman Sharma (KTH) | 11:00 in Room 4523 and https://kth-se.zoom.us/j/64844199459

Due to the rise in Free and Open Source Software, software is built on top of other software and so on. This forms a nested dependency graph of software which is a part of Software Supply Chain. Malicious actors can attack this software supply chain in multiple ways and one of them is by exploiting a vulnerability in one of the dependencies of a particular software. These attacks are referred to as software supply chain attacks. Such an attack could take place while the application is running by triggering malicious code execution, leading to the application's compromise.

In this talk, we discuss a novel approach [1] to mitigate such attacks by proposing a tool that attaches to the application at runtime and prevents the execution of malicious code. Then we demonstrate how such an approach can mitigate one of the most infamous attacks, Log4Shell [2], that triggers the download and execution of unknown code during runtime. Next, we acknowledge other related work in the field and finally, we discuss the new ideas that have emerged from this research.

[1] https://arxiv.org/abs/2407.00246 [2] https://en.wikipedia.org/wiki/Log4Shell

30 August 2024

Automated formal verification of safety-critical automotive software (50% seminar)

Jesper Amilon (KTH) | 13:15 in Room 4523 and https://kth-se.zoom.us/j/63471645347

Within embedded systems software, correctness relies mainly on testing techniques. While testing may serve well as a bug-catcher, it cannot be used to formally prove their absence. For this, an exhaustive method, i.e., formal verification, is required. However, the cost and level of skill needed for formal verification means that it is often overlooked by the industry. Furthermore, several common software aspects, a prominent one being floating point arithmetic, are not well-supported by state-of-the-art tools. In this seminar, we present our work on formal verification of embedded-systems software at the Swedish heavy-vehicle manufacturer Scania, where we focus on automating the verification process to allow for easier integration of formal verification into the development chain. To this end, we present work on automatic verification of properties of arrays in C programs and automated verification of properties of C arrays. We will also discuss an approach for combining deductive verification and model checking. Lastly, we describe ongoing research for verification of C programs with floating point arithmetic.

29 August 2024

Detecting Vulnerabilities in Blockchain Layer 2 Software Clients

Karolina Gorna | 11:00 in Room 1537

Public blockchain networks have faced security challenges at both the protocol and application levels for over a decade. To address scalability issues and high transaction fees on primary networks (Layer 1), secondary layers (Layer 2) like Rollups have been developed. They claim to offer offloading of transaction execution while aiming to maintain Layer 1's security standards. This research aims to confirm or refute this hypothesis by using a concolic execution methodology on Layer 2 software client binaries to study their effectiveness in preserving security equivalence with Layer 1.

18 June 2024

Defeating Business-Logic Smart Contract Exploits (80% Seminar)

Mojtaba Eshghie (KTH) | 10:30 in Room 1537 and https://kth-se.zoom.us/j/7998785238

Smart contracts are self-executing programs used to implement business transactions as code. Because smart contracts handle significant monetary assets, they are prime targets for exploitation—successful attacks on smart contract applications in 2024 amount to more than 100 million USD per month. Among the various types of vulnerabilities involved in these attacks, business logic vulnerabilities pose a particularly serious threat.

Despite their severity, previous research has largely overlooked business logic vulnerabilities. These flaws are not easily recognizable as they deviate from the expected program behavior and do not follow traditional patterns, complicating their detection using analysis tools, especially static analysis. Identifying business logic flaws requires understanding the contract’s intended behavior, often through a formal specification. We proposed DCR graphs, a well-known declarative modeling notation from business process modeling research, to capture the business logic of smart contracts.

To defend against business logic vulnerabilities, we have developed a monitoring tool that uses formal DCR specifications of the contract and monitors the interactions (transactions) with the contract.

The ground truth used to train and test smart contract analysis tools, such as our monitoring tool, is often based on the verdict of other tools rather than an actual verification result, as it does not scale. Furthermore, data sets lack concrete exploit examples. To address these shortcomings for our tool evaluation and to provide a methodology for other analysis tool developers, we generate and use concrete vulnerability-exploit pairs as a verdict. We demonstrate our oracle-guided methodology for diversifying and synthesizing business logic exploits and their efficiency with formal DCR specification of the contract built into the prompts.

17 June 2024

Verifying and Securing Programmable Networks (80% Seminar)

Anoud Alshnakat (KTH) | 13:15 in Room 1537 and https://kth-se.zoom.us/my/anoud

The rise of Software-Defined Networking has introduced domain-specific languages such as P4, allowing engineers to program the data plane devices (switches). While this flexibility ensures faster network development, it also naturally introduces the risk of errors in P4 programs due to programmability. Additionally, P4 introduces security challenges, such as information leakage through complex data dependencies in network packet processing which might expose sensitive data.

In this seminar, I will present HOL4P4, which is a formalization of P4 semantics using the HOL4 interactive theorem prover. This semantics provides a formal foundation for the verification of P4 software. Furthermore, I will introduce a novel type system designed to ensure program progress and preservation of properties. Additionally, I will illustrate the semantics and type system using examples.

Furthermore, I will introduce P4Sec, a novel approach to securing P4 programs. P4Sec is based on a combination of security type systems and interval analysis to track information flow and enforce data-dependent security policies.

14 June 2024

Practical Applications of Formal Methods to Next-generation Automotive Systems

Toshiaki Aoki (Japan Advanced Institute of Science and Technology) | 10:00 in Teknikringen 33, Room 3412 and https://kth-se.zoom.us/j/9776973454

We are working on practical applications of formal methods to automotive systems. We successfully applied the formal methods to automotive operating systems including OSEK/VDX, Classic AUTOSAR, and Adaptive AUTOSAR OSs so far. We would like to extend our target to more modern automotive system platforms, in particular, those for automated driving systems. The modern automotive system consists of AI for perception and planning, control, and basic software for high-performance computing. Recently, we got a JST/CREST project titled 'Formal Methods and Verification Tools for Next-generation Automotive System Platforms' which focuses on such modern automotive systems. This project aims at proposing formal methods and verification tools to ensure the safety and reliability of next-generation automotive system platforms. These formal methods and verification tools cover the perception to control functions, and we stick in their practical application to real systems. In this talk, I would like to introduce the overview of the JST/CREST project after showing our research results obtained so far.

30 May 2024

Algorithmic approaches to online social network challenges (80% seminar)

Sijing Tu (KTH) | 10:30 in Room 1537 and https://kth-se.zoom.us/j/3467429329

In contemporary society, online social networks play an essential role in people's lives, serving as platforms to maintain connections, access information, and express views on political and societal issues. In this background, online social networks present numerous data science challenges. These challenges include traditional graph mining tasks, such as detecting communities in the network, and new tasks like increasing the diversity of information exposure.

In this seminar, I will discuss some data science challenges in our research, focusing on our theoretical approaches to formulating and solving these problems. The problems discussed are NP-hard with some objective functions being \#P-hard to evaluate. The presentation involves five research papers that constitute my dissertation. In all of these papers, we compare our approaches against multiple baseline algorithms and heuristics and demonstrate the effectiveness of our methods. Below are high-level descriptions of these papers:

(1) Assuming the presence of two opposing political campaigns using social marketing, we aim to select disjoint seed sets for both sides to maximize the number of individuals receiving both sides' information. We model this as a combinatorial optimization problem under a $p$-system constraint and derive the approximation ratio for our approach. For efficient computation, we generalize the reverse-reachable sampling technique.

(2) Assuming that individuals' opinions vary within the interval [-1,1], our objective is to nudge some opinions within a predefined budget to maximize the network diversity. We apply the semidefinite relaxation approach to solve our problem and show the effectiveness of our approaches through experimental evaluation.

(3) To understand how viral marketing shapes individuals' opinions, we introduce the spread-acknowledge model: individuals' private opinions update when exposed to viral information, and their public opinions keep adjusting due to social pressure. We propose an efficient algorithm to estimate the equilibrium opinions and polarization of the network.

(4) To assess the impact of an adversary, we study how much polarization a weak adversary, which only knows the network structure and can nudge k individuals' opinions, can introduce to the network. The problem is a cardinality-constrained max-cut problem on a graph with positive and negative edge weights. We apply the semidefinite relaxation approach to obtain a constant-factor approximation ratio under certain assumptions.

(5) We study variations of the densest k-subgraph and cardinality-constrained max-cut problem. Assuming there is an initial subgraph/cut on a graph with n nodes, our goal is to change k nodes' positions corresponding to the subgraph/partition, to maximize the density/cut. We apply semidefinite relaxation approaches and obtain constant-factor approximation ratios assuming k = Omega(n). For the max-cut variant, we generalize the sum-of-squares relaxation applied in max-bisection and show that the optimality of the approximation ratio still holds in some regimes of k.

The whole seminar will last 90 minutes, with a 45-minute presentation. At the end of the presentation, the quality of the research work will be evaluated. Moreover, we will discuss how the current research can be improved and how to position it in my thesis.

29 May 2024

Unattainability of Common Knowledge in Asymmetric Multi-Agent Games with Imperfect Information

Fabian Farestam | 15:00 in Room 1537 and Zoom

In this talk, we explore the challenges of attaining common knowledge in asymmetric multi-agent games with imperfect information. As a running example, we use a conceptual game involving an active sensor-less agent and a passive observer agent. Through this example, we demonstrate that common knowledge is inherently unattainable in such games. Our proof, laid out through a number of theoretical constructions, highlights the complexities and strategic depth of decision-making under uncertainty. With our findings we hope to enhance our understanding of strategic planning under imperfect information.

28 May 2024

Analysis of Complex Quantitative Properties of Real-World Software Product Lines with Large Solution Spaces.

Daniel-Jesus Munoz (Universidad de Málaga) | 14:15 in Room 1537 and https://kth-se.zoom.us/j/66909147040

Product lines are highly configurable systems with numerous common and reusable components representing a family of products. Modelling real-world Software Product Lines (SPLs) in (pseudo) standards supported by automated reasoners is subject to many limitations when dealing with numerical components (i.e., numerical features) and quantitative properties (i.e., quality attributes). Additionally, interesting quantitative properties can be too complex to approximate in the feature space (e.g., energy requirements), and solution spaces can be too large to analyse. In this seminar, I will present the complexities of real-world SPL analysis, the difficulties of complex quantitative properties, the limitations of colossal solution spaces, and propose a range of possible solutions - some of them as exotic as Category Theory.

28 May 2024

Discovering and exploiting structures in graph mining and combinatorial multi-arm bandits (80% Seminar)

Ruo-Chun Tzeng (KTH) | 13:00 in Room 4523 (Stefan Arnborg) and https://kth-se.zoom.us/j/67689867244

This talk is divided into two parts, each consisting of two papers. The first part focuses on graph mining problems that can be described as Rayleigh quotient optimization. The second part focuses on improving the computational efficiency of combinatorial semi-bandits. For both parts, we develop algorithms with provable guarantees and compare them with the baselines or the fundamental limits. Below is the summary of the four papers:(i) Conflicting group detection aims to find k disjoint node subsets such that intra-group edges are mostly positive and inter-group edges are mostly negative. We extend the approach of a prior work on 2-conflicting group detection to k>=2. Our approach is empirically very competitive compared to the baselines. The main technique is randomized rounding and an eigenvector-based algorithm.(ii) For an eigensolver using O(dn) space and q passes, Randomized SVD is the most preferable method. We focus on the multiplicative ratio R(u), which is defined as the ratio of Rayleigh quotient of u and the largest eigenvalue. However, it has no theoretical guarantee for R in the o(ln n)-pass regime. This raises the natural question: is \Omega(ln n) passes necessary for Randomized SVD to provide a non-trivial guarantee? We answer this question by sharpening the theoretical guarantee of Randomized SVD for R in the o(ln n)-pass regime for PSD matrices, but also extend our results to provide non-trivial guarantees for some indefinite matrices. The main technique is the random projection lemma.(iii) Best-arm identification with fixed confidence for combinatorial semi-bandit feedback is a game between a learner and a stochastic environment. At each round, the learner selects an action to pull and observes a noisy reward for each arm in the selected action. The goal is to identify the best action with a probability of at least 1-\delta while using as few samples as possible. For this problem, several statistically optimal approaches exist, but none of them are computationally efficient. We close the computational-statistical gap by proposing the first optimal algorithm that runs in polynomial time. The main techniques are a two-player zero-sum game, stochastic smoothing, and the Frank-Wolfe algorithm.(iv) Matroid semi-bandit in regret minimization is given an action set which is the bases of a given matroid, and the goal is to identify the best base (action) while minimizing the cumulative regret. There exist competitive algorithms CUCB and instance-specifically optimal algorithms KL-OSM, but both require time complexity at least linear to the number K of arms. We develop an algorithm whose per-round time complexity is sublinear in K based on CUCB. The main techniques are minimum-hitting set and rounding.

20 May 2024

Redefining Blockchain Security

Liyi Zhou (Imperial College London) | 10:00 in Zoom (https://kth-se.zoom.us/j/67833778211)

Did you know that attacks on blockchain, particularly in the realm of Decentralized Finance (DeFi), have resulted in a staggering loss of over USD 5 billion? In this presentation, I will unveil my vision for transforming the blockchain landscape, emphasizing a suite of intrusion prevention & detection tools that I believe could significantly enhance the safety and robustness of DeFi systems. I will also discuss BlockLLM, a real-time intrusion detection system trained on 68 million transactions. Among 124 attacks, BlockLLM successfully ranked 49 as the top three most aberrant transactions in their interactions with the victim application, potentially preventing USD 276 million in losses.

16 May 2024

Perspectives on Built-in Progression in Engineering and Computer Science Education - Computer Science Engineering Students' Path Through the Curriculum (50% Seminar)

Camilla Björn (KTH) | 10:15 in Room 1537 and https://kth-se.zoom.us/j/68401572145

Progression is a term which, in later years, has become more frequently used in higher education. Despite this, it is not always clear what it refers to and how the progression will take place. In a study from last year, we found three main disambiguations to the term used in the literature. Progression is sometimes viewed in terms of completion or retention focusing on increasing the number of students graduating from (progressing through) the programme. It can also be used in the context of the necessary progression (steps) to learn a topic, often referred to as learning progressions. Finally, it can be used when referring to the planned progression built into the curriculum design. This is the meaning of the term I am focusing on in my work and I call this built-in progression to distinguish it from other interpretations.

Built-in progression is an important concept to create coherence in educational programmes and I would argue it is fundamental to develop a programme designed to help the students reach the programme objectives. Since the Bolonga Process, the pressure of designing programmes with progression in mind has increased. However, it is a non-trivial task and considering its importance, it has received surprisingly little attention. In curriculum design, it is also important to acknowledge the gap between the planned curriculum and the one the students experience in real life since this can help us create a curriculum that translates better into reality. In my work, I aim to increase the understanding of built-in progression and create a conceptual framework for helping design programmes with realistic and thoughtful progressions. In addition, I hope to be able to present my research in a way that helps the recipient feel empathy and understand some of the students’ perspectives. Understanding how the curriculum can impact the students, as well as having empathy for them, is important both when designing and teaching the curriculum.

In this seminar, I will start with a presentation on my research and thesis plan followed by a discussion with Professor Arnold Pears from the Department of Learning in Engineering Sciences.

7 May 2024

A Minimal Separation Kernel with Time Protection (50% Seminar)

Henrik Karlsson (KTH) | 15:00 in Room 4523 and https://kth-se.zoom.us/j/65862564111

Traditionally, safety-critical embedded systems have been implemented as distributed computing systems, ensuring that faults in one component can not easily propagate to other components. However, due to the ever-increasing need for computational power and the need to reduce a system's physical size, weight, and power consumption (SWaP), there has been a push toward integrating these distributed components into a single unit. For this reason, the concept of separation kernel is popular within safety-critical systems. A separation kernel is an OS kernel that robustly partitions a computer system's resources, allowing one to safely and securely integrate multiple system components onto a single chip, and thus reduce a system's overall SWaP. However, due to an increasing connectivity of critical systems, and subsequent cyber threats, there is also a need for improving the security of the separation kernel.

In this seminar, we explain how timing side-channels threaten the security of separation kernels, and how we can mitigate these threats using time protection. We discuss what is required of the kernel's design and implementation for effective time protection. We then present a minimal separation kernel for resource-contained embedded devices that uses the temporal fence instruction (designed for application-grade systems) to implement time protection. We demonstrate that this kernel can effectively mitigate timing side-channels, improve the time predictability of a system, and have an acceptable impact on performance. Finally, we present the design of a more advanced multicore separation kernel with dynamically configurable partitioning and time protection.

19 April 2024

An integrated approach to analyzing the impact of cyber-attacks on the safety of networked control systems (80% Seminar)

Ehsan Poorhadi (KTH) | 08:30 in Room 4423

Over recent years, the number and scale of cyberattacks on safety-critical systems have rapidly grown. Thus, it raised a serious concern over the safety of critical infrastructures such as railway signaling systems. Formal modeling and verification are often used to verify the safety of railway signaling systems. Currently, industry practitioners are willing to utilize their expertise and extend it to reason about safety in the presence of cyber threats. This motivated our research on integrating safety and security modeling to analyze how cyberattacks could jeopardize system safety. In this talk, we introduce an integrated approach to combining modeling in SysML and Event-B to support a rigorous integrated analysis of the impact of cyberattacks on system safety. Our approach allows the designers to model the system in SysML and then automatically translate it into Event-B, which formally supports the process of identifying safety requirements that are violated by cyberattacks. The safety requirements are formalized as model invariants, and the violated requirements are identified via the failed invariant preservation proof obligations. Such an analysis serves as a basis for the consecutive security analysis aiming to define security control mechanisms and prioritize their implementation. Finally, We have applied this approach to several case studies, including the Hybrid Level 3 signaling system—the most mature implementation of the ERTMS standard for automatic train protection.

20 March 2024

RepairLLaMA: Efficient Representations and Fine-Tuned Adapters for Program Repair (30% Seminar)

André Silva (KTH) | 11:00 in Room 4523 and https://kth-se.zoom.us/j/7606930890

Automated Program Repair (APR) has evolved significantly with the advent of Large Language Models (LLMs). Fine-tuning LLMs for program repair is a recent avenue of research, with many dimensions which have not been explored. Existing work mostly fine-tunes LLMs with naive code representations and is fundamentally limited in its ability to fine-tune larger LLMs. To address this problem, we propose RepairLLaMA, a novel program repair approach that combines 1) code representations for APR and 2) the state-of-the-art parameter-efficient LLM fine-tuning technique called LoRA. This results in RepairLLaMA producing a highly effective `program repair adapter' for fixing bugs with language models. Our experiments demonstrate the validity of both concepts. First, fine-tuning adapters with program repair specific code representations enables the model to use meaningful repair signals. Second, parameter-efficient fine-tuning helps fine-tuning to converge and contributes to the effectiveness of the repair adapter to fix data-points outside the fine-tuning data distribution. Overall, RepairLLaMA correctly fixes 125 Defects4J v2 and 82 HumanEval-Java bugs, outperforming all baselines.

8 March 2024

Test generation from production data (80% Seminar)

Deepika Tiwari (KTH) | 10:00 in Fantum and https://kth-se.zoom.us/j/5279673473

For software systems to be reliable, they need to be tested. The responsibility of testing lies with the developers who create and maintain the system, based on their understanding of it. However, the system may deviate from these assumed behaviors when it executes in production scenarios, as a result of the interactions made by end users. We hypothesize that insights from the production behavior of an application can be used to augment its test suite, and complement developer-written tests. In my 80% seminar, I will present how observations made from executing systems can be leveraged to automatically generate tests. We will first look at Pankti, an approach that captures production states and uses them to generate regression tests that contain explicit oracles to verify production behaviors. The second contribution we will discuss is RICK, an approach for automatically generating tests that replace external objects with mocks, stub the behavior of these mocks, and verify distinct aspects of production behaviors through mock-based oracles.

29 February 2024

Of Attackers and Policies: A Language-based Approach to Software Security (80% Seminar)

Amir Ahmadian (KTH) | 11:00 in Room 1440 and https://kth-se.zoom.us/j/5089122904

Software security is imperative in today's world as our reliance on technology continues to grow. In this seminar we are going to look at software security through the lens of language-based security and information flow control. We focus on defining and enforcing realistic security policies, in the presence of various real-world attacker models and investigate their effect on the security of software systems. For the main part of this presentation, I will focus on my research on disjunctive security policies. In this section, I provide an overview of the lattice structure proposed to define disjunctive security policies, discuss our extensions to it to make it suitable for database-backed programs, and propose a provably-sound static enforcement mechanism to verify a database-backed program is inline with a disjunctive policy. Next, I will provide an overview of my ongoing research on the security of P4 programs. This will include a brief introduction to the P4 programming language and its applications, a discussion on the challenges of enforcing information flow control policies on P4 programs, as well as an overview of our proposed enforcement mechanism which relies on security type systems and abstract interpretation.

23 February 2024

Prototype Pollution in JavaScript as a Vector for Code Reuse Attacks (80% Seminar)

Mikhail Shcherbakov (KTH) | 13:00 in Room 1440

Researchers and developers have been aware of code reuse attacks for decades, utilizing techniques such as Return-Oriented Programming (ROP) and Jump-Oriented Programming (JOP). These methods exploit "gadgets" -- small sequences of code already present within an application -- to achieve Arbitrary Code Execution (ACE) by leveraging memory corruption vulnerabilities in languages like C and C++. Memory-managed platforms, including the JVM, .NET, and JavaScript, offer protections against such memory corruption bugs, raising questions about what primitives within these managed runtimes could lead to ACE.

In this seminar, we explore the code-reuse primitives in the JavaScript language that attackers can exploit to achieve Remote Code Execution (RCE). Our primary concern is prototype pollution, a class of vulnerabilities that enables the exploitation of code reuse attacks. We adopt a comprehensive approach to this issue, starting from the identification of prototype pollution vulnerabilities to the detection of gadgets, with the ambitious goal of discovering end-to-end exploits. We discuss various aspects of these attacks, including code patterns of prototype pollution, methodologies for their static detection, associated gadgets in Node.js and Deno server-side JavaScript runtimes, and both static and dynamic approaches, including developed tools, for gadget detection. Furthermore, we explore end-to-end exploits that demonstrate how prototype pollution can lead to RCE in widely-used open-source applications. This seminar aims to provide an understanding of prototype pollution vulnerabilities, their exploitation mechanisms, and best practices for mitigating such threats.

1 February 2024

Efficient Exploration and Analysis of Program Repair Search Spaces (80% Seminar)

Khashayar Etemadi (KTH) | 10:00 in Room 4523

Automated program repair (APR) is all about searching for a patch that fixes a given bug. This is performed in two steps: first, exploring the repair search space to obtain a large and diverse set of repair patches. Second, analyzing the discovered patches to identify the ones that are likely to actually fix the bug. In this presentation, we discuss both of these steps: efficient exploration of repair search spaces and effective analysis of repair patches. We show how template based and large language model based APR tools can be used to find a large and diverse set of plausible patches. We also present how the differences and similarities between patches can be identified.

6 December 2023

Locality in Data Structures

Ioana Bercea (KTH) | 10:30 in Room 1537 and Zoom

We will talk about ways in which locality is employed in the design of data structures. We will focus on dictionaries, which maintain sets under insertions and deletions and support membership queries of the form “is an element x in the set?”. We will discuss how state-of-the-art techniques for dictionaries exploit locality in the word RAM model to obtain space and time gains and also how locality is exploited in the choice of hash functions employed for the design.

22 September 2023

Towards Provable Transport Layer Privacy

Boel Nelson (Aarhus) | 11:00 in Room 1440 and https://kth-se.zoom.us/j/62761627532

Confidentiality of metadata is a challenging privacy problem. Most systems today ensure confidentiality of data using encryption, but they do not address confidentiality of metadata. For example, encrypted data still leaks metadata such as when a message is sent, how long it is, and to whom it is addressed. While metadata privacy is a known problem, existing solutions are far from perfect: they are either resource exhaustive which affects performance, or they guarantee only weak notions of privacy.

In this talk I present the problem of transport layer privacy, and outline a novel formalization of the problem using information flow control techniques that introduces a new trade-off between performance and privacy for anonymous communication. To exemplify I present a provably private protocol for instant messaging which we call Deniable Instant Messaging, DenIM for short.

Joint work in progress with E. Pagnin and A. Askarov.

20 September 2023

Modeling Structure and Behavior using Graph Machine Learning (80% Seminar)

Aravind Ashok Nair (KTH) | 14:00 in Room 1440 and https://kth-se.zoom.us/my/aanair.kth

Graph structures are ubiquitous and help us to understand many complex real-world applications. Graph Machine Learning (GML) is a branch of computer science that aims to understand a graph's hidden relations and structure and use them to provide powerful graph models for prediction, clustering, and classification tasks. Graphs are also capable of explaining complex behavior by analyzing graph models at the macro level or by studying the interaction of nodes and edges at the micro level. The theme of this seminar is to learn and understand the global and local behavioral properties of complex systems by modeling their graph structure. To achieve this, we investigate the graph structures of complex systems in two real-world domains - (i) Software Testing in Software Engineering and (ii) Cell Graph Modeling in Digital Pathology.

For tasks in the software testing domain, we use control flow graphs (CFGs) for understanding global behavioral properties such as (i) reverse engineering software requirements (metamorphic relations) and (ii) estimating code similarity. In the area of digital pathology, we generate cell graphs from digitized medical tissue images, where each node represents a cell nucleus and edges represent the spatial relationship between two cell nuclei. Cell-graphs can be used to understand tissue structure and function in histology.

In this seminar, we show that both global and local behavior properties of complex systems can be modeled from their graph structure using GML techniques. However, to achieve higher performance, it is critical to consider relevant: attributes, embeddings, topologies, and architectures.

6 September 2023

80% Seminar About Automatic WebAssembly Diversficiation

Javier Cabrera (KTH) | 10:00 in Fantum Room and https://kth-se.zoom.us/j/64627724629

Since its 2015 inception, WebAssembly (Wasm) has experienced swift integration, with languages such as Rust and C/C++ now compiled to Wasm and compatible with all major browsers. Crucially, Wasm's adoption extends beyond browsers, with platforms like Fastly and Cloudflare incorporating it into their core services. However, vulnerabilities have been detected in Wasm's implementations, both in browsers and standalone runtimes, mostly due to software monoculture.

Our work explores software diversification as a proactive solution to address these vulnerabilities. We generate hundreds of variants that share functionality, while exhibiting diverse execution behaviors. This presentation outlines four contributions in this area.

First, we discuss our CROW, a superdiversification engine for Wasm implemented within the LLVM compilation pipeline. Second, we introduce MEWE, which embeds multiple variants into a single program, supporting runtime randomization. MEWE has been evaluated on worldwide content delivery network. Third, considering the evolving landscape of Wasm tools and the introduction of new compilers outputting Wasm binaries via non-LLVM methodologies, we unveil a Wasm-to-Wasm diversification solution, wasm-mutate. We demonstrate how it can be used to prevent Spectre attacks on WebAssembly programs. Finally, we demonstrate how commercial solutions' are inaccurate in detecting Wasm cryptomalwares variants and how diversification can be used to rectify this.

21 August 2023

Automated formal verification of safety-critical automotive software (30% seminar)

Jesper Amilon (KTH) | 13:15 in Room 4523 and https://kth-se.zoom.us/j/62505868883

Within embedded systems software, correctness relies mainly on testing techniques. While testing may serve well as a bug-catcher, it cannot be used to formally prove their absence. For this, an exhaustive method, i.e., formal verification, is required. However, the cost and level of skill needed for formal verification means that it is often overlooked by the industry. Furthermore, several common software aspects, a prominent one being floating point arithmetic, are not well-supported by state-of-the-art tools. In this seminar, we present our work on formal verification of embedded-systems software at the Swedish heavy-vehicle manufacturer Scania, where we focus on automating the verification process to allow for easier integration of formal verification into the development chain. To this end, we present work on automatic verification of properties of arrays in C programs and automated verification of properties of C arrays. We will also discuss an approach for combining deductive verification and model checking. Lastly, we outline future research plans, including in particular verification of C programs with floating point arithmetic.

26 June 2023

Prevalence of continuous integration failures in industrial systems and the study of compiler errors with hardware-in-the-loop testing

Han Fu | 13:00 in Room 4523 and https://kth-se.zoom.us/j/6236541995

Automated continuous integration (CI) process faults can severely impact industrial code development. We investigated CI failures in four large industrial projects to enhance efficiency, reliability, and maintainability. Analyzing 11,731 builds over six months, we identified 1,414 failing builds and found that compilation errors accounted for 47% of failures, followed by testing at 36%. We also discovered 14 distinct types of testing failures and highlighted configuration problems as significant issues. To delve deeper, we utilized the "Shadow Job" CI diagnostics solution, examining 40,000 commits and categorizing compiler errors into 14 types. The five most common errors comprised 89% of all compiler errors. Our study also explored the localizability and repairability of errors, revealing their independence from resolution time, size, and distance. Furthermore, we provided insights into the human effort required to fix common compilation errors, suggesting a prioritization strategy based on error frequency and repair effort.

Our future focus is reducing failing tests and compilation errors caused by configuration and dependency problems. We will first aim to implement automatic fault localization and program repair techniques to address compilation errors resulting from configuration and dependency issues. Enhancing our "Shadow Job" implementation, we plan to develop automated solutions for error resolution, leveraging advanced techniques like neural networks and large language models. Furthermore, we will expand our study to include additional projects and companies to validate our findings and ensure broader applicability.

20 June 2023

Securing Software Systems: Exploring Real-World Policies and Attacker Models (50% Seminar)

Amir Ahmadian (KTH) | 13:00 in Room 1537 and https://kth-se.zoom.us/j/5089122904

Our world today heavily relies of software systems. They are used in almost every aspect of our society and thus securing them is a crucial and important task. Numerous factors influence the security of a program in a particular setting, including the interplay between the program's building components, the unique threats present in that particular setting, and relevant attacker models and security policies.

In this seminar I will be presenting my research on the security of software systems under real-world policies and attacker models.

In the first part of the presentation, I will provide an overview of my research on dynamic policies and the effects of changing the security policy during the execution of a program. I will discuss different attacker models and investigate their effects on our definition of security. Next, I will provide an overview of my research on the security of programs executed on trusted execution environments (TEE). The security guarantees of a TEE, relevant attacker capabilities, and their effect on the security of the programs are discussed in this part.

In the second part of the presentation, I will be discussing my research on disjunctive policies in database-backed programs. I will explore models to understand the semantics of disjunctive security policies in database-backed programs, and propose a provably-sound static enforcement mechanism to check these disjunctive policies.

Prof. David Sands from Chalmers (https://www.cse.chalmers.se/~dave) will be joining the seminar as the discussion leader.

16 May 2023

Establishing Software Root of Trust Unconditionally

Virgil Gligor (CMU) | 16:00 in Room 4523 and https://kth-se.zoom.us/j/6529956822

Suppose that a trustworthy program must be booted on a commodity system that may contain persistent malware in the firmware of its device controllers. Establishing software root of trust (RoT) assures that the system’s firmware has all and only the content chosen by a trusted external verifier, or the verifier discovers unaccounted content, with high probability. Hence, RoT establishment assures that trustworthy program boot can take place in a malware-free firmware state. This is challenging because device controller malware can survive in system states across repeated secure- and trusted-boot operations and act on behalf of a powerful remote adversary. Furthermore, anti-malware tools do not have malware-unmediated access to device controllers’ firmware and thus must interact with devices controllers as with any “black box;” e.g., attempts to re-flush firmware a black-box firmware offers no guarantee of malware erasure nor prevents malware connections over the internet. I this presentation, I will illustrate both the theoretical and practical challenges of RoT establishment unconditionally; i.e., without secrets, privileged modules (e.g., TPMs, RoMs, HSMs), or adversary computation bounds. The approach proposed offers the only unconditional solution known to any security or cryptography to date.

16 May 2023

Capturing Smart Contract Design with DCR Graphs (50% Seminar)

Mojtaba Eshghie (KTH) | 13:00 in Room 1440 and https://kth-se.zoom.us/j/7998785238

Smart contracts manage blockchain assets. While smart contracts embody business processes, their platforms could be more process-aware. Mainstream smart contract programming languages such as Solidity do not have explicit notions of roles, action dependencies, and time. Instead, these concepts are implemented in program code. This makes it very hard to design and analyze smart contracts. We argue that DCR graphs are a suitable formalization tool for smart contracts because they explicitly and visually capture these features. We utilize this expressiveness to show that many common high-level design patterns in smart-contract applications can be naturally modeled this way. Applying these patterns shows that DCR graphs facilitate the development and analysis of correct and reliable smart contracts by providing a clear and easy-to-understand specification. Also, we'll talk about the uses of these designs for monitoring smart contracts.

15 May 2023

Retrofitting Security in Real Systems

Shravan Narayan (UT Austin) | 16:30 in Zoom

Most systems we use today are wildly insecure. This past year attackers have exploited bugs in systems like browsers and operating systems to steal user data, shut down hospitals, and stalk activists. Attackers are winning because bugs in a seemingly unimportant component like the browser's spell-checker can be used to compromise the entire browser.

In this talk, I will describe my work retrofitting existing systems towards secure design. First, I will describe RLBox, a type-driven sandboxing framework that ships in the Firefox browser. RLBox helps Firefox engineers retrofit the browser to sandbox third-party libraries and safeguard its users from attacks that exploit vulnerabilities from these libraries. Then I will describe my work on VeriWasm analysis tool and Swivel compiler which together secure edge cloud platforms against attacks that exploit compiler bugs and micro-architectural vulnerabilities like Spectre. Along the way, I will describe the challenges and research questions that only arise when trying to deploy principles techniques (e.g., from programming languages and system design) to secure huge systems like Firefox.

19 April 2023

Perspectives on progression in professional education: Progression in computer science education (30% Seminar)

Camilla Björn (KTH) | 13:15 in Room 4523 and https://kth-se.zoom.us/j/64269201723

Progression is an established and important concept in curriculum development for professional education programmes. But despite its frequent usage, it is surprisingly unclear what it means in this context and how it can be utilised to optimise the fulfilment of the programme learning outcomes. There are several disambiguations of the concept, such as the idea of viewing progression as programme retention or course completion, or the constructive alignment between courses. However, there are other ways of viewing progression in professional education and there are also different aspects of progression to focus on when considering the constructive alignment between courses, such as increased scope (the depth and breadth of knowledge), increased ability to utilise the knowledge and skills in practice and increased proficiency in different skills.

In this 30% seminar, Camilla will present her research plan for her PhD project focusing on conceptualising progression in professional education with an emphasis on computer science education. Her project consists of four subprojects which will be explained in detail during the seminar including the work that is already completed.

18 April 2023

Highly Available Blockchain Nodes With N-Version Design

Javier Ron (KTH) | 16:30 in Room 1440 and https://kth-se.zoom.us/j/66050844902

As all software, blockchain nodes are exposed to faults in their underlying execution stack. Unstable execution environments can disrupt the availability of blockchain nodes interfaces, resulting in downtime for users. This paper introduces the concept of N-version Blockchain nodes. This new type of node relies on simultaneous execution of different implementations of the same blockchain protocol, in the line of Avizienis’ N-version programming vision. We design and implement an N-version blockchain node prototype in the context of Ethereum, called N-ETH. We show that N-ETH is able to mitigate the effects of unstable execution environments and significantly enhance availability under environment faults. To simulate unstable execution environments, we perform fault injection at the system-call level.

Our results show that existing Ethereum node implementations behave asymmetrically under identical instability scenarios. N- ETH leverages this asymmetric behavior available in the diverse implementations of Ethereum nodes to provide increased availability, even under our most aggressive fault-injection strategies. We are the first to validate the relevance of N-version design in the domain of blockchain infrastructure. From an industrial perspective, our results are of utmost importance for businesses operating blockchain nodes.

14 April 2023

Meta meta-programming

Jimmy Koppel | 13:30 in Room 1440 and https://kth-se.zoom.us/j/63233589327

Programming languages researchers have developed many advanced tools that promise to greatly ease software engineering. Yet even conceptually simple tools are expensive to implement fully due to the complexity of the target language, and standard techniques tie an implementation to a particular target language. In order to make the development of advanced programming tools economical, these problems demand new techniques for decomposing the development of tools and automating portions of their construction, which I collectively dub “meta-metaprogramming."

In this talk, I will present my work on new techniques and frameworks for making tools easier to build and more general, and tools built on them. First I will present Cubix, a "One Tool, Many Languages" framework that allows a single tool to quickly be written for multiple programming languages. We have used Cubix to build Yogo, a semantic search tool capable of finding equivalent patterns in C, Java, and Python from a single shared query. Second, I will present Mandate, the world's first control-flow graph generator generator, able to generate multiple varieties of CFG-generator given a programming language's formal semantics. Finally, I will present our ongoing work on ECTAs (equality-constrained tree automata), a new type of data structure/constraint-solver able to efficiently search large spaces of programs subject to certain constraints. We have used ECTAs to build two synthesizers for Haskell, Hectare and Spectacular, which solidly beat their pre-existing competitors (Hoogle+ and QuickSpec) with only a fraction the engineering-effort.

31 March 2023

Automated Deductive Verification of Embedded Software (80% seminar)

Christian Lidström (KTH) | 13:15 in Room 1440 and https://kth-se.zoom.us/j/2440183377

In this seminar I will present the outline of my thesis, and describe the published and planned papers to be included. The focus of the thesis is automated verification of embedded software, and in the talk I will also briefly discuss the details of the work performed to this end. Today, embedded systems are increasingly large and complex, and in many cases of a safety-critical nature, as is the case in, for example, the automotive industry. For this reason, traditional methods of ensuring correctness, such as testing, may be insufficient. On the other hand, formal methods, such as deductive verification, are largely considered prohibitively expensive in the industry, because of the amount of expertise and manual work required. To deal with this, we have firstly developed a formal framework for compositionally designing embedded systems, with a focus on procedural software components, where high-level system properties can be decomposed into lower-level software properties and then formally verified. Secondly, we have developed techniques to automate the low-level software verification, focusing on automatic inference of function contracts to be used in deductive verification, as well as combining deductive verification with other techniques.

31 March 2023

Confidential computation for multi-user IoT systems (50% Seminar)

Marcus Schmidt Birgersson (KTH) | 11:00 in Room 1440 and https://kth-se.zoom.us/j/6256056515

Smart devices, so-called IoT devices, produce a lot of data of various kinds. These devices are in general limited both in terms of storage and processing power. This means that this data in most cases are transported to the cloud where it can be both stored and analyzed. To protect data stored in the cloud, while still being able to perform arbitrarily computations, we present a system for secure multiparty computation by the means of trusted execution environments. Principals store encrypted data in a middleware located in the cloud and provide ac- cess to a subset of this data to a trusted execution platform provided by some party that needs an aggregation of this data, the receiver. This access is given by the principals by providing the TEE with the decryption keys. By the means of remote attestation, the principals can verify that the correct computation will be carried out. Our system allows for evaluating any computation over mutually distrusting princi- pals’ data without any complex secret sharing schema. We allow for users to be added and removed at any time and no user needs to be online and active at the time of computation. For our system to work in a secure and confidential way, we assume that the middle- ware that stores the data does not cooperate with the receiver. If they do, this sets extra requirements on the aggregation function to be able to prevent such attacks. Potential solutions for this involve tools from differential privacy or statistics in general. I will end the talk by reasoning about how we can use this to further improve the security guarantees for our system.

30 March 2023

Designing input and output representations of machine learning for software engineering models (80% Seminar)

Zimin Chen (KTH) | 10:30 in Room 4523 and https://kth-se.zoom.us/j/69265621933

Machine learning techniques are increasingly being applied to software engineering tasks such as program generation and repair. However, the design of effective input and output representations is a critical aspect of these models. Since the size of the input and output of any machine learning models are subjective to memory and computation constraints. In this presentation, we will discuss our recent research papers that propose novel approaches for designing input and output representations of machine learning for software engineering models.

20 March 2023

Security Proofs in Lattice-Based Post-Quantum Cryptography (50% seminar)

Joel Gärtner (KTH) | 10:00 in Room 1440 and https://kth-se.zoom.us/j/6293265597

Quantum computers are currently under active development and may soon be capable of solving certain problems that are believed to be infeasible to solve with classical computers. In particular, Shor's algorithm is a famous quantum algorithm that allows a sufficiently powerful quantum computer to break essentially all asymmetric cryptography that is currently being used. Therefore, there is a need for post-quantum secure cryptographic algorithms, that are not only secure against classical adversaries but also against those with access to large quantum computers.

Some of the primary candidates for post-quantum cryptography are based on the assumed hardness of variants of the Learning with Errors (LWE) problem. The LWE problem can be seen as a random instance of a special type of a lattice problem, and LWE-based cryptosystems are secure if this problem is infeasible to solve. The hardness of the LWE problem is asymptotically supported by a reduction that shows that a random LWE instance is hard as long as there exists a hard instance of a standard lattice problem. However, this reduction has in practice not been able to support the security of any concrete cryptosystem. In this talk, I will discuss post-quantum lattice-based cryptography, what reductions may say about the security of typical lattice-based cryptosystems and how I have been able to parametrize a scheme with security actually based on the worst-case hardness of a standard lattice problem.

14 March 2023

Building a Trustworthy Stack: Formal Verification of Low-Level Hardware and Software (80% Seminar)

Ning Dong (KTH) | 13:15 in Room 4523, and https://kth-se.zoom.us/j/5937312883

Computing devices have become an indispensable part of people’s daily life. In order to build a trustworthy stack, both hardware and software are necessary to consider. Accordingly, my research focuses on low-level hardware and software in the computing system by applying formal verification. In this presentation, I will briefly introduce my finished work: formal verification of an I/O device and its driver, and the executable semantics of the MIL (Machine Independent Language) formalization. The first work studied the Serial Peripheral Interface (SPI) and device-to-device information flow. The MIL work is a formal platform for reasoning about the information flow of low-level programs on the microarchitecture level, and checking information leakage caused by microarchitecture features like out-of-order execution. The executable semantics is used to evaluate MIL programs' execution and partially automate the conditional noninterference proof. The central part is the formal approach to verify in-order pipelined processor circuits. We have developed and verified a pipelined processor in the interactive theorem prover HOL4. I will present how we address general pipeline challenges and how to handle them in the correctness proof.

9 March 2023

HOL4P4: A Formal Semantics for P4 Programs (50% Seminar)

Anoud Alshnakat (KTH) | 13:15 in Room 1537 and https://kth-se.zoom.us/my/anoud

The emergence of Software-Defined Networking (SDN) has enabled the programming of network device data planes through specialized languages such as P4. However, this flexibility in programming can result in network correctness issues due to potential errors in P4 programs. Our aim is to provide insights into the formal verification of P4 programs and their potential to enhance the correctness of data planes.

To this end, we have developed HOL4P4, a formalization of P4 semantics using the interactive theorem prover HOL4. This semantics provides a formal foundation for the verification of P4 software. We have also introduced a type system to ensure the progress of P4 programs.

In this seminar, I will cover the main building blocks of the HOL4P4 semantics and provide illustrative examples. Furthermore, I will present the type system and discuss the challenges faced in its development. Finally, I will discuss our planned work and possible future work, including the obstacles that need to be overcome to develop P4 proof-producing verification tools.

22 February 2023

Type-Directed Program Repair and Test Synthesis

Matthías Páll Gissurarson (Chalmers) | 11:00 in Room 1440 and https://kth-se.zoom.us/j/61671598890

Automatically repairing programs is difficult, especially when the program space we have to search is huge. In this talk, I'll describe how a strongly typed language like Haskell allows us to narrow the search space to a more manageable size, a data structure known as an ECTAs that allows us to efficiently represent such entangled program spaces, and how these can be used for automatic program repair and test synthesis based on property-based testing.

17 February 2023

LazyTAP: In Praise of Laziness

Mohammad M. Ahmadpanah (Chalmers University of Technology) | 11:00 in Room 4423 and Zoom

Trigger-Action Platforms (TAPs) empower applications for connecting otherwise unconnected devices and services. The current TAPs like IFTTT require trigger services to push excessive amounts of sensitive data to the TAP regardless of whether the data will be used in the app, violating the principle of data minimization. Furthermore, the rich features of modern TAPs, including IFTTT queries to support multiple trigger services and nondeterminism of apps, have been out of the reach of previous data minimization approaches like minTAP. In this talk, I will present LazyTAP, a new paradigm for fine-grained on-demand data minimization. LazyTAP breaks away from the traditional push-all approach of coarse-grained data over-approximation. Instead, LazyTAP pulls input data on-demand, once it is accessed by the app execution. Thanks to the fine granularity, LazyTAP enables tight minimization that naturally generalizes to support multiple trigger services via queries and is robust with respect to nondeterministic behavior of the apps. We achieve seamlessness for third-party app developers by leveraging laziness to defer computation and proxy objects to load necessary remote data on-demand and behind the scenes. We formally establish the correctness of LazyTAP and its minimization properties with respect to both IFTTT and minTAP. We implement and evaluate LazyTAP on a mixture of real and synthetic app benchmarks showing that on average LazyTAP improves minimization by 95% over IFTTT and by 38% over minTAP, while incurring a tolerable performance overhead.

17 February 2023

Generating Optimized and Secured Binary Code

Romy Tsoupidi | 10:00 in Room 1537 and https://kth-se.zoom.us/j/66518599125

Increased digitalization of services and the popularity of the Internet of Things (IoT) put increasing demand on system cybersecurity. The vulnerabilities that enable malicious actors to perform cyberattacks are often present in the binary code that runs on the target device. Such attacks include code-reuse attacks and side-channel attacks. The former reuse binary code snippets to stitch a payload that performs a malicious attack. The latter use so-called side channels, such as execution time or power consumption, to infer secret information. These attacks depend largely on the low-level code that the machine executes and this code typically depends on compilers. Compilers perform semantic-preserving transformations that aim at improving the performance, reducing the size, or minimizing the energy consumption of the generated code. However, general-purpose compiler transformations often do not preserve security properties, while state-of-the-art approaches to secure compilation do not achieve highly optimized code. We propose a method that allows the compiler to preserve security properties and, at the same time, achieve highly optimized code.

In this talk, I will present my work on securing optimized code and code verification on WebAssembly code. In particular, I will talk about 1) execution-time-aware software diversification against large-scale code-reuse attacks, 2) optimizing code against power side channels, 3) combining efficiency-aware compiler-based security migitations, and 4) verifying constant-time programs in WebAssembly. The first work presents a code diversification approach that operates at the machine-lever intermediate representation using a combinatorial compiler backend to achieve fine-grained diversification at the function level, while controlling the execution-time overhead of the generated code. The second work describes an approach on optimizing (performance) software-masked code, while preserving security properties that hinder power-side channels. The third work presents an approach to software diversification to generate secure and optimized code against code-reuse attacks and side-channel attacks. Finally, the last work is an approach to verify the constant-time property, which mitigates timing side-channel attacks at the WebAssembly level using relational symbolic execution. WebAssembly is a recent low-level language for the Web and is the target of many security-critical and performance-demanding Web applications.

The material for these works, including the relevant publications (one publication that is under submission is not available on the Website) are available at: https://romits800.github.io/publications.html.

16 February 2023

Formal Verification of Binary Code and Programmable Networks

Didrik Lundberg (KTH) | 10:00 in Room 4523 and https://kth-se.zoom.us/j/63683731064

The broad topic of my research is creating the infrastructure for formal verification of low-level system software, device drivers and programmable network elements with provable security guarantees using an interactive theorem prover (ITP). Specifically, this is in the context of network and communications solutions. Part of this challenge is to incorporate precise models of various types of architectures and external or auxiliary components and make them available to be used in an interactive theorem prover together with different pieces of code. This is not merely a practical issue, but rather a question of how to formalize into theorems just how various components are interdependent in an unstructured setting, and how to formulate modular abstractions from which you can easily add and remove external components. In particular, for programmable network elements, the data plane and the control plane together with other external functionality should be made modularly composable.

Concretely, this is realized by implementation in the ITP HOL4 of unstructured program logics, a formalization of the P4 language (with associated tools) and extensions of the HolBA toolbox, most importantly to weak memory models for the multicore setting.

15 February 2023

Optimization beyond a single submodular function: Submodular optimization for ranking, decision trees and diversity

Guangyi Zhang | 13:30 in F3, Lindstedtsvägen 26 and https://kth-se.zoom.us/j/69024442639

Submodular functions characterize mathematically the ubiquitous ``diminishing-returns'’ property. They are widely used to describe core subjects in numerous applications, including economic utility, redundancy in information, spread of influence in social networks, and more. Optimization of submodular functions thus plays a central role in a broad range of applications in data science. However, existing works concentrate largely on optimizing a single submodular function, with a sole focus on succinct subset selection among massive data. Other circumstances have not been fully explored, for example, when there exists interplay between a submodular function and other components.

In this thesis, we study optimization beyond a single (non-decreasing) submodular function in the following three areas.

Ranking: Ranking arises naturally in the presence of multiple submodular functions, for instance, when a list of shared resources is sequentially served to satisfy multiple submodular demands with individual budgets. One concrete example is ranking web pages to satisfy multiple user intents with different ``patience.'' We first study this ranking problem under cardinality constraints, and then we consider extensions of the basic setting, including knapsack constraints, streaming settings, and robustness requirements.

Decision trees: A tree (or a policy) can be seen as an adaptive generalization of ranking. Popular decision trees for classification, including CART and C4.5, are constructed by recursive greedy splits, guided by some impurity function. These trees are accurate, but may not be easy to understand due to a potentially large tree size. A novel characterization of a decision tree is via adaptive intersection among multiple submodular functions, one for each object to be classified. We discover that this characterization enables one to analyze and control the tree complexity for a large family of impurity functions. As a consequence, we are able to produce accurate and interpretable decision trees with bounded complexity.

Diversity: Multi-objective optimization is frequently encountered in reality. One specific important form is a joint optimization of utility and diversity, where utility is often modeled by a submodular function. From the theoretical side, we discuss new techniques for such joint optimization over clustered input; from the practical side, we propose one novel application in learning diverse rule sets, where diversity encourages non-overlapping rules that deliver desirable unambiguous decisions.

For each of the three areas discussed above, we introduce novel formulations for applications in data science, and devise algorithms that are efficient, easy to implement, and have provable approximation guarantees. In addition to theory, we also highlight their practical potential by presenting empirical performance in selected applications. Our work significantly enhances the modeling capabilities of submodular optimization for novel application domains.

More info can be found in Diva: http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-322458

9 February 2023

Breaking the quadratic barrier for matroid intersection

Joakim Blikstad | 15:00 in Room 4523 and https://kth-se.zoom.us/my/blikstad

Joakim Blikstad will present recent developments in matroid intersection algorithms. Matroid intersection is a fundamental combinatorial optimization problem which can model a range of problems, e.g. bipartite matching or packing spanning-trees / arborescences.

We obtain faster matroid intersection algorithms by studying a specific graph reachability problem, for which we show new and simple algorithms. The exact complexity of this reachability problem, however, remains open. Combining our graph reachability algorithms with recent approximation algorithms, we manage to obtain the first subquadratic matroid intersection algorithms, thus breaking the "quadratic barrier".

No background about matroid intersection is needed for the talk.

The talk is mostly based on the STOC'21 paper "Breaking the Quadratic Barrier for Matroid Intersection", (https://arxiv.org/abs/2102.05548. Joint work with Jan van den Brand, Sagnik Mukhopadhyay and Danupon Nanongkai) but also on the ICALP'21 paper "Breaking O(nr) for matroid intersection", ICALP'22 paper "Sublinear-round parallel matroid intersection" and a recent paper "Fast Algorithms via Dynamic-Oracle Matroids" still in submission.

TCS Seminars

2025

Securing confidential multi-party computations by the means of trusted execution environments and formal verification tools (80% seminar)

[80% Seminar] Jonas Spenger -- Programming Models for Failure-Transparent Distributed Stateful Services

[50% Seminar] André Silva - Gradient-Based Program Repair: Fixing Bugs in Continuous Program Spaces

Sijing Tu PhD Defense "Models and Algorithms for Addressing Challenges in Online Social Networks"

Workshop after Mojtaba Eshghie PhD Defense

PhD Defense: Securing Smart Contracts Against Business Logic Flaws

A Roadmap for Server-Side JavaScript Sandboxing

Object Graph Programming

Opinion dynamics towards the climate talks: concatenated Friedkin-Johnsen model and social power game

Towards Practical Defenses Against Traffic Analysis (30% Seminar)

When cryptocurrency meets network security: the layers, the attacks, and the defenses

Efficient discovery of significant patterns with few-shot sampling

Automated Program Repair for Smart Contracts (30% Seminar)

Securing Automation Platforms: From Internet of Things to CI/CD Pipelines (30% Seminar)

Automated Memory Forensics: Analyzing Data Structures from Memory Dumps

Automated Memory Forensics: Analyzing Data Structures from Memory Dumps

Reskilling AI/GenAI: Exploring the 12 new roles

Technical Debt Management and LLM-based Refactoring with CodeScene

Adversary Emulation in the Age of Generative AI

The end of open source? Regulating open source under the cyber resilience act and the new product liability directive

Supply Chain Security for Server-Side JavaScript Applications (30% Seminar)

Formal Verification and Test Design for Distributed Redundant Controllers

Fortifying Decentralized Financial Systems: Towards Secure Smart Contracts

An Empirical Study on Reproducible Packaging in Open-Source Ecosystems

Exploring New Strategies in N-Version Programming: Blockchain Resilience and LLM-Generated Versions (50% Seminar)

Fairness-aware and diversity-enhancing approximation algorithms (50% seminar)

Leveraging Native Network Modality in Radio Networks via Graph Representation Learning (80% Seminar)

Specular: Towards Secure, Trust-minimized Optimistic Blockchain Execution

Benchmarking and Build Analysis for Breaking Dependency Updates (30% Seminar)

Automated Performance Modeling, Benchmarking, and Insights into Plagiarism in Computer Science

Breaking the Bubble: Reducing Polarization and Boosting Information Diversity Online (50% Seminar)

Software Supply Chain Security at Runtime (30% Seminar)

Automated formal verification of safety-critical automotive software (50% seminar)

Detecting Vulnerabilities in Blockchain Layer 2 Software Clients

Defeating Business-Logic Smart Contract Exploits (80% Seminar)

Verifying and Securing Programmable Networks (80% Seminar)

Practical Applications of Formal Methods to Next-generation Automotive Systems

Algorithmic approaches to online social network challenges (80% seminar)

Unattainability of Common Knowledge in Asymmetric Multi-Agent Games with Imperfect Information

Analysis of Complex Quantitative Properties of Real-World Software Product Lines with Large Solution Spaces.

Discovering and exploiting structures in graph mining and combinatorial multi-arm bandits (80% Seminar)

Redefining Blockchain Security

Perspectives on Built-in Progression in Engineering and Computer Science Education - Computer Science Engineering Students' Path Through the Curriculum (50% Seminar)

A Minimal Separation Kernel with Time Protection (50% Seminar)

An integrated approach to analyzing the impact of cyber-attacks on the safety of networked control systems (80% Seminar)

RepairLLaMA: Efficient Representations and Fine-Tuned Adapters for Program Repair (30% Seminar)

Test generation from production data (80% Seminar)

Of Attackers and Policies: A Language-based Approach to Software Security (80% Seminar)

Prototype Pollution in JavaScript as a Vector for Code Reuse Attacks (80% Seminar)

Efficient Exploration and Analysis of Program Repair Search Spaces (80% Seminar)

Locality in Data Structures

Towards Provable Transport Layer Privacy

Modeling Structure and Behavior using Graph Machine Learning (80% Seminar)

80% Seminar About Automatic WebAssembly Diversficiation

Automated formal verification of safety-critical automotive software (30% seminar)

Prevalence of continuous integration failures in industrial systems and the study of compiler errors with hardware-in-the-loop testing

Securing Software Systems: Exploring Real-World Policies and Attacker Models (50% Seminar)

Establishing Software Root of Trust Unconditionally

Capturing Smart Contract Design with DCR Graphs (50% Seminar)

Retrofitting Security in Real Systems

Perspectives on progression in professional education: Progression in computer science education (30% Seminar)

Highly Available Blockchain Nodes With N-Version Design

Meta meta-programming

Automated Deductive Verification of Embedded Software (80% seminar)

Confidential computation for multi-user IoT systems (50% Seminar)

Designing input and output representations of machine learning for software engineering models (80% Seminar)

Security Proofs in Lattice-Based Post-Quantum Cryptography (50% seminar)

Building a Trustworthy Stack: Formal Verification of Low-Level Hardware and Software (80% Seminar)

HOL4P4: A Formal Semantics for P4 Programs (50% Seminar)

Type-Directed Program Repair and Test Synthesis

LazyTAP: In Praise of Laziness

Generating Optimized and Secured Binary Code

Formal Verification of Binary Code and Programmable Networks

Optimization beyond a single submodular function: Submodular optimization for ranking, decision trees and diversity

Breaking the quadratic barrier for matroid intersection