Stanford Software Seminar

The Stanford Software Seminar is usually held on Mondays in various rooms in the Gates building. Talks are open to anyone.

To subscribe to the seminar mailing list, visit its mailman page or send an email to software-research-join@lists.stanford.edu from the email address you wish to subscribe. Likewise, to unsubscribe, send an email to software-research-leave@lists.stanford.edu from the subscribed email address. In either case, the subject and body of your email will be ignored.

You can also subscribe to the seminar's Google calendar.

Upcoming Talks
Date Monday, June 5 2017, 3-4pm
Place Gates 104
Speaker Adam Chlipala, MIT
Title A Case for Integrating Proof Assistants into Daily Programming
Abstract Computer proof assistants are IDEs for developing rigorous mathematical proofs that are checked algorithmically. They are just beginning to be accepted as useful for "real work" in domains from systems infrastructure to applications. Still, most observers think of a mechanized proof as a kind of penance that programmers can pay to be especially virtuous. Are machine-checked proofs just an extra cost, bolted on to a standard software-development process, to increase assurance? I will argue that, instead, mechanized proofs are a key enabler of new patterns of abstraction and modularity, and these new patterns deserve to permeate the whole life cycle of development.
I will draw examples from a few different efforts I'm involved with, applying the Coq proof assistant. One reach target, for the different aspects combined, is to synthesize efficient and secure Internet servers from specifications. Key challenge areas included persistent data management, parsing of network wire formats, and cryptography. I will describe how we are able to generate performance-competitive code automatically from reasonable specifications in each domain -- and how to do it in a way that generates proofs of correctness. The resulting programmer experience is something between "program synthesis" and "programming with well-designed libraries." The paradigm we're pushing is code libraries that include specification notations and procedures that know how to compile those notations efficiently, with proofs.
Speaker Bio Adam Chlipala has been on the faculty in computer science at MIT since 2011. He did his undergrad at Carnegie Mellon and his PhD at Berkeley. His research focus is applying mechanized logical reasoning to the programming process (both software and hardware) to improve how we implement, compile, specify, verify, and synthesize code. Much of his work uses the Coq proof assistant, about which he has written a popular book, "Certified Programming with Dependent Types." Current projects involve verification and synthesis for processors, file systems, cryptography, databases, and Internet servers. He is designer and implementer of the Ur/Web programming language, a domain-specific functional language for building modern Web applications, which has a few commercial users.
Date Monday, May 15 2017, 3:00pm - 4:00pm
Place Gates 415
Speaker Yu Feng, UT Austin
Title Component-based Synthesis of Table Consolidation and Transformation Tasks from Examples
Abstract Programs that manipulate tabular data are extremely important in an era where data analytics has gained enormous popularity. Since raw data is rarely in a form that is immediately amenable to an analytics or visualization task, data scientists typically spend over 80% of their time performing tedious data preparation tasks. Such tasks include consolidating multiple data sources into a single table, reshaping data from one format into another, or adding new rows or columns to an existing table.
In this talk, I will present a novel component-based synthesis algorithm that marries the power of type-directed search with lightweight SMT-based deduction and partial evaluation. Given a set of components together with their over-approximate first-order specification, our method first generates a program sketch over a subset of the components and checks its feasibility using an SMT solver. Since a program sketch typically represents many concrete programs, the use of SMT-based deduction greatly increases the scalability of the algorithm. Once a feasible program sketch is found, our algorithm completes the sketch in a bottom-up fashion, using partial evaluation to further increase the power of deduction for rejecting partially-filled program sketches.
We apply the proposed synthesis methodology for automating a large class of data preparations tasks that commonly arise in data science. We have evaluated our synthesis algorithm on dozens of data wrangling and consolidation tasks obtained from on-line forums, and we show that our approach can automatically solve a large class of problems encountered by R users.
Speaker Bio Yu Feng is a 5th year Ph.D. student at UT Austin advised by Isil Dillig and he was a member of the STAMP group led by Alex Aiken. Yu has interests spanning programming languages, security, and software engineering. For instance, he has published topics about Android Malware Detection (fse'14, ndss'17), Program Synthesis (popl'17, pldi'17) and Program Analysis (oopsla'15, aplas'15).
Date Friday, May 12 2017, 2-3pm
Place Gates 415
Speaker Loris D'Antoni, University of Wisconsin-Madison
Title Adventures in program repair
Abstract Programmers constantly face errors that are confusing and hard to fix. In particular, inexperienced programmers, who cannot address commonly occurring errors, have to resort to online help-forums for finding corrections to their buggy programs. I will present three ideas that leverage advances in program synthesis to automatically assist unskilled programmers that face commonly occurring errors.
First, I will talk about "emulative program repair", a technique for automatically learning how to fix errors directly from programmers. Given examples of bug fixes from real programmers, we synthesize "rules" that generalize such bug fixes and use them to automatically repair programs that contains similar bugs. Second, I will talk about "repair via direct manipulation", a technique that allows the programmer to express what a repaired program should do by directly manipulating intermediate runtime values of a program. Using the manipulated values, we automatically synthesize programs compliant with the user intent. Third, I will talk about "program repair under uncertainty", where program inputs are drawn from a probability distribution. I will show how program that do not satisfy a given probabilistic postcondition can be efficiently repaired by combining techniques from program synthesis and computational learning theory.
Speaker Bio Loris D'Antoni is an Assistant Professor in the MadPL (Madison Programming Languages and Software Engineering) Group at the University of Wisconsin-Madison. He received his PhD from the University of Pennsylvania in 2015, where he worked under the supervision of Rajeev Alur. His dissertation "Programming using Automata and Transducers" won the The Morris and Dorothy Rubinoff Award. Loris's research interests lie in formal methods and program synthesis, with applications to networking, personalized education, and data science. Loris is currently investigating how formal methods can be used to provide personalized feedback to computer science students, to automatically program networks, and to remove bias from machine learning classifiers.
Past Talks
Date Thursday, April 20 2017, 2:00pm - 3:00pm
Place Gates 463A
Speaker Michael Pradel, Assistant Professor, TU Darmstadt
Title Systematic Black-Box Analysis of Collaborative Web Applications
Abstract Web applications, such as collaborative editors that allow multiple clients to concurrently interact on a shared resource, are difficult to implement correctly. Existing techniques for analyzing concurrent software do not scale to such complex systems or do not consider multiple interacting clients. This paper presents Simian, the first fully automated technique for systematically analyzing multi-client web applications. Naively exploring all possible interactions between a set of clients of such applications is practically infeasible. Simian obtains scalability for real-world applications by using a two-phase black-box approach. The application code remains unknown to the analysis and is first explored systematically using a single client to infer potential conflicts between client events triggered in a specific context. The second phase synthesizes multi-client interactions targeted at triggering misbehavior that may result from the potential conflicts, and reports an inconsistency if the clients do not converge to a consistent state. We evaluate the analysis on three widely used systems, Google Docs, Firepad, and ownCloud Documents, where it reports a variety of inconsistencies, such as incorrect formatting and misplaced text fragments. Moreover, we find that the two-phase approach runs 10x faster compared to exhaustive exploration, making systematic analysis practically applicable.
(Joint work with Marina Billes and Anders Moller. Paper at PLDI'17.)
Speaker Bio Michael Pradel is an assistant professor at TU Darmstadt, which he joined after a PhD at ETH Zurich and a post-doc at UC Berkeley. His research interests span software engineering and programming languages, with a focus on tools and techniques for building reliable, efficient, and secure software. In particular, he is interested in dynamic program analysis, test generation, concurrency, performance profiling, and JavaScript-based web applications.
Date Monday, March 6 2017, 2:00pm - 3:00pm
Place Gates 415
Speaker Thomas Ball, Principal Researcher and Research Manager, Microsoft Research
Title Physical Computing for Everyone
Abstract Thanks to Moore’s Law, embeddable microcontroller-based devices continue to get cheaper, faster, and include more integrated sensors and networking options. In 2016, the BBC and a host of technical partners, including Microsoft, delivered such a physical computing device, the micro:bit, to every 5th grader in the UK. Microsoft Research helped to make the micro:bit easy to program for novices. The non-profit Micro:bit Education Foundation (microbit.org), of which Microsoft is a founding partner, was recently created to take the micro:bit global. Over the last year, Microsoft has invested in a new web-based programming platform for physical computing, called PXT, with the micro:bit being the first target (pxt.microbit.org).
In this talk, I’ll describe the design and implementation of PXT, focusing specifically on its web-based approach to physical computing. PXT supports rapid script development and testing within the confines of a modern web browser, via a novel combination of Blockly, TypeScript and hardware simulation. A browser-based compilation toolchain targets both the Thumb and AVR instruction sets and links against pre-compiled C++ code. PXT uses a bespoke C++ runtime from Lancaster University that provides a set of useful abstractions, including events, a message bus, and fibers.
Speaker Bio Thomas (Tom) Ball is a principal researcher and manager at Microsoft Research. In 1999, Tom initiated the SLAM software model-checking project with Sriram Rajamani. This led to the creation of the Static Driver Verifier tool for finding defects in Windows device drivers. Tom is a 2011 ACM Fellow for “contributions to software analysis and defect detection.” As a manager, he has nurtured research areas such as automated theorem proving, program testing/verification, and empirical software engineering. His current focus is CS education and the PXT platform for physical computing.
Date Wednesday, September 21, 4:00pm - 5:00pm
Place Gates 104
Speaker Tom Henzinger, IST Austria
Title The Quest for Average Response Time
Abstract Responsiveness -the requirement that every request to a system be eventually handled- is one of the fundamental liveness properties of a reactive system and lies at the heart of all methods for specifying and verifying liveness. Average response time is a quantitative measure for the responsiveness requirement used commonly in performance evaluation. The static computation of average response time has proved remarkably elusive even for finite-state models of reactive systems. We present, for the first time, a robust formalism that allows the specification and computation of quantitative temporal properties including average response time. The formalism is based on nested weighted automata, which can serve as monitors for measuring the response time of a reactive system. We show that the average response time can be computed in exponential space for nondeterministic finite-state models of reactive systems and in polynomial time for probabilistic finite-state models.
This work is joint with Krishnendu Chatterjee and Jan Otop.
Date Thursday, January 21, 3:00pm - 4:00pm
Place Gates 415
Speaker Julian Shun, UC Berkeley
Title A Framework for Processing Large Graphs in Shared Memory
Abstract In this talk, I will discuss Ligra, a shared-memory graph processing framework that has two very simple routines, one for mapping over edges and one for mapping over vertices. The routines can be applied to any subset of the vertices, which makes the framework useful for many graph traversal algorithms that operate on subsets of the vertices. Based on recent ideas used in a very fast algorithm for breadth-first search, the routines automatically adapt to the density of vertex sets. Ligra is able to express a broad class of graph algorithms including BFS, betweenness centrality, eccentricity estimation, connectivity, PageRank, and single-source shortest paths. The algorithms expressed using this framework are very simple and concise, and perform almost as well as highly optimized code. Furthermore, they get good speedups on a modern 40-core machine and are sometimes much more efficient than previously reported results using graph frameworks on machines with many more cores. I will also discuss Ligra+, an extension of Ligra that uses graph compression to reduce space usage and improve parallel performance.
Speaker Bio Julian Shun (http://www.eecs.berkeley.edu/~jshun) is currently a Miller postdoc at UC Berkeley. He obtained his Ph.D. in Computer Science from Carnegie Mellon University, and his undergraduate degree in Computer Science from UC Berkeley. He is interested in developing large-scale parallel algorithms for graph processing, and parallel text algorithms and data structures. He is also interested in designing methods for writing deterministic parallel programs and benchmarking parallel programs.
Date Tuesday, September 22, 11:00am-12:00pm
Place Gates 415
Speaker Nate Foster, Cornell University
Title NetKAT: Semantic Foundations for Networks
Abstract Formal specification and verification of computer networks has become a reality in recent years, with the emergence of domain-specific programming languages and automated verification tools. But the design of these languages and tools has been largely ad hoc, driven more by the needs of applications and the capabilities of hardware than by any foundational principles. This talk will present NetKAT, a language for programming networks based on a well-studied mathematical foundation, Kleene Algebra with Tests (KAT). The first part of the talk will describe the design of the language, including primitives for filtering, modifying, and forwarding packets; operators for combining programs in parallel and in sequence; and a Kleene star operator for iterating programs. The next part of the talk will explore the semantic underpinnings of the language, developing an equational deductive reasoning system as well as a formal connection to finite automata. The third part of the talk will show how NetKAT can be compiled to low-level configurations for network devices. Lastly, I will discuss a recent extension of NetKAT with new features designed to support probabilistic programming.
NetKAT is joint work with colleagues at Cornell, Facebook, Inhabited Type, Princeton, Samsung, UCL, and UMass Amherst.
Speaker Bio Nate Foster is an Assistant Professor of Computer Science at Cornell University. The goal of his search is developing programming languages and tools for building reliable systems. He received a PhD in Computer Science from the University of Pennsylvania in 2009, an MPhil in History and Philosophy of Science from Cambridge University in 2008, and a BA in Computer Science from Williams College in 2001. His awards include a Sloan Research Fellowship, an NSF CAREER Award, a Most Influential POPL Paper Award, a Tien '72 Teaching Award, a Google Research Award, a Yahoo! Academic Career Enhancement Award, and the Morris and Dorothy Rubinoff Award.
Date Friday, June 19, 11:00am-12:00pm
Place Gates 415
Speaker Martin Vechev, ETH Zurich
Title Machine Learning for Programming
Abstract The increased availability of massive codebases (“Big Code”) creates an exciting opportunity for new kinds of programming tools based on probabilistic models. Enabled by these models, tomorrow’s tools will provide probabilistically likely solutions to programming tasks that are difficult or impossible to solve with traditional techniques. I will present a new approach for building such tools based on structured prediction with graphical models, and in particular, conditional random fields. These are powerful machine learning techniques popular in computer vision – by connecting these techniques to programs, our work enables new applications not previously possible.
As an example, I will discuss JSNice (http://jsnice.org), a system that automatically de-minifies JavaScript programs by predicting statistically likely variable names and types. Since its release a year ago, JSNice has become a popular tool in the JavaScript community and is regularly used by thousands of developers worldwide.
Speaker Bio Martin Vechev (http://www.srl.inf.ethz.ch/vechev.php) is a tenure-track assistant professor at the Department of Computer Science, ETH Zurich. Previously, he was a Research Staff Member at the IBM T.J. Watson Research Center, New York (2007-2011). He obtained his PhD from Cambridge University in 2008. His research interests are in program analysis, program synthesis, application of machine learning to programming languages, and concurrency.
Date Thursday, June 11, 2:00pm-3:00pm
Place Gates 104
Speaker Ofri Ziv
Title Composing Concurrency Control
Abstract Concurrency control poses significant challenges when composing computations over multiple data-structures (objects) with different concurrency-control implementations. We formalize the usually desired requirements (serializability, abort-safety, deadlock-safety, and opacity) as well as stronger versions of these properties that enable composition. We show how to compose synchronization protocols in a way which preserves these properties. Our approach generalizes well-known synchronization protocols (such as two-phase-locking and two-phase-commit) and leads to new synchronization protocols. We apply this theory to show how we can safely compose optimistic and pessimistic concurrency control and demonstrate the practical value of such a composition. For example, we show how we can execute a transaction that accesses two objects, one controlled by an STM and another by locking.
Date Thursday, June 4, 1:00pm-2:00pm
Place Gates 104
Speaker Mayur Naik, Georgia Tech
Title Petablox: Declarative Program Analysis for Big Code
Abstract Most software development today leverages the world's massive collection of open source software. There is significant room for program analyses to similarly leverage Big Code, the collective knowledge amassed from analyzing existing programs, to automatically infer or predict salient behaviors and vulnerabilities in new programs. We present Petablox, a framework for automatically synthesizing use-cases of arbitrary declarative program analyses for Big Code tasks such as efficiently finding good abstractions, transferring analysis results across programs, and adapting analyses to user feedback. Despite their diversity, all these tasks entail solving large instances of MaxSAT, the maximum satisfiability problem which comprises a mix of hard (inviolable, logical) constraints and soft (violable, probabilistic) constraints. We describe demand-driven, compositional, and learning-based MaxSAT optimizations in Petablox for scaling these tasks to large code bases.
Speaker Bio Mayur Naik is an Assistant Professor in Computer Science at Georgia Tech since 2011. His research interests are in programming languages and software engineering, with a current emphasis on program analysis techniques and systems for improving software quality and programmer productivity on modern computing platforms such as parallel, mobile, and cloud computing. He holds a Ph.D. in Computer Science from Stanford University (2007) and was a research scientist at Intel Labs, Berkeley from 2008 to 2011.
Date Friday, May 8, 11:00-12:00
Place Gates 415
Speaker Philip Wadler, University of Edinburgh
Title A Practical Theory of Language-Integrated Query
Abstract Language-integrated query is receiving renewed attention, in part because of its support through Microsoft's LINQ framework. We present a theory of language-integrated query based on quotation and normalisation of quoted terms. Our technique supports abstraction over values and predicates, composition of queries, dynamic generation of queries, and queries with nested intermediate data. Higher-order features prove useful even for constructing first-order queries. We prove that normalisation always succeeds in translating any query of flat relation type to SQL. We present experimental results confirming our technique works, even in situations where Microsoft's LINQ framework either fails to produce an SQL query or, in one case, produces an avalanche of SQL queries.
Date Monday, March 23, 2:00-3:00
Place Gates 415
Speaker Per Stenström, Chalmers University of Technology, Sweden
Title Efficient Statistical-based Cache Compression
Abstract Low utilization of on-chip cache capacity limits performance and causes energy wastages because of the long latency, the limited bandwidth, and the energy consumption associated with off-chip memory accesses. Value replication ­- the same value appears in multiple memory locations ­- is an important source of low capacity utilization. While cache compression techniques in the past manage to code frequent values densely, they trade off a high compression ratio for low decompression latency, thus missing opportunities to utilize on-chip cache capacity more effectively. This talk presents, for the first time, a detailed design-space exploration of statistical-based cache compression. We show that more aggressive, statistical-based compression approaches, such as Huffman, that have been excluded in the past due to the processing overhead for compression and decompression, are prime candidates for cache and memory compression. We first find that the overhead of statistics acquisition to generate new codewords is low because value locality varies little over time and across applications so new encodings need to be generated rarely, making it possible to off-load it to software routines. We then show that the high compression ratio obtained by Huffman-based cache compression makes it possible to enjoy the performance benefits of 4X larger last-level caches at a power consumption that is about 50% lower than 4X times larger caches.
Date Friday, February 27, 3:00-4:00
Place Gates 104
Speaker Armando Solar-Lezama, MIT
Title Making synthesis practical. Are we there yet?
Abstract In this talk, I will describe recent advances in our ability to synthesize programs that satisfy a specification, and some of the new applications that have been enabled by these advances. The first part of the talk will focus on the Sketch synthesis system and the algorithms that allow it to scale to challenging synthesis problems. The second part of the talk will focus on applications of synthesis to program optimization, automated tutoring and the development of the synthesizer itself.
Speaker Bio Armando Solar-Lezama is an associate professor without tenure at MIT where he leads the Computer Aided Programming Group. His research interests include software synthesis and its applications, as well as high-performance computing, information flow security and probabilistic programming.
Date Tuesday, July 22, 3:00-4:00
Place Gates 104
Speaker Martin Monperrus, University of Lille
Title On the Search Space of Automatic Software Repair
Abstract Automatic software repair consists of fixing software bugs automatically. A fix is a small point in a huge space of all possible program modifications. In this talk, I will talk about the topology of this search space, and how one can exploit it to speed up automatic software repair. The talk is based on content from:
  • Mining Software Repair Models for Reasoning on the Search Space of Automated Program Fixing (Matias Martinez, Martin Monperrus), In Empirical Software Engineering, Springer, 2013.
  • Do the Fix Ingredients Already Exist? An Empirical Inquiry into the Redundancy Assumptions of Program Repair Approaches (Matias Martinez, Westley Weimer, Martin Monperrus), In Proceedings of the International Conference on Software Engineering, 2014.
  • A Critical Review of "Automatic Patch Generation Learned from Human-Written Patches": Essay on the Problem Statement and the Evaluation of Automatic Software Repair (Martin Monperrus), In Proceedings of the International Conference on Software Engineering, 2014.
Speaker Bio The speaker is an Associate Professor at the University of Lille (France) and a member of INRIA's research group SPIRALS. In 2008-2011, he was a research associate at the Darmstadt University of Technology (Germany), and received his Ph.D. from the University of Rennes (France) in 2008.
Date Monday, July 21, 3:00-4:00
Place Gates 104
Speaker Emina Torlak, UC Berkeley
Title Programming for Everyone: From Solvers to Solver-Aided Languages and Beyond
Abstract In this talk, I will present a new approach to constructing programs, which exploits advances in constraint solving to make programming easier for experts and more accessible to everyone else. The approach is based on two observations. First, much of everyday programming involves the use of domain-specific languages (DSLs) that are embedded, in the form of APIs and interpreters, into modern host languages (for example, JavaScript, Scala or Racket). Second, productivity tools based on constraint solvers (such as verification or synthesis) work best when specialized to a given domain. Rosette is a new kind of host language, designed for easy creation of DSLs that are equipped with solver-based tools. These Solver-Aided DSLs (SDSLs) use Rosette's symbolic virtual machine (SVM) to automate hard programming tasks, including verification, debugging, synthesis, and programming with angelic oracles. The SVM works by compiling SDSL programs to logical constraints understood by SMT solvers, and then translating the solver's output to counterexamples (in the case of verification), traces (in the case of angelic execution), or code snippets (in the case of synthesis and debugging). Rosette has hosted several new SDSLs, including imperative SDSLs for data-parallel and spatial programming; a functional SDSL for specifying executable semantics of secure stack machines; and a declarative SDSL for web scraping by example.
Speaker Bio
Date Thursday, May 29, 2:00-3:00
Place Gates 104
Speaker Atif Memon, University of Maryland
Title The First Decade of GUI Ripping: Extensions, Applications, and Broader Impacts
Abstract This seminar provides a retrospective examination of GUI Ripping---reverse engineering a workflow model of the graphical user interface of a software application---born a decade ago out of recognition of the severe need for improving the then largely manual state-of-the-practice of functional GUI testing. In these last 10 years, GUI ripping has turned out to be an enabler for much research, both within our group at Maryland and other groups. Researchers have found new and unique applications of GUI ripping, ranging from measuring human performance to re-engineering legacy user interfaces. GUI ripping has also enabled large-scale experimentation involving millions of test cases, thereby helping to understand the nature of GUI faults and characteristics of test cases to detect them. It has resulted in large multi-institutional Government-sponsored research projects on test automation and benchmarking. GUI ripping tools have been ported to many platforms, including Java AWT and Swing, iOS, Android, UNO, Microsoft Windows, and web. In essence, the technology has transformed the way researchers and practitioners think about the nature of GUI testing, no longer considered a manual activity; rather, thanks largely to GUI Ripping, automation has become the primary focus of current GUI testing techniques.
Speaker Bio Atif M. Memon is an Associate Professor at the Department of Computer Science, University of Maryland, where he founded and heads the Event Driven Software Lab (EDSL). Researchers at the EDSL study issues of design, development, quality assurance, and maintenance of such software applications. He designed and developed the model-based GUI testing software GUITAR, which operates on Android, iPhone, Java Swing, .NET, Java SWT, UNO, MS Windows, and web systems, and leverages a resource cloud for test automation. He has published over 100 research articles on the topic of event driven systems, software testing, and software engineering. He is the founder of the International Workshop on TESTing Techniques & Experimentation Benchmarks for Event-Driven Software (TESTBEDS). He also helped develop the workshop on Experimental Evaluation of Software and Systems in Computer Science (EVALUATE).

He is the Serial Editor of Advances in Computers, published by Elsevier. This series, since its first volume in 1960 and now the oldest series still being published, covers new developments in computer technology. He is an elected member of the Steering Committee of the International Conference on Software Testing, Verification and Validation ICST, the largest conference on software testing. He serves on various editorial boards, including that of the Journal of Software Testing, Verification, and Reliability. He has served on numerous National Science Foundation panels and program committees, including ICSE, FSE, ICST, WWW, ASE, ICSM, and WCRE. He is currently serving on a National Academy of Sciences panel as an expert in the area of Computer Science and Information Technology, for the Pakistan-U.S. Science and Technology Cooperative Program, sponsored by United States Agency for International Development (USAID).

He has a joint appointment in the University's Institute for Advanced Computer Studies (UMIACS). He received his Ph.D. from the University of Pittsburgh in Computer Science in 2001, and started at the University of Maryland soon thereafter. In 2007 he was a visiting scholar at the Institute of Software at the Chinese Academy of Sciences, and a visiting researcher at Tata Research Development and Design Centre.

Date Thursday, May 15, 4:00-5:00
Place Gates 104
Speaker Adam Chlipala, MIT
Title Bedrock: A Platform for Practical Proof-Carrying Code with Rich Policies
Abstract Today more than ever, the computing ecosystem is full of examples of execution of untrusted code. From mobile applications, to JavaScript-based Web applications in browsers, to virtual-machine images run by cloud providers, users from consumers to hosting providers are running programs produced by untrusted third parties. It is essential to enforce security policies on these pieces of untrusted code, but the common techniques of today are often too weak, and they often impose high run-time costs in power and so forth, which matter especially much for resource-constrained mobile devices. One alternative paradigm is proof-carrying code (PCC), where developers distribute software with machine-checkable proofs of adherence to policies. In this talk, I will give an overview of Bedrock, a platform to make PCC more practical along a number of dimensions.

Ideally we want to apply PCC with rich behavioral policies, but, without careful tool design, it can be intractable to build rigorous conformance proofs for realistic programs. Past PCC frameworks supporting rich policies have required on the order of 100 lines of manual proof per 1 line of assembly code. Bedrock, a library for the Coq proof assistant, brings this ratio closer to 1-to-1. We can do mostly automated assembly-level proofs of functional correctness for tricky program modules that mix features like heap-allocated data structures, function pointers, and concurrency. In this domain, the challenge is not just the algorithmic design of verification tools, but also the generation of machine-checkable proofs from first principles. I will focus on three broad challenges that are important to scale up the approach: "horizontal" modularity in the form of program decomposition into modules, "vertical" modularity via verified compilers, and effective proof automation for individual program modules written in particular languages.

Speaker Bio Adam Chlipala is an assistant professor in computer science at MIT. He completed his PhD in CS at Berkeley in 2007 and his BS in CS at CMU in 2003. His research interests, broadly speaking, are in applied logic for software development tools. Much of his group's work uses the Coq theorem-proving software, about which he has written a popular book "Certified Programming with Dependent Types." Current projects in theorem-proving deal with verification of compilers and low-level imperative code. He also designed the Ur/Web domain-specific programming language for modern Web applications, which applies advanced type system ideas in a form practical enough to be used by several production Web applications.
Date Friday March 7, 2:00-3:00
Place Gates 392
Speaker Mooly Sagiv, Tel Aviv University
Title VeriCon: Towards Verifying Controller Programs in Software-Defined Networks
Abstract Software-defined networking (SDN) is a new paradigm for operating and managing computer networks. SDN enables logically-centralized control over network devices through a software ``controller'' program that operates independently from the network hardware.

We present VeriCon, the first system for verifying network-wide invariants of SDN controller programs. VeriCon uses first-order logic to define admissible network topologies and desired network-wide invariants (e.g., routing correctness. correct access control, and consistency of the controller's data structures).

VeriCon either confirms the correctness of the controller program on ALL admissible network topologies or outputs a concrete example that violates an invariant, and so can be used for debugging controller code.

We show that VeriCon, which implements classical Floyd-Hoare-Dijkstra deductive verification, is practical for a large repertoire of controller programs. In addition, as VeriCon is compositional, in the sense that it checks the correctness of each network event independently against the specified invariants, it can scale to handle complex systems.

We view VeriCon as a first step en route to practical mechanisms for verifying network-wide invariants of controller code. This is a joint work with Thomas Ball and Nikolaj Bjorner (MSR), Aaron Gember (Wisc), Shachar Itzhaky (TAU), Aleksandr Karbyshev (TUM), Michael Schapira and Asaf Valdarsky (HUJI)

Speaker Bio
Date Friday Jan 17, 3-4pm
Place Gates 498
Speaker Stavros Aronis, Uppsala University
Title Optimal Dynamic Partial Order Reduction
Abstract Stateless model checking is a powerful technique for program verification, which however suffers from an exponential growth in the number of explored executions. A successful technique for reducing this number, while still maintaining complete coverage, is Dynamic Partial Order Reduction (DPOR). We present a new DPOR algorithm, which is the first to be provably optimal in that it always explores the minimal number of executions. It is based on a novel class of sets, called source sets, which replace the role of persistent sets in previous algorithms. First, we show how to modify an existing DPOR algorithm to work with source sets, resulting in an efficient and simple to implement algorithm. Second, we extend this algorithm with a novel mechanism, called wakeup trees, that allows to achieve optimality. We have implemented both algorithms in a stateless model checking tool for Erlang programs. Experiments show that source sets significantly increase the performance and that wakeup trees incur only a small overhead in both time and space.
Speaker Bio
Date 3:00-4:00, Monday, November 18
Place Gates 415
Speaker Junfeng Yang, Columbia University
Title Determinism Is Not Enough: Making Parallel Programs Reliable with Stable Multithreading
Abstract Our accelerating computational demand and the rise of multicore hardware have made parallel programs, especially shared-memory multithreaded programs, increasingly pervasive and critical. Yet, these programs remain extremely difficult to write, test, analyze, debug, and verify. Conventional wisdom has attributed these difficulties to nondeterminism (i.e., repeated executions of the same program on the same input may show different behaviors), and researchers have recently dedicated much effort to bringing determinism into multithreading. In this talk, I argue that determinism is not as useful as commonly perceived: it is neither sufficient nor necessary for reliability. We present our view on why multithreaded programs are difficult to get right, describe a promising approach we call stable multithreading to dramatically improve reliability, and summarize our last four years research on building and applying stable multithreading systems. (More details are at http://www.cs.columbia.edu/~junfeng/.)
Speaker Bio
Date 3:00-4:00, Monday, December 2
Place Gates 415
Speaker Todd Millstein, UCLA
Title Toward a "Safe" Semantics for Multithreaded Programming Languages
Abstract "Safe" programming languages enforce fundamental language abstractions, thereby providing strong guarantees for all program executions and obviating large classes of subtle and dangerous errors. Modern languages have embraced the compelling programmability benefits of (memory and type) safety despite the additional run-time overhead. Unfortunately, recent work to standardize multithreading semantics in mainstream programming languages is reversing this trend. While a significant improvement over prior informally-specified semantics, the current standards allow a small programming error or omission to violate program safety in ways that are difficult to understand, detect, and correct.

In this talk I will argue that a safe multithreaded programming language should support the simple interleaving semantics of threads known as sequential consistency (SC). I'll then describe the results of our research over the past few years, which challenges the idea that the SC semantics is inconsistent with high performance. Perhaps surprisingly, restricting a modern compiler's optimizations to respect SC introduces minimal runtime overhead. While modern hardware relies upon important optimizations that potentially violate SC, a small extension to such hardware can preserve the SC semantics while retaining the lion's share of the benefit of these optimizations. Further, various factors will conspire to lower the cost of SC on stock hardware in the coming years.

This work is joint with Dan Marino (Symantec Research Labs, formerly UCLA), Abhay Singh (U Michigan, Ann Arbor), Madan Musuvathi (Microsoft Research, Redmond), and Satish Narayanasamy (U Michigan, Ann Arbor).

Speaker Bio Todd Millstein is an Associate Professor in the Computer Science Department at the University of California, Los Angeles. Todd received his Ph.D. and M.S. from the University of Washington and his A.B. from Brown University, all in Computer Science. Todd received an NSF CAREER award in 2006, an IBM Faculty Award in 2008, an ACM SIGPLAN Most Influential PLDI Paper Award in 2011, and an IEEE Micro Top Picks selection in 2012.
Date 2:00-3:00, Monday, September 16
Place 104 Gates
Speaker Mohsen Lesani, UCLA
Title On Testing and Verification of Transactional Memory Algorithms
Abstract A transactional memory (TM) is an object composed of a set of base objects such as registers and locks. The safety of the TM algorithm is the result of the safety of the composed base objects and the logic of the algorithm. We define a language that captures the type of the base objects and the algorithm. We define a history semantics for the language that characterizes the set of histories that a program can result. Based on the history semantics, we propose techniques for both testing and verification of TM algorithms. First, we identify two problems that lead to violation of opacity, a safety criterion for TM. We present an automatic tool that given a violating history, finds program traces that result in that history. We show that the well-known TM algorithms DSTM and McRT don't satisfy opacity. DSTM suffers from a write-skew anomaly, while McRT suffers from a write-exposure anomaly. Second, we present a program logic with novel propositions about execution order and linearization orders of base objects. We prove that our inference rules are sound i.e. if we can derive that a program satisfies a property, then every history of the program satisfies the property. Our logic is composable as it can be augmented with new inference rules to support reasoning about new object types. We have used our logic to prove that TL2 TM algorithm satisfies opacity. We are formalizing our logic and proofs in PVS.
Speaker Bio Mohsen Lesani is a Phd candidate at UCLA advised by professor Palsberg. He has research experience with IBM Research, Oracle (Sun) Labs, HP Labs and EPFL. He is interested in the design, implementation, testing and verification of synchronization algorithms.
Date 4:00-5:00, Tuesday, May 28
Place 415 Gates
Speaker Vijay Ganesh
Title SMT Solvers for Software Reliability and Security
Abstract SMT solvers increasingly play a central role in the construction of reliable and secure software, regardless of whether such reliability/security is ensured through formal methods, program analysis or testing. This dramatic influence of SMT solvers on software engineering as a discipline is a recent phenomenon, largely attributable to impressive gains in solver efficiency and expressive power.

In my talk, I will motivate the need for SMT solvers, sketch out their research story thus far, and then describe my contributions to solver research. Specifically, I will talk about two SMT solvers that I designed and implemented, namely, STP and HAMPI, currently being used in 100+ research projects. I will talk about real-world applications enabled by my solvers, and the techniques I developed that helped make them efficient.

Time permitting, I will also talk about some theoretical results in the context of SMT solving.

Speaker Bio Vijay Ganesh is an assistant professor at the University of Waterloo, Canada, since September 2012. Prior to that he was a Research Scientist at MIT, and completed his PhD in computer science from Stanford University.
Date 2:00-3:00, Monday, May 20
Place Gates 104
Speaker Cristian Cadar, Imperial College London
Title Safe Software Updates via Multi-version Execution
Abstract Software systems are constantly evolving, with new versions and patches being released on a continuous basis. Unfortunately, software updates present a high risk, with many releases introducing new bugs and security vulnerabilities. We tackle this problem using a simple but effective multi-version based approach. Whenever a new update becomes available, instead of upgrading the software to the new version, we run the new version in parallel with the old one; by carefully coordinating their executions and selecting the behavior of the more reliable version when they diverge, we create a more secure and dependable multi-version application. We implemented this technique in Mx, a system targeting Linux applications running on multi-core processors, and show that it can be applied successfully to several real applications, such as Lighttpd and Redis.
Date 3:00-4:00, Monday, May 6
Place Gates 392
Speaker Rupak Majumdar, Max Planck Institute
Title Static Provenance Verification for Message-Passing Programs
Abstract Provenance information records the source and ownership history of an object. We study the problem of static provenance tracking in concurrent programs in which several principals execute concurrent processes and exchange messages over unbounded but unordered channels. The provenance of a message, roughly, is a function of the sequence of principals that have transmitted the message in the past. The provenance verification problem is to statically decide, given a message passing program and a set of allowed provenances, whether the provenance of all messages in all possible program executions, belongs to the allowed set. We formalize the provenance verification problem abstractly in terms of well-structured provenance domains, and show a general decidability result for it. In particular, we show that if the provenance of a message is a sequence of principals who have sent the message, and a provenance query asks if the provenance lies in a regular set, the problem is decidable and EXPSPACE-complete. We describe an implementation of our technique to check provenances of messages in Firefox extensions. (Joint work with Roland Meyer and Zilong Wang)
Speaker Bio
Date 2:30-3:30, Thurs., Dec. 6
Place 392 Gates
Speaker Christoph Kirsch, University of Salzburg
Title Distributed Queues: Faster Pools and Better Queues
Abstract Designing and implementing high-performance concurrent data structures whose access performance scales on multicore hardware is difficult. An emerging remedy to scalability issues is to relax the sequential semantics of the data structure and exploit the resulting potential for parallel access in relaxed implementations. However, a major obstacle in the adoption of relaxed implementations is the belief that their behavior becomes unpredictable. We therefore aim at relaxing existing implementations systematically for better scalability and performance without incurring a cost in predictability. We present distributed queues (DQ), a new family of relaxed concurrent queue implementations. DQ implement bounded or unbounded out-of-order relaxed queues with strict (i.e. linearizable) emptiness check. Our comparison of DQ against existing pool, and strict and relaxed queue implementations reveals that DQ outperform and outscale the state-of-the-art implementations. We also empirically show that the shorter execution time of queue operations of fast but relaxed implementations such as DQ (i.e. the degree of reordering through overlapping operations) may offset the effect of semantical relaxations (i.e. the degree of reordering through relaxation) making them appear as behaving as or sometimes even more FIFO than strict but slow implementations.

This is joint work with A. Haas, T.A. Henzinger, M. Lippautz, H. Payer, A. Sezgin, and A. Sokolova

Speaker Bio Christoph Kirsch is full professor and holds a chair at the Department of Computer Sciences of the University of Salzburg, Austria. Since 2008 he is also a visiting scholar at the Department of Civil and Environmental Engineering of the University of California, Berkeley. He received his Dr.Ing. degree from Saarland University, Saarbruecken, Germany, in 1999 while at the Max Planck Institute for Computer Science. From 1999 to 2004 he worked as Postdoctoral Researcher at the Department of Electrical Engineering and Computer Sciences of the University of California, Berkeley. His research interests are in concurrent programming and systems, virtual execution environments, and embedded software. Dr. Kirsch co-invented the Giotto and HTL languages, and leads the JAviator UAV project for which he received an IBM faculty award in 2007. He co-founded the International Conference on Embedded Software (EMSOFT), has been elected ACM SIGBED chair in 2011, and is currently associate editor of ACM TODAES.
Date 2:00-3:00, July 16, 2012
Place Gates 260
Speaker Jesse Tov, Harvard
Title Practical Programming with Substructural Types
Abstract Substructural logics remove from classical logic rules for reordering, duplication, or dropping of assumptions. Because propositions in such a logic may no longer be freely copied or ignored, this suggests understanding propositions in substructural logics as representing resources rather than truth. For the programming language designer, substructural logics thus provide a framework for considering type systems that can track the changing states of logical and physical resources.

While several substructural type systems have been proposed and implemented, many of these have targeted substructural types at a particular purpose, rather than offering them as a general facility. The more general substructural type systems have been theoretical in nature and too unwieldy for practical use. This talk presents the design of a general purpose language with substructural types, and discusses several language design problems that had to be solved in order to make substructural types useful in practice.

Speaker Bio
Date 2:00-3:00, July 18, 2012
Place Gates 260
Speaker Aditya Thakur, U. Wisconsin
Title A Deductive Algorithm for Symbolic Abstraction with Applications to SMT
Abstract This talk presents connections between logic and abstract interpretation. In particular, I will present a new algorithm for the problem of "symbolic abstraction": Given a formula \phi in a logic L and an abstract domain A, the symbolic abstraction of \phi is the best abstract value in A that over-approximates the meaning of \phi. When \phi represents a concrete transformer, algorithms for symbolic abstraction can be used to automatically synthesize the corresponding abstract transformer. Furthermore, if the symbolic abstraction of \phi is bottom, then \phi is proved unsatisfiable.

The bottom line is that our algorithm is "dual-use": (i) it can be used by an abstract interpreter to compute abstract transformers, and (ii) it can be used in an SMT (Satisfiability Modulo Theories) solver to determine whether a formula is satisfiable.

The key insight behind the algorithm is that Staalmarck's method for satisfiability checking of propositional-logic formulas can be explained using concepts from the field of abstract interpretation. This insight then led to the discovery of the connection between Staalmarck's method and symbolic abstraction, and the extension of Staalmarck's method to richer logics, such as quantifier-free linear real arithmetic.

This is joint work with Prof. Thomas Reps.

Speaker Bio
Date 3:00-4:00, May 24, 2012
Place Gates 260
Speaker Nataliya Guts, University of Maryland
Title Polymonads: Reasoning and Inference
Abstract Many useful programming constructions can be expressed as monads. Examples include probabilistic modeling, functional reactive programming, parsing, and information flow tracking, not to mention effectful functionality like state and I/O. In our previous work[SGLH11], we presented a type-based rewriting algorithm to make programming with arbitrary monads as easy as ubuilt-in support for state and I/O. Developers write programs using monadic values of type m t as if they were of type t, and our algorithm inserts the necessary binds, units, and monad-to-monad morphisms so that the program typechecks.

A number of other programming idioms resemble monads but deviate from the standard monad binding mechanism. Examples include parameterized monads, monads for effects, information flow state tracking. Our present work aims to provide support for formal reasoning and lightweight programming for such constructs. We present a new expressive paradigm, polymonads, including the equivalent of monad and morphism laws. Polymonads subsume conventional monads and all other examples mentioned above. On the practical side, we provide an extension of our type inference rewriting algorithm to support lightweight programming with polymonads.

[SGLH11] N. Swamy, N. Guts, D. Leijen, M. Hicks. Lightweight Monadic Programming in ML. In ICFP, 2011.

Speaker Bio
Date 3:00-4:00, November 7, 2011
Place Gates 260
Speaker Anupam Datta, CMU
Title Policy Auditing over Incomplete Logs
Abstract We present the design, implementation and evaluation of an algorithm that checks audit logs for compliance with privacy and security policies. The algorithm, which we name reduce, addresses two fundamental challenges in compliance checking that arise in practice. First, in order to be applicable to realistic policies, reduce operates on policies expressed in a first-order logic that allows restricted quantification over infinite domains. We build on ideas from logic programming to identify the restricted form of quantified formulas. The resulting logic is more expressive than prior logics used for compliance-checking, including propositional temporal logics and metric first-order temporal logic, and, in contrast to these logics, can express all 84 disclosure-related clauses in the HIPAA Privacy Rule. Second, since audit logs are inherently incomplete (they may not contain sufficient information to determine whether a policy is violated or not), reduce proceeds iteratively: in each iteration, it provably checks as much of the policy as possible over the current log and outputs a residual policy that can only be checked when the log is extended with additional information. We prove correctness, termination, time and space complexity results for reduce.We implement reduce and evaluate it by checking simulated audit logs for compliance with the HIPAA Privacy Rule. Our experimental results demonstrate that the algorithm is fast enough to be used in practice.
Speaker Bio Anupam Datta is an Assistant Research Professor at Carnegie Mellon University where he is affiliated with CyLab, the Electrical and Computer Engineering Department, and (by courtesy) the Computer research focuses on foundations of security and privacy. One area of focus has been on programming language methods for compositional security. His work on Protocol Composition Logic and the Logic of Secure Systems has uncovered new principles for compositional security and has been applied successfully to find attacks in and verify properties of a number of practical cryptographic protocols and secure systems. A second area of focus has been on formalizing and enforcing privacy policies. He has worked on a Logic of Privacy that formalizes concepts from contextual integrity --- a philosophical theory of privacy as a right to appropriate flows of personal information. His group has produced the first complete formalization of the HIPAA Privacy Rule using this logic and developed principled audit mechanisms for enforcing policies expressed in the logic.

Dr. Datta has co-authored a book and over 30 publications in conferences and journals on these topics. He serves on the Steering Committee of the IEEE Computer Security Foundations Symposium. He has served as Program Co-Chair of the 2011 Formal Aspects of Security and Trust Workshop and the 2008 Formal and Computational Cryptography Workshop. Dr. Datta obtained MS and PhD degrees from Stanford University and a BTech from IIT Kharagpur, all in Computer Science.

Date 3:00-4:00, Oct 4, 2011
Place 498 Gates
Speaker David Basin, ETH Zurich
Title Policy Monitoring in First-order Temporal Logic
Abstract In security and compliance, it is often necessary to ensure that agents and systems comply to complex policies. An example from financial reporting is the requirement that every transaction t of a customer c, who has within the last 30 days been involved in a suspicious transaction t', must be reported as suspicious within 2 days. We present an approach to monitoring such policies formulated in an expressive fragment of metric first-order temporal logic. We also report on case studies in security and compliance monitoring and use these to evaluate both the suitability of this fragment for expressing complex, realistic policies and the efficiency of our monitoring algorithm.
Speaker Bio David Basin is a full professor and has the chair for Information Security at the Department of Computer Science, ETH Zurich since 2003. He is also the director of the ZISC, the Zurich Information Security Center.

He received his bachelors degree in mathematics from Reed College in 1984, his Ph.D. from Cornell University in 1989, and his Habilitation from the University of Saarbr|cken in 1996. His appointments include a postdoctoral research position at the University of Edinburgh (1990 - 1991), and afterwards he led a subgroup, within the programming logics research group, at the Max-Planck-Institut f|r Informatik (1992 - 1997). From 1997 - 2002 he was a full professor at the University of Freiburg where he held the chair for software engineering.

His research focuses on information security, in particular methods and tools for modeling, building, and validating secure and reliable systems.

Date 3:00-4:00 pm, June 10
Place Gates 104
Speaker Koushik Sen, UC Berkeley
Title Specifying and Checking Correctness of Parallel Programs
Abstract The spread of multicore processors and manycore graphics processing units has greatly increased the need for parallel correctness tools. Reasoning about parallel multi-threaded programs is significantly more difficult than for sequential programs due to non-determinism. We believe that the only way to tackle this complexity is to separate reasoning about parallelism correctness (i.e., that a parallel program gives the same outcome despite thread interleavings) from reasoning about functional correctness (i.e., that the program produces the correct outcome on a thread interleaving). In this talk, I will describe two fundamental techniques for separating the parallelization correctness aspect of a program from its functional correctness. The first idea consists of extending programming languages with constructs for writing specifications, called bridge assertions, that focus on relating outcomes of two parallel executions differing only in thread-interleavings. The second idea consists of allowing a programmer to use a non-deterministic sequential program as the specification of a parallel one. For functional correctness, it is then enough to check the sequential program. For parallelization correctness, it is sufficient to check the deterministic behavior of the parallel program with respect to the non-deterministic sequential program. To check parallel correctness, we have developed a new scalable automated method for testing and debugging, called active testing. Active testing combines the power of imprecise program analysis with the precision of software testing to quickly discover concurrency bugs and to reproduce discovered bugs on demand.
Speaker Bio Koushik Sen is an assistant professor in the Department of Electrical Engineering and Computer Sciences at the University of California, Berkeley. His research interest lies in Software Engineering, Programming Languages, and Formal methods. He is interested in developing software tools and methodologies that improve programmer productivity and software quality. He is best known for his work on directed automated random testing and concolic testing. He has received a NSF CAREER Award in 2008, a Haifa Verification Conference (HVC) Award in 2009, a IFIP TC2 Manfred Paul Award for Excellence in Software: Theory and Practice in 2010, and a Sloan Foundation Fellowship in 2011. He has won three ACM SIGSOFT Distinguished Paper Awards. He received the C.L. and Jane W-S. Liu Award in 2004, the C. W. Gear Outstanding Graduate Award in 2005, and the David J. Kuck Outstanding Ph.D. Thesis Award in 2007 from the UIUC Department of Computer Science. He holds a B.Tech from Indian Institute of Technology, Kanpur, and M.S. and Ph.D. in CS from University of Illinois at Urbana-Champaign.
Date 1:00-2:00, Friday June 3
Place 463a Gates
Speaker Andreas Zeller, Saarland University
Title Mining Precise Specifications
Abstract Recent advances in software validation and verification make it possible to widely automate the check whether a specification is satisfied. This progress is hampered, though, by the persistent difficulty of writing specifications. Are we facing a "specification crisis"? By mining specifications from existing systems, we can alleviate this burden, reusing and extending the knowledge of 60 years of programming, and bridging the gap between formal methods and real-world software. But mining specifications has its challenges: We need good usage examples to learn expected behavior; we need to cope with the approximations of static and dynamic analysis; and we need specifications that are readable and relevant to users. In this talk, I present the state of the art in specification mining, its challenges, and its potential, up to a vision of seamless integration of specification and programming.
Speaker Bio Andreas Zeller is a full professor for Software Engineering at Saarland University in Saarbr|cken, Germany. His research concerns the analysis of large software systems and their development process; his students are funded by companies like Google, Microsoft, or SAP. In June 2011, Zeller will be inducted as Fellow of the ACM for his contributions to automated debugging and mining software archives.
Date 2:00-3:00, Monday, Feb. 7
Place Gates 104
Speaker Michael Franz, UC Irvine
Title Recent Advances In Compiler Research - Firefox's TraceMonkey and Beyond
Abstract Common to practically all compilers built over the past 50 years has been the concept of the "control flow graph", a model of a program that a compiler builds and then traverses while generating target code. Even just-in-time and embedded compilers use such control flow graphs, although they tend to make the unit of compilation smaller than traditional batch compilers (e.g., one method at a time rather than one class at a time). Trace Compilation, to which we have made significant contributions, represents a radical departure from this long established convention. A novel intermediate representation, the Trace Tree, is constructed lazily on-demand while the program is simultaneously executed, incrementally compiled, and optimized. The advantage of this technique is that the compiler doesn't expend any resources on parts of the program that are not frequently executed; traditional compilers construct control-flow graphs for unimportant and even for unreachable parts of a program and need to prune such graphs later. Our specific approach to trace compilation is now in the process of being adopted widely across and beyond academia. Working with the Mozilla foundation, we incorporated our technique into the Firefox browser, starting with version 3.5. By incorporating our invention, Mozilla was able to raise Firefox's JavaScript performance by a surprising factor of 7. Our Trace Compilation technique is now being used daily by several hundred million users around the globe. Other groups of researchers that are now using trace compilation include Oracle, Adobe, Google, and Microsoft, and we are collaborating with several of these projects. In a second project, we are investigating compiler-generated software diversity as a defense mechanism against software attacks. Our solution is centered on an "App Store" containing a diversification engine (a "multicompiler") that automatically generates a unique version of every program each time that a downloader requests it. All the different versions of the same program behave in exactly the same way from the perspective of the end-user, but they implement their functionality in subtly different ways. As a result, any specific attack will succeed only on a small fraction of targets. An attacker would require a large number of different attacks and would have no way of knowing a priori which specific attack will succeed on which specific target. Equally importantly, our approach makes it much more difficult for an attacker to generate attack vectors by way of reverse engineering of security patches.
Speaker Bio Prof. Michael Franz is a Professor of Computer Science in UCI's Donald Bren School of Information and Computer Sciences, a Professor of Electrical Engineering and Computer Science (by courtesy) in UCI's Henry Samueli School of Engineering, and the director of UCI's Secure Systems and Software Laboratory. He is currently also a visiting Professor of Informatics at ETH Zurich, the Swiss Federal Institute of Technology, from which he previously received the Dr. sc. techn. (advisor: Niklaus Wirth) and the Dipl. Informatik-Ing. ETH degrees.
Date 11:00-12:00, Tuesday, December 7th
Place Gates 463a
Speaker Hongseok Yang, University of London
Title Automatic Program Analysis of Overlaid Data Structures
Abstract We call a data structure overlaid, if a node in the structure includes links for multiple data structures and these links are intended to be used at the same time. These overlaid data structures are frequently used in systems code, when implementing multiple types of indexing structures over the same set of nodes. For instance, the deadline IO scheduler of Linux has a queue whose node has links for a doubly-linked list as well as those for a red-black tree. The doubly-linked list here is used to record the order that nodes are inserted in the queue, and the red-black tree provides an efficient indexing structure on the sector fields of the nodes.

In this talk, I will describe an automatic program analysis of these overlaid data structures. The focus of the talk will be on two main issues: to represent such data structures effectively and to build an efficient yet precise program analyser, which can prove the memory safety of realistic examples, such as the Linux deadline IO scheduler. During the talk, I will explain how we addressed the first issue by the combination of standard classical conjunction and separating conjunction from separation logic. Also, I will describe how we used a meta-analysis and the dynamic insertion of ghost instructions in solving the second issue. If time permits, I will give a demo of the tool.

This is a joint work with Oukseh Lee and Rasmus Petersen.

Speaker Bio
Date 2:00-3:00, October 25
Place Gates 104
Speaker Patrick Eugster, Purdue
Title Distributed Event-based Programming in Java
Abstract The abstraction of "event" has been used for years to reason about concurrent and distributed programs, and is being increasingly used as a programming paradigm. Developing distributed event-based applications is currently challenging for programmers though as it involves integrating a number of technologies besides dealing with an abstraction that cuts across more traditional programming paradigms.

EventJava is an extension of the mainstream Java language targeting at simplifying the development of a wide range of event-based applications. In this talk, we first provide an overview of select features of the EventJava language framework and its implementation. Then we present a performance evaluation from different viewpoints. We conclude with an outlook on future work.

Speaker Bio
Date 1:00-2:00, Wednesday, October 13
Place Gates 219
Speaker Erik Meijer, Microsoft
Title Fundamentalist Functional Programming
Abstract In 1984, John Hughes wrote a seminal paper titled "Why Functional Programming Matters", in which he eloquently explained the value of pure and lazy functional programming. Due to the increasing importance of the Web and the advent of many-core machines, in the quarter of a century since the paper was written, the problems associated with imperative languages and their side effects have become increasingly evident.

This talk argues that fundamentalist functional programming-that is, radically eliminating all side effects from programming languages, including strict evaluation-is what it takes to conquer the concurrency and parallelism dragon. Programmers must embrace pure, lazy functional programmin g-with all effects apparent in the type system of the host language using monads.

A radical paradigm shift is the answer, but does that mean that all current programmers will be lost along the way? Fortunately not! By design, LINQ is based on monadic principles, and the success of LINQ proves that the world does not fear the monads.

Speaker Bio Erik Meijer is an accomplished programming-language designer who has worked on a wide range of languages, including Haskell, Mondrian, X#, C#, and Visual Basic. He runs the Cloud Languages Team in the Business Platform Division at Microsoft, where his primary focus has been to remove the impedance mismatch between databases and programming languages in the context of the Cloud. One of the fruits of these efforts is LINQ, which not only adds a native querying syntax to .NET languages, such as C# and Visual Basic, but also allows developers to query data sources other than tables, such as objects or XML. Most recently, Erik has been working on and preaching the virtues of fundamentalist functional programming in the new age of concurrency and many-core. Some people might recognize him from his brief stint as the "Head in the Box" on Microsoft VBTV. These days, you can regularly watch Erik's interviews on the "Expert to Expert" and "Going Deep" series on Channel 9.
Date 2:00-3:00, September 27
Place Gates 104
Speaker Sorin Lerner, UC San Diego
Title Strategies for Building Correct Optimizations
Abstract Program analyses and optimizations are at the core of many optimizing compilers, and their correctness in this setting is critically important because it affects the correctness of any code that is compiled. However, designing correct analyses and optimizations is difficult, error prone and time consuming. This talk will present several inter-related approaches for building correct analyses and optimizations. I will start with an approach based on a domain-specific language for expressing optimizations, combined with a checker for verifying the correctness of optimizations written in this language. This first approach still requires the programmer to write down the optimizations in full detail. I will then move on to several techniques which instead attempt to synthesize correct optimizations from a higher-level description. In particular, I will present an approach that discovers correct optimization opportunities by exploring the application of equality axioms on the program being optimized. Finally, I will present a technique that synthesizes generally applicable and correct optimization rules from concrete examples of code before and after some transformations have been performed.
Speaker Bio Sorin Lerner is an Assistant Professor in the Department of Computer Science and Engineering at the University of California, San Diego. He received his PhD in 2006 from the University of Washington, under the supervision of Craig Chambers. Before that, he received an undergraduate degree in Computer Engineering from McGill University, Montreal. His research interests lie in programming language and analysis techniques for making software systems easier to write, maintain and understand, including static program analysis, domain specific languages, compilation, formal methods and automated theorem proving. Lerner works actively at the interface of Programming Languages and Software Engineering, and frequently publishes at POPL/PLDI and ICSE/FSE. Sorin Lerner was the co-chair of the 2010 ACM SIGPLAN-SIGSOFT PASTE workshop, and is the recipient of an NSF Career Award (2007), and of the 2003 PLDI Best paper award.
Date 2:00-3:00, Wednesday, September 15
Place Gates 392
Speaker Madan Musuvathy, Microsoft Research
Title A Probabilistic Algorithm for Finding Concurrency Errors
(or How to Crash Your Program in the First Few Runs)
Abstract Unexpected thread interleavings can lead to concurrency errors that are hard to find, reproduce, and debug. In this talk, I will present a probabilistic algorithm for finding such errors. The algorithm works by randomly perturbing the timing of threads and event handlers at runtime. Every run of the algorithm finds every concurrency bug in the program with some (reasonably large) probability. Repeated runs can be used to reduce the chance of missing bugs to any desired amount. The algorithms scales to large programs and, in many cases, finds bugs in the first few runs of a program. A tool implementing this algorithm is being used to improve the concurrency testing at Microsoft for over a year.

I will also describe the relationship between this algorithm and the dimension theory of partial-orders, and how results from this field can be used to further improve the algorithm.

Speaker Bio Madan Musuvathi is a researcher at Microsoft Research interested in software verification, program analysis, and systems. Recently, he has focused on the scalable analysis of concurrent systems. He received his Ph.D. at Stanford University in 2004 and has been at Microsoft Research since.
Date 1:00-2:00, August 24
Place Gates 104
Speaker Noam Rinetzky, Queen Mary University of London
Title Verifying Linearizability with Hindsight
Abstract We present a proof of safety and linearizability of a highly-concurrent optimistic set algorithm. The key step in our proof is the Hindsight Lemma, which allows a thread to infer the existence of a global state in which its operation can be linearized based on limited local atomic observations about the shared state. The Hindsight Lemma allows us to avoid one of the most complex and non-intuitive steps in reasoning about highly concurrent algorithms: considering the linearization point of an operation to be in a different thread than the one executing it.

The Hindsight Lemma assumes that the algorithm maintains certain simple invariants which are resilient to interference, and which can themselves be verified using purely thread-local proofs. As a consequence, the lemma allows us to unlock a perhaps-surprising intuition: a high degree of interference makes non-trivial highly-concurrent algorithms in some cases much easier to verify than less concurrent ones.

Joint work with Peter W. O'Hearn (Queen Mary University of London), Martin T. Vechev (IBM T.J. Watson Research Center), Eran Yahav (IBM T.J. Watson Research Center), and Greta Yorsh (IBM T.J. Watson Research Center).

Presented in the 29th Annual ACM SIGACT-SIGOPS Symposium on Principles of Distributed Computing (PODC'10).

Link to paper: http://www.eecs.qmul.ac.uk/~maon/pubs/PODC10-hindsight.pdf

Speaker Bio
Date 2:00-3:00, Monday, August 16
Place Gates 104
Speaker Greta Yorsh, IBM Research
Title Specializing Memory Management for Concurrent Data Structures
Abstract Memory reclamation plays a central role in the design of concurrent data structures. The main challenge is to equip a particular concurrent data structure with its own custom memory reclamation, in a way that is both correct and efficient. This problem arises frequently in environments that do not support automatic memory management, but it is also relevant in the case where we want to obtain a more efficient concurrent data structure. Unfortunately, despite various proposals, the most prevalent methodologies today such as hazard pointers are not well understood, and applying them is still an ad-hoc, error-prone, difficult, and time-consuming manual process.

We propose a systematic approach to specialization of memory reclamation to a particular concurrent data structure. We start with a concurrent algorithm that is proven to behave correctly assuming automatic memory reclamation. We apply a sequence of correctness-preserving transformations to both the memory reclamation scheme and the algorithm. These transformation rely on invariants of the algorithm, computed by standard analyses and clearly illustrate why a given transformation is applied and what are the conditions under which it can be applied safely. We demonstrate our approach by systematically deriving correct and efficient custom memory reclamation for state-of-the-art concurrent data structure algorithms, including several variations of concurrent stack, queue, and set algorithms.

(joint work in progress with Martin Vechev and Eran Yahav)

Speaker Bio
Date CHANGED: 1:00-2:00 Friday, August 6
Place CHANGED: Gates 463a
Speaker Cindy Rubio Gonzalez, University of Wisconsin
Title Error Propagation Analysis for File Systems
Abstract Unchecked errors are especially pernicious in operating system file management code. Transient or permanent hardware failures are inevitable, and error-management bugs at the file system layer can cause silent, unrecoverable data corruption. Furthermore, even when developers have the best of intentions, inaccurate documentation can mislead programmers and cause software to fail in unexpected ways.

We propose an interprocedural static analysis that tracks errors as they propagate through file system code. Our implementation detects overwritten, out-of-scope, and unsaved unchecked errors. Analysis of four widely-used Linux file system implementations (CIFS, ext3, IBM JFS and ReiserFS), a relatively new file system implementation (ext4), and shared virtual file system (VFS) code uncovers 312 confirmed error propagation bugs. Our flow- and context-sensitive approach produces more precise results than related techniques while providing better diagnostic information, including possible execution paths that demonstrate each bug found.

Additionally, we use our error-propagation analysis framework to identify the error codes returned by system calls across 52 Linux file systems. We examine mismatches between documented and actual error codes returned by 42 Linux file-related system calls. Comparing analysis results with Linux manual pages reveals over 1,700 undocumented error-code instances affecting all file systems and system calls examined.

Speaker Bio
Date 2:00-3:00, Thursday, July 22
Place Gates 104
Speaker Byron Cook, Microsoft Research, Cambridge
Title New methods for proving temporal properties of infinite-state systems
Abstract I will describe some new methods of proving temporal properties of infinite-state programs. Our approach takes advantage of the fact that linear-temporal properties can often be proved more efficiently using proof techniques usually associated with the branching-time logic CTL. The caveat is that, in certain instances, nondeterminism in the system's transition relation can cause CTL methods to report counterexamples that are spurious in LTL. To address this problem we describe an algorithm that, as it attempts to apply CTL proof methods, finds and then removes problematic nondeterminism via an analysis on the spurious counterexamples. We must also develop CTL symbolic model checking tools for infinite-state systems.
Speaker Bio Dr. Byron Cook is a Principal Researcher at Microsoft Research in Cambridge, UK as well as Professor of Computer Science at Queen Mary, University of London. He is one of the developers of the Terminator program termination proving tool, as well as the SLAM software model checker. Before joining Microsoft Research he was a developer in the Windows OS kernel group. See research.microsoft.com/~bycook/ for more information.
Date 1:30-2:30, Tuesday, June 1
Place 463a
Speaker Sanjit Seshia, UC Berkeley
Title Integrating Induction and Deduction for Verification and Synthesis
Abstract Even with impressive advances in formal methods over the last few decades, some problems in automatic verification and synthesis remain challenging. Examples include the verification of quantitative properties of software such as execution time, and certain program synthesis problems. In this talk, I will present a new approach to automatic verification and synthesis based on a combination of inductive methods (learning from examples), and deductive methods (based on logical inference and constraint solving).

Our approach integrates verification techniques such as satisfiability solving and theorem proving (SAT/SMT), numerical simulation, and fixpoint computation with inductive inference methods including game-theoretic online learning, learning Boolean functions and learning polyhedra. My talk will illustrate this combination of inductive and deductive reasoning for three problems: (i) program synthesis applied to malware deobfuscation; (ii) the verification of execution time properties of embedded software, and (briefly) (iii) the synthesis of switching logic for hybrid systems.

Speaker Bio Sanjit A. Seshia is an assistant professor in the Department of Electrical Engineering and Computer Sciences at the University of California, Berkeley. He received an M.S. and a Ph.D. in Computer Science from Carnegie Mellon University, and a B.Tech. in Computer Science and Engineering from the Indian Institute of Technology, Bombay. His research interests are in dependable computing and computational logic, with a current focus on applying automated formal methods to problems in embedded systems, computer security, and electronic design automation. He has received a Presidential Early Career Award for Scientists and Engineers (PECASE), an Alfred P. Sloan Research Fellowship, and the School of Computer Science Distinguished Dissertation Award at Carnegie Mellon University.
Date 2:00-3:00, Tuesday, May 11th
Place Gates 104
Speaker Mark Wegman, IBM Research
Title Managing Businesses that Design
Abstract Software development is fundamentally a design process. The quality of the eventual outcome depends on how well people can come together to create a pleasing design. Different organizations may be vastly better or worse than others in how they create designs. Understanding how what an organization does well is similar in many ways to debugging a program. You instrument it, subject to concerns about privacy. The instrumentation can be done via the tools that we use to build software as that's what people use in the organization. Given the needs of the organization to change, those tools may also need to change. The analysis they do on the Software artifacts may change as well. This is new work in an attempt to define a vision of a potential new science on the management of design. It should be noted that we are not advocating that the best management is more intrusive management -- sometimes the best management recognizes that to accomplish what is needed people need to take more risks up front.
Speaker Bio
Date 1:30-2:30, Wednesday, May 12
Place Gates 104
Speaker John Field, IBM Research
Title The Thorn Programming Language: Robust Distributed Scripting
Abstract Scripting languages enjoy great popularity due to their support for rapid and exploratory development. They typically have lightweight syntax, weak data privacy, dynamic typing, and powerful aggregate data types. The price of these features comes later in the software life cycle. Scripts are hard to evolve and compose, and often slow. An additional weakness of m