Distributed, Parallel, and Concurrent Programming

Scheme Dialects for Distributed Programming

Germain, Guillaume and Feeley, Marc and Monnier, Stefan. Concurrency oriented programming in Termite Scheme. 2006 Workshop on Scheme and Functional Programming. September 2006. [PDF] [PDF] [PDF]

Termite Scheme is a variant of Scheme intended for distributed computing. It offers a simple and powerful concurrency model, inspired by the Erlang programming language, which is based on a message-passing model of concurrency. Our system is well suited for building custom protocols and abstractions for distributed computation. Its open network model allows for the building of non-centralized distributed applications. The possibility of failure is reflected in the model, and ways to handle failure are available in the language. We exploit the existence of first class continuations in order to allow the expression of high-level concepts such as process migration. We describe the Termite model and its implications, how it compares to Erlang, and describe sample applications built with Termite. We conclude with a discussion of the current implementation and its performance.

Piérard, Adrien and Feeley, Marc. Towards a portable and mobile Scheme interpreter. Proceedings of the Scheme and Functional Programming Workshop. 2007. [PDF] [PDF]

The transfer of program data between the nodes of a distributed system is a fundamental operation. It usually requires some form of data serialization. For a functional language such as Scheme it is clearly desirable to also allow the unrestricted transfer of functions between nodes. With the goal of developing a portable implementation of the Termite system we have designed the Mobit Scheme interpreter which supports unrestricted serialization of Scheme objects, including procedures and continuations. Mobit is derived from an existing Scheme in Scheme fast interpreter. We demonstrate how macros were valuable in transforming the interpreter while preserving its structure and maintainability. Our performance evaluation shows that the run time speed of Mobit is comparable to existing Scheme interpreters.

Fuchs, Matthew. Dreme: for Life in the Net. 1995. [PDF] [PDF] [PS]

This dissertation makes four contributions towards supporting distributed, multiuser applications over open networks. Dreme, a distributed dialect of the Scheme language in which all first-class language objects are mobile in the network. In particular, various distributed topologies, such as client/server and peer-to-peer, can be created by migrating closures with overlapping scopes around the network, correct inter-process communication being assured by Scheme's lexical scoping rules and network wide addressing. Threads of control are passed around through first-class distributed continuations. A User Interface toolkit for coordinating events in multi-threaded, multi-user applications by organizing continuation callbacks into nested lexical scopes. Each event has certain attributes, such as synchronous/asynchronous. Certain events create new scopes with new events. Continuation callbacks allow both synchronous events which return values to their callers, and asynchronous ones. Application needn't be spread throughout the application, as with applications using an event-loop. A distributed garbage collection algorithm that collects all cycles on an open network. The basic algorithm depends on maintaining the inverse reference graph (IRG) among network nodes (i.e., if a->b is in the regular graph, b->a is in the IRG). A single IRG traversal from any object determines the status of each object touched. Communication is decentralized (any object can choose to determine its status), garbage is touched O(1) times (in the absence of failures), it is fault-tolerant, and can handle malicious or faulty neighbors. Each operation uses messages linear in the size of the IRG. Overlapping operations perform like parallel quick sort. An approach to using the Standard Generalized Markup Language (SGML) over the network to support distributed GUIs, intelligent clients, and mobile agents. SGML is a meta-grammar for creating domain specific document markup languages to which a variety of semantics (display, reading/writing databases, etc.) can be applied. The document, its grammar, and some semantics, are retrieved over the network. Applications normally create interfaces directly out of graphic objects to communicate with the user. However, if the interface has some semantics (and is parsable), a computational agent can interpret the interface and talk directly to the application on behalf of the human.

Bawden, Alan. Implementing distributed systems using linear naming. 1993. [PDF]

Linear graph reduction is a simple computational model in which the cost of naming things is explicitly represented. The key idea is the notion of linearity. A name is linear if it is only used once, so with linear naming you cannot create more than one outstanding reference to an entity. As a result, linear naming is cheap to support and easy to reason about. Programs can be translated into the linear graph reduction model such that linear names in the program are implemented directly as linear names in the model. Nonlinear names are supported by constructing them out of linear names. The translation thus exposes those places where the program uses names in expensive, nonlinear ways. Two applications demonstrate the utility of using linear graph reduction: First, in the area of distributed computing, linear naming makes it easy to support cheap cross-network references and highly portable data structures, Linear naming also facilitates demand driven migration of tasks and data around the network without requiring explicit guidance from the programmer. Second, linear graph reduction reveals a new characterization of the phenomenon of state. Systems in which state appears are those which depend on certain global system properties. State is not a localizable phenomenon, which suggests that our usual object oriented metaphor for state is awed.

Moreau, Luc and De Roure, David and Foster, Ian. NeXeme: A distributed Scheme based on Nexus. European Conference on Parallel Processing. August 1997. [PDF] [PS]

The remote service request, a form of remote procedure call, and the global pointer, a global naming mechanism, are two features at the heart of Nexus, a library for building distributed systems. NeXeme is an extension of Scheme that fully integrates both concepts in a mostly-functional framework, hence providing an expressive language for distributed computing. This paper presents a semantics for this Scheme extension, and also describes a NeXeme implementation, including its distributed garbage collector.

Moreau, Luc and Queinnec, Christian. Design and Semantics of Quantum: A Language to Control Resource Consumption in Distributed Computing. Usenix Conference on Domain-Specific Languages (DSL'97). October 1997. [PDF] [PS]

This paper describes the semantics of Quantum, a language that was specifically designed to control resource consumption of distributed computations, such as mobile agent style applications. In Quantum, computations can be driven by mastering their resource consumption. Resources can be understood as processors cycles, geographical expansion, bandwidth or duration of communications, etc. We adopt a generic view by saying that computations need energy to be performed. Quantum relies on three new primitives that deal with energy. The first primitive creates a tank of energy associated with a computation. Asynchronous notifications inform the user of energy exhaustion and computation termination. The other two primitives allow us to implement suspension and resumption of computations by emptying a tank and by supplying more energy to a tank. The semantics takes the form of an abstract machine with explicit parallelism and energy-related primitives.

Moreau, Luc. NeXeme: A distributed Scheme based on Nexus (Reference Manual and User's Guide). 1997. [PDF] [PS]

The remote service request, a form of remote procedure call, and the global pointer, a global naming mechanism, are two features at the heart of Nexus, a library for building distributed systems. NeXeme is an extension of Scheme that fully integrates both concepts in a mostly-functional framework, hence providing an expressive language for distributed computing. This document is both NeXeme reference manual and user's guide.

Queinnec, Christian. DMeroon: A Distributed Class-based Causally-Coherent Data Model - General documentation - Revision: 1.64. 1998. [PS] [PS] [PS]

DMeroon provides a data model above a coherently distributed shared memory. DMeroon allows multiple users to statically or dynamically create new classes hierarchically organized, to dynamically instantiate these classes and to dynamically and coherently share the resulting instances over a network. DMeroon automatically takes care of representation and alignment, marshaling and unmarshaling objects, migrating and sharing objects, local and global garbage collections. This document describes DMeroon, its philosophy of design, its architecture and principles of implementation, and its bindings with various languages. It also presents some internal details within DMeroon or some applications above DMeroon. This document tries to present the overlines of DMeroon, in places, it describes features which are not implemented, in some other places there are implemented features that are not documented. I packed it up in order for interested people to get an idea and, perhaps, induce them to pursue my effort or definitively convince me of its little value. I have a lot of lectures to prepare for the following months and will not be able to devote much time to DMeroon.

Queinnec, Christian. Distributed generic functions. Proc. 1997 France-Japan Workshop on Object-Based Parallel and Distributed Computing. October 1997. [PDF] [PS]

The network now gives the opportunity to combine code and data from everywhere in the world. However, the dominant paradigm is the client/server model where immobile objects with static interfaces can only be used as prescribed by their proprietary site. While this constraint corresponds to understandable industrial programming practices, it negates the point of view of dynamic clients that collect interesting objects and want to confer new behaviors to these collected objects. How to enrich objects "from the outside" that is, without access to their source code hence without re-compilation of their defining classes, is the problem addressed by this paper. Generic functions, à la CLOS, separate classes from behaviors i.e., methods. Roughly said, a generic function is a bag of methods; when a generic function is invoked, the nature (type, class or, structure) of its argument(s) triggers the choice of an appropriate method. Methods are no longer the exclusive property of classes, they are regular functions anyone may define outside classes definitions. This paper describes how generic functions may be conveniently and not so inefficiently implemented in a distributed world. Section 1 presents some of the constraints of distributed systems. Section 2 recalls the framework of generic functions as well as how we extend them to the distributed world. Significantly, we address the problem of mutual recursion over a bag of methods to which new methods may be adjoined at run-time. We also propose a new feature: call-former-method. Section 3 discusses implementation while Section 4 eventually discusses the incorporation of these results in a real system.

Queinnec, Christian. Bribes de DMeroon. May 1996

Cet article est un survol fragmentaire de DMeroon, un projet inachevé de recherche visant à l'écriture d'une bibliothèque multilingue procurant un modèle de mémoire dynamique, distribuée et partagée. Rien n'est original dans cet article fors l'emploi du français. Son contenu provient de 17 et 18 amendé de réflexions décousues mais motivées par l'écriture des programmes composant cette bibliothèque.

Queinnec, Christian. DMeroon: overview of a distributed class-based causally-coherent data model. International Workshop on Parallel Symbolic Languages and Systems. 1995. [PDF] [PS]

DMeroon is a library of C functions that provides a data model above a coherently distributed shared memory. DMeroon allows users to statically or dynamically create new classes, to dynamically instantiate these classes and to dynamically and coherently share the resulting instances over a network. DMeroon automatically takes care of representation and alignment, migrating and sharing objects, local and global garbage collections. This document provides an overview of DMeroon.

Queinnec, Christian and De Roure, David. Design of a concurrent and distributed language. US/Japan Workshop on Parallel Symbolic Computing. 1992. [PDF] [PS]

This paper presents a new dialect of Scheme aimed towards concurrency and distribution. It offers a few primitives, including first-class continuations, with very simple semantics. Numerous examples are given showing how to program the classical concurrent control operators such as future, pcall and either. The implementation is sketched and presented along the lines of a metacircular interpreter.

Queinnec, Christian. A concurrent and distributed extension of Scheme. International Conference on Parallel Architectures and Languages Europe. 1992. [PDF] [PS]

The Lisp family of languages has traditionally been a privileged domain where linguistic experiments were done, this paper presents a new dialect offering concurrency and distribution. This dialect, nick-named CD-Scheme, has been designed above Scheme with as few as possible features to allow a great expressiveness but still to retain the original consistency and simplicity of Scheme. We explicitly show how common parallel constructs can be written in regular CD-Scheme. A denotational semantics is also presented that expresses the detailed meaning of assignment, data mutation, continuations in presence of concurrency and distribution. This semantics offers a basis to understand new proposals of concurrent or distributed features and may be used to justify compiler optimizations or implementation techniques. The proposed set of features of CD-Scheme can be also used to extend languages other than Scheme.

Piquer, José and Queinnec, Christian. TransPive: A distributed Lisp system. 1992. [PDF]

This paper exposes an overview of a distributed Lisp system, called TransPive, designed to run on a loosely-coupled multi-processor system. The main goal of the system is to provide a transparent distributed programming environment, sharing data structures using remote pointers, as a platform to prototype distributed algorithms and applications. The TransPive message passing primitives create the remote pointers, with local caches for the object's value (to improve performance of read accesses). A protocol to invalidate the caches is invoked when a modifier is applied. A Distributed Garbage Collector is included on TransPive to reclaim distant pointed objects. A simple mechanism to distribute the computation over the system is also provided. All the mechanisms have been implemented without any operating system support (as virtual memory), and so they are portable to any reliable message-passing distributed environment. In particular, the first version has been implemented on a Transputer network. The paper is an overview of the remote pointer mechanism, the memory structures and protocols used to share data, the garbage collector and the remote process creation primitives.

Jagannathan, Suresh. Communication-passing style for coordination languages. International Conference on Coordination Languages and Models. 1997. [PDF] [PS]

Coordination languages for parallel and distributed systems specify mechanisms for creating tasks and communicating data among them. These languages typically assume that (a) once a task begins execution on some processor, it will remain resident on that processor throughout its lifetime, and (b) communicating shared data among tasks is through some form of message-passing and data migration. In this paper, we investigate an alternative approach to understanding coordination. Communication-passing style (CmPS) refers to a coordination semantics in which data communication is always undertaken by migrating the continuation of the task requiring the data to the processor where the data resides. Communication-passing style is closely related to continuation-passing style (CPS), a useful transformation for compiling functional languages. Just as CPS eliminates implicit call-return sequences, CmPS eliminates implicit inter-processor data communication and synchronization requests. In a CmPS-transformed program, only continuations (i.e., control contexts) are transmitted across machines; all synchronization and data communication occurs locally. Besides providing significant optimization opportunities, CmPS is a natural representation for implementations on networks of workstations. This paper presents an operational semantics for a coordination language that supports first-class distributed data repositories. The computation sublanguage considered is an untyped call-by-value functional language similar to pure Scheme. Optimizations and implementation issues that arise from using a CmPS-driven coordination language are also described.

Cejtin, Henry and Jagannathan, Suresh and Kelsey, Richard. Higher-order distributed objects. 1995. [PDF] [PS]

We describe a distributed implementation of Scheme that permits efficient transmission of higherorder objects such as closures and continuations. The integration of distributed communication facilities within a higher-order programming language engenders a number of new abstractions and paradigms for distributed computing. Among these are user-specified load-balancing and migration policies for threads, incrementally linked distributed computations, and parameterized client-server applications. To our knowledge, this is the first distributed dialect of Scheme (or a related language) that addresses lightweight communication abstractions for higher-order objects.

Dwyer, Rex A and Dybvig, R Kent. A SCHEME for distributed processes. 1981

Distributed Garbage Collection

Moreau, Luc. Correctness of a distributed-memory model for Scheme. European Conference on Parallel Processing. August 1996. [PDF] [PS]

We propose a high-level approach to program distributed applications; it is based on the annotation future by which the programmer specifies which expressions may be evaluated remotely in parallel. We present the CEKDS-Machine, an abstract machine with a distributed memory, able to evaluate Scheme-like future-based programs. In this paper, we focus on the issue of task migration and prove that task migration is transparent to the user, i.e. task migration does not change the observable behaviour of programs.

Moreau, Luc and DeRoure, David. A Distributed Garbage Collector for NeXeme. 1997. [PDF] [PS]

The remote service request, a form of remote procedure call, and the global pointer, a global naming mechanism, are two features at the heart of Nexus, a library to build distributed systems. NeXeme is an extension of Scheme that fully integrates both concepts in a mostly-functional framework. This short paper describes the distributed garbage collector that we implemented in NeXeme.

Moreau, Luc. A distributed garbage collector with diffusion tree reorganisation and mobile objects. Proceedings of the third ACM SIGPLAN international conference on Functional programming. September 1998. [PDF] [PS]

We present a new distributed garbage collection algorithm that is able to reorganise diffusion trees and to support mobile objects. It has a modular design comprising three components: a reliable transport mechanism, a reference-counting based distributed garbage collector for non-mobile objects, and an extra layer that provides mobility. The algorithm is formalised by an abstract machine and is proved to be correct. The safety property ensures that an object may not be reclaimed as long as it is referred to locally or remotely. The liveness property guarantees that unreachable objects will eventually be reclaimed. The mobility property certifies that messages are always forwarded towards more recent mobile object positions.

Moreau, Luc. Hierarchical distributed reference counting. Proceedings of the First ACM SIGPLAN International Symposium on Memory Management (ISMM'98). October 1998. [PDF] [PS]

Massively distributed computing is a challenging problem for garbage collection algorithm designers as it raises the issue of scalability. The high number of hosts involved in a computation can require large tables for reference listing, whereas the lack of information sharing between hosts in a same locality can entail redundant GC traffic. In this paper, we argue that a conceptual hierarchical organisation of massive distributed computations can solve this problem. By conceptual hierarchical organisation, we mean that processors are still able to communicate in a peer to peer manner using their usual communication mechanism, but GC messages will be routed as if processors were organised in hierarchy. We present an extension of a distributed reference counting algorithm that uses such a hierarchical organisation. It allows us to bound table sizes by the number of hosts in a domain, and it allows us to share GC information between hosts in a same locality in order to reduce cross-network GC traffic.

Other Topics in Distributed Compututing

Cowley, Anthony and Taylor, CJ. Distributed Software Transactional Memory. [PDF]

This report describes an implementation of a distributed software transactional memory (DSTM) system in PLT Scheme. The system is built using PLT Scheme's Unit construct to encapsulate the various concerns of the system, and allow for multiple communication layer backends. The front-end API exposes true parallel processing to PLT Scheme programmers, as well as cluster-based computing using a shared namespace for transactional variables. The ramifications of the availability of such a system are considered in the novel context of highly dynamic robot swarm programming scenarios. In robotics programming scenarios, difficulty with expressing complex distributed computing patterns often supersedes raw performance in importance. In fact, for many applications the data to be shared among networked peers is relatively small in size, but the manner in which data sharing is expressed leads to tremendous inefficiencies both at development time and runtime. In an effort to maintain focus on behavior specification, we reduce the emphasis on messaging protocols typically found in distributed robotics software, while providing even greater flexibility in terms of how data is mixed and matched as it moves over the network.

Kimball, Aaron and Grossman, Dan. Software transactions meet first-class continuations. 8th Annual Workshop on Scheme and Functional Programming. 2007. [PDF]

Software transactions are a promising technology that make writing correct and efficient shared-memory multithreaded programs easier, but adding transactions to programming languages requires defining and implementing how they interact with existing language features. In this work, we consider how transactions interact with first-class continuations. We demonstrate that different idiomatic uses of continuations require different transactional semantics, so a language supporting transactions and call-with-current-continuation should provide programmers with a way to control these semantics. We present a design meeting this need, addressing both escaping from and reentering the dynamic extent of a transaction. We have implemented our design by modifying Scheme48. We present the most interesting details of the implementation and its performance on some small benchmarks.

Epardaud, Stéphane. Mobile reactive programming in ULM. 2004. [PDF]

We present the embedding of ULM in Scheme and an implementation of a compiler and virtual machine for it. ULM is a core programming model that allows multi-threaded and distributed programming via strong mobility with a deterministic semantics. We present the multi-threading and distributed primitives of ULM step by step using examples. The introduction of mobility in a Scheme language raises questions about the semantics of variables with respect to migration. We expose the problems and offer two solutions alongside ULM’s network references. We also present our implementation of the compiler, virtual machine and the concurrent threading library written in Scheme.

Wittenberger, J. ASKEMOS: A distributed settlement. 2002. [PDF]

This paper presents Askemos, an autonomous, distributed operating system on top of peer to peer networks which significantly raises the level of abstraction in comparison with today’s operating systems. Askemos addresses safe, secure and correct (forge proof) information processing while securing intellectual property in an innovative way. Askemos defines a virtual machine on document level, which is defined in terms of abstract trees and pure functional transformation of them, both described in XML. This virtual machine has no physical representation at any single machine. Instead it works distributed among independent components which appear as if they observed it. To achieve that effect, the participating machines compute the process steps of the virtual machine independent and vote among each other about the correct result. To prevent illegal attacks, there exists no concept of unique resources like superuser rights or unique name spaces.

Dionne, Carl and Feeley, Marc and Desbien, Jocelyn. A Taxonomy of Distributed Debuggers Based on Execution Replay.. PDPTA. 1996. [PDF]

This paper presents a taxonomy of parallel and distributed debuggers based on execution replay. Programming of distributed and parallel systems is a complex task. Amongst the many factors contributing to this complexity, the nondeterminacy of these systems is an important one. Execution replay is a technique developed to facilitate the debugging of nondeterministic programs. Execution replay has very broad applications and not every algorithm is applicable in every situation. This taxonomy provides a precise classification of replay debuggers using nine criteria. From this classification, it is easier to determine a debugger's scope of application, outline its strengths and weaknesses and compare it with others. This taxonomy is illustrated and validated using a collection of existing replay debuggers.

Feeley, Marc. Lazy remote procedure call and its implementation in a parallel variant of C. International Workshop on Parallel Symbolic Languages and Systems. 1995. [PDF] [PDF] [PS]

Lazy task creation (LTC) is an efficient approach for executing divide and conquer parallel programs that has been used in the implementation of Multilisp's future construct. Unfortunately it requires a specialized memory management scheme, in particular for stack frames, which makes it hard to use in the context of conventional languages. We have designed a variant of LTC which has a stack management discipline that is compatible with the semantics of conventional languages. This mechanism, which we call lazy remote procedure call, has been used to implement a parallel variant of C. A first prototype of our system has been ported to shared-memory multiprocessors and network of workstations. Experimental results on a Cray T3D multiprocessor show that good performance can be achieved on several symbolic programs.

Vitek, Jan and Serrano, Manuel and Thanos, Dimitri. Security and communication in mobile object systems. International Workshop on Mobile Object Systems. 1996. [PDF]

Sumii, Eijiro. An implementation of transparent migration on standard Scheme. Proceedings of the Workshop on Scheme and Functional Programming, Technical Report 00-368, Rice University. September 2000. [PDF]

I present a handy (though somewhat restrictive) way to implement mobile computation à la Telescript on top of standard Scheme.

Moreau, Luc and Queinnec, Christian. On the finiteness of resources in distributed computing. 1997. [PDF]

Millions of computers are now connected together by the Internet. At a fast pace, applications are taking profit of these new capabilities, and become parallel and distributed, e.g. applets on the WWW or agent technology. As we live in a world with finite resources, an important challenge is to be able to control computations in such an environment. For instance, a user might like to suspend a computation because another one seems to be more promising. In this paper, we present a paradigm that allows the programmer to monitor and control computations, whether parallel or distributed, by mastering their resource consumption.

Moreau, Luc and Queinnec, Christian. Distributed computations driven by resource consumption. Proceedings of the 1998 International Conference on Computer Languages (Cat. No. 98CB36225). 1998. [PDF] [PDF]

Millions of computers are now connected together by the Internet. At a fast pace, applications are taking advantage of these new capabilities, and are becoming parallel and distributed, e.g. applets on the WWW or agent technology. As we live in a world with finite resources, an important challenge is to be able to control computations in such an environment. For instance, a user might like to suspend a computation because another one seems to be more promising. In this paper, we present a paradigm that allows the programmer to monitor and control computations, whether parallel or distributed, by mastering their resource consumption. We describe an implementation on top of the thread library PPCR and the message-passing library Nexus.

Philbin, James and Jagannathan, Suresh and Mirani, Rajiv. Virtual topologies: A new concurrency abstraction for high-level parallel languages. International Workshop on Languages and Compilers for Parallel Computing. 1995. [PDF] [PS]

We present a new concurrency abstraction and implementation technique for high-level (symbolic) parallel languages that allows significant programmer control over load-balancing and mapping of fine-grained lightweight threads. Central to our proposal is the notion of a virtual topology. A virtual topology defines a relation over a collection of virtual processors, and a mapping of those processors to a set of physical processors; processor topologies configured as trees, graphs, butterflies, and meshes are some well-known examples. A virtual topology need not have any correlation with a physical one; it is intended to capture the interconnection structure best suited for a given algorithm. We consider a virtual processor to be an abstraction that defines scheduling, migration and load-balancing policies for the threads it executes. Thus, virtual topologies are intended to provide a simple, expressive and efficient high-level framework for defining complex thread/processor mappings that abstracts low-level details of a physical interconnection.

Jagannathan, Suresh. TS/Scheme: Distributed data structures in Lisp. 1994. [PDF] [PS]

We describe a parallel object-oriented dialect of Scheme called TS/Scheme that provides a simple and expressive interface for building asynchronous parallel programs. The main component in TS/Scheme's coordination framework is an abstraction that serves the role of a distributed data structure. Distributed data structures are an extension of conventional data structures insofar as many tasks may simultaneously access and update their contents according to a well-defined serialization protocol. The semantics of these structures also specifies that consumers which attempt to access an as-of-yet undefined element are to block until a producer provides a value. TS/Scheme permits the construction of two basic kinds of distributed data structures, those accessed by content, and those accessed by name. These structures can be further specialized and composed to yield a number of other synchronization abstractions. Our intention is to provide an efficient medium for expressing concurrency and synchronization that is amenable to modular programming, and which can be used to succinctly and efficiently describe a variety of diverse concurrency paradigms useful for parallel symbolic computing.

Jagannathan, Suresh. Locality abstractions for parallel and distributed computing. International Workshop on Theory and Practice of Parallel Programming. 1994. [PDF] [PS]

Temporal and spatial locality are signi cant concerns in the design and implementation of any realistic parallel or distributed computing system. Temporal locality is concerned with relations among objects that share similar lifetimes and birth dates; spatial locality is concerned with relations among ob jects that share information. Exploiting temporal locality can lead to improved memory behavior; exploiting spatial locality can lead to improved communication behavior. Linguistic, compiler, and runtime support for locality issues is especially important for unstructured symbolic computations in which lifetimes and sharing properties of ob jects are not readily apparent. Language abstractions for spatial and temporal locality include mechanisms for grouping related threads of control, allowing programs exibility to map computations onto virtual processors, reusing dynamic contexts efficiently, and permitting asynchronous garbage collection across multiple processors. These abstractions give users and implementations a large degree of mobility to exploit inherent locality properties found within many dynamic parallel applications. We have investigated a number of these abstractions within a high-level language framework and within compilers targeted for such a framework. In this paper, we discuss several of these abstractions and justify their importance

Jagannathan, Suresh. Expressing Fine-Grained Parallelism Using Distributed Data Structures. [PDF] [PS]

Despite the fact that distributed data structures have long been advocated as a powerful tool for specifying communication and synchronization constraints in explicitly parallel languages, their utility in expressing ne-grained parallelism has thus far been limited. In this paper, we consider semantic analysis techniques and runtime support mechanisms that can be used to make distributed data structures an efficient device in this regard. We consider a compiler/runtime system for a higher-order expression-based parallel language supporting distributed data structures that relies on (1) a type inference system to answer questions regarding how shared distributed ob jects should be represented and partitioned, and (2) a high-level runtime kernel to e ectively manage and schedule dynamically instantiated lightweight threads of control. We argue that a tightly integrated system of this kind can be used to efficiently implement a variety of ne-grained parallel applications.

Jagannathan, Suresh. Customization of first-class tuple-spaces in a higher-order language. Parle'91 Parallel Architectures and Languages Europe. 1991. [PDF]

A distributed data structure is an object which permits many producers to augment or modify its contents, and many consumers simultaneously to access its component elements. Synchronization is implicit in data structure access: a process that requests an element which has not yet been generated blocks until a producer creates it. In this paper, we describe a parallel programming language (called T S) whose fundamental communication device is a significant generalization of the tuple-space distributed data structure found in the Linda coordination language. Our sequential base language is a dialect of Scheme. Beyond the fact that T S is derived by incorporating a tuple-space coordination language into a higher-order computation language (i.e., Scheme), T S differs from other tuple-space languages in two important ways: * Tuple-spaces are first-class objects. They may be dynamically created, bound to names, passed as arguments to (or returned as results from) functions, and built into other data structures or tuples. * The behavior of tuple-spaces may be customized. A tuple-space is manipulated via a policy closure that specifies its operational characteristics. The representation of policy closures take significant advantage of Scheme's support for higher-order functions; there is no fundamental extension to Scheme needed in order to support them. We argue that rst-class customizable tuple-spaces provide an expressive and exible medium for building parallel programs in higher-order languages.

Jagannathan, Suresh. Optimizing Analysis for First-Class Tuple-Spaces. 1991. [PDF] [PS]

This paper considers the design and optimization of a simple asynchronous parallel language that uses first-class tuple-spaces as its main communication and process creation device. Our proposed kernel language differs from other tuple-space languages insofar tuple-spaces are treated as true first-class objects. Moreover, we develop a formal framework for constructing an optimizing preprocessor for such a language. The semantic analysis is based on an inference engine that statically computes the set of tuples (and their structural attributes) that can occupy any given tuple-space. The inference system is non-trivial insofar as it operates in the presence of higher-order functions and non-at data structures (e.g, lists). The result of the inference procedure can be used to customize the representation of tuple-space objects.

Queinnec, Christian. Marshaling/demarshaling as a compilation/interpretation process. Proceedings 13th International Parallel Processing Symposium and 10th Symposium on Parallel and Distributed Processing. IPPS/SPDP 1999. 1999. [PDF]

Sérialisation-désérialisation en DMeroon. November 1997. [PDF] [PS]

Queinnec, Christian. Sharing mutable objects and controlling groups of tasks in a concurrent and distributed language. International Workshop on Theory and Practice of Parallel Programming. 1994. [PDF] [PDF]

This paper presents: (i) an operational semantics, based on a functional framework, for a concurrent and distributed extension of the Scheme programming language, (ii) a coherency protocol taking care of shared mutable objects, (iii) a new coherency protocol to imperatively control hierarchical groups of cooperating tasks. These two protocols do not require broadcast, nor FIFO communications, nor a centralized machine; they allow to manage an unbound number of shared mutable values and groups of tasks. This paper also advocates for the use of a functional continuationbased presentation for these protocols.

Extension of Scheme for Parallel and Concurrent Programming

Ciabrini, Damien. Debugging scheme fair threads. 2004. [PDF]

There are two main policies for scheduling thread-based concurrent programs: preemptive scheduling and cooperative scheduling. The former is known to be difficult to debug, because it is usually non-deterministic and can lead to data races or difficult thread synchronization. We believe the latter is a better model when it comes to debugging programs. In this paper, we discuss the debugging of Scheme Fair Threads, that are based on cooperative scheduling and synchronous reactive programming. In this approach, thread communication and synchronization is achieved by means of special primitives called signals, which ease the debugging process. We present the tools we have implemented to deal with the main types of concurrent bugs that can arise in this special programming framework.

Tinker, Pete and Katz, Morry. Parallel execution of sequential Scheme with ParaTran. Proceedings of the 1988 ACM Conference on LISP and Functional Programming. 1988

This paper describes a system called ParaTran for executing sequential Scheme in parallel. It supports arbitrary side effects without requiring user annotations. The ParaTran runtime system detects and corrects data dependency violations using an automatic history and rollback mechanism. ParaTran is first described by analogy with Time Warp, a system for distributed simulation; this description is followed by a discussion of ParaTran's implementation and presentation of preliminary results.

Halstead Jr, Robert H. Multilisp: A language for concurrent symbolic computation. 1985. [PDF]

Multilisp is a version of the Lisp dialect Scheme extended with constructs for parallel execution. Like Scheme, Multilisp is oriented toward symbolic computation. Unlike some parallel programming languages, Multilisp incorporates constructs for causing side effects and for explicitly introducing parallelism. The potential complexity of dealing with side effects in a parallel context is mitigated by the nature of the parallelism constructs and by support for abstract data types: a recommended Multilisp programming style is presented which, if followed, should lead to highly parallel, easily understandable programs. Multilisp is being implemented on the 32-processor Concert multiprocessor; however, it is ultimately intended for use on larger multiprocessors. The current implementation, called Concert Multilisp, is complete enough to run the Multilisp compiler itself and has been run on Concert prototypes including up to eight processors. Concert Multilisp uses novel techniques for task scheduling and garbage collection. The task scheduler helps control excessive resource utilization by means of an unfair scheduling policy; the garbage collector uses a multiprocessor algorithm based on the incremental garbage collector of Baker.

Kranz, David A and Halstead Jr, Robert H and Mohr, Eric. Mul-T: A high-performance parallel Lisp. Proceedings of the ACM SIGPLAN 1989 Conference on Programming language design and implementation. 1989. [PDF]

Mul-T is a parallel Lisp system, based on Multilisp's future construct, that has been developed to run on an Encore Multimax multiprocessor. Mul-T is an extended version of the Yale T system and uses the T system's ORBIT compiler to achieve "production quality" performance on stock hardware - about 100 times faster than Multilisp. Mul-T shows that futures can be implemented cheaply enough to be useful in a production-quality system. Mul-T is fully operational, including a user interface that supports managing groups of parallel tasks.

Mohr, Eric and Kranz, David A and Halstead Jr, Robert H. Lazy task creation: A technique for increasing the granularity of parallel programs. Proceedings of the 1990 ACM conference on LISP and functional programming. 1990. [PDF]

Many parallel algorithms are naturally expressed at a fine level of granularity, often finer than a MIMD parallel system can exploit efficiently. Most builders of parallel systems have looked to either the programmer or a parallelizing compiler to increase the granularity of such algorithms. In this paper we explore a third approach to the granularity problem by analyzing two strategies for combining parallel tasks dynamically at run-time. We reject the simpler load-based inlining method, where tasks are combined based on dynamic load level, in favor of the safer and more robust lazy task creation method, where tasks are created only retroactively as processing resources become available. These strategies grew out of work on Mul-T, an efficient parallel implementation of Scheme, but could be used with other applicative languages as well. We describe our Mul-T implementations of lazy task creation for two contrasting machines, and present performance statistics which show the method's effectiveness. Lazy task creation allows efficient execution of naturally expressed algorithms of a substantially finer grain than possible with previous parallel Lisp systems.

Halstead Jr, Robert H. Implementation of Multilisp: Lisp on a multiprocessor. Proceedings of the 1984 ACM Symposium on LISP and functional programming. 1984

Multilisp is an extension of Lisp (more specifically, of the Lisp dialect Scheme) with additional operators and additional semantics to deal with parallel execution. It is being implemented on the 32-processor Concert multiprocessor. The current implementation is complete enough to run the Multilisp compiler itself, and has been run on Concert prototypes including up to four processors. Novel techniques are used for task scheduling and garbage collection. The task scheduler helps control excessive resource utilization by means of an unfair scheduling policy: the garbage collector uses a multiprocessor algorithm modeled after the incremental garbage collector of Baker. A companion paper discusses language design issues relating to Multilisp.

Halstead Jr, Robert H and Loaiza, Juan R. Exception Handling in Multilisp.. ICPP. August 1985

Halstead Jr, Robert H. Overview of concert multilisp: a multiprocessor symbolic computing system. 1987. [PDF]

Multilisp is a parallel programming language derived from the Scheme dialect of Lisp by addition of the future construct. Multilisp has been implemented on Concert, a shared-memory muitiprocessor that uses a novel RingBus interconnection. Concert currently has 28 MC68000 processors, with a design goal of 32 processors. Several application programs have been developed and measured using Concert Multilisp. Experience with these programs has contributed to tuning the Multilisp language design and will ultimately contribute to the design of a parallel architecture streamlined for high performance on Multilisp programs.

Miller, James S. MultiScheme: a Parallel Processing System Based on MIT Scheme. September 1987

Queinnec, Christian. Crystal Scheme, A Language for Massively Parallel Machines. Second Symposium on High Performance Computing. 1991. [PDF]

Massively parallel computers are built out of thousands conventional but powerful processors with independent memories. Very simple topologies mainly based on physical neighbourhood link these processors. The paper discusses extensions to the Scheme language in order to master such machines. Allowing arguments of functions to be concurrently evaluated introduces parallelism. Migration across the different processors is achieved through a remote evaluation mechanism. First-class continuations offering some semantical problems with respect to concurrency, we propose a neat semantics for them and then show how to build, in the language itself, advanced concurrent constructs such as futures. Eventually we comment some simulations, with various topologies and migration policies, which enables to appreciate our previous linguistical choices and confirms the viability of the model. Massively parallel computers [Hewitt 80, Dally et al. 89, Germain et al. 90] are large ensembles comprising thousands of conventional but powerful processors equipped with independent memories. Their total throughput confers them tremendous computing potential but they still remain to be tamed. Such machines usually have a crystalline structure where a processor is only connected to its neighbours. No direct global addressing is possible within such a machine: informations can only ow from processor to processor on a neighbourhood basis. For the same reason, there is no shared memory. Various topologies exist and mainly hypertorus [Hewitt 80]. Only simple topologies achieve scalability i.e. the possibility to make a machine grow by simple addition of processors without logical nor physical discontinuity. To realize a chip which can be directly interconnected in three dimensions seems to be now attainable [B'echennec89] and confers much value to the whole approach We follow two ideas, similar to these expressed by [Chien & Dally 89]. First, programming should be relatively easy i.e. the mental model of the computation on so many processors with so many processes must be very simple to be still understandable; second, the language must allow to express sufficient concurrency to utilize the machine. Similarly to Halstead [Halstead 85], we choose to stick to an already familiar algorithmic language: Scheme [Rees & Clinger 86]. We only slightly modify or extend it with the needed capabilities. This approach growing from a small, powerful and semantically clean basis is our preferred way to easily experiment new features. We also try, following the Scheme philosophy, to introduce the minimal capabilities to satisfy our goals. The modified semantics of Scheme allows to concurrently evaluate the terms composing functional applications. This induces programming style changes as well as a modification of the meaning of continuations. Since concurrency is only introduced on a local basis, a remote evaluation mechanism [Stamos & Gifford 90] makes computations ow over the whole machine. Computations migrate but objects still reside on their birth site. We retain the full capabilities of Scheme and, in particular, do not forbid side-effects nor first-class continuation. We thus stick to a very simple, even na've, model: it is our aim to demonstrate that this model is viable through a series of simulations with varying topologies and migration strategies. The paper is organized in two main parts. The first part (section 1) describe the linguistical side and presents the semantics of our modifications to Scheme. The second part collects and comments results of simulations made on machines presented in section 2. We prove the viability of the previous linguistical choices in section 3. Future (section 4) and related (section 5) works conclude the paper.

Queinnec, Christian. PolyScheme: A Semantics for a Concurrent Scheme. High Performance and Parallel Computing in Lisp Workshop, Twickenham, England. 1990. [PDF]

The Scheme language does not fully specify the semantics of combination: the evaluation order of the terms composing a combination is left indeterminate. We investigate in this paper a different semantics for Scheme where terms of combinations are evaluated concurrently. The resulting semantics models a language with concurrent threads sharing a common workspace. The semantics is given in terms of denotational semantics and uses resumptions as well as a choice operator: oneof which mimics a scheduler. An alternate definition for this operator lets appear the classical powerdomains. The main interest of this operator is to offer a formalization that can be read with an operational point of view while keeping a firm theoretical base. Scheme also offers first class continuations with indefinite extent; we examine some semantics for continuations with respect to concurrency. Each of these semantics is a natural extension of the sequential case of regular Scheme. Still they strongly differ in their observed behaviours. The resulting language, named PolyScheme, offers much of the features of current concurrent Lisp (or Scheme) dialects thanks to the sole extension of its combination semantics and without any explicit specialized construct dealing with concurrency.

Philbin, James. An overview of the STING operating system. Proceedings of the. th NEC Software Conference. 1992. [PDF]

Sting is an operating system designed to serve as a highly efficient substrate for modern programming languages. Sting includes support for: various models of parallelism and synchronization; lazy, normal, and eager evaluation; automatic storage management; and topology mapping. Sting is different from other operating systems in several respects. The basic concurrency objects in Sting, virtual processors and lightweight threads, are first class. They provide the user with a level of expressiveness and control unavailable in other operating systems. Thread scheduling and migration are completely customizable. This allows Sting to be used in real-time, interactive, and batch oriented operating environments. Sting also provides a novel system of storage management, including a new kind of parallel garbage collection. Finally, Sting introduces several new optimizations which significantly improve locality of reference and reduce storage requirements, thereby increasing efficiency.

Philbin, James. Customizable policy management in the STING operating system. US/Japan Workshop on Parallel Symbolic Computing. 1992. [PDF]

Sting is an operating system designed to serve as a highly efficient substrate for modern programming languages. It is designed to run on both parallel and distributed systems. This paper describes one of the important contributions of Sting - customizable policy management at both the kernel and user level of the operating system. Two well defined interfaces separate control issues from policy issues. These interfaces allow different, customized policy management modules to be implemented without changing the rest of the system. Customizable policy management makes Sting suitable for many different operating environments including real time, interactive, and computationally intensive. It also allows the user to choose or implement a policy management strategy that is best suited to a particular program.

Philbin, James. STING: An Operating System for Modern Languages. May 1993

Jagannathan, Suresh and Philbin, Jim. A customizable substrate for concurrent languages. 1992. [PDF] [PS]

We describe an approach to implementing a wide-range of concurrency paradigms in high-level (symbolic) programming languages. The focus of our discussion is sting, a dialect of Scheme, that supports lightweight threads of control and virtual processors as first-class objects. Given the significant degree to which the behavior of these objects may be customized, we can easily express a variety of concurrency paradigms and linguistic structures within a common framework without loss of efficiency. Unlike parallel systems that rely on operating system services for managing concurrency, sting implements concurrency management entirely in terms of Scheme objects and procedures. It, therefore, permits users to optimize the runtime behavior of their applications without requiring knowledge of the underlying runtime system. This paper concentrates on (a) the implications of the design for building asynchronous concurrency structures, (b) organizing large-scale concurrent computations, and (c) implementing robust programming environments for symbolic computing.

Jagannathan, Suresh and Philbin, Jim. A foundation for an efficient multi-threaded scheme system. Proceedings of the 1992 ACM conference on LISP and functional programming. 1992. [PDF]

We have built a parallel dialect of Scheme called STING that differs from its contemporaries in a number of important respects. STING is intended to be used as an operating system substrate for modern parallel programming languages. The basic concurrency management objects in STING are first-class lightweight threads of control and virtual processors (VPS). Unlike high-level concurrency structures, STING threads and VPS are not encumbered by complex synchronization protocols. Threads and VPS are manip ulated in the same way es any other Scheme structure. STING separates thread policy decisions from thread implementation ones. Implementations of different parallel languages built on top of STING can define their own scheduling and migration policies without requiring modification to the runtime system or the provided interface. Process migration and scheduling can be customized by applications on a per-VP basis. The semantics and implementation of threads minimizes the cost of thread creation, and puts a premium on storage locality. The storage management policies in STING lead to better cache and page utilization, and allows users to experiment with a variety of different execution regimes - from fully delayed to completely eager evaluation.

Moreau, Luc and Ribbens, Daniel. The semantics of pcall and fork in the presence of first-class continuations and side-effects. 1995. [PDF]

We present the semantics of the annotations pcall and fork for parallel evaluation of Scheme. Annotated programs are proved to be behaviourly indistinguishable from their non-annotated counterparts even in the presence of first-class continuations and side-effects. The semantics takes the form of an abstract machine, which can be regarded as a guideline for an implementation.

Taura, Kenjiro and Yonezawa, Akinori. Schematic: A concurrent object-oriented extension to scheme. France-Japan Workshop on Object-Based Parallel and Distributed Computation. 1995. [PDF]

A concurrent object-oriented extension to the programming language Scheme, called Schematic, is described. Schematic supports familiar constructs often used in typical parallel programs (future and higher-level macros such as plet and pbegin), which are actually defined atop a very small number of fundamental primitives. In this way, Schematic achieves both the convenience for typical concurrent programming and simplicity and exibility of the language kernel. Schematic also supports concurrent objects which exhibit more natural and intuitive behavior than the "bare" (unprotected) shared memory, and permit more concurrency than the traditional Actor model. Schematic will be useful for intensive parallel applications on parallel machines or networks of workstations, concurrent GUI programming, distributed programming over network, and even concurrent shell programming.

Masuhara, Hidehiko. Architecture Design and Compilation Techniques Using Partial Evaluation in Reflective Concurrent Object-Oriented Languages. 1999. [PDF]

Parallel and distributed programs often have hardware/problem specific optimizations for improving quality of the program such as efficiency and robustness. Those optimizations, unfortunately, degrade portability and re-usability as they are intertwined with the original algorithm description. Reective languages, which provide the application programmer extensible and abstract implementation of the language, can describe such optimizations as extensions to the language. The separation of optimization descriptions gains portability and re-usability of both application programs and optimizations. However, the interpretive execution model of reective languages imposes a large amount of performance overhead, which sometimes outweighs bene- fits of optimizations. Previous reective languages prohibit some of operations being modified via reection, so as to reduce the amount of interpretation overhead. The imperfection of this approach is that it still leaves a considerable amount of overhead, and it yields less exible, unclear reective architecture. This dissertation investigates design and compilation framework of metainterpreters and meta-objects in an object-oriented concurrent language ABCL/R3. By using partial evaluation to compile reective programs, ABCL/R3 achieves exible and lucid reective architecture and efficient execution at the same time. We design full-edged meta-interpreters by examining several concurrent programming examples. A newly proposed delegation mechanism enables to define modular and scope controlled extensions to meta-interpreters. We design meta-objects by exploiting the notion of reader/writer methods in a concurrent object-oriented language Schematic, so that they can be effectively partially evaluated. The compilation frameworks of meta-interpreters and meta-objects basically translate concurrent object definitions into a sequential program, then apply partial evaluator for a sequential language, and generates a program in a (non-reective) concurrent object-oriented language, in which base-level and meta-level objects are collapsed to single level objects. The efficiency of generated programs is demonstrated by several benchmark programs, in which our compiler exhibits performance close to non-reective languages.

Masuhara, Hidehiko and Yonezawa, Akinori. Design and partial evaluation of meta-objects for a concurrent reflective language. European Conference on Object-Oriented Programming. 1998. [PDF]

Customizable meta-objects are a powerful abstraction for extending language features and implementation mechanisms, but interpretive execution suffers from severe performance penalty. Some of this penalty can be reduced by applying partial evaluation to meta-interpreters, but partial evaluation of meta-objects in existing concurrent object-oriented languages is ineffective. This paper proposes a new meta-object design for our reflective language ABCL/R3. It yields meta-objects that can be optimized effectively using partial evaluation. The crux of the design is the separation of state-related operations from other operations, and this separation is accomplished by using reader/writer methods in our concurrent object-oriented language called Schematic. Our benchmark trials show that non-trivial programs with partially evaluated meta-objects run more than six times faster than ones that are interpreted by meta-objects. In addition, a partially evaluated program that uses a customized meta-object runs as efficiently as a program that is manually rewritten so as to have the same functionality without using meta-objects.

Threading and Parallel Programming with Continuations

Serrano, Manuel and Boussinot, Frédéric and Serpette, Bernard. Scheme FairThreads. 2th International Lisp Conference. October 2002

Serrano, Manuel and Boussinot, Frédéric and Serpette, Bernard. Scheme fair threads. Proceedings of the 6th ACM SIGPLAN international conference on Principles and practice of declarative programming. 2004

Gasbichler, Martin and Sperber, Michael. Processes vs. user-level threads in Scsh. 2002. [PDF]

The new version of scsh enables concurrent system programming with portable user-level threads. In scsh, threads behave like processes in many ways. Each thread receives its own set of process resources. Like Unix processes, forked threads can inherit resources from the parent thread. To store these resources scsh uses preserved thread fluids, a special kind of fluid variables. The paper gives a detailed description of an efficient implementation for thread-local process resources. Scsh also provides an interface to the fork system calls which avoids common pitfalls which arise with a userlevel thread system. Scsh contains a binding for fork that forks "only the current thread."

Haynes, Christopher T. and Friedman, Daniel P.. Abstracting timed preemption with engines. 1987. [PDF]

The need for a programming language abstraction for timed preemption is argued, and several possibilities for such an abstraction are presented. One, called engines, is adopted. Engines are an abstraction of bounded computation, not a process abstraction in the usual sense. However, in conjuction with first class continuations, engines allow a language to be extended with time-sharing implementations for a variety of process abstraction facilities. We present a direct implementation of hiaton streams. Engine nesting refers to the initiation of an engine computation by an already running engine. We consider the need for engine nesting and show how it may be accomplished in a manner that charges a parent engine for the computation of its offspring. We conclude by discussing the importance of simple and general abstractions such as engines.

Haynes, Christopher T and Friedman, Daniel P. Engines build process abstractions. Proceedings of the 1984 ACM Symposium on LISP and functional programming. 1984. [PDF]

Engines are a new programming language abstraction for timed preemption. In conjunction with first class continuations, engines allow the language to be extended with a time sharing implementation of process abstraction facilities. To illustrate engine programming techniques, we implement a round-robin process scheduler. The importance of simple but powerful primitives such as engines is discussed.

Hieb, Robert and Dybvig, R Kent. Continuations and concurrency. Proceedings of the second ACM SIGPLAN symposium on Principles & practice of parallel programming. 1990

Continuations have proven to be useful for implementing a variety of control structures, including exception handling facilities and breadth-first searching algorithms. However, traditional continuations are not useful in the presence of concurrency, because the notion of the rest of the computation represented by a continuation does not in general make sense. This paper presents a new type of continuation, called a process continuation, that may be used to control tree-structured concurrency. Just as a traditional continuation represents the rest of a computation from a given point in the computation, a process continuation represents the rest of a subcomputation, or process, from a given point in the subcomputation. Process continuations allow nonlocal exits to arbitrary points in the process tree and allow the capture of a subtree of a computation as a composable continuation for later use. Even in the absence of multiple processes, the precise control achievable with process continuations makes them more useful than traditional continuations.

Wand, Mitchell. Continuation-based multiprocessing. Proceedings of the 1980 ACM conference on LISP and functional programming. 1980. [PDF]

ABSTRACT: Any multiprocessing facility must include three features: elementary exclusion, data protection, and process saving. While elementary exclusion must rest on some hardware facility (e.g. a test-and-set instruction), the other two requirements are fulfilled by features already present in applicative languages. Data protection may be obtained through the use of procedures (closures or funargs), and process saving may be obtained through the use of the CATCH operator. The use of CATCH, in particular, allows an elegant treatment of process saving which is much more tractable than the ones found in the operating systems literature. We demonstrate these techniques by writing the kernel and some modules for a multiprocessing system. The kernel is very small. Many functions which one would normally expect to find inside the kernel are completely decentralized. We consider the implementation of other schedulers, interrupts, and the implications of these ideas for language design.

Hieb, Robert and Dybvig, R Kent and Anderson, Claude W. Subcontinuations. January 1994. [PDF] [PS]

Continuations have proven to be useful for implementing a variety of control structures, including exception handling facilities and breadth-first searching algorithms. However, traditional continuations are not useful in the presence of concurrency, because the notion of the rest of the computation represented by a continuation does not in general make sense. Traditional continuations can also be difficult to use in nonconcurrent settings, since their global nature is sometimes problematic. This article presents a new type of continuation, called a subcontinuation. Just as a traditional continuation represents the rest of a computation from a given point in the computation, a subcontinuation represents the rest of a subcomputation from a given point in the subcomputation. Subcontinuations may be used to control tree-structured concurrency by allowing nonlocal exits to arbitrary points in a process tree and allowing the capture of a subtree of a computation as a composable continuation for later use. In the absence of concurrency the localized control achievable with subcontinuations makes them more useful than traditional continuations.

Shivers, Olin. Continuations and threads: Expressing machine concurrency directly in advanced languages. Proceedings of the Second ACM SIGPLAN Workshop on Continuations. January 1997. [PDF]

It is well known [Wand] that concurrency can be expressed within languages that provide a continuation type. However, a number of misconceptions persist regarding the relationship between threads and continuations. I discuss the proper relationship between these two objects, and present a model for directly expressing concurrency using continuations. The model is designed to support systems programming, and has several novel features: it is synchronous, preemptable, and fully virtualisable, allowing schedulers to be written by unprivileged users that are indistinguishable from top-level schedulers that actually control access to the hardware resources.

Moreau, Luc. Programming in a Parallel Functional Language with Continuations. February 1992. [PDF]

Dans cet article, nous montrons que la méthodologie de program- mation fonctionnelle avec continuations peut étre appliquée pour réaliser des applications paralléles. Cette approche est rendue possible grace a des opérateurs pour le parallélisme qui sont transparents. La définition de ces opérateurs se base sur une notion de métacontinuation représentant un ordre d'évaluation séquentiel de gauche a droite. Une définition d'un langage fonctionnel avec opérateurs transparents pour le parallélisme est donnée sous forme d'une traduction vers un langage fonctionnel possédant 4 primitives pour le parallélisme du type CCS.

Moreau, Luc. An operational semantics for a parallel functional language with continuations. International Conference on Parallel Architectures and Languages Europe. June 1992. [PDF]

Explicit parallelism can be introduced in Scheme by adding the constructs fork, pcall and future. Katz and Weise gave an implementation where those constructs are transparent even when first class continuations are used. In this paper, we formalise this work by giving an operational semantics for a functional language with first class continuations and transparent constructs for parallelism. We introduce a concept of higher order continuation that we call metacontinuation which preserves sequential properties of continuations in a parallel language.

Moreau, Luc and Ribbens, Daniel. Sound rules for parallel evaluation of a functional language callcc. Proceedings of the conference on Functional programming languages and computer architecture. June 1993. [PDF]

Observationally equivalent programs are programs which are indistinguishable in all contexts, as far as their termination property is concerned. In this paper, we present rules pre- serving observational equivalence, for the parallel evaluation of programs using call/cc. These rules allow the capture of continuations in any applicative context and they prevent from aborting the whole computation when a continuation is applied in the extent of the call/cc by which it was reified. As a consequence, these results prove that one can design a functional language with first-class continuations which has transparent constructs for parallelism.

Moreau, Luc. The PCKS-machine: An abstract machine for sound evaluation of parallel functional programs with first-class continuations. European Symposium on Programming. April 1994. [PDF]

The PCKS-machine is an abstract machine that evaluates parallel functional programs with first-class continuations. Parallelism is introduced by the construct pcall, which provides a fork-and-join type of parallelism. To the best of our knowledge, the PCKS-machine is the first implementation of such a language that is proved to have a transparent construct for parallelism: every program using such a construct returns the same result as in the absence of this construct. This machine is also characterised by the non-speculative invocation of continuations whose interest is illustrated in an application.

Moreau, Luc. Sound evaluation of parallel functional programs with first-class continuations. June 1995. [PDF]

The interpreter continuation is the computation that remains to be performed after evaluating a given expression. Some programming languages provide the programmer with two facilities to act on the interpreter continuation: the capture and the invocation. The capture of a continuation consists in packaging up the current interpreter continuation as a rst-class object so that it can be manipulated like any other object. The invocation of a continuation discards the current interpreter continuation and resumes the computation with the invoked continuation. The constructs fork and pcall explicitly indicate where parallel evaluations can proceed. These constructs are expected to be transparent: a program using such constructs is supposed to return the same result as in their absence. Traditionally, the semantics of continuations is formulated in a sequential framework; the addition of parallelism to a language with rst-class continuations requires to specify a new semantics for continuations. This is the issue addressed in our dissertation, and these are its ma jor contributions. * We set up a semantics for rst-class continuations such that the meaning of a parallel program is the same as the sequential meaning of the program without the constructs for parallelism. This semantics is formalised in a reduction system that is an extension of the -calculus. * The proposed semantics is proved to be sound with respect to the sequential semantics. The soundness criterion used in this proof is the notion of observational equivalence. * We implement this semantics on an abstract machine that models a MIMD architecture with a shared memory, and we prove the correctness of this implementation.

Moreau, Luc. A Parallel Functional Language with First-Class Continuations: Programming Style and Semantics. 1995. [PDF]

We present an operational semantics for a functional language with first-class continuations and transparent constructs for parallelism fork and pcall. The sequential semantics of programs with first-class continuations is preserved when parallel evaluation is allowed, by verifying whether some expressions have returned a value before applying a continuation. These expressions are the ones that are evaluated before this continuation is applied in a left-to-right sequential order. An implementation is proposed using a notion of higher-order continuation that we call mcetacontinuation. This semantics is costless when first-class continuations are not used, Several programs also illustrate the programming style that can be adopted in such a language.

Moreau, Luc. Non-speculative and upward invocation of continuations in a parallel language. Colloquium on Trees in Algebra and Programming. May 1995. [PDF] [PDF]

A method of preserving the sequential semantics in parallel programs with first-class continuations is to invoke continuations non-speculatively. This method, which prevents a continuation from being invoked as long as its invocation can infringe the sequential semantics, reduces parallelism by the severe conditions that it imposes~ especially on upward uses. In this paper, we present new conditions for invoking continuations in an upward way and both preserving the sequential semantics and providing parallelism. This new approach is formalised in the PCKS-machine, which is proved to be correct by showing that it has the same observational equivalence theory as the sequential semantics.

Futures and Multi-Lisp

Feeley, Marc. An efficient and general implementation of futures on large scale shared-memory multiprocessors. April 1993. [PDF] [PS]

This thesis describes a high-performance implementation technique for Multilisp's "future" parallelism construct. This method addresses the non-uniform memory access (NUMA) problem inherent in large scale shared-memory multiprocessors. The technique is based on lazy task creation (LTC), a dynamic task partitioning mechanism that dramatically reduces the cost of task creation and consequently makes it possible to exploit fine grain parallelism. In LTC, idle processors get work to do by "stealing" tasks from other processors. A previously proposed implementation of LTC is the shared-memory (SM) protocol. The main disadvantage of the SM protocol is that it requires the stack to be cached suboptimally on cache-incoherent machines. This thesis proposes a new implementation technique for LTC that allows full caching of the stack: the message-passing (MP) protocol. Idle processors ask for work by sending "work request" messages to other processors. After receiving such a message a processor checks its private stack and task queue and sends back a task if one is available. The message passing protocol has the added benefits of a lower task creation cost and simpler algorithms. Extensive experiments evaluate the performance of both protocols on large shared-memory multiprocessors: a 90 processor GP1000 and a 32 processor TC2000. The results show that the MP protocol is consistently better than the SM protocol. The difference in performance is as high as a factor of two when a cache is available and a factor of 1.2 when a cache is not available. In addition, the thesis shows that the semantics of the Multilisp language does not have to be impoverished to attain good performance. The laziness of LTC can be exploited to support at virtually no cost several programming features including: the Katz-Weise continuation semantics with legitimacy, dynamic scoping, and fairness.

Moreau, Luc. The semantics of future in the presence of first-class continuations and side-effects. November 1995. [PS]

We present the rst semantics of future in a Schemelike language which has both side-e ects and rstclass continuations. Correctness is established by proving that programs annotated by future have the same observable behaviour as their non-annotated counterparts, even though evaluation may be parallel.

Moreau, Luc. The semantics of Scheme with future. Proceedings of the first ACM SIGPLAN international conference on Functional programming. May 1996. [PDF]

We present the formal semantics of future in a Scheme-like language which has both side-effects and first-class continu- ations. Correctness is established by proving that programs annotated by future have the same observable behaviour as their non-annotated counterparts, even though evaluation may be parallel.

Moreau, Luc. Continuing Into the Future: the Return. 8th International Conference in Systems Research Informatics and Cybernetics (InterSymp'96). August 1996. [PDF]

future is an annotation that indicates which expressions of a program may be evaluated in parallel. By de nition, future is transparent, i.e. annotated programs return the same results as in the absence of annotations. In such a framework, the interaction of parallelism and rst-class continuations has been considered as a delicate matter for a long time. Indeed, unrestricted parallelism and rst-class continuations may lead to non-deterministic programs, which is contradictory to the notion of annotation. In this paper, we overview the formal semantics of future and rst-class continuations. The semantics is an abstract machine that models a parallel computer with a shared memory.

Forin, Alessandro. Futures. 1990

Katz, Morry and Weise, Daniel. Continuing into the future: on the interaction of futures and first-class continuations. Proceedings of the 1990 ACM conference on LISP and functional programming. June 1990

Flanagan, Cormac and Felleisen, Matthias. The semantics of future and its use in program optimization. Proceedings of the 22nd ACM SIGPLAN-SIGACT symposium on Principles of programming languages. 1995. [PDF] [PS]

The future annotations of MultiLisp provide a simple method for taming the implicit parallelism of functional programs Past research concerning futures has focused on implementation issues In this paper we present a se ries of operational semantics for an idealized functional language with futures with varying degrees of inten sionality We develop a setbased analysis algorithm from the most intensional semantics and use that al gorithm to perform touch optimization on programs Experiments with the Gambit compiler indicates that this optimization substantially reduces program execu tion

Flanagan, Cormac and Felleisen, Matthias. Well-founded touch optimization for futures. 1994. [PDF] [PS]

The future annotations of MultiLisp provide a simple method for taming the implicit parallelism of functional programs, but require touch operations at all placeholderstrict positions of program operations to ensure proper synchronization between threads. These touch operations contribute substantially to a program's execution time. We use an operational semantics of future, developed in a previous paper, to derive a program analysis algorithm and an optimization algorithm based on the analysis that removes provably-redundant touch operations. Experiments with the Gambit compiler indicate that this optimization signi cantly reduces the overhead imposed by touch operations.

Flanagan, Cormac and Felleisen, Matthias. The semantics of Future. 1994. [PDF] [PS]

The future annotation introduced by MultiLisp provides a simple method for taming the implicit parallelism of functional programs. Prior research on futures has concentrated on implementation and design issues, and has largely ignored the development of a semantic characterization of futures. This paper presents four operational semantics for an idealized functional language with futures with varying degrees of intensionality. The first semantics defines future to be a semantically-transparent annotation. The second semantics interprets a future expression as a potentially parallel task. The third semantics explicates the coordination of parallel tasks and the need for touch operations on placeholder-strict arguments to certain primitive operations by introducing placeholder objects. The fourth and last semantics is a low-level refinement of the third semantics, which explicates just enough information to permit the smooth derivation of program analyses. The paper includes proofs showing the equivalence of these semantics.

Other Topics in Parallel and Concurrent Programming

Flatt, Matthew and Findler, Robert Bruce. Kill-safe synchronization abstractions. 2004. [PDF] [PDF]

When an individual task can be forcefully terminated at any time, cooperating tasks must communicate carefully. For example, if two tasks share an object, and if one task is terminated while it manipulates the object, the object may remain in an inconsistent or frozen state that incapacitates the other task. To support communication among terminable tasks, language run-time systems (and operating systems) provide kill-safe abstractions for inter-task communication. No kill-safe guarantee is available, however, for abstractions that are implemented outside the run-time system. In this paper, we show how a run-time system can support new kill-safe abstractions without requiring modification to the run-time system, and without requiring the run-time system to trust any new code. Our design frees the run-time implementor to provide only a modest set of synchronization primitives in the trusted computing base, while still allowing tasks to communicate using sophisticated abstractions.

Chen, Pee-Hong and Friedman, Daniel P. Prototyping data flow by translation into Scheme. August 1983. [PDF]

We consider language modeling using Scheme, an applicative order, lexically scoped dialect of Lisp. In Lisp, programs are represented by list structures. By representing programs of other languages as list structures with carefully placed keywords it is possible to define the keywords as either functions or syntactic extensions. Proper consideration of the lexical scoping yields even more control than would be expected. As an example of this approach we prototype Data Flow, a concurrent computation system, by expanding each actor or arc into a function invocation and by expanding a network into a Scheme function that is compiled and executed. This approach avoids the familiar interpretation step used in most prototyping approaches, yielding improved performance.

Boucher, Dominique and Feeley, Marc. Construction parallele de l'automate LR(0): Une application de Multilisp a la compilation. June 1994. [PDF] [PS]

Nous nous int'eressons ici a la parall'elisation des g'en'erateurs d'analyseurs syntaxiques LALR(1), tels YACC et Bison, et a leur implantation en Multilisp [Halstead, 1985], un dialecte parallele de Scheme, dont le choix est motiv'e par la nature symbolique du probleme. La construction de tels analyseurs requiere habituellement trois phases distinctes [Aho, Sethi et Ullman, 1986] : 1. la construction de l'automate LR(0), 2. le calcul des contextes de r'eductions, aussi appel'es ensembles de pr'evision, et 3. la construction des tables d'analyse.

Feeley, Marc and Turcotte, Marcel and Lapalme, Guy. Using Multilisp for solving constraint satisfaction problems: an application to nucleic acid 3D structure determination. 1994. [PDF] [PDF] [PS]

This paper describes and evaluates a parallel program for determining the three- dimensional structure of nucleic acids. A parallel constraint satisfaction algorithm is used to search a discrete space of shapes. Using two realistic data sets, we compare a previous sequential version of the program written in Miranda to the new sequential and parallel versions written in C, Scheme, and Multilisp, and explain how these new versions were designed to attain good absolute performance. Critical issues were the performance of floating-point operations, garbage collection, load balancing, and contention for shared data. We found that speedup was dependent on the data set. For the first data set, nearly linear speedup was observed for up to 64 processors whereas for the second the speedup was limited to a factor of 16.

Feeley, Marc. A message passing implementation of lazy task creation. US/Japan Workshop on Parallel Symbolic Computing. 1992. [PDF] [PS]

This paper describes an implementation technique for Multilisp's future construct aimed at large shared-memory multiprocessors. The technique is a variant of lazy task creation. The original implementation of lazy task creation described in [Mohr, 1991] relies on ef- cient shared memory to distribute tasks between processors. In contrast, we propose a task distribution method based on a message passing paradigm. Its main advantages are that it is simpler to implement, has a lower cost for locally run tasks, and allows full caching of the stack on cache incoherent machines. Benchmarks on a 32 processor BBN TC2000 show that our method is more efficient than the original implementation by as much as a factor of 2.

Feeley, Marc and Miller, James S. A parallel virtual machine for efficient Scheme compilation. Proceedings of the 1990 ACM conference on LISP and functional programming. 1990. [PDF] [PS]

Programs compiled by Gambit, our Scheme compiler, achieve performance as much as twice that of the fastest available Scheme compilers. Gambit is easily ported, while retaining its high performance, through the use of a simple virtual machine (PVM). PVM allows a wide variety of machine-independent optimizations and it supports parallel computation based on the future construct. PVM conveys high-level information bidirectionally between the machine-independent front end of the compiler and the machine-dependent back end, making it easy to implement a number of common back end optimizations that are difficult to achieve for other virtual machines. PVM is similar to many real computer architectures and has an option to efficiently gather dynamic measurements of virtual machine usage. These measurements can be used in performance prediction for ports to other architectures as well as design decisions related to proposed optimizations and object representations.

Jagannathan, Suresh and Philbin, James. High-level abstractions for efficient concurrent systems. Programming Languages and System Architectures. April 1994. [PDF] [PS]

Parallel symbolic algorithms exhibit characteristics that make their efficient implementation on current multiprocessor platforms difficult: data is generated dynamically and often have irregular shape and density, data sets typically consist of objects of many different types and structure, the natural unit of concurrency is often much smaller than can be efficiently supported on stock hardware, efficient scheduling, migration and load-balancing strategies vary widely among different algorithms, and sensible decomposition of the program into parallel threads of control often cannot be achieved by mere examination of the source text.

Weeks, Stephen and Jagannathan, Suresh and Philbin, James. A concurrent abstract interpreter. 1994. [PDF] [PS]

Abstract interpretation has been long regarded as a promising optimization and analysis technique for high-level languages. In this article, we describe an implementation of a concurrent abstract interpreter. The interpreter evaluates programs written in an expressive parallel language that supports dynamic process creation, first-class locations, list data structures and higher-order procedures. Synchronization in the input language is mediated via first-class shared locations. The analysis computes intra- and inter-thread control and dataflow information. The interpreter is implemented on top of Sting, a multi-threaded dialect of Scheme that serves as a high-level operating system for modern programming languages

Jagannathan, Suresh and Weeks, Stephen. Analyzing stores and references in a parallel symbolic language. June 1994. [PDF] [PS]

We describe an analysis of a parallel language in which processes communicate via rst-class mutable shared locations. The sequential core of the language de nes a higher-order strict functional language with list data structures. The parallel extensions permit processes and shared locations to be dynamically created; synchronization among processes occurs exclusively via shared locations. The analysis is de ned by an abstract interpretation on this language. The interpretation is efficient and useful, facilitating a number of important optimizations related to synchronization, processor/thread mapping, and storage management.