Computer Science

Faculty Research Summaries | Chairman's Introduction

Yali Amit

Development of models and algorithms for automated analysis of visual and acoustic signals by computer.

The primary constraints on these models are

  1. incorporating the types of invariance exhibited by the human visual and auditory systems,
  2. addressing the issues of variability through the appropriate statistical models,
  3. limited training set sizes
  4. ensuring computational efficiency,
  5. portability into neural network architectures which provide models for how the biological system carries out these tasks.

In this framework a system for learning, detecting and recognizing two-dimensional views of objects in visual scenes has been developed together with an associated network architecture. The main question at this point is finding the appropriate framework for extending this system to deal with hundreds or thousands of possible object classes, together with a well designed and efficient mechanism for analyzing complex visual scenes containing a large number of these objects. How can the object models and classifiers be learned sequentially as more training data is encountered? How to start processing a scene in the absence of prior knowledge regarding which objects are present?

Further information can be found at http://galton.uchicago.edu/~amit.

László Babai

I work in the fields of theoretical computer science and discrete mathematics; more specifically in computational complexity theory, algorithms, combinatorics, and finite groups, with an emphasis on the interactions between these fields. Asymptotic questions and probabilistic methods are common features in my work in each of these areas. A very recent example: methods of the complexity theories of Boolean circuits and branching programs have been brought to bear on the analysis of a popular random sampling technique in computational group theory.

David Beazley

My research interests are primarily in the areas of systems, compilers, and software development tools. I also tend to focus on applications related to software development problems in scientific computing.

Most of my recent research has focused on problems related to software development in mixed-language environments with a focus on the integration of interpreted scripting languages with compiled languages such as C,C++, and Fortran. The most visible project in this area has been my work on SWIG, a compiler I have developed to simplify the integration of existing software in C and C++ with a variety of high-level scripting languages including Python, Perl, Ruby, and Tcl/Tk. More recently, I have started a project that focuses on the problem of mixed-mode debugging of interpreted and compiled software. Specifically, I have developed an integrated debugging module that allows fatal errors in compiled code to be handled as exceptions in scripting languages—making it much easier for a developer to obtain useful diagnostics.

In addition to my work on software development tools, I have worked on a variety of large-scale scientific simulation codes for massively parallel computers and clusters. One problem faced by the users of these systems is difficulty of managing large quantities of simulation data. Thus, I am generally interested in the problem of making such applications easier to use. In the past, I have worked on integrated data analysis and visualization systems for large parallel machines. More recently, I have developed an easy to embed web server library that makes it easy for a scientist to add a remote monitoring capability to existing software.

Todd Dupont

The main thrust of my research is the construction, analysis, and evaluation of numerical methods for partial differential equations ( PDE 's), but I also have had interests in related areas such as the construction of mathematical models for physical and biological systems.

Approximate solution of PDE 's is frequently computationally expensive, even for problems that are conceptually simple. I have been studying ways of using adaptivity to make some of these calculations more efficient and robust. For time-dependent problems the use of meshes that move smoothly with time can be of significant value in producing high quality solutions to difficult problems. Although a general solution to the question of how to use such meshes has not yet been found, there are many situations that I have looked at with my students in which such procedures can be both effective and simple.

The computation of free surface flows is important in several of the projects that I am working on at the moment. These involve the formation of drops under various conditions, modeling of the flow of a fluid over a solid surface, and two fluid flows.

Robert Findler

Broadly, my research focuses on improving software reliability for large software systems. To date, I've focused in three specific areas: behavioral software contracts, program development environments, and the introductory computer science course.

Behavioral contract checking has long proven effective in improving software reliability. Essentially, behavioral contracts provide the programmer with the means to specify important invariants of the program's execution, which can then be checked (typically as the program executes). Crucially, the contract checker must be able to assign blame to the guilty party for any contract violation it discovers. I've been involved in improving the state of the art in behavioral contract checking: making available in languages with advancedfeatures (eg, higher-order functions, classes & objects), and studyingproperties of the contract checker itself (eg, non-interference).

My interest in contract checking was motivated by my experience developing DrScheme, a program development environment for Scheme (http://www.drscheme.org/). DrScheme provides several novel features for Scheme developers. For example, it supports bound variable renaming, (correct) online syntax coloring, an algebraic stepper, and tools for exploring dependencies in large systems. In addition, it is extensible, allowing third-parties to develop new tools for DrScheme without intervention from the core team.

The original motivation for building DrScheme was to support the introductory computer science curriculum. Developing a well-tailored programming environment for beginning programmers posed several interesting research questions. For example, DrScheme supports a hierarchy of programming languages, designed to grow with the student as they grow as programmers. By eliminating aspects of the programming language in the early stages of the curriculum, DrScheme is able to give very good error messages for common mistakes, significantly lowering the frustration beginners experience in their first programming course.

Beyond a well-designed programming environment, students need a well-designed curriculum. I helped develop a curriculum that focuses on software design, eschewing the more traditional "tinker until it works" approach to the introductory course. The curriculum (http://www.htdp.org) focuses on the relationship between the data that a program processes and the structure of the data. Exploiting that relationship gives students a good handle on the structure of their programs leading to a deeper understanding of programming.

In sum, it is my belief that the tools and techniques for building large, reliable software systems are nearly within our grasp, and it is my hope that I'll be able move us closer to that goal.

Lance Fortnow

I study computational complexity theory, trying to understand the power and limitations of efficient computation. My major accomplishments include

helping to show the tremendous power of interactions between powerful untrustworthy provers and probabilistic verifiers and time-space tradeoff lower bounds for solving Boolean formula satisfiability on general purpose computers.

More and more computation complexity has had applications to other areas of science and I have participated in research applying complexity theory to quantum computing, game theory, bioinformatics, learning theory and cryptography.

Most recently I have worked on utilizing complexity in electronic commerce. We can view financial markets as large parallel computation devices and are working on determining what a market can efficiently compute and how we can design securities, auctions and other mechanisms to make the markets more productive.

Ian Foster

In my research, I seek to develop tools and techniques that allow people to use high-performance computing technologies to do qualitatively new things. This involves investigations of parallel and distributed languages, algorithms, and communication; and also focused work on applications. I am particularly interested in using high-performance networking to incorporate remote compute and information resources into local computational environments.

Current Projects: Globus: Software infrastructure for high-performance computing in wide-area internet worked environments, including resource location, scheduling, security, data access, and communication.

Smart instruments: "Computer in the loop" enhancement of scientific instruments, such as high-brilliance X-ray sources.

Computational science: Parallel algorithms and modeling in climate modeling and astrophysics.

Petaflops computing: Programming model and systems software issues for future extreme-scale computer systems.

John Goldsmith

I am interested in problems of natural language processing, with special interest in the problem of unsupervised learning of grammatical structure. Given a large sample of a natural language, what machine learning techniques will be capable of inferring the linguistic structure inherent in the data, whether that structure is syntactic, semantic, or morphological? What constraints on the hypothesis space will be necessary and what will be sufficient to permit the inference of the grammar of the language? I am also interested in the application of the techniques that arise in this work to other domains, such as bioinformatics.

Robert Kirby

I am currently working on developing an algorithm and associated software for computing a wide class of finite element basis functions. Rather than having to hand code each particular order of each family of elements of interest, this approach will allow me to specify the finite element space at a mathematical level of abstraction: a function space over a domain plus a set of linear functionals. Thus a single code can tabulate any order of most of the widely known elements -- Lagrange, nonconforming, H(div), H(curl), etc. This is part of a larger picture of automating the process of generating finite element code, rather through a domain-specific language for multilinear forms or some sort of generative programming.

I am continuing work with the ITR/Climate Systems Center to develop a high-order hydrodynamics solver as part of a next-generation climate model. We are using an arbitrary-order discontinuous Galerkin method that will work on a general set of conservation laws. As a side note, this code will also allow us to explore several issues relating to postprocessing away spurious oscillations arising in most higher-order hyperbolic solvers.

Additionally, I am working to extend earlier work on a posteriori error estimates for mixed finite element methods from elliptic equations to some specific parabolic problems arising in porous media applications.

Stuart A. Kurtz

I am currently working on several concrete problems in computational biology, including oligo micro-array design, pattern matching, and database structure.

Other current interests include the biological computing, the mathematical theory of computation, logic, and randomness.

Gina-Anne Levow

My research interests range over natural language processing, spoken language systems, and human-computer interface. I am currently pursuing two main avenues of research that integrate these topics.

One project explores the use of prosody - pitch, loudness, duration, and pause - in automatic understanding of discourse and dialogue. The work includes detection and improved acoustic modeling of spoken corrections in human-computer dialogue to improve error recovery in spoken language systems. This research also extends to recognition of topical structure in narrative and dialogue. The second area is cross-language multi-modal document retrieval, automatically identifying related text and spoken documents in a wide range of languages, including English, French, German, Arabic, and Mandarin Chinese. This work exploits minimal, simple linguistic resources and statistical techniques to enable rapid application to new languages, and a multi-scale translation and retrieval strategy to improve overall retrieval effectiveness. The two avenues of research come together in work on fine-grained search and presentation of spoken documents using textual and prosodic features to identify story and topic boundaries.

David MacQueen

My research interests cover the general area of programming languages and systems, including formal semantics of programming languages, type and module systems, theoretically-founded language design, and language implementation. My main focus is on functional programming languages and the ML family of languagues in particular, but I am interested in common foundations for understanding both functional and object-oriented languages. My long-term goal is the broader application of modern programming language theory and technology in systems and application programming.

I was one of the principle designers of the Standard ML language, and with Andrew Appel of Princeton was the author of the Standard ML of New Jersey compiler (SML/NJ), which continues to be the most widely used implementation of the language. I continue active involvement the development of the Standard ML of New Jersey compiler in collaboration with John Reppy, Matthias Blume of Toyotal Technological Institute in Chicago, and Zhong Shao of Yale. SML/NJ serves as both a tool for building systems and as a testbed for language research. Several new experimental extensions of the language are being implemented, including record concatenation and existential types. I am also working with John Reppy, Robby Findler, and Matthias Blume to develop a principled approach to "component-based" programming using the Standard ML language infrastructure.

Ketan Mulmuley

I am working in two areas: (1) Design of fast geometric algorithms using randomization; (2) Lower bounds in complexity theory using algebro-geometric methods. A few years ago I had proved a weaker analogue and implication of the $P\not=NC$ conjecture in a restricted but natural PRAM model without bit operations.

Currently I am developing geometric complexity theory, which provides an algebro-geometric approach to fundamental problems in complexity theory. It is based on geometric invariant theory and algebraic geometry of equivariant compactifications of honogeneous spaces.

Svetlozar Nestorov

My research interests include databases, data mining, information integration, semistructured data (XML), and web technologies. In particular, I am interested in the problem of integrating automatically many heterogeneous, disparate information sources. My approach to solving this problem involves data mining each information source in order to discover its structure and contents, and then translating and integrating the discovered structure and contents to a global model (expressed in XML). Since this integration process produces a large amount of XML data, I am interested in developing novel data mining techniques for such data.

Partha Niyogi

Partha Niyogi has been involved in a variety of problems at the interface of learning, language, and speech.

One project in speech and auditory pattern recognition attempts to construct detectors for acoustic classes of sounds (speech and bird songs), and understand their psychological basis through psychophysical experiments. This is joint with Yali Amit, Howard Nusbaum, and graduate students and postdocs.

A second project in language acquisition and change attempts to create a computational framework within which one can coherently examine the relationship between how languages are learned and transmitted and how they change over generational time.

This is joint with visitor Natalia Komarova.

Some theoretical questions in learning theory have been examined in each of the above domains.

Michael J. O'Donnell

I am investigating the informational structure of sound. Most acoustical studies in the past have emphasized the production, transmission or perception of sound, neglecting the central question of its natural data structure. Improved foundations for modeling sound will lead to more transparent synthesis techniques, more insightful analysis techniques, and more effective use of sound along with graphics in human/computer interfaces.

I am also investigating strategic applications of digital signatures. Many applications of digital signatures, such as the Internet's Domain Name System depend on chains of trust, which are no stronger than their weakest links. I am seeking purely end-to-end uses of signatures, where trust develops cumulatively and robustly. Example: a system of self-assigned self-verifying domain names could be deployed very easily, reducing the bitter and costly contention over domain names.

John Reppy

My primary area of research is the design and implementation of advanced programming languages.

It is my belief that such languages provide the best hope for increasing the quality and reliability of software, while also improving programmer productivity. My current work in this area is focused on the Moby programming language, which is a higher-order typed language with support for object-oriented and

concurrent programming. The Moby project, which is joint work with Kathleen Fisher of AT&T Labs --- Research, provides a testbed for exploring ideas in language design and implementation. Using this testbed, we have explored the relationship between module and class mechanisms; language interoperability; and compiler transformations for functional languages. We are currently exploring compiler and runtime system support for supporting lightweight concurrency.

My work on language design has always been motivated by practical programming problems.

For example, work on a multi-threaded GUI toolkit motivated the design of Concurrent ML. For our work in Moby, we are exploring distributed programming and 3D computer graphics as motivating application areas.

Anne Rogers

My work lies in the intersection of programming languages, compilers, and systems. In particular, I develop abstractions and tools for computations that are difficult to implement efficiently, but are essential for progress in other areas. Some examples include Hancock

(joint work with Kathleen Fisher of AT&T Labs), which simplifies the process of analyzing massive data streams, and Olden, which simplifies the implementation of a class of computationally-intensive scientific applications.

My current research focuses on developing abstractions and tools for collecting, organizing, and processing medical data. Stuart Kurtz of Computer Science, David Lovinger of the Department of Medicine, and I are leading a project, called Baton, designed to improve the process of transferring overnight responsibilities for patients in the hospital. Baton integrates much of the information that represents

the intern's view of the patient into a single source that can be accessed using a hand-held device. An integrated view will reduce the time interns spend managing data and improve the transfer of information between interns.

L. Ridgway Scott

My research is divided into three areas: numerical solution of differential equations, bioinformatics and parallel computing. In the first area, work is being done regarding finite element methods for simulating non-Newtonian fluids. In bioinformatics, research is focused on gene micro-arrays (mis-)hybridization and on protein-protein interactions. Finally, we study software tools to support parallel scientific computing and develop

scalable algorithms for scientific simulation.

Janos Simon

My main research area is computational complexity -- estimating the amount of resources (such as memory, time, number of algebraic operations, or interprocess communication) that are needed to compute functions. One tries to get good upper bounds by exhibiting efficient algorithms and to develop mathematical methods to prove lower bounds.

I am both interested in lower bound techniques, especially for parallel and for probabilistic models, and in combinatorial algorithms. I have also worked in distributed computing, in particular the study of fault tolerant distributed computations.

More recently, I have been working also with computational problems suggested by Biology.

Finally, as editor of an electronic journal, I have an interest in electronic publishing, as well as the impact of the Internet--both in research about algorithms for information processing, and in societal impact.

Robert I. Soare

Robert Soare's main research area is mathematical logic and particularly the theory of computability. The theory of computable function emerged during the 1930's with the primitive recursive functions used in Godel's incompleteness theorem, and then the full definitions by Godel [1934], Turing [1936] and others of computable functions which played a significant role in later development of computing devices. Turing [1939] introduced the notion of an "oracle Turing machine" for defining relative computability of a set A from a set B. Sets are placed into equivalence classes called "degrees" if each is computable from the other, i.e. if they code the same amount of information. Since Post [1944] much work has been done on the structure of these degrees, particularly those containing a set which is computably enumerable (c.e.), namely which can be listed in a computable fashion. The degrees are used to classify problems in mathematics as being solvable or unsolvable, such as Hilbert's 10 th problem on Diophantine equations. They are also used to compare the information content of one unsolvable problem in mathematics or computer science to another. For example, the problem of deciding whether a program computes a total function is of strictly greater degree than that of the halting problem.

The first part of Soare's work has been on degrees and particularly the c.e. degrees, C. Major questions concern the first order theory and classifying the algebraic structure of C. For example, Slaman and Soare [Annals of Mathematics, 1998] unified and extended virtually all the major results and methods about C over the last thirty years to solve the extension of embeddings problem for C, namely to give an algorithm to decide whether a given finite embedding into C can be extended to a second given embedding. This also gives a decision procedure for a large part of the Sigma_2 theory of C. Soare is working on related problems.

The second part of Soare's work is on E, the structure of the c.e. sets. Post [1944] initiated the study of the relationship between the degree of a c.e. set and its structural properties in E. Harrington and Soare [Proc. Nat'l. Acad. Sci., 1991] answered a question of Post by producing an E-definable property which guarantees incompleteness. On the other hand, Harrington and Soare have invented a new method [ JAMS, 1996] for generating automorphisms of E. Automorphisms and definable properties tend to work in opposite directions closing the gap of our lack of knowledge. Soare and his students are working on a number of questions about automorphisms and definable properties.

The third part deals with computability and its interaction with model theory and algebraic structures. From basic theorems of model theory we know when a given theory has prime or saturated models. For example, when does a computable theory have computable prime or saturated models? Soare has worked with Lachlan on degrees of models of the theory Peano arithmetic, and also for full (true) arithmetic, to decide how such a model differs from a mere uniform upper bound for arithmetic sets. Computability has also played a strong role in effective analyses of algebraic structures. For example, when does a structure have an isomorphic copy in every degree, but no computable copy? Soare and another student are working on such questions.

Soare has recently been working on connections between computability theory and differential geometry. In a recent reprint topologists Alex Nabutovsky and Shmuel Weinberger (NW), begin, "In this paper we would like to expose some of the astonishing richness of the space of Riemannian metrics on a smooth manifold, up to reparametrization." Part of this richness is due to the connection with computability, and in particular with computably enumerable (c.e.) sets. NW prove roughly that for any c.e. set A (i.e. one recognized by a Turing machine) there is a sequence of manifolds (indeed hypersurfaces) \Gamma_n such that n\in A iff \Gamma_n determines a local minimum, and that the depth of the local minimum is roughly equal to the halting time of the Turing machine. Combining this with a recent result of Soare about the existence of a sequence of c.e. sets {A_n: n \in N} with certain properties on the halting times of A_n relative to A_{n+1}, NW obtain unexpected results about the depth and distribution of local minima and their fractal nature. Soare is exploring with a graduate student other connections between computability and geometry.

Finally Soare is writing a new book on the theory of computability. His previous book with Springer Verlag, 1987, became the standard reference in mathematics and computer science for computability and computably enumerable sets. That book is now outdated with respect to notation, terminology, and new results. The new book is intended as a replacement, updating the initial chapters, omitting many later chapters and writing ones on new results, and adding chapters on the relation of computability to model theory and to computational complexity in computer science. The new book will be called, "The Theory of Computability and Computably Enumerable Sets."

Rick Stevens

I am interested in the development of innovative tools and techniques that enable computational scientists to solve large-scale problems more effectively on the most advanced high-performance computers. Specifically, my research focuses on three principal areas: collaborative visualization environments, high-performance computer architectures, and performance modeling.

In the area of collaborative visualization, I am exploring the use of virtual reality in the visualization of scientific data and processes. My efforts include improving displays, recording, and playback of virtual reality experiences; developing new methods for tracking and control and close coupling with parallel supercomputers; and devising new ways of collaborating in virtual environments. Of particular interest to me is teleimmersion -- strategies for synthesizing networking and multimedia technologies to enhance the development of wide-area wide-area collaborative computational science.

In the area of high-performance computers, I am studying approaches to computing at the Petaflops Scale, focusing on analysis, modeling, and simulation tools for these ultra-high-performance computers. I am also particularly interested in algorithm and software for multithreaded computer architectures and for hierarchical processor and memory architectures.

In a related area, I am investigating analytic performance models that will help researchers understand the performance relationship between high-performance computer systems and scientific applications. My goal is to enable scientific simulations to achieve the very high performance potential of next-generation computer architectures with deep memory hierarchies.

Eric Vigoda

My broad areas of interest are Randomized Algorithms, Statistical Physics, and Discrete Mathematics. I am particularly interested in simple stochastic processes, such as Markov chains and random walks. A recent example of our work is the first polynomial time algorithm for approximating the permanent of an arbitrary non-negative matrix.

Senior Research Associate

Jessie Pinkham

I'm interested in problems of natural language processing; my work has focused on the modeling of English and French syntax. My most recent project was a large-scale machine translation project involving English, French, and Spanish, accomplished at Microsoft Research, which involved automatic learning of translations across detailed logical structures for each language based on grammars developed by computational linguists working on each language separately.