COSC85/185-S96: Computing with DNA

Selected links:

Using DNA to Solve SAT, by Richard Lipton, January 1995. (Or here.): An introductory version of "On the Computational Power of DNA". See lecture 1 summary below.
On the Computational Power of DNA, by Boneh, Dunworth, Lipton and Sgall, October 1995.: "We show how DNA based computers can be used to solve the satisfiability problem for boolean circuits. Furthermore, we show how DNA computers can solve optimization problems directly without first solving several decision problems..." (see lecture 2 summary below)
Breaking DES Using a Molecular Computer: Boneh, Dunworth, Lipton. (See lecture 2 summary below.) Some details about the biology.

Grab bag:

Bionet newsgroup on information theory.
Description of talks at the Princeton workshop '95: Interesting, brief summaries of the results presented at DNA Based Computers: A Mini DIMACS Workshop, April 1995
An annotated on-line bibliography on Molecular Computation maintained by Don Beaver at PSU.
On Constructing a Molecular Computer. Adleman.: Suggestions on how to expand earlier results. Non-DNA molecular computing, "combinatorial chemistry", rational drug design. Some discussion of practical considerations (by the man who actually did a biological computation!)

Lecture Summaries

first meeting: We briefly summarized Adleman's approach to solving Hamilton Path problems -- representing the edges and vertices of a given graph by strings of DNA so that the strings will join into larger strings representing paths and cycles. Mixing such strings, then selecting, among the strings representing paths that have every vertex, the shortest ones (using electrophoresis) will yield a Hamilton path if there is one.

In more detail we discussed Using DNA to Solve Satisfiability, by Lipton. The idea is to generate DNA for all possible assignments to the variables (by mixing strands that glue together in the right ways similar to Adleman's method) and then to select only those that satisfy the formula. To select "X or Y" for instance, filter out the assignments that have X true, filter out those that have Y true, and then take the union of the two assignments (pour the two test tubes together). To do "X AND Y", first filter X, then in the result filter Y. This works for Boolean expressions.

second meeting: On the Computational Power of DNA, Boneh et al. This paper is a little more technical and develops some more powerful ideas. The basic new operation is extension: given a tube of DNA, each strand can be extended by attaching some given strand at one end. For instance, to construct DNA strands representing all 2^N N-bit strings, start with a "null" sequence in a tube T and repeat the following: duplicate the current tube to get tubes T and T'. Extend the strings in T by DNA representing 0 and those in T' by DNA representing 1. Pour T' back into T and continue. After n steps, each string will occur exactly once.

Extension can also be used to solve circuit satisfiability. Given a circuit, consider the corresponding directed acyclic graph where each interior node represents a logic gate and each leaf represents a variable or its negation. The "root" node of the circuit is the value computed. Now use a different approach than mixing and separating, as follows. Start with all possible 2^N inputs in a test tube. Let, say, "x1,x2,x3,...,xN" denote a typical input string. Consider the gates sequentially in order of increasing distance from the leaves. After consideration of the Ith gate (for each I), each string will be extended to "x1,x2,...,xN,g1,g2,...,gI" where gI is the value of the Ith gate when the circuit is given input "x1,..,xN". This is accomplished as follows. If the Ith gate has inputs xJ and xK, for instance, separate the contents of the current tube into 4 tubes T00, T01, T10, T11, depending on the values of xJ and xK. Extend the strings in the tube T00 with f(0,0), where f is the function computed by the Ith gate. Similarly extend the other tubes. Now pour all tubes back into tube T. When all gates have been computed, extract the strings with gK representing 1.

Boneh et al. observe that this idea has other uses. Optimization problems such as finding a satisfying assignment with a maximum number of true variables can be solved by applying the technique with longer subsequences representing "1"s than "0"'s, and then using electrophoresis to select the longest strands at the end of the process. Similar approaches work for MAX-CLIQUE, etc. An exercise: can you figure out how to, given a formula in conjuctive normal form (an "and" of a bunch of clauses, each clause being an "or" of several variables and negations of variables), find the maximum number clauses that can be satisfied simultaneously by one assignment?

Random sampling of satisfying assignments is straightforward as well, because the process produces exactly one copy of each satisfying assignment; by standard results, this means that estimating the *number* of satisfying assignments is also possible. To complexity theorists, this is quite interesting, because the class #P ("sharp-P") which represents hard counting problems (such as counting satisfying assignments), is notoriously intractable. (If one can count satisfying assignments, then any problem in the polynomial hierarchy can be solved with a polynomial number of such queries --- 1991 6 STRUCTCMP, Toda & Ogiwara, Counting Classes are At Least as Hard as the Polynomial-Time Hierarchy.)

A more direct approach solves problems K levels up in the polynomial hierarchy with O(K) extra steps. For instance, suppose a problem is in the second level of the polynomial hierarchy, specifically, suppose the language encoding the problem can be expressed as "L= { X : for all Y, there exists Z such that P(X,Y,X) }" where P is a predicate computable in time polynomial in the size of X. For instance, "L=graphs such that, for all two-colorings of the graph (i.e., all ways of partitioning the vertices into two sets), one of the two colors contains a clique of a given size." Even for complete graphs, this problem (in this case called computing the "Ramsey Number") is notoriously hard.

Boneh et al. use the ability to "complement" (in the set-theoretic sense) the strands of DNA in the tube to solve such problems. For the L above, the method works as follows (assuming we have a circuit to compute P). Using the previous methods, generate a tube with those (X,Y,Z) inputs that satisfy P. Now chop off the "Z" portion of each string, leaving in the tube (X,Y) pairs. Now generate a tube that has exactly those (X,Y) pairs not in the current tube by starting with all possible (X,Y) pairs and filtering out those matching the DNA in the first tube. This leaves in the tube pairs (X,Y) such that there is no Z satisfying P(X,Y,Z). Now chop off the "Y" part of each pair. This leaves the strings "X" such that there exists a "Y" such that for all Z, P(X,Y,Z) is false. Now complement this set of X's. The result is exactly those X's in L.

Finally, it appears that the method gives an approach to cracking DES. The circuit that computes DES, given the 56-bit key K and a 64-bit plain text P, produces a cypher-text C. If a (P,C) pair is known, the key can be found as follows. Compute the circuit simultaneously for all 2^56 keys and the P for which C is known. Use the previous technique that appends the circuit gate values to each input. At the end, each string will encode the inputs, the intermediate gates, and the outputs. Select the string that encodes the correct output and recover the key from it. The paper estimates that this can be done in 3-4 months. Conventional techniques are faster (requiring 2^43 or so operations, maybe one week on a fast computer) but require roughly 2^43 plain-text/cypher pairs, so more or less require that the circuit being cracked is available.

Since only the first few Ramsey numbers are known, I propose the following challenge: use these techniques to compute some new Ramsey numbers. Last modified: Sat Nov 2 05:35:52 EST 2002

Neal Young <Neal.Young@dartmouth.edu>