pydna.assembly

Assembly of sequences by homologous recombination.

Should also be useful for related techniques such as Gibson assembly and fusion PCR. Given a list of sequences (Dseqrecords), all sequences are analyzed for shared homology longer than the set limit.

A graph is constructed where each overlapping region form a node and sequences separating the overlapping regions form edges.

            -- A --
catgatctacgtatcgtgt     -- B --
            atcgtgtactgtcatattc
                        catattcaaagttct



--x--> A --y--> B --z-->   (Graph)

Nodes:

A : atcgtgt
B : catattc

Edges:

x : catgatctacgt
y : actgt
z : aaagttct

The NetworkX package is used to trace linear and circular paths through the graph.

class pydna.assembly.Assembly(frags: List[Dseqrecord], limit: int = 25, algorithm: Callable[[str, str, int], List[Tuple[int, int, int]]] = common_sub_strings)[source]

Bases: object

Assembly of a list of linear DNA fragments into linear or circular constructs. The Assembly is meant to replace the Assembly method as it is easier to use. Accepts a list of Dseqrecords (source fragments) to initiate an Assembly object. Several methods are available for analysis of overlapping sequences, graph construction and assembly.

Parameters:
  • fragments (list) – a list of Dseqrecord objects.

  • limit (int, optional) – The shortest shared homology to be considered

  • algorithm (function, optional) – The algorithm used to determine the shared sequences.

  • max_nodes (int) – The maximum number of nodes in the graph. This can be tweaked to manage sequences with a high number of shared sub sequences.

Examples

>>> from pydna.assembly import Assembly
>>> from pydna.dseqrecord import Dseqrecord
>>> a = Dseqrecord("acgatgctatactgCCCCCtgtgctgtgctcta")
>>> b = Dseqrecord("tgtgctgtgctctaTTTTTtattctggctgtatc")
>>> c = Dseqrecord("tattctggctgtatcGGGGGtacgatgctatactg")
>>> x = Assembly((a,b,c), limit=14)
>>> x
Assembly
fragments....: 33bp 34bp 35bp
limit(bp)....: 14
G.nodes......: 6
algorithm....: common_sub_strings
>>> x.assemble_circular()
[Contig(o59), Contig(o59)]
>>> x.assemble_circular()[0].seq.watson
'acgatgctatactgCCCCCtgtgctgtgctctaTTTTTtattctggctgtatcGGGGGt'
assemble_linear(**kwargs)
assemble_circular(**kwargs)