pydna.assembly
Assembly of sequences by homologous recombination.
Should also be useful for related techniques such as Gibson assembly and fusion PCR. Given a list of sequences (Dseqrecords), all sequences are analyzed for shared homology longer than the set limit.
A graph is constructed where each overlapping region form a node and sequences separating the overlapping regions form edges.
-- A --
catgatctacgtatcgtgt -- B --
atcgtgtactgtcatattc
catattcaaagttct
--x--> A --y--> B --z--> (Graph)
Nodes:
A : atcgtgt
B : catattc
Edges:
x : catgatctacgt
y : actgt
z : aaagttct
The NetworkX package is used to trace linear and circular paths through the graph.
- class pydna.assembly.Assembly(frags: List[Dseqrecord], limit: int = 25, algorithm: Callable[[str, str, int], List[Tuple[int, int, int]]] = common_sub_strings)[source]
Bases:
object
Assembly of a list of linear DNA fragments into linear or circular constructs. The Assembly is meant to replace the Assembly method as it is easier to use. Accepts a list of Dseqrecords (source fragments) to initiate an Assembly object. Several methods are available for analysis of overlapping sequences, graph construction and assembly.
- Parameters:
fragments (list) – a list of Dseqrecord objects.
limit (int, optional) – The shortest shared homology to be considered
algorithm (function, optional) – The algorithm used to determine the shared sequences.
max_nodes (int) – The maximum number of nodes in the graph. This can be tweaked to manage sequences with a high number of shared sub sequences.
Examples
>>> from pydna.assembly import Assembly >>> from pydna.dseqrecord import Dseqrecord >>> a = Dseqrecord("acgatgctatactgCCCCCtgtgctgtgctcta") >>> b = Dseqrecord("tgtgctgtgctctaTTTTTtattctggctgtatc") >>> c = Dseqrecord("tattctggctgtatcGGGGGtacgatgctatactg") >>> x = Assembly((a,b,c), limit=14) >>> x Assembly fragments....: 33bp 34bp 35bp limit(bp)....: 14 G.nodes......: 6 algorithm....: common_sub_strings >>> x.assemble_circular() [Contig(o59), Contig(o59)] >>> x.assemble_circular()[0].seq.watson 'acgatgctatactgCCCCCtgtgctgtgctctaTTTTTtattctggctgtatcGGGGGt'
- assemble_linear(**kwargs)
- assemble_circular(**kwargs)