pydna.all

This module provide most pydna functionality in the local namespace.

Example

>>> from pydna.all import *
>>> Dseq("aaa")
Dseq(-3)
aaa
ttt
>>> Dseqrecord("aaa")
Dseqrecord(-3)
>>> from pydna.all import __all__
>>> __all__
['Anneal', 'pcr', 'Assembly', 'genbank', 'Genbank', 'download_text', 'Dseqrecord', 'Dseq', 'read', 'read_primer', 'parse', 'parse_primers', 'ape', 'primer_design', 'assembly_fragments', 'circular_assembly_fragments', 'eq', 'gbtext_clean', 'PrimerList']
>>>

class pydna.all.Anneal(primers, template, limit=13, **kwargs)[source]

Bases: object

The Anneal class has the following important attributes:

forward_primers

Description of forward_primers.

Type:: list

reverse_primers

Description of reverse_primers.

Type:: list

template

A copy of the template argument. Primers annealing sites has been added as features that can be visualized in a seqence editor such as ApE.

Type:: Dseqrecord

limit

The limit of PCR primer annealing, default is 13 bp.

Type:: int, optional

property products

report(): returns a short report describing if or where primer anneal on the template.

pydna.all.pcr(*args, **kwargs) → Amplicon[source]

pcr is a convenience function for the Anneal class to simplify its usage, especially from the command line. If more than one or no PCR product is formed, a ValueError is raised.

args is any iterable of Dseqrecords or an iterable of iterables of Dseqrecords. args will be greedily flattened.

Parameters:

args (iterable containing sequence objects) – Several arguments are also accepted.
limit (int = 13, optional) – limit length of the annealing part of the primers.

Notes

sequences in args could be of type:

string
Seq
SeqRecord (or subclass)
Dseqrecord (or sublcass)

The last sequence will be assumed to be the template while all preceeding sequences will be assumed to be primers.

This is a powerful function, use with care!

Returns:: product – An pydna.amplicon.Amplicon object representing the PCR product. The direction of the PCR product will be the same as for the template sequence.
Return type:: Amplicon

Examples

>>> from pydna.dseqrecord import Dseqrecord
>>> from pydna.readers import read
>>> from pydna.amplify import pcr
>>> from pydna.primer import Primer
>>> template = Dseqrecord("tacactcaccgtctatcattatctactatcgactgtatcatctgatagcac")
>>> from Bio.SeqRecord import SeqRecord
>>> p1 = Primer("tacactcaccgtctatcattatc")
>>> p2 = Primer("cgactgtatcatctgatagcac").reverse_complement()
>>> pcr(p1, p2, template)
Amplicon(51)
>>> pcr([p1, p2], template)
Amplicon(51)
>>> pcr((p1,p2,), template)
Amplicon(51)
>>>

class pydna.all.Assembly(frags: List[Dseqrecord], limit: int = 25, algorithm: Callable[[str, str, int], List[Tuple[int, int, int]]] = common_sub_strings)[source]

Bases: object

Assembly of a list of linear DNA fragments into linear or circular constructs. The Assembly is meant to replace the Assembly method as it is easier to use. Accepts a list of Dseqrecords (source fragments) to initiate an Assembly object. Several methods are available for analysis of overlapping sequences, graph construction and assembly.

Parameters:

fragments (list) – a list of Dseqrecord objects.
limit (int, optional) – The shortest shared homology to be considered
algorithm (function, optional) – The algorithm used to determine the shared sequences.
max_nodes (int) – The maximum number of nodes in the graph. This can be tweaked to manage sequences with a high number of shared sub sequences.

Examples

>>> from pydna.assembly import Assembly
>>> from pydna.dseqrecord import Dseqrecord
>>> a = Dseqrecord("acgatgctatactgCCCCCtgtgctgtgctcta")
>>> b = Dseqrecord("tgtgctgtgctctaTTTTTtattctggctgtatc")
>>> c = Dseqrecord("tattctggctgtatcGGGGGtacgatgctatactg")
>>> x = Assembly((a,b,c), limit=14)
>>> x
Assembly
fragments....: 33bp 34bp 35bp
limit(bp)....: 14
G.nodes......: 6
algorithm....: common_sub_strings
>>> x.assemble_circular()
[Contig(o59), Contig(o59)]
>>> x.assemble_circular()[0].seq.watson
'acgatgctatactgCCCCCtgtgctgtgctctaTTTTTtattctggctgtatcGGGGGt'

assemble_linear(**kwargs)

assemble_circular(**kwargs)

pydna.all.genbank(accession: str = 'CS570233.1', *args, **kwargs) → GenbankRecord[source]

Download a genbank nuclotide record.

This function takes the same paramenters as the :func:pydna.genbank.Genbank.nucleotide method. The email address stored in the pydna_email environment variable is used. The easiest way set this permanantly is to edit the pydna.ini file. See the documentation of pydna.open_config_folder()

if no accession is given, a very short Genbank entry is used as an example (see below). This can be useful for testing the connection to Genbank.

Please note that this result is also cached by default by settings in the pydna.ini file. See the documentation of pydna.open_config_folder()

LOCUS       CS570233                  14 bp    DNA     linear   PAT 18-MAY-2007
DEFINITION  Sequence 6 from Patent WO2007025016.
ACCESSION   CS570233
VERSION     CS570233.1
KEYWORDS    .
SOURCE      synthetic construct
  ORGANISM  synthetic construct
            other sequences; artificial sequences.
REFERENCE   1
  AUTHORS   Shaw,R.W. and Cottenoir,M.
  TITLE     Inhibition of metallo-beta-lactamase by double-stranded dna
  JOURNAL   Patent: WO 2007025016-A1 6 01-MAR-2007;
            Texas Tech University System (US)
FEATURES             Location/Qualifiers
     source          1..14
                     /organism="synthetic construct"
                     /mol_type="unassigned DNA"
                     /db_xref="taxon:32630"
                     /note="This is a 14bp aptamer inhibitor."
ORIGIN
        1 atgttcctac atga
//

class pydna.all.Genbank(users_email: str, *, tool: str = 'pydna')[source]

Bases: object

Class to facilitate download from genbank. It is easier and quicker to use the pydna.genbank.genbank() function directly.

Parameters:: users_email (string) – Has to be a valid email address. You should always tell Genbanks who you are, so that they can contact you.

Examples

>>> from pydna.genbank import Genbank
>>> gb=Genbank("bjornjobb@gmail.com")
>>> rec = gb.nucleotide("LP002422.1")   # <- entry from genbank
>>> print(len(rec))
1

nucleotide(**kwargs)

pydna.all.download_text(*args, **kwargs)

class pydna.all.Dseqrecord(record, *args, circular=None, n=5e-14, **kwargs)[source]

Bases: SeqRecord

Dseqrecord is a double stranded version of the Biopython SeqRecord [1] class. The Dseqrecord object holds a Dseq object describing the sequence. Additionally, Dseqrecord hold meta information about the sequence in the from of a list of SeqFeatures, in the same way as the SeqRecord does.

The Dseqrecord can be initialized with a string, Seq, Dseq, SeqRecord or another Dseqrecord. The sequence information will be stored in a Dseq object in all cases.

Dseqrecord objects can be read or parsed from sequences in FASTA, EMBL or Genbank formats. See the pydna.readers and pydna.parsers modules for further information.

There is a short representation associated with the Dseqrecord. Dseqrecord(-3) represents a linear sequence of length 2 while Dseqrecord(o7) represents a circular sequence of length 7.

Dseqrecord and Dseq share the same concept of length. This length can be larger than each strand alone if they are staggered as in the example below.

<-- length -->
GATCCTTT
     AAAGCCTAG

Parameters:

record (string, Seq, SeqRecord, Dseq or other Dseqrecord object) – This data will be used to form the seq property
circular (bool, optional) – True or False reflecting the shape of the DNA molecule
linear (bool, optional) – True or False reflecting the shape of the DNA molecule

Examples

>>> from pydna.dseqrecord import Dseqrecord
>>> a=Dseqrecord("aaa")
>>> a
Dseqrecord(-3)
>>> a.seq
Dseq(-3)
aaa
ttt
>>> from pydna.seq import Seq
>>> b=Dseqrecord(Seq("aaa"))
>>> b
Dseqrecord(-3)
>>> b.seq
Dseq(-3)
aaa
ttt
>>> from Bio.SeqRecord import SeqRecord
>>> c=Dseqrecord(SeqRecord(Seq("aaa")))
>>> c
Dseqrecord(-3)
>>> c.seq
Dseq(-3)
aaa
ttt

References

classmethod from_string(record: str = '', *args, circular=False, n=5e-14, **kwargs)[source]: docstring.

classmethod from_SeqRecord(record: SeqRecord, *args, circular=None, n=5e-14, **kwargs)[source]

property circular: The circular property can not be set directly. Use looped()

m()[source]: This method returns the mass of the DNA molecule in grams. This is calculated as the product between the molecular weight of the Dseq object and the

extract_feature(n)[source]

Extracts a feature and creates a new Dseqrecord object.

Parameters:: n (int) – Indicates the feature to extract

Examples

>>> from pydna.dseqrecord import Dseqrecord
>>> a=Dseqrecord("atgtaa")
>>> a.add_feature(2,4)
>>> b=a.extract_feature(0)
>>> b
Dseqrecord(-2)
>>> b.seq
Dseq(-2)
gt
ca

add_feature(x=None, y=None, seq=None, type_='misc', strand=1, *args, **kwargs)[source]

Add a feature of type misc to the feature list of the sequence.

Parameters:

x (int) – Indicates start of the feature
y (int) – Indicates end of the feature

Examples

>>> from pydna.seqrecord import SeqRecord
>>> a=SeqRecord("atgtaa")
>>> a.features
[]
>>> a.add_feature(2,4)
>>> a.features
[SeqFeature(SimpleLocation(ExactPosition(2), ExactPosition(4), strand=1), type='misc', qualifiers=...)]

seguid()[source]

Url safe SEGUID for the sequence.

This checksum is the same as seguid but with base64.urlsafe encoding instead of the normal base64. This means that the characters + and / are replaced with - and _ so that the checksum can be part of a URL.

Examples

>>> from pydna.dseqrecord import Dseqrecord
>>> a = Dseqrecord("aa")
>>> a.seguid()
'ldseguid=TEwydy0ugvGXh3VJnVwgtxoyDQA'

looped()[source]

Circular version of the Dseqrecord object.

The underlying linear Dseq object has to have compatible ends.

Examples

>>> from pydna.dseqrecord import Dseqrecord
>>> a=Dseqrecord("aaa")
>>> a
Dseqrecord(-3)
>>> b=a.looped()
>>> b
Dseqrecord(o3)
>>>

See also

pydna.dseq.Dseq.looped

tolinear()[source]

Returns a linear, blunt copy of a circular Dseqrecord object. The underlying Dseq object has to be circular.

This method is deprecated, use slicing instead. See example below.

Examples

>>> from pydna.dseqrecord import Dseqrecord
>>> a=Dseqrecord("aaa", circular = True)
>>> a
Dseqrecord(o3)
>>> b=a[:]
>>> b
Dseqrecord(-3)
>>>

terminal_transferase(nucleotides='a')[source]: docstring.

format(f='gb')[source]

Returns the sequence as a string using a format supported by Biopython SeqIO [2]. Default is “gb” which is short for Genbank.

Examples

>>> from pydna.dseqrecord import Dseqrecord
>>> x=Dseqrecord("aaa")
>>> x.annotations['date'] = '02-FEB-2013'
>>> x
Dseqrecord(-3)
>>> print(x.format("gb"))
LOCUS       name                       3 bp    DNA     linear   UNK 02-FEB-2013
DEFINITION  description.
ACCESSION   id
VERSION     id
KEYWORDS    .
SOURCE      .
  ORGANISM  .
            .
FEATURES             Location/Qualifiers
ORIGIN
        1 aaa
//

References

write(filename=None, f='gb')[source]

Writes the Dseqrecord to a file using the format f, which must be a format supported by Biopython SeqIO for writing [3]. Default is “gb” which is short for Genbank. Note that Biopython SeqIO reads more formats than it writes.

Filename is the path to the file where the sequece is to be written. The filename is optional, if it is not given, the description property (string) is used together with the format.

If obj is the Dseqrecord object, the default file name will be:

<obj.locus>.<f>

Where <f> is “gb” by default. If the filename already exists and AND the sequence it contains is different, a new file name will be used so that the old file is not lost:

<obj.locus>_NEW.<f>

References

find(other)[source]

find_aminoacids(other)[source]

>>> from pydna.dseqrecord import Dseqrecord
>>> s=Dseqrecord("atgtacgatcgtatgctggttatattttag")
>>> s.seq.translate()
Seq('MYDRMLVIF*')
>>> "RML" in s
True
>>> "MMM" in s
False
>>> s.seq.rc().translate()
Seq('LKYNQHTIVH')
>>> "QHT" in s.rc()
True
>>> "QHT" in s
False
>>> slc = s.find_aa("RML")
>>> slc
slice(9, 18, None)
>>> s[slc]
Dseqrecord(-9)
>>> code = s[slc].seq
>>> code
Dseq(-9)
cgtatgctg
gcatacgac
>>> code.translate()
Seq('RML')

find_aa(other)

>>> from pydna.dseqrecord import Dseqrecord
>>> s=Dseqrecord("atgtacgatcgtatgctggttatattttag")
>>> s.seq.translate()
Seq('MYDRMLVIF*')
>>> "RML" in s
True
>>> "MMM" in s
False
>>> s.seq.rc().translate()
Seq('LKYNQHTIVH')
>>> "QHT" in s.rc()
True
>>> "QHT" in s
False
>>> slc = s.find_aa("RML")
>>> slc
slice(9, 18, None)
>>> s[slc]
Dseqrecord(-9)
>>> code = s[slc].seq
>>> code
Dseq(-9)
cgtatgctg
gcatacgac
>>> code.translate()
Seq('RML')

map_trace_files(pth, limit=25)[source]

linearize(*enzymes)[source]

Similar to :func:cut.

Throws an exception if there is not excactly one cut i.e. none or more than one digestion products.

no_cutters(batch: RestrictionBatch = None)[source]: docstring.

unique_cutters(batch: RestrictionBatch = None)[source]: docstring.

once_cutters(batch: RestrictionBatch = None)[source]: docstring.

twice_cutters(batch: RestrictionBatch = None)[source]: docstring.

n_cutters(n=3, batch: RestrictionBatch = None)[source]: docstring.

cutters(batch: RestrictionBatch = None)[source]: docstring.

number_of_cuts(*enzymes)[source]: The number of cuts by digestion with the Restriction enzymes contained in the iterable.

cas9(RNA: str)[source]: docstring.

reverse_complement()[source]

Reverse complement.

Examples

>>> from pydna.dseqrecord import Dseqrecord
>>> a=Dseqrecord("ggaatt")
>>> a
Dseqrecord(-6)
>>> a.seq
Dseq(-6)
ggaatt
ccttaa
>>> a.reverse_complement().seq
Dseq(-6)
aattcc
ttaagg
>>>

See also

pydna.dseq.Dseq.reverse_complement

rc()

Reverse complement.

Examples

>>> from pydna.dseqrecord import Dseqrecord
>>> a=Dseqrecord("ggaatt")
>>> a
Dseqrecord(-6)
>>> a.seq
Dseq(-6)
ggaatt
ccttaa
>>> a.reverse_complement().seq
Dseq(-6)
aattcc
ttaagg
>>>

See also

pydna.dseq.Dseq.reverse_complement

synced(ref, limit=25)[source]

This method returns a new circular sequence (Dseqrecord object), which has been rotated in such a way that there is maximum overlap between the sequence and ref, which may be a string, Biopython Seq, SeqRecord object or another Dseqrecord object.

The reason for using this could be to rotate a new recombinant plasmid so that it starts at the same position after cloning. See the example below:

Examples

>>> from pydna.dseqrecord import Dseqrecord
>>> a=Dseqrecord("gaat", circular=True)
>>> a.seq
Dseq(o4)
gaat
ctta
>>> d = a[2:] + a[:2]
>>> d.seq
Dseq(-4)
atga
tact
>>> insert=Dseqrecord("CCC")
>>> recombinant = (d+insert).looped()
>>> recombinant.seq
Dseq(o7)
atgaCCC
tactGGG
>>> recombinant.synced(a).seq
Dseq(o7)
gaCCCat
ctGGGta

upper()[source]

Returns an uppercase copy. >>> from pydna.dseqrecord import Dseqrecord >>> my_seq = Dseqrecord(“aAa”) >>> my_seq.seq Dseq(-3) aAa tTt >>> upper = my_seq.upper() >>> upper.seq Dseq(-3) AAA TTT >>>

Returns:: Dseqrecord object in uppercase
Return type:: Dseqrecord

See also

pydna.dseqrecord.Dseqrecord.lower

lower()[source]

>>> from pydna.dseqrecord import Dseqrecord
>>> my_seq = Dseqrecord("aAa")
>>> my_seq.seq
Dseq(-3)
aAa
tTt
>>> upper = my_seq.upper()
>>> upper.seq
Dseq(-3)
AAA
TTT
>>> lower = my_seq.lower()
>>> lower
Dseqrecord(-3)
>>>

Returns:: Dseqrecord object in lowercase
Return type:: Dseqrecord

See also

pydna.dseqrecord.Dseqrecord.upper

orfs(minsize=300)[source]: docstring.

orfs_to_features(minsize=300)[source]: docstring.

copy_gb_to_clipboard()[source]: docstring.

copy_fasta_to_clipboard()[source]: docstring.

figure(feature=0, highlight='\x1b[48;5;11m', plain='\x1b[0m')[source]: docstring.

shifted(shift)[source]

Circular Dseqrecord with a new origin <shift>.

This only works on circular Dseqrecords. If we consider the following circular sequence:

GAAAT <-- watson strand

CTTTA <-- crick strand

The T and the G on the watson strand are linked together as well as the A and the C of the of the crick strand.

if shift is 1, this indicates a new origin at position 1:

new origin at the | symbol:

G|AAAT

C|TTTA

new sequence:

AAATG

TTTAC

Examples

>>> from pydna.dseqrecord import Dseqrecord
>>> a=Dseqrecord("aaat",circular=True)
>>> a
Dseqrecord(o4)
>>> a.seq
Dseq(o4)
aaat
ttta
>>> b=a.shifted(1)
>>> b
Dseqrecord(o4)
>>> b.seq
Dseq(o4)
aata
ttat

cut(*enzymes)[source]

Digest a Dseqrecord object with one or more restriction enzymes.

returns a list of linear Dseqrecords. If there are no cuts, an empty list is returned.

See also Dseq.cut() :param enzymes: A Bio.Restriction.XXX restriction object or iterable of such. :type enzymes: enzyme object or iterable of such objects

Returns:: Dseqrecord_frags – list of Dseqrecord objects formed by the digestion
Return type:: list

Examples

>>> from pydna.dseqrecord import Dseqrecord
>>> a=Dseqrecord("ggatcc")
>>> from Bio.Restriction import BamHI
>>> a.cut(BamHI)
(Dseqrecord(-5), Dseqrecord(-5))
>>> frag1, frag2 = a.cut(BamHI)
>>> frag1.seq
Dseq(-5)
g
cctag
>>> frag2.seq
Dseq(-5)
gatcc
    g

apply_cut(left_cut, right_cut)[source]

class pydna.all.Dseq(watson: str | bytes, crick: str | bytes | None = None, ovhg=None, circular=False, pos=0)[source]

Bases: Seq

Dseq holds information for a double stranded DNA fragment.

Dseq also holds information describing the topology of the DNA fragment (linear or circular).

Parameters:

watson (str) – a string representing the watson (sense) DNA strand.
crick (str, optional) – a string representing the crick (antisense) DNA strand.
ovhg (int, optional) – A positive or negative number to describe the stagger between the watson and crick strands. see below for a detailed explanation.
linear (bool, optional) – True indicates that sequence is linear, False that it is circular.
circular (bool, optional) – True indicates that sequence is circular, False that it is linear.

Examples

Dseq is a subclass of the Biopython Seq object. It stores two strings representing the watson (sense) and crick(antisense) strands. two properties called linear and circular, and a numeric value ovhg (overhang) describing the stagger for the watson and crick strand in the 5’ end of the fragment.

The most common usage is probably to create a Dseq object as a part of a Dseqrecord object (see pydna.dseqrecord.Dseqrecord).

There are three ways of creating a Dseq object directly listed below, but you can also use the function Dseq.from_full_sequence_and_overhangs() to create a Dseq:

Only one argument (string):

>>> from pydna.dseq import Dseq
>>> Dseq("aaa")
Dseq(-3)
aaa
ttt

The given string will be interpreted as the watson strand of a blunt, linear double stranded sequence object. The crick strand is created automatically from the watson strand.

Two arguments (string, string):

>>> from pydna.dseq import Dseq
>>> Dseq("gggaaat","ttt")
Dseq(-7)
gggaaat
   ttt

If both watson and crick are given, but not ovhg an attempt will be made to find the best annealing between the strands. There are limitations to this. For long fragments it is quite slow. The length of the annealing sequences have to be at least half the length of the shortest of the strands.

Three arguments (string, string, ovhg=int):

The ovhg parameter is an integer describing the length of the crick strand overhang in the 5’ end of the molecule.

The ovhg parameter controls the stagger at the five prime end:

dsDNA       overhang

  nnn...    2
nnnnn...

 nnnn...    1
nnnnn...

nnnnn...    0
nnnnn...

nnnnn...   -1
 nnnn...

nnnnn...   -2
  nnn...

Example of creating Dseq objects with different amounts of stagger:

>>> Dseq(watson="agt", crick="actta", ovhg=-2)
Dseq(-7)
agt
  attca
>>> Dseq(watson="agt",crick="actta",ovhg=-1)
Dseq(-6)
agt
 attca
>>> Dseq(watson="agt",crick="actta",ovhg=0)
Dseq(-5)
agt
attca
>>> Dseq(watson="agt",crick="actta",ovhg=1)
Dseq(-5)
 agt
attca
>>> Dseq(watson="agt",crick="actta",ovhg=2)
Dseq(-5)
  agt
attca

If the ovhg parameter is specified a crick strand also needs to be supplied, otherwise an exception is raised.

>>> Dseq(watson="agt", ovhg=2)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python2.7/dist-packages/pydna_/dsdna.py", line 169, in __init__
    else:
ValueError: ovhg defined without crick strand!

The shape of the fragment is set by circular = True, False

Note that both ends of the DNA fragment has to be compatible to set circular = True.

>>> Dseq("aaa","ttt")
Dseq(-3)
aaa
ttt
>>> Dseq("aaa","ttt",ovhg=0)
Dseq(-3)
aaa
ttt
>>> Dseq("aaa","ttt",ovhg=1)
Dseq(-4)
 aaa
ttt
>>> Dseq("aaa","ttt",ovhg=-1)
Dseq(-4)
aaa
 ttt
>>> Dseq("aaa", "ttt", circular = True , ovhg=0)
Dseq(o3)
aaa
ttt

>>> a=Dseq("tttcccc","aaacccc")
>>> a
Dseq(-11)
    tttcccc
ccccaaa
>>> a.ovhg
4

>>> b=Dseq("ccccttt","ccccaaa")
>>> b
Dseq(-11)
ccccttt
    aaacccc
>>> b.ovhg
-4
>>>

Coercing to string

>>> str(a)
'ggggtttcccc'

A Dseq object can be longer that either the watson or crick strands.

<-- length -->
GATCCTTT
     AAAGCCTAG

<-- length -->
      GATCCTTT
AAAGCCCTA

The slicing of a linear Dseq object works mostly as it does for a string.

>>> s="ggatcc"
>>> s[2:3]
'a'
>>> s[2:4]
'at'
>>> s[2:4:-1]
''
>>> s[::2]
'gac'
>>> from pydna.dseq import Dseq
>>> d=Dseq(s, circular=False)
>>> d[2:3]
Dseq(-1)
a
t
>>> d[2:4]
Dseq(-2)
at
ta
>>> d[2:4:-1]
Dseq(-0)


>>> d[::2]
Dseq(-3)
gac
ctg

The slicing of a circular Dseq object has a slightly different meaning.

>>> s="ggAtCc"
>>> d=Dseq(s, circular=True)
>>> d
Dseq(o6)
ggAtCc
ccTaGg
>>> d[4:3]
Dseq(-5)
CcggA
GgccT

The slice [X:X] produces an empty slice for a string, while this will return the linearized sequence starting at X:

>>> s="ggatcc"
>>> d=Dseq(s, circular=True)
>>> d
Dseq(o6)
ggatcc
cctagg
>>> d[3:3]
Dseq(-6)
tccgga
aggcct
>>>

See also

pydna.dseqrecord.Dseqrecord

trunc = 30

classmethod quick(watson: str, crick: str, ovhg=0, circular=False, pos=0)[source]

classmethod from_string(dna: str, *args, circular=False, **kwargs)[source]

classmethod from_representation(dsdna: str, *args, **kwargs)[source]

classmethod from_full_sequence_and_overhangs(full_sequence: str, crick_ovhg: int, watson_ovhg: int)[source]

Create a linear Dseq object from a full sequence and the 3’ overhangs of each strand.

The order of the parameters is like this because the 3’ overhang of the crick strand is the one on the left side of the sequence.

Parameters:

full_sequence (str) – The full sequence of the Dseq object.
crick_ovhg (int) – The overhang of the crick strand in the 3’ end. Equivalent to Dseq.ovhg.
watson_ovhg (int) – The overhang of the watson strand in the 5’ end.

Returns:

A Dseq object.

Return type:

Dseq

Examples

>>> Dseq.from_full_sequence_and_overhangs('AAAAAA', crick_ovhg=2, watson_ovhg=2)
Dseq(-6)
  AAAA
TTTT
>>> Dseq.from_full_sequence_and_overhangs('AAAAAA', crick_ovhg=-2, watson_ovhg=2)
Dseq(-6)
AAAAAA
  TT
>>> Dseq.from_full_sequence_and_overhangs('AAAAAA', crick_ovhg=2, watson_ovhg=-2)
Dseq(-6)
  AA
TTTTTT
>>> Dseq.from_full_sequence_and_overhangs('AAAAAA', crick_ovhg=-2, watson_ovhg=-2)
Dseq(-6)
AAAA
  TTTT

mw() → float[source]

This method returns the molecular weight of the DNA molecule in g/mol. The following formula is used:

MW = (A x 313.2) + (T x 304.2) +
     (C x 289.2) + (G x 329.2) +
     (N x 308.9) + 79.0

upper() → DseqType[source]

Return an upper case copy of the sequence.

>>> from pydna.dseq import Dseq
>>> my_seq = Dseq("aAa")
>>> my_seq
Dseq(-3)
aAa
tTt
>>> my_seq.upper()
Dseq(-3)
AAA
TTT

Returns:: Dseq object in uppercase
Return type:: Dseq

See also

pydna.dseq.Dseq.lower

lower() → DseqType[source]

Return a lower case copy of the sequence.

>>> from pydna.dseq import Dseq
>>> my_seq = Dseq("aAa")
>>> my_seq
Dseq(-3)
aAa
tTt
>>> my_seq.lower()
Dseq(-3)
aaa
ttt

Returns:: Dseq object in lowercase
Return type:: Dseq

See also

pydna.dseq.Dseq.upper

find(sub: _SeqAbstractBaseClass | str | bytes, start=0, end=_sys.maxsize) → int[source]

This method behaves like the python string method of the same name.

Returns an integer, the index of the first occurrence of substring argument sub in the (sub)sequence given by [start:end].

Returns -1 if the subsequence is NOT found.

Parameters:

sub (string or Seq object) – a string or another Seq object to look for.
start (int, optional) – slice start.
end (int, optional) – slice end.

Examples

>>> from pydna.dseq import Dseq
>>> seq = Dseq("atcgactgacgtgtt")
>>> seq
Dseq(-15)
atcgactgacgtgtt
tagctgactgcacaa
>>> seq.find("gac")
3
>>> seq = Dseq(watson="agt",crick="actta",ovhg=-2)
>>> seq
Dseq(-7)
agt
  attca
>>> seq.find("taa")
2

reverse_complement() → Dseq[source]

Dseq object where watson and crick have switched places.

This represents the same double stranded sequence.

Examples

>>> from pydna.dseq import Dseq
>>> a=Dseq("catcgatc")
>>> a
Dseq(-8)
catcgatc
gtagctag
>>> b=a.reverse_complement()
>>> b
Dseq(-8)
gatcgatg
ctagctac
>>>

rc() → Dseq

Dseq object where watson and crick have switched places.

This represents the same double stranded sequence.

Examples

>>> from pydna.dseq import Dseq
>>> a=Dseq("catcgatc")
>>> a
Dseq(-8)
catcgatc
gtagctag
>>> b=a.reverse_complement()
>>> b
Dseq(-8)
gatcgatg
ctagctac
>>>

shifted(shift: int) → DseqType[source]: Shifted version of a circular Dseq object.

looped() → DseqType[source]

Circularized Dseq object.

This can only be done if the two ends are compatible, otherwise a TypeError is raised.

Examples

>>> from pydna.dseq import Dseq
>>> a=Dseq("catcgatc")
>>> a
Dseq(-8)
catcgatc
gtagctag
>>> a.looped()
Dseq(o8)
catcgatc
gtagctag
>>> a.T4("t")
Dseq(-8)
catcgat
 tagctag
>>> a.T4("t").looped()
Dseq(o7)
catcgat
gtagcta
>>> a.T4("a")
Dseq(-8)
catcga
  agctag
>>> a.T4("a").looped()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python2.7/dist-packages/pydna/dsdna.py", line 357, in looped
    if type5 == type3 and str(sticky5) == str(rc(sticky3)):
TypeError: DNA cannot be circularized.
5' and 3' sticky ends not compatible!
>>>

tolinear() → DseqType[source]

Returns a blunt, linear copy of a circular Dseq object. This can only be done if the Dseq object is circular, otherwise a TypeError is raised.

This method is deprecated, use slicing instead. See example below.

Examples

>>> from pydna.dseq import Dseq
>>> a=Dseq("catcgatc", circular=True)
>>> a
Dseq(o8)
catcgatc
gtagctag
>>> a[:]
Dseq(-8)
catcgatc
gtagctag
>>>

five_prime_end() → Tuple[str, str][source]

Returns a tuple describing the structure of the 5’ end of the DNA fragment

Examples

>>> from pydna.dseq import Dseq
>>> a=Dseq("aaa", "ttt")
>>> a
Dseq(-3)
aaa
ttt
>>> a.five_prime_end()
('blunt', '')
>>> a=Dseq("aaa", "ttt", ovhg=1)
>>> a
Dseq(-4)
 aaa
ttt
>>> a.five_prime_end()
("3'", 't')
>>> a=Dseq("aaa", "ttt", ovhg=-1)
>>> a
Dseq(-4)
aaa
 ttt
>>> a.five_prime_end()
("5'", 'a')
>>>

See also

pydna.dseq.Dseq.three_prime_end

three_prime_end() → Tuple[str, str][source]

Returns a tuple describing the structure of the 5’ end of the DNA fragment

>>> from pydna.dseq import Dseq
>>> a=Dseq("aaa", "ttt")
>>> a
Dseq(-3)
aaa
ttt
>>> a.three_prime_end()
('blunt', '')
>>> a=Dseq("aaa", "ttt", ovhg=1)
>>> a
Dseq(-4)
 aaa
ttt
>>> a.three_prime_end()
("3'", 'a')
>>> a=Dseq("aaa", "ttt", ovhg=-1)
>>> a
Dseq(-4)
aaa
 ttt
>>> a.three_prime_end()
("5'", 't')
>>>

See also

pydna.dseq.Dseq.five_prime_end

watson_ovhg() → int[source]: Returns the overhang of the watson strand at the three prime.

fill_in(nucleotides: None | str = None) → Dseq[source]

Fill in of five prime protruding end with a DNA polymerase that has only DNA polymerase activity (such as exo-klenow [4]) and any combination of A, G, C or T. Default are all four nucleotides together.

Parameters:: nucleotides (str) –

Examples

>>> from pydna.dseq import Dseq
>>> a=Dseq("aaa", "ttt")
>>> a
Dseq(-3)
aaa
ttt
>>> a.fill_in()
Dseq(-3)
aaa
ttt
>>> b=Dseq("caaa", "cttt")
>>> b
Dseq(-5)
caaa
 tttc
>>> b.fill_in()
Dseq(-5)
caaag
gtttc
>>> b.fill_in("g")
Dseq(-5)
caaag
gtttc
>>> b.fill_in("tac")
Dseq(-5)
caaa
 tttc
>>> c=Dseq("aaac", "tttg")
>>> c
Dseq(-5)
 aaac
gttt
>>> c.fill_in()
Dseq(-5)
 aaac
gttt
>>>

References

transcribe() → Seq[source]

Transcribe a DNA sequence into RNA and return the RNA sequence as a new Seq object.

Following the usual convention, the sequence is interpreted as the coding strand of the DNA double helix, not the template strand. This means we can get the RNA sequence just by switching T to U.

>>> from Bio.Seq import Seq
>>> coding_dna = Seq("ATGGCCATTGTAATGGGCCGCTGAAAGGGTGCCCGATAG")
>>> coding_dna
Seq('ATGGCCATTGTAATGGGCCGCTGAAAGGGTGCCCGATAG')
>>> coding_dna.transcribe()
Seq('AUGGCCAUUGUAAUGGGCCGCUGAAAGGGUGCCCGAUAG')

The sequence is modified in-place and returned if inplace is True:

>>> sequence = MutableSeq("ATGGCCATTGTAATGGGCCGCTGAAAGGGTGCCCGATAG")
>>> sequence
MutableSeq('ATGGCCATTGTAATGGGCCGCTGAAAGGGTGCCCGATAG')
>>> sequence.transcribe()
MutableSeq('AUGGCCAUUGUAAUGGGCCGCUGAAAGGGUGCCCGAUAG')
>>> sequence
MutableSeq('ATGGCCATTGTAATGGGCCGCTGAAAGGGTGCCCGATAG')

>>> sequence.transcribe(inplace=True)
MutableSeq('AUGGCCAUUGUAAUGGGCCGCUGAAAGGGUGCCCGAUAG')
>>> sequence
MutableSeq('AUGGCCAUUGUAAUGGGCCGCUGAAAGGGUGCCCGAUAG')

As Seq objects are immutable, a TypeError is raised if transcribe is called on a Seq object with inplace=True.

Trying to transcribe an RNA sequence has no effect. If you have a nucleotide sequence which might be DNA or RNA (or even a mixture), calling the transcribe method will ensure any T becomes U.

Trying to transcribe a protein sequence will replace any T for Threonine with U for Selenocysteine, which has no biologically plausible rational.

>>> from Bio.Seq import Seq
>>> my_protein = Seq("MAIVMGRT")
>>> my_protein.transcribe()
Seq('MAIVMGRU')

translate(table='Standard', stop_symbol='*', to_stop=False, cds=False, gap='-') → Seq[source]: Translate..

mung() → Dseq[source]

Simulates treatment a nuclease with 5’-3’ and 3’-5’ single strand specific exonuclease activity (such as mung bean nuclease [5])

    ggatcc    ->     gatcc
     ctaggg          ctagg

     ggatcc   ->      ggatc
    tcctag            cctag

>>> from pydna.dseq import Dseq
>>> b=Dseq("caaa", "cttt")
>>> b
Dseq(-5)
caaa
 tttc
>>> b.mung()
Dseq(-3)
aaa
ttt
>>> c=Dseq("aaac", "tttg")
>>> c
Dseq(-5)
 aaac
gttt
>>> c.mung()
Dseq(-3)
aaa
ttt

References

T4(nucleotides=None) → Dseq[source]

Fill in five prime protruding ends and chewing back three prime protruding ends by a DNA polymerase providing both 5’-3’ DNA polymerase activity and 3’-5’ nuclease acitivty (such as T4 DNA polymerase). This can be done in presence of any combination of the four A, G, C or T. Removing one or more nucleotides can facilitate engineering of sticky ends. Default are all four nucleotides together.

Parameters:: nucleotides (str) –

Examples

>>> from pydna.dseq import Dseq
>>> a=Dseq("gatcgatc")
>>> a
Dseq(-8)
gatcgatc
ctagctag
>>> a.T4()
Dseq(-8)
gatcgatc
ctagctag
>>> a.T4("t")
Dseq(-8)
gatcgat
 tagctag
>>> a.T4("a")
Dseq(-8)
gatcga
  agctag
>>> a.T4("g")
Dseq(-8)
gatcg
   gctag
>>>

t4(nucleotides=None) → Dseq

Fill in five prime protruding ends and chewing back three prime protruding ends by a DNA polymerase providing both 5’-3’ DNA polymerase activity and 3’-5’ nuclease acitivty (such as T4 DNA polymerase). This can be done in presence of any combination of the four A, G, C or T. Removing one or more nucleotides can facilitate engineering of sticky ends. Default are all four nucleotides together.

Parameters:: nucleotides (str) –

Examples

>>> from pydna.dseq import Dseq
>>> a=Dseq("gatcgatc")
>>> a
Dseq(-8)
gatcgatc
ctagctag
>>> a.T4()
Dseq(-8)
gatcgatc
ctagctag
>>> a.T4("t")
Dseq(-8)
gatcgat
 tagctag
>>> a.T4("a")
Dseq(-8)
gatcga
  agctag
>>> a.T4("g")
Dseq(-8)
gatcg
   gctag
>>>

exo1_front(n=1) → DseqType[source]: 5’-3’ resection at the start (left side) of the molecule.

exo1_end(n=1) → DseqType[source]: 5’-3’ resection at the end (right side) of the molecule.

no_cutters(batch: RestrictionBatch | None = None) → RestrictionBatch[source]: Enzymes in a RestrictionBatch not cutting sequence.

unique_cutters(batch: RestrictionBatch | None = None) → RestrictionBatch[source]: Enzymes in a RestrictionBatch cutting sequence once.

once_cutters(batch: RestrictionBatch | None = None) → RestrictionBatch: Enzymes in a RestrictionBatch cutting sequence once.

twice_cutters(batch: RestrictionBatch | None = None) → RestrictionBatch[source]: Enzymes in a RestrictionBatch cutting sequence twice.

n_cutters(n=3, batch: RestrictionBatch | None = None) → RestrictionBatch[source]: Enzymes in a RestrictionBatch cutting n times.

cutters(batch: RestrictionBatch | None = None) → RestrictionBatch[source]: Enzymes in a RestrictionBatch cutting sequence at least once.

seguid() → str[source]: SEGUID checksum for the sequence.

isblunt() → bool[source]

isblunt.

Return True if Dseq is linear and blunt and false if staggered or circular.

Examples

>>> from pydna.dseq import Dseq
>>> a=Dseq("gat")
>>> a
Dseq(-3)
gat
cta
>>> a.isblunt()
True
>>> a=Dseq("gat", "atcg")
>>> a
Dseq(-4)
 gat
gcta
>>> a.isblunt()
False
>>> a=Dseq("gat", "gatc")
>>> a
Dseq(-4)
gat
ctag
>>> a.isblunt()
False
>>> a=Dseq("gat", circular=True)
>>> a
Dseq(o3)
gat
cta
>>> a.isblunt()
False

cas9(RNA: str) → Tuple[slice, ...][source]: docstring.

terminal_transferase(nucleotides='a') → Dseq[source]: docstring.

cut(*enzymes: EnzymesType) → Tuple[DseqType, ...][source]

Returns a list of linear Dseq fragments produced in the digestion. If there are no cuts, an empty list is returned.

Parameters:: enzymes (enzyme object or iterable of such objects) – A Bio.Restriction.XXX restriction objects or iterable.
Returns:: frags – list of Dseq objects formed by the digestion
Return type:: list

Examples

>>> from pydna.dseq import Dseq
>>> seq=Dseq("ggatccnnngaattc")
>>> seq
Dseq(-15)
ggatccnnngaattc
cctaggnnncttaag
>>> from Bio.Restriction import BamHI,EcoRI
>>> type(seq.cut(BamHI))
<class 'tuple'>
>>> for frag in seq.cut(BamHI): print(repr(frag))
Dseq(-5)
g
cctag
Dseq(-14)
gatccnnngaattc
    gnnncttaag
>>> seq.cut(EcoRI, BamHI) ==  seq.cut(BamHI, EcoRI)
True
>>> a,b,c = seq.cut(EcoRI, BamHI)
>>> a+b+c
Dseq(-15)
ggatccnnngaattc
cctaggnnncttaag
>>>

cutsite_is_valid(cutsite: Tuple[Tuple[int, int], _AbstractCut | None]) → bool[source]: Returns False if: - Cut positions fall outside the sequence (could be moved to Biopython) - Overhang is not double stranded - Recognition site is not double stranded or is outside the sequence - For enzymes that cut twice, it checks that at least one possibility is valid

get_cutsites(*enzymes: EnzymesType) → List[Tuple[Tuple[int, int], _AbstractCut | None]][source]

Returns a list of cutsites, represented represented as ((cut_watson, ovhg), enz):

cut_watson is a positive integer contained in [0,len(seq)), where seq is the sequence that will be cut. It represents the position of the cut on the watson strand, using the full sequence as a reference. By “full sequence” I mean the one you would get from str(Dseq).
ovhg is the overhang left after the cut. It has the same meaning as ovhg in the Bio.Restriction enzyme objects, or pydna’s Dseq property.
enz is the enzyme object. It’s not necessary to perform the cut, but can be
used to keep track of which enzyme was used.

Cuts are only returned if the recognition site and overhang are on the double-strand part of the sequence.

Parameters:: enzymes (Union[_RestrictionBatch,list[_AbstractCut]]) –
Return type:: list[tuple[tuple[int,int], _AbstractCut]]

Examples

>>> from Bio.Restriction import EcoRI
>>> from pydna.dseq import Dseq
>>> seq = Dseq('AAGAATTCAAGAATTC')
>>> seq.get_cutsites(EcoRI)
[((3, -4), EcoRI), ((11, -4), EcoRI)]

cut_watson is defined with respect to the “full sequence”, not the watson strand:

>>> dseq = Dseq.from_full_sequence_and_overhangs('aaGAATTCaa', 1, 0)
>>> dseq
Dseq(-10)
 aGAATTCaa
ttCTTAAGtt
>>> dseq.get_cutsites([EcoRI])
[((3, -4), EcoRI)]

Cuts are only returned if the recognition site and overhang are on the double-strand part of the sequence.

>>> Dseq('GAATTC').get_cutsites([EcoRI])
[((1, -4), EcoRI)]
>>> Dseq.from_full_sequence_and_overhangs('GAATTC', -1, 0).get_cutsites([EcoRI])
[]

left_end_position() → Tuple[int, int][source]

The index in the full sequence of the watson and crick start positions.

full sequence (str(self)) for all three cases is AAA

AAA              AA               AAT
 TT             TTT               TTT
Returns (0, 1)  Returns (1, 0)    Returns (0, 0)

right_end_position() → Tuple[int, int][source]

The index in the full sequence of the watson and crick end positions.

full sequence (str(self)) for all three cases is AAA

` AAA AA AAA TT TTT TTT Returns (3, 2) Returns (2, 3) Returns (3, 3) `

get_cut_parameters(cut: Tuple[Tuple[int, int], _AbstractCut | None] | None, is_left: bool) → Tuple[int, int, int][source]

For a given cut expressed as ((cut_watson, ovhg), enz), returns a tuple (cut_watson, cut_crick, ovhg).

cut_watson: see get_cutsites docs
cut_crick: equivalent of cut_watson in the crick strand
ovhg: see get_cutsites docs

The cut can be None if it represents the left or right end of the sequence. Then it will return the position of the watson and crick ends with respect to the “full sequence”. The is_left parameter is only used in this case.

apply_cut(left_cut: Tuple[Tuple[int, int], _AbstractCut | None], right_cut: Tuple[Tuple[int, int], _AbstractCut | None]) → Dseq[source]

Extracts a subfragment of the sequence between two cuts.

For more detail see the documentation of get_cutsite_pairs.

Parameters:

left_cut (Union[tuple[tuple[int,int], _AbstractCut], None]) –
right_cut (Union[tuple[tuple[int,int], _AbstractCut], None]) –

Return type:

Dseq

Examples

>>> from Bio.Restriction import EcoRI
>>> from pydna.dseq import Dseq
>>> dseq = Dseq('aaGAATTCaaGAATTCaa')
>>> cutsites = dseq.get_cutsites([EcoRI])
>>> cutsites
[((3, -4), EcoRI), ((11, -4), EcoRI)]
>>> p1, p2, p3 = dseq.get_cutsite_pairs(cutsites)
>>> p1
(None, ((3, -4), EcoRI))
>>> dseq.apply_cut(*p1)
Dseq(-7)
aaG
ttCTTAA
>>> p2
(((3, -4), EcoRI), ((11, -4), EcoRI))
>>> dseq.apply_cut(*p2)
Dseq(-12)
AATTCaaG
    GttCTTAA
>>> p3
(((11, -4), EcoRI), None)
>>> dseq.apply_cut(*p3)
Dseq(-7)
AATTCaa
    Gtt

>>> dseq = Dseq('TTCaaGAA', circular=True)
>>> cutsites = dseq.get_cutsites([EcoRI])
>>> cutsites
[((6, -4), EcoRI)]
>>> pair = dseq.get_cutsite_pairs(cutsites)[0]
>>> pair
(((6, -4), EcoRI), ((6, -4), EcoRI))
>>> dseq.apply_cut(*pair)
Dseq(-12)
AATTCaaG
    GttCTTAA

Returns pairs of cutsites that render the edges of the resulting fragments.

A fragment produced by restriction is represented by a tuple of length 2 that may contain cutsites or None:

Two cutsites: represents the extraction of a fragment between those two cutsites, in that orientation. To represent the opening of a circular molecule with a single cutsite, we put the same cutsite twice.

None, cutsite: represents the extraction of a fragment between the left edge of linear sequence and the cutsite.

cutsite, None: represents the extraction of a fragment between the cutsite and the right edge of a linear sequence.

Parameters:: cutsites (list[tuple[tuple[int,int], _AbstractCut]]) –
Return type:: list[tuple[tuple[tuple[int,int], _AbstractCut]|None],tuple[tuple[int,int], _AbstractCut]|None]

Examples

>>> from Bio.Restriction import EcoRI
>>> from pydna.dseq import Dseq
>>> dseq = Dseq('aaGAATTCaaGAATTCaa')
>>> cutsites = dseq.get_cutsites([EcoRI])
>>> cutsites
[((3, -4), EcoRI), ((11, -4), EcoRI)]
>>> dseq.get_cutsite_pairs(cutsites)
[(None, ((3, -4), EcoRI)), (((3, -4), EcoRI), ((11, -4), EcoRI)), (((11, -4), EcoRI), None)]

>>> dseq = Dseq('TTCaaGAA', circular=True)
>>> cutsites = dseq.get_cutsites([EcoRI])
>>> cutsites
[((6, -4), EcoRI)]
>>> dseq.get_cutsite_pairs(cutsites)
[(((6, -4), EcoRI), ((6, -4), EcoRI))]

pydna.all.read(data, ds=True)[source]

This function is similar the parse() function but expects one and only one sequence or and exception is thrown.

Parameters:

data (string) – see below
ds (bool) – Double stranded or single stranded DNA, if True return Dseqrecord objects, else Bio.SeqRecord objects.

Returns:

contains the first Dseqrecord or SeqRecord object parsed.

Return type:

Dseqrecord

Notes

The data parameter is similar to the data parameter for parse().

See also

parse

pydna.all.read_primer(data)[source]: Use this function to read a primer sequence from a string or a local file. The usage is similar to the parse_primer() function.

pydna.all.parse(data, ds=True)[source]

Return all DNA sequences found in data.

If no sequences are found, an empty list is returned. This is a greedy function, use carefully.

Parameters:

data (string or iterable) –
The data parameter is a string containing:
1. an absolute path to a local file. The file will be read in text mode and parsed for EMBL, FASTA and Genbank sequences. Can be a string or a Path object.
2. a string containing one or more sequences in EMBL, GENBANK, or FASTA format. Mixed formats are allowed.
3. data can be a list or other iterable where the elements are 1 or 2
ds (bool) – If True double stranded Dseqrecord objects are returned. If False single stranded Bio.SeqRecord [6] objects are returned.

Returns:

contains Dseqrecord or SeqRecord objects

Return type:

list

References

See also

read

pydna.all.parse_primers(data)[source]: docstring.

pydna.all.ape(*args, **kwargs)[source]: docstring.

pydna.all.primer_design(template, fp=None, rp=None, limit=13, target_tm=55.0, tm_func=_tm_default, estimate_function=None, **kwargs)[source]

This function designs a forward primer and a reverse primer for PCR amplification of a given template sequence.

The template argument is a Dseqrecord object or equivalent containing the template sequence.

The optional fp and rp arguments can contain an existing primer for the sequence (either the forward or reverse primer). One or the other primers can be specified, not both (since then there is nothing to design!, use the pydna.amplify.pcr function instead).

The limit argument is the minimum length of the primer. The default value is 13.

If one of the primers is given, the other primer is designed to match in terms of Tm. If both primers are designed, they will be designed to target_tm

tm_func is a function that takes an ascii string representing an oligonuceotide as argument and returns a float. Some useful functions can be found in the pydna.tm module, but can be substituted for a custom made function.

estimate_function is a tm_func-like function that is used to get a first guess for the primer design, that is then used as starting point for the final result. This is useful when the tm_func function is slow to calculate (e.g. it relies on an external API, such as the NEB primer design API). The estimate_function should be faster than the tm_func function. The default value is None. To use the default tm_func as estimate function to get the NEB Tm faster, you can do: primer_design(dseqr, target_tm=55, tm_func=tm_neb, estimate_function=tm_default).

The function returns a pydna.amplicon.Amplicon class instance. This object has the object.forward_primer and object.reverse_primer properties which contain the designed primers.

Parameters:

template (pydna.dseqrecord.Dseqrecord) – a Dseqrecord object. The only required argument.
fp (pydna.primer.Primer, optional) – optional pydna.primer.Primer objects containing one primer each.
rp (pydna.primer.Primer, optional) – optional pydna.primer.Primer objects containing one primer each.
target_tm (float, optional) – target tm for the primers, set to 55°C by default.
tm_func (function) – Function used for tm calculation. This function takes an ascii string representing an oligonuceotide as argument and returns a float. Some useful functions can be found in the pydna.tm module, but can be substituted for a custom made function.

Returns:

result

Return type:

Amplicon

Examples

>>> from pydna.dseqrecord import Dseqrecord
>>> t=Dseqrecord("atgactgctaacccttccttggtgttgaacaagatcgacgacatttcgttcgaaacttacgatg")
>>> t
Dseqrecord(-64)
>>> from pydna.design import primer_design
>>> ampl = primer_design(t)
>>> ampl
Amplicon(64)
>>> ampl.forward_primer
f64 17-mer:5'-atgactgctaacccttc-3'
>>> ampl.reverse_primer
r64 18-mer:5'-catcgtaagtttcgaacg-3'
>>> print(ampl.figure())
5atgactgctaacccttc...cgttcgaaacttacgatg3
                     ||||||||||||||||||
                    3gcaagctttgaatgctac5
5atgactgctaacccttc3
 |||||||||||||||||
3tactgacgattgggaag...gcaagctttgaatgctac5
>>> pf = "GGATCC" + ampl.forward_primer
>>> pr = "GGATCC" + ampl.reverse_primer
>>> pf
f64 23-mer:5'-GGATCCatgactgct..ttc-3'
>>> pr
r64 24-mer:5'-GGATCCcatcgtaag..acg-3'
>>> from pydna.amplify import pcr
>>> pcr_prod = pcr(pf, pr, t)
>>> print(pcr_prod.figure())
      5atgactgctaacccttc...cgttcgaaacttacgatg3
                           ||||||||||||||||||
                          3gcaagctttgaatgctacCCTAGG5
5GGATCCatgactgctaacccttc3
       |||||||||||||||||
      3tactgacgattgggaag...gcaagctttgaatgctac5
>>> print(pcr_prod.seq)
GGATCCatgactgctaacccttccttggtgttgaacaagatcgacgacatttcgttcgaaacttacgatgGGATCC
>>> from pydna.primer import Primer
>>> pf = Primer("atgactgctaacccttccttggtgttg", id="myprimer")
>>> ampl = primer_design(t, fp = pf)
>>> ampl.forward_primer
myprimer 27-mer:5'-atgactgctaaccct..ttg-3'
>>> ampl.reverse_primer
r64 32-mer:5'-catcgtaagtttcga..atc-3'

pydna.all.assembly_fragments(f, overlap=35, maxlink=40)[source]

This function return a list of pydna.amplicon.Amplicon objects where primers have been modified with tails so that the fragments can be fused in the order they appear in the list by for example Gibson assembly or homologous recombination.

Given that we have two linear pydna.amplicon.Amplicon objects a and b

we can modify the reverse primer of a and forward primer of b with tails to allow fusion by fusion PCR, Gibson assembly or in-vivo homologous recombination. The basic requirements for the primers for the three techniques are the same.

 _________ a _________
/                     \
agcctatcatcttggtctctgca
                  |||||
                 <gacgt
agcct>
|||||
tcggatagtagaaccagagacgt

                        __________ b ________
                       /                     \
                       TTTATATCGCATGACTCTTCTTT
                                         |||||
                                        <AGAAA
                       TTTAT>
                       |||||
                       AAATATAGCGTACTGAGAAGAAA

agcctatcatcttggtctctgcaTTTATATCGCATGACTCTTCTTT
||||||||||||||||||||||||||||||||||||||||||||||
tcggatagtagaaccagagacgtAAATATAGCGTACTGAGAAGAAA
\___________________ c ______________________/

Design tailed primers incorporating a part of the next or previous fragment to be assembled.

agcctatcatcttggtctctgca
|||||||||||||||||||||||
                gagacgtAAATATA

|||||||||||||||||||||||
tcggatagtagaaccagagacgt

                       TTTATATCGCATGACTCTTCTTT
                       |||||||||||||||||||||||

                ctctgcaTTTATAT
                       |||||||||||||||||||||||
                       AAATATAGCGTACTGAGAAGAAA

PCR products with flanking sequences are formed in the PCR process.

agcctatcatcttggtctctgcaTTTATAT
||||||||||||||||||||||||||||||
tcggatagtagaaccagagacgtAAATATA
                \____________/

                   identical
                   sequences
                 ____________
                /            \
                ctctgcaTTTATATCGCATGACTCTTCTTT
                ||||||||||||||||||||||||||||||
                gagacgtAAATATAGCGTACTGAGAAGAAA

The fragments can be fused by any of the techniques mentioned earlier to form c:

agcctatcatcttggtctctgcaTTTATATCGCATGACTCTTCTTT
||||||||||||||||||||||||||||||||||||||||||||||
tcggatagtagaaccagagacgtAAATATAGCGTACTGAGAAGAAA

The first argument of this function is a list of sequence objects containing Amplicons and other similar objects.

At least every second sequence object needs to be an Amplicon

This rule exists because if a sequence object is that is not a PCR product is to be fused with another fragment, that other fragment needs to be an Amplicon so that the primer of the other object can be modified to include the whole stretch of sequence homology needed for the fusion. See the example below where a is a non-amplicon (a linear plasmid vector for instance)

 _________ a _________           __________ b ________
/                     \         /                     \
agcctatcatcttggtctctgca   <-->  TTTATATCGCATGACTCTTCTTT
|||||||||||||||||||||||         |||||||||||||||||||||||
tcggatagtagaaccagagacgt                          <AGAAA
                                TTTAT>
                                |||||||||||||||||||||||
                          <-->  AAATATAGCGTACTGAGAAGAAA

     agcctatcatcttggtctctgcaTTTATATCGCATGACTCTTCTTT
     ||||||||||||||||||||||||||||||||||||||||||||||
     tcggatagtagaaccagagacgtAAATATAGCGTACTGAGAAGAAA
     \___________________ c ______________________/

In this case only the forward primer of b is fitted with a tail with a part a:

agcctatcatcttggtctctgca
|||||||||||||||||||||||
tcggatagtagaaccagagacgt

                       TTTATATCGCATGACTCTTCTTT
                       |||||||||||||||||||||||
                                        <AGAAA
         tcttggtctctgcaTTTATAT
                       |||||||||||||||||||||||
                       AAATATAGCGTACTGAGAAGAAA

PCR products with flanking sequences are formed in the PCR process.

agcctatcatcttggtctctgcaTTTATAT
||||||||||||||||||||||||||||||
tcggatagtagaaccagagacgtAAATATA
                \____________/

                   identical
                   sequences
                 ____________
                /            \
                ctctgcaTTTATATCGCATGACTCTTCTTT
                ||||||||||||||||||||||||||||||
                gagacgtAAATATAGCGTACTGAGAAGAAA

The fragments can be fused by for example Gibson assembly:

agcctatcatcttggtctctgcaTTTATAT
||||||||||||||||||||||||||||||
tcggatagtagaacca

                             TCGCATGACTCTTCTTT
                ||||||||||||||||||||||||||||||
                gagacgtAAATATAGCGTACTGAGAAGAAA

to form c:

agcctatcatcttggtctctgcaTTTATATCGCATGACTCTTCTTT
||||||||||||||||||||||||||||||||||||||||||||||
tcggatagtagaaccagagacgtAAATATAGCGTACTGAGAAGAAA

The first argument of this function is a list of sequence objects containing Amplicons and other similar objects.

The overlap argument controls how many base pairs of overlap required between adjacent sequence fragments. In the junction between Amplicons, tails with the length of about half of this value is added to the two primers closest to the junction.

>       <
Amplicon1
         Amplicon2
         >       <

         ⇣

>       <-
Amplicon1
         Amplicon2
        ->       <

In the case of an Amplicon adjacent to a Dseqrecord object, the tail will be twice as long (1*overlap) since the recombining sequence is present entirely on this primer:

Dseqrecd1
         Amplicon1
         >       <

         ⇣

Dseqrecd1
         Amplicon1
       -->       <

Note that if the sequence of DNA fragments starts or stops with an Amplicon, the very first and very last prinmer will not be modified i.e. assembles are always assumed to be linear. There are simple tricks around that for circular assemblies depicted in the last two examples below.

The maxlink arguments controls the cut off length for sequences that will be synhtesized by adding them to primers for the adjacent fragment(s). The argument list may contain short spacers (such as spacers between fusion proteins).

Example 1: Linear assembly of PCR products (pydna.amplicon.Amplicon class objects) ------

>       <         >       <
Amplicon1         Amplicon3
         Amplicon2         Amplicon4
         >       <         >       <

                     ⇣
                     pydna.design.assembly_fragments
                     ⇣

>       <-       ->       <-                      pydna.assembly.Assembly
Amplicon1         Amplicon3
         Amplicon2         Amplicon4     ➤  Amplicon1Amplicon2Amplicon3Amplicon4
        ->       <-       ->       <

Example 2: Linear assembly of alternating Amplicons and other fragments

>       <         >       <
Amplicon1         Amplicon2
         Dseqrecd1         Dseqrecd2

                     ⇣
                     pydna.design.assembly_fragments
                     ⇣

>       <--     -->       <--                     pydna.assembly.Assembly
Amplicon1         Amplicon2
         Dseqrecd1         Dseqrecd2     ➤  Amplicon1Dseqrecd1Amplicon2Dseqrecd2

Example 3: Linear assembly of alternating Amplicons and other fragments

Dseqrecd1         Dseqrecd2
         Amplicon1         Amplicon2
         >       <       -->       <

                     ⇣
             pydna.design.assembly_fragments
                     ⇣
                                                  pydna.assembly.Assembly
Dseqrecd1         Dseqrecd2
         Amplicon1         Amplicon2     ➤  Dseqrecd1Amplicon1Dseqrecd2Amplicon2
       -->       <--     -->       <

Example 4: Circular assembly of alternating Amplicons and other fragments

                 ->       <==
Dseqrecd1         Amplicon2
         Amplicon1         Dseqrecd1
       -->       <-
                     ⇣
                     pydna.design.assembly_fragments
                     ⇣
                                                   pydna.assembly.Assembly
                 ->       <==
Dseqrecd1         Amplicon2                    -Dseqrecd1Amplicon1Amplicon2-
         Amplicon1                       ➤    |                             |
       -->       <-                            -----------------------------

------ Example 5: Circular assembly of Amplicons

>       <         >       <
Amplicon1         Amplicon3
         Amplicon2         Amplicon1
         >       <         >       <

                     ⇣
                     pydna.design.assembly_fragments
                     ⇣

>       <=       ->       <-
Amplicon1         Amplicon3
         Amplicon2         Amplicon1
        ->       <-       +>       <

                     ⇣
             make new Amplicon using the Amplicon1.template and
             the last fwd primer and the first rev primer.
                     ⇣
                                                   pydna.assembly.Assembly
+>       <=       ->       <-
 Amplicon1         Amplicon3                  -Amplicon1Amplicon2Amplicon3-
          Amplicon2                      ➤   |                             |
         ->       <-                          -----------------------------

Parameters:

f (list of pydna.amplicon.Amplicon and other Dseqrecord like objects) – list Amplicon and Dseqrecord object for which fusion primers should be constructed.
overlap (int, optional) – Length of required overlap between fragments.
maxlink (int, optional) – Maximum length of spacer sequences that may be present in f. These will be included in tails for designed primers.

Returns:

seqs –

[Amplicon1,
 Amplicon2, ...]

Return type:

list of pydna.amplicon.Amplicon and other Dseqrecord like objects pydna.amplicon.Amplicon objects

Examples

>>> from pydna.dseqrecord import Dseqrecord
>>> from pydna.design import primer_design
>>> a=primer_design(Dseqrecord("atgactgctaacccttccttggtgttgaacaagatcgacgacatttcgttcgaaacttacgatg"))
>>> b=primer_design(Dseqrecord("ccaaacccaccaggtaccttatgtaagtacttcaagtcgccagaagacttcttggtcaagttgcc"))
>>> c=primer_design(Dseqrecord("tgtactggtgctgaaccttgtatcaagttgggtgttgacgccattgccccaggtggtcgtttcgtt"))
>>> from pydna.design import assembly_fragments
>>> # We would like a circular recombination, so the first sequence has to be repeated
>>> fa1,fb,fc,fa2 = assembly_fragments([a,b,c,a])
>>> # Since all fragments are Amplicons, we need to extract the rp of the 1st and fp of the last fragments.
>>> from pydna.amplify import pcr
>>> fa = pcr(fa2.forward_primer, fa1.reverse_primer, a)
>>> [fa,fb,fc]
[Amplicon(100), Amplicon(101), Amplicon(102)]
>>> fa.name, fb.name, fc.name = "fa fb fc".split()
>>> from pydna.assembly import Assembly
>>> assemblyobj = Assembly([fa,fb,fc])
>>> assemblyobj
Assembly
fragments....: 100bp 101bp 102bp
limit(bp)....: 25
G.nodes......: 6
algorithm....: common_sub_strings
>>> assemblyobj.assemble_linear()
[Contig(-231), Contig(-166), Contig(-36)]
>>> assemblyobj.assemble_circular()[0].seguid()
'cdseguid=85t6tfcvWav0wnXEIb-lkUtrl4s'
>>> (a+b+c).looped().seguid()
'cdseguid=85t6tfcvWav0wnXEIb-lkUtrl4s'
>>> print(assemblyobj.assemble_circular()[0].figure())
 -|fa|36
|     \/
|     /\
|     36|fb|36
|           \/
|           /\
|           36|fc|36
|                 \/
|                 /\
|                 36-
|                    |
 --------------------
>>>

pydna.all.circular_assembly_fragments(f, overlap=35, maxlink=40)[source]

pydna.all.eq(*args, **kwargs)[source]

Compare two or more DNA sequences for equality.

Compares two or more DNA sequences for equality i.e. if they represent the same double stranded DNA molecule.

Parameters:

args (iterable) – iterable containing sequences args can be strings, Biopython Seq or SeqRecord, Dseqrecord or dsDNA objects.
circular (bool, optional) – Consider all molecules circular or linear
linear (bool, optional) – Consider all molecules circular or linear

Returns:

eq – Returns True or False

Return type:

bool

Notes

Compares two or more DNA sequences for equality i.e. if they represent the same DNA molecule.

Two linear sequences are considiered equal if either:

They have the same sequence (case insensitive)
One sequence is the reverse complement of the other

Two circular sequences are considered equal if they are circular permutations meaning that they have the same length and:

One sequence can be found in the concatenation of the other sequence with itself.
The reverse complement of one sequence can be found in the concatenation of the other sequence with itself.

The topology for the comparison can be set using one of the keywords linear or circular to True or False.

If circular or linear is not set, it will be deduced from the topology of each sequence for sequences that have a linear or circular attribute (like Dseq and Dseqrecord).

Examples

>>> from pydna.dseqrecord import Dseqrecord
>>> from pydna.utils import eq
>>> eq("aaa","AAA")
True
>>> eq("aaa","AAA","TTT")
True
>>> eq("aaa","AAA","TTT","tTt")
True
>>> eq("aaa","AAA","TTT","tTt", linear=True)
True
>>> eq("Taaa","aTaa", linear = True)
False
>>> eq("Taaa","aTaa", circular = True)
True
>>> a=Dseqrecord("Taaa")
>>> b=Dseqrecord("aTaa")
>>> eq(a,b)
False
>>> eq(a,b,circular=True)
True
>>> a=a.looped()
>>> b=b.looped()
>>> eq(a,b)
True
>>> eq(a,b,circular=False)
False
>>> eq(a,b,linear=True)
False
>>> eq(a,b,linear=False)
True
>>> eq("ggatcc","GGATCC")
True
>>> eq("ggatcca","GGATCCa")
True
>>> eq("ggatcca","tGGATCC")
True

pydna.all.gbtext_clean(gbtext)[source]

This function takes a string containing one genbank sequence in Genbank format and returns a named tuple containing two fields, the gbtext containing a string with the corrected genbank sequence and jseq which contains the JSON intermediate.

Examples

>>> s = '''LOCUS       New_DNA      3 bp    DNA   CIRCULAR SYN        19-JUN-2013
... DEFINITION  .
... ACCESSION
... VERSION
... SOURCE      .
...   ORGANISM  .
... COMMENT
... COMMENT     ApEinfo:methylated:1
... ORIGIN
...         1 aaa
... //'''
>>> from pydna.readers import read
>>> read(s)  
/home/bjorn/anaconda3/envs/bjorn36/lib/python3.6/site-packages/Bio/GenBank/Scanner.py:1388: BiopythonParserWarning: Malformed LOCUS line found - is this correct?
:'LOCUS       New_DNA      3 bp    DNA   CIRCULAR SYN        19-JUN-2013\n'
  "correct?\n:%r" % line, BiopythonParserWarning)
Traceback (most recent call last):
  File "/home/bjorn/python_packages/pydna/pydna/readers.py", line 48, in read
    results = results.pop()
IndexError: pop from empty list

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/bjorn/python_packages/pydna/pydna/readers.py", line 50, in read
    raise ValueError("No sequences found in data:\n({})".format(data[:79]))
ValueError: No sequences found in data:
(LOCUS       New_DNA      3 bp    DNA   CIRCULAR SYN        19-JUN-2013
DEFINITI)
>>> from pydna.genbankfixer import gbtext_clean
>>> s2, j2 = gbtext_clean(s)
>>> print(s2)
LOCUS       New_DNA                    3 bp ds-DNA     circular SYN 19-JUN-2013
DEFINITION  .
ACCESSION
VERSION
SOURCE      .
ORGANISM  .
COMMENT
COMMENT     ApEinfo:methylated:1
FEATURES             Location/Qualifiers
ORIGIN
        1 aaa
//
>>> s3 = read(s2)
>>> s3
Dseqrecord(o3)
>>> print(s3.format())
LOCUS       New_DNA                    3 bp    DNA     circular SYN 19-JUN-2013
DEFINITION  .
ACCESSION   New_DNA
VERSION     New_DNA
KEYWORDS    .
SOURCE
  ORGANISM  .
            .
COMMENT
            ApEinfo:methylated:1
FEATURES             Location/Qualifiers
ORIGIN
        1 aaa
//

class pydna.all.PrimerList(initlist: ~typing.Iterable = None, path: (<class 'str'>, <class 'pathlib.Path'>) = None, *args, identifier: str = "p", **kwargs)[source]

Bases: UserList

Read a text file with primers.

The primers can be of any format readable by the parse_primers function. Lines beginning with # are ignored. Path defaults to the path given by the pydna_primers environment variable.

The primer list does not accept new primers. Use the assign_numbers_to_new_primers method and paste the new primers at the top of the list.

The primer list remembers the numbers of accessed primers. The indices of accessed primers are stored in the .accessed property.

property accessed: docstring.

assign_numbers(lst: list)[source]

Find new primers in lst.

Returns a string containing new primers with their assigned numbers. This string can be copied and pasted to the primer text file.

pydna_code_from_list(lst: list)[source]: Pydna code for a list of primer objects.

open_folder()[source]: Open folder where primer file is located.

code(lst: list): Pydna code for a list of primer objects.