Dseq holds information for a double stranded DNA fragment.
Dseq also holds information describing the topology of
the DNA fragment (linear or circular).
Parameters:
watson (str) – a string representing the watson (sense) DNA strand.
crick (str, optional) – a string representing the crick (antisense) DNA strand.
ovhg (int, optional) – A positive or negative number to describe the stagger between the
watson and crick strands.
see below for a detailed explanation.
linear (bool, optional) – True indicates that sequence is linear, False that it is circular.
circular (bool, optional) – True indicates that sequence is circular, False that it is linear.
Examples
Dseq is a subclass of the Biopython Seq object. It stores two
strings representing the watson (sense) and crick(antisense) strands.
two properties called linear and circular, and a numeric value ovhg
(overhang) describing the stagger for the watson and crick strand
in the 5’ end of the fragment.
The most common usage is probably to create a Dseq object as a
part of a Dseqrecord object (see pydna.dseqrecord.Dseqrecord).
There are three ways of creating a Dseq object directly listed below, but you can also
use the function Dseq.from_full_sequence_and_overhangs() to create a Dseq:
The given string will be interpreted as the watson strand of a
blunt, linear double stranded sequence object. The crick strand
is created automatically from the watson strand.
If both watson and crick are given, but not ovhg an attempt
will be made to find the best annealing between the strands.
There are limitations to this. For long fragments it is quite
slow. The length of the annealing sequences have to be at least
half the length of the shortest of the strands.
Three arguments (string, string, ovhg=int):
The ovhg parameter is an integer describing the length of the
crick strand overhang in the 5’ end of the molecule.
The ovhg parameter controls the stagger at the five prime end:
If the ovhg parameter is specified a crick strand also
needs to be supplied, otherwise an exception is raised.
>>> Dseq(watson="agt",ovhg=2)Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/local/lib/python2.7/dist-packages/pydna_/dsdna.py", line 169, in __init__else:ValueError: ovhg defined without crick strand!
The shape of the fragment is set by circular = True, False
Note that both ends of the DNA fragment has to be compatible to set
circular = True.
This can only be done if the two ends are compatible,
otherwise a TypeError is raised.
Examples
>>> frompydna.dseqimportDseq>>> a=Dseq("catcgatc")>>> aDseq(-8)catcgatcgtagctag>>> a.looped()Dseq(o8)catcgatcgtagctag>>> a.T4("t")Dseq(-8)catcgat tagctag>>> a.T4("t").looped()Dseq(o7)catcgatgtagcta>>> a.T4("a")Dseq(-8)catcga agctag>>> a.T4("a").looped()Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/local/lib/python2.7/dist-packages/pydna/dsdna.py", line 357, in loopediftype5==type3andstr(sticky5)==str(rc(sticky3)):TypeError: DNA cannot be circularized.5' and 3' sticky ends not compatible!>>>
Fill in of five prime protruding end with a DNA polymerase
that has only DNA polymerase activity (such as exo-klenow [1])
and any combination of A, G, C or T. Default are all four
nucleotides together.
Transcribe a DNA sequence into RNA and return the RNA sequence as a new Seq object.
Following the usual convention, the sequence is interpreted as the
coding strand of the DNA double helix, not the template strand. This
means we can get the RNA sequence just by switching T to U.
As Seq objects are immutable, a TypeError is raised if
transcribe is called on a Seq object with inplace=True.
Trying to transcribe an RNA sequence has no effect.
If you have a nucleotide sequence which might be DNA or RNA
(or even a mixture), calling the transcribe method will ensure
any T becomes U.
Trying to transcribe a protein sequence will replace any
T for Threonine with U for Selenocysteine, which has no
biologically plausible rational.
Fill in five prime protruding ends and chewing back
three prime protruding ends by a DNA polymerase providing both
5’-3’ DNA polymerase activity and 3’-5’ nuclease acitivty
(such as T4 DNA polymerase). This can be done in presence of any
combination of the four A, G, C or T. Removing one or more nucleotides
can facilitate engineering of sticky ends. Default are all four nucleotides together.
Fill in five prime protruding ends and chewing back
three prime protruding ends by a DNA polymerase providing both
5’-3’ DNA polymerase activity and 3’-5’ nuclease acitivty
(such as T4 DNA polymerase). This can be done in presence of any
combination of the four A, G, C or T. Removing one or more nucleotides
can facilitate engineering of sticky ends. Default are all four nucleotides together.
Returns False if:
- Cut positions fall outside the sequence (could be moved to Biopython)
- Overhang is not double stranded
- Recognition site is not double stranded or is outside the sequence
- For enzymes that cut twice, it checks that at least one possibility is valid
Returns a list of cutsites, represented represented as ((cut_watson, ovhg), enz):
cut_watson is a positive integer contained in [0,len(seq)), where seq is the sequence
that will be cut. It represents the position of the cut on the watson strand, using the full
sequence as a reference. By “full sequence” I mean the one you would get from str(Dseq).
ovhg is the overhang left after the cut. It has the same meaning as ovhg in
the Bio.Restriction enzyme objects, or pydna’s Dseq property.
enz is the enzyme object. It’s not necessary to perform the cut, but can be
used to keep track of which enzyme was used.
Cuts are only returned if the recognition site and overhang are on the double-strand
part of the sequence.
For a given cut expressed as ((cut_watson, ovhg), enz), returns
a tuple (cut_watson, cut_crick, ovhg).
cut_watson: see get_cutsites docs
cut_crick: equivalent of cut_watson in the crick strand
ovhg: see get_cutsites docs
The cut can be None if it represents the left or right end of the sequence.
Then it will return the position of the watson and crick ends with respect
to the “full sequence”. The is_left parameter is only used in this case.
Returns pairs of cutsites that render the edges of the resulting fragments.
A fragment produced by restriction is represented by a tuple of length 2 that
may contain cutsites or None:
Two cutsites: represents the extraction of a fragment between those two
cutsites, in that orientation. To represent the opening of a circular
molecule with a single cutsite, we put the same cutsite twice.
None, cutsite: represents the extraction of a fragment between the left
edge of linear sequence and the cutsite.
cutsite, None: represents the extraction of a fragment between the cutsite
and the right edge of a linear sequence.