pydna.dseq

Provides the Dseq class for handling double stranded DNA sequences.

Dseq is a subclass of Bio.Seq.Seq. The Dseq class is mostly useful as a part of the pydna.dseqrecord.Dseqrecord class which can hold more meta data.

The Dseq class support the notion of circular and linear DNA topology.

class pydna.dseq.Dseq(watson: str | bytes, crick: str | bytes | None = None, ovhg=None, circular=False, pos=0)[source]

Bases: Seq

Dseq holds information for a double stranded DNA fragment.

Dseq also holds information describing the topology of the DNA fragment (linear or circular).

Parameters:
  • watson (str) – a string representing the watson (sense) DNA strand.

  • crick (str, optional) – a string representing the crick (antisense) DNA strand.

  • ovhg (int, optional) – A positive or negative number to describe the stagger between the watson and crick strands. see below for a detailed explanation.

  • linear (bool, optional) – True indicates that sequence is linear, False that it is circular.

  • circular (bool, optional) – True indicates that sequence is circular, False that it is linear.

Examples

Dseq is a subclass of the Biopython Seq object. It stores two strings representing the watson (sense) and crick(antisense) strands. two properties called linear and circular, and a numeric value ovhg (overhang) describing the stagger for the watson and crick strand in the 5’ end of the fragment.

The most common usage is probably to create a Dseq object as a part of a Dseqrecord object (see pydna.dseqrecord.Dseqrecord).

There are three ways of creating a Dseq object directly listed below, but you can also use the function Dseq.from_full_sequence_and_overhangs() to create a Dseq:

Only one argument (string):

>>> from pydna.dseq import Dseq
>>> Dseq("aaa")
Dseq(-3)
aaa
ttt

The given string will be interpreted as the watson strand of a blunt, linear double stranded sequence object. The crick strand is created automatically from the watson strand.

Two arguments (string, string):

>>> from pydna.dseq import Dseq
>>> Dseq("gggaaat","ttt")
Dseq(-7)
gggaaat
   ttt

If both watson and crick are given, but not ovhg an attempt will be made to find the best annealing between the strands. There are limitations to this. For long fragments it is quite slow. The length of the annealing sequences have to be at least half the length of the shortest of the strands.

Three arguments (string, string, ovhg=int):

The ovhg parameter is an integer describing the length of the crick strand overhang in the 5’ end of the molecule.

The ovhg parameter controls the stagger at the five prime end:

dsDNA       overhang

  nnn...    2
nnnnn...

 nnnn...    1
nnnnn...

nnnnn...    0
nnnnn...

nnnnn...   -1
 nnnn...

nnnnn...   -2
  nnn...

Example of creating Dseq objects with different amounts of stagger:

>>> Dseq(watson="agt", crick="actta", ovhg=-2)
Dseq(-7)
agt
  attca
>>> Dseq(watson="agt",crick="actta",ovhg=-1)
Dseq(-6)
agt
 attca
>>> Dseq(watson="agt",crick="actta",ovhg=0)
Dseq(-5)
agt
attca
>>> Dseq(watson="agt",crick="actta",ovhg=1)
Dseq(-5)
 agt
attca
>>> Dseq(watson="agt",crick="actta",ovhg=2)
Dseq(-5)
  agt
attca

If the ovhg parameter is specified a crick strand also needs to be supplied, otherwise an exception is raised.

>>> Dseq(watson="agt", ovhg=2)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python2.7/dist-packages/pydna_/dsdna.py", line 169, in __init__
    else:
ValueError: ovhg defined without crick strand!

The shape of the fragment is set by circular = True, False

Note that both ends of the DNA fragment has to be compatible to set circular = True.

>>> Dseq("aaa","ttt")
Dseq(-3)
aaa
ttt
>>> Dseq("aaa","ttt",ovhg=0)
Dseq(-3)
aaa
ttt
>>> Dseq("aaa","ttt",ovhg=1)
Dseq(-4)
 aaa
ttt
>>> Dseq("aaa","ttt",ovhg=-1)
Dseq(-4)
aaa
 ttt
>>> Dseq("aaa", "ttt", circular = True , ovhg=0)
Dseq(o3)
aaa
ttt
>>> a=Dseq("tttcccc","aaacccc")
>>> a
Dseq(-11)
    tttcccc
ccccaaa
>>> a.ovhg
4
>>> b=Dseq("ccccttt","ccccaaa")
>>> b
Dseq(-11)
ccccttt
    aaacccc
>>> b.ovhg
-4
>>>

Coercing to string

>>> str(a)
'ggggtttcccc'

A Dseq object can be longer that either the watson or crick strands.

<-- length -->
GATCCTTT
     AAAGCCTAG

<-- length -->
      GATCCTTT
AAAGCCCTA

The slicing of a linear Dseq object works mostly as it does for a string.

>>> s="ggatcc"
>>> s[2:3]
'a'
>>> s[2:4]
'at'
>>> s[2:4:-1]
''
>>> s[::2]
'gac'
>>> from pydna.dseq import Dseq
>>> d=Dseq(s, circular=False)
>>> d[2:3]
Dseq(-1)
a
t
>>> d[2:4]
Dseq(-2)
at
ta
>>> d[2:4:-1]
Dseq(-0)


>>> d[::2]
Dseq(-3)
gac
ctg

The slicing of a circular Dseq object has a slightly different meaning.

>>> s="ggAtCc"
>>> d=Dseq(s, circular=True)
>>> d
Dseq(o6)
ggAtCc
ccTaGg
>>> d[4:3]
Dseq(-5)
CcggA
GgccT

The slice [X:X] produces an empty slice for a string, while this will return the linearized sequence starting at X:

>>> s="ggatcc"
>>> d=Dseq(s, circular=True)
>>> d
Dseq(o6)
ggatcc
cctagg
>>> d[3:3]
Dseq(-6)
tccgga
aggcct
>>>
trunc = 30
classmethod quick(watson: str, crick: str, ovhg=0, circular=False, pos=0)[source]
classmethod from_string(dna: str, *args, circular=False, **kwargs)[source]
classmethod from_representation(dsdna: str, *args, **kwargs)[source]
classmethod from_full_sequence_and_overhangs(full_sequence: str, crick_ovhg: int, watson_ovhg: int)[source]

Create a linear Dseq object from a full sequence and the 3’ overhangs of each strand.

The order of the parameters is like this because the 3’ overhang of the crick strand is the one on the left side of the sequence.

Parameters:
  • full_sequence (str) – The full sequence of the Dseq object.

  • crick_ovhg (int) – The overhang of the crick strand in the 3’ end. Equivalent to Dseq.ovhg.

  • watson_ovhg (int) – The overhang of the watson strand in the 5’ end.

Returns:

A Dseq object.

Return type:

Dseq

Examples

>>> Dseq.from_full_sequence_and_overhangs('AAAAAA', crick_ovhg=2, watson_ovhg=2)
Dseq(-6)
  AAAA
TTTT
>>> Dseq.from_full_sequence_and_overhangs('AAAAAA', crick_ovhg=-2, watson_ovhg=2)
Dseq(-6)
AAAAAA
  TT
>>> Dseq.from_full_sequence_and_overhangs('AAAAAA', crick_ovhg=2, watson_ovhg=-2)
Dseq(-6)
  AA
TTTTTT
>>> Dseq.from_full_sequence_and_overhangs('AAAAAA', crick_ovhg=-2, watson_ovhg=-2)
Dseq(-6)
AAAA
  TTTT
mw() float[source]

This method returns the molecular weight of the DNA molecule in g/mol. The following formula is used:

MW = (A x 313.2) + (T x 304.2) +
     (C x 289.2) + (G x 329.2) +
     (N x 308.9) + 79.0
upper() DseqType[source]

Return an upper case copy of the sequence.

>>> from pydna.dseq import Dseq
>>> my_seq = Dseq("aAa")
>>> my_seq
Dseq(-3)
aAa
tTt
>>> my_seq.upper()
Dseq(-3)
AAA
TTT
Returns:

Dseq object in uppercase

Return type:

Dseq

lower() DseqType[source]

Return a lower case copy of the sequence.

>>> from pydna.dseq import Dseq
>>> my_seq = Dseq("aAa")
>>> my_seq
Dseq(-3)
aAa
tTt
>>> my_seq.lower()
Dseq(-3)
aaa
ttt
Returns:

Dseq object in lowercase

Return type:

Dseq

find(sub: _SeqAbstractBaseClass | str | bytes, start=0, end=_sys.maxsize) int[source]

This method behaves like the python string method of the same name.

Returns an integer, the index of the first occurrence of substring argument sub in the (sub)sequence given by [start:end].

Returns -1 if the subsequence is NOT found.

Parameters:
  • sub (string or Seq object) – a string or another Seq object to look for.

  • start (int, optional) – slice start.

  • end (int, optional) – slice end.

Examples

>>> from pydna.dseq import Dseq
>>> seq = Dseq("atcgactgacgtgtt")
>>> seq
Dseq(-15)
atcgactgacgtgtt
tagctgactgcacaa
>>> seq.find("gac")
3
>>> seq = Dseq(watson="agt",crick="actta",ovhg=-2)
>>> seq
Dseq(-7)
agt
  attca
>>> seq.find("taa")
2
reverse_complement() Dseq[source]

Dseq object where watson and crick have switched places.

This represents the same double stranded sequence.

Examples

>>> from pydna.dseq import Dseq
>>> a=Dseq("catcgatc")
>>> a
Dseq(-8)
catcgatc
gtagctag
>>> b=a.reverse_complement()
>>> b
Dseq(-8)
gatcgatg
ctagctac
>>>
rc() Dseq

Dseq object where watson and crick have switched places.

This represents the same double stranded sequence.

Examples

>>> from pydna.dseq import Dseq
>>> a=Dseq("catcgatc")
>>> a
Dseq(-8)
catcgatc
gtagctag
>>> b=a.reverse_complement()
>>> b
Dseq(-8)
gatcgatg
ctagctac
>>>
shifted(shift: int) DseqType[source]

Shifted version of a circular Dseq object.

looped() DseqType[source]

Circularized Dseq object.

This can only be done if the two ends are compatible, otherwise a TypeError is raised.

Examples

>>> from pydna.dseq import Dseq
>>> a=Dseq("catcgatc")
>>> a
Dseq(-8)
catcgatc
gtagctag
>>> a.looped()
Dseq(o8)
catcgatc
gtagctag
>>> a.T4("t")
Dseq(-8)
catcgat
 tagctag
>>> a.T4("t").looped()
Dseq(o7)
catcgat
gtagcta
>>> a.T4("a")
Dseq(-8)
catcga
  agctag
>>> a.T4("a").looped()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python2.7/dist-packages/pydna/dsdna.py", line 357, in looped
    if type5 == type3 and str(sticky5) == str(rc(sticky3)):
TypeError: DNA cannot be circularized.
5' and 3' sticky ends not compatible!
>>>
tolinear() DseqType[source]

Returns a blunt, linear copy of a circular Dseq object. This can only be done if the Dseq object is circular, otherwise a TypeError is raised.

This method is deprecated, use slicing instead. See example below.

Examples

>>> from pydna.dseq import Dseq
>>> a=Dseq("catcgatc", circular=True)
>>> a
Dseq(o8)
catcgatc
gtagctag
>>> a[:]
Dseq(-8)
catcgatc
gtagctag
>>>
five_prime_end() Tuple[str, str][source]

Returns a tuple describing the structure of the 5’ end of the DNA fragment

Examples

>>> from pydna.dseq import Dseq
>>> a=Dseq("aaa", "ttt")
>>> a
Dseq(-3)
aaa
ttt
>>> a.five_prime_end()
('blunt', '')
>>> a=Dseq("aaa", "ttt", ovhg=1)
>>> a
Dseq(-4)
 aaa
ttt
>>> a.five_prime_end()
("3'", 't')
>>> a=Dseq("aaa", "ttt", ovhg=-1)
>>> a
Dseq(-4)
aaa
 ttt
>>> a.five_prime_end()
("5'", 'a')
>>>
three_prime_end() Tuple[str, str][source]

Returns a tuple describing the structure of the 5’ end of the DNA fragment

>>> from pydna.dseq import Dseq
>>> a=Dseq("aaa", "ttt")
>>> a
Dseq(-3)
aaa
ttt
>>> a.three_prime_end()
('blunt', '')
>>> a=Dseq("aaa", "ttt", ovhg=1)
>>> a
Dseq(-4)
 aaa
ttt
>>> a.three_prime_end()
("3'", 'a')
>>> a=Dseq("aaa", "ttt", ovhg=-1)
>>> a
Dseq(-4)
aaa
 ttt
>>> a.three_prime_end()
("5'", 't')
>>>
watson_ovhg() int[source]

Returns the overhang of the watson strand at the three prime.

fill_in(nucleotides: None | str = None) Dseq[source]

Fill in of five prime protruding end with a DNA polymerase that has only DNA polymerase activity (such as exo-klenow [1]) and any combination of A, G, C or T. Default are all four nucleotides together.

Parameters:

nucleotides (str) –

Examples

>>> from pydna.dseq import Dseq
>>> a=Dseq("aaa", "ttt")
>>> a
Dseq(-3)
aaa
ttt
>>> a.fill_in()
Dseq(-3)
aaa
ttt
>>> b=Dseq("caaa", "cttt")
>>> b
Dseq(-5)
caaa
 tttc
>>> b.fill_in()
Dseq(-5)
caaag
gtttc
>>> b.fill_in("g")
Dseq(-5)
caaag
gtttc
>>> b.fill_in("tac")
Dseq(-5)
caaa
 tttc
>>> c=Dseq("aaac", "tttg")
>>> c
Dseq(-5)
 aaac
gttt
>>> c.fill_in()
Dseq(-5)
 aaac
gttt
>>>

References

transcribe() Seq[source]

Transcribe a DNA sequence into RNA and return the RNA sequence as a new Seq object.

Following the usual convention, the sequence is interpreted as the coding strand of the DNA double helix, not the template strand. This means we can get the RNA sequence just by switching T to U.

>>> from Bio.Seq import Seq
>>> coding_dna = Seq("ATGGCCATTGTAATGGGCCGCTGAAAGGGTGCCCGATAG")
>>> coding_dna
Seq('ATGGCCATTGTAATGGGCCGCTGAAAGGGTGCCCGATAG')
>>> coding_dna.transcribe()
Seq('AUGGCCAUUGUAAUGGGCCGCUGAAAGGGUGCCCGAUAG')

The sequence is modified in-place and returned if inplace is True:

>>> sequence = MutableSeq("ATGGCCATTGTAATGGGCCGCTGAAAGGGTGCCCGATAG")
>>> sequence
MutableSeq('ATGGCCATTGTAATGGGCCGCTGAAAGGGTGCCCGATAG')
>>> sequence.transcribe()
MutableSeq('AUGGCCAUUGUAAUGGGCCGCUGAAAGGGUGCCCGAUAG')
>>> sequence
MutableSeq('ATGGCCATTGTAATGGGCCGCTGAAAGGGTGCCCGATAG')
>>> sequence.transcribe(inplace=True)
MutableSeq('AUGGCCAUUGUAAUGGGCCGCUGAAAGGGUGCCCGAUAG')
>>> sequence
MutableSeq('AUGGCCAUUGUAAUGGGCCGCUGAAAGGGUGCCCGAUAG')

As Seq objects are immutable, a TypeError is raised if transcribe is called on a Seq object with inplace=True.

Trying to transcribe an RNA sequence has no effect. If you have a nucleotide sequence which might be DNA or RNA (or even a mixture), calling the transcribe method will ensure any T becomes U.

Trying to transcribe a protein sequence will replace any T for Threonine with U for Selenocysteine, which has no biologically plausible rational.

>>> from Bio.Seq import Seq
>>> my_protein = Seq("MAIVMGRT")
>>> my_protein.transcribe()
Seq('MAIVMGRU')
translate(table='Standard', stop_symbol='*', to_stop=False, cds=False, gap='-') Seq[source]

Translate..

mung() Dseq[source]

Simulates treatment a nuclease with 5’-3’ and 3’-5’ single strand specific exonuclease activity (such as mung bean nuclease [2])

    ggatcc    ->     gatcc
     ctaggg          ctagg

     ggatcc   ->      ggatc
    tcctag            cctag

>>> from pydna.dseq import Dseq
>>> b=Dseq("caaa", "cttt")
>>> b
Dseq(-5)
caaa
 tttc
>>> b.mung()
Dseq(-3)
aaa
ttt
>>> c=Dseq("aaac", "tttg")
>>> c
Dseq(-5)
 aaac
gttt
>>> c.mung()
Dseq(-3)
aaa
ttt

References

T4(nucleotides=None) Dseq[source]

Fill in five prime protruding ends and chewing back three prime protruding ends by a DNA polymerase providing both 5’-3’ DNA polymerase activity and 3’-5’ nuclease acitivty (such as T4 DNA polymerase). This can be done in presence of any combination of the four A, G, C or T. Removing one or more nucleotides can facilitate engineering of sticky ends. Default are all four nucleotides together.

Parameters:

nucleotides (str) –

Examples

>>> from pydna.dseq import Dseq
>>> a=Dseq("gatcgatc")
>>> a
Dseq(-8)
gatcgatc
ctagctag
>>> a.T4()
Dseq(-8)
gatcgatc
ctagctag
>>> a.T4("t")
Dseq(-8)
gatcgat
 tagctag
>>> a.T4("a")
Dseq(-8)
gatcga
  agctag
>>> a.T4("g")
Dseq(-8)
gatcg
   gctag
>>>
t4(nucleotides=None) Dseq

Fill in five prime protruding ends and chewing back three prime protruding ends by a DNA polymerase providing both 5’-3’ DNA polymerase activity and 3’-5’ nuclease acitivty (such as T4 DNA polymerase). This can be done in presence of any combination of the four A, G, C or T. Removing one or more nucleotides can facilitate engineering of sticky ends. Default are all four nucleotides together.

Parameters:

nucleotides (str) –

Examples

>>> from pydna.dseq import Dseq
>>> a=Dseq("gatcgatc")
>>> a
Dseq(-8)
gatcgatc
ctagctag
>>> a.T4()
Dseq(-8)
gatcgatc
ctagctag
>>> a.T4("t")
Dseq(-8)
gatcgat
 tagctag
>>> a.T4("a")
Dseq(-8)
gatcga
  agctag
>>> a.T4("g")
Dseq(-8)
gatcg
   gctag
>>>
exo1_front(n=1) DseqType[source]

5’-3’ resection at the start (left side) of the molecule.

exo1_end(n=1) DseqType[source]

5’-3’ resection at the end (right side) of the molecule.

no_cutters(batch: RestrictionBatch | None = None) RestrictionBatch[source]

Enzymes in a RestrictionBatch not cutting sequence.

unique_cutters(batch: RestrictionBatch | None = None) RestrictionBatch[source]

Enzymes in a RestrictionBatch cutting sequence once.

once_cutters(batch: RestrictionBatch | None = None) RestrictionBatch

Enzymes in a RestrictionBatch cutting sequence once.

twice_cutters(batch: RestrictionBatch | None = None) RestrictionBatch[source]

Enzymes in a RestrictionBatch cutting sequence twice.

n_cutters(n=3, batch: RestrictionBatch | None = None) RestrictionBatch[source]

Enzymes in a RestrictionBatch cutting n times.

cutters(batch: RestrictionBatch | None = None) RestrictionBatch[source]

Enzymes in a RestrictionBatch cutting sequence at least once.

seguid() str[source]

SEGUID checksum for the sequence.

isblunt() bool[source]

isblunt.

Return True if Dseq is linear and blunt and false if staggered or circular.

Examples

>>> from pydna.dseq import Dseq
>>> a=Dseq("gat")
>>> a
Dseq(-3)
gat
cta
>>> a.isblunt()
True
>>> a=Dseq("gat", "atcg")
>>> a
Dseq(-4)
 gat
gcta
>>> a.isblunt()
False
>>> a=Dseq("gat", "gatc")
>>> a
Dseq(-4)
gat
ctag
>>> a.isblunt()
False
>>> a=Dseq("gat", circular=True)
>>> a
Dseq(o3)
gat
cta
>>> a.isblunt()
False
cas9(RNA: str) Tuple[slice, ...][source]

docstring.

terminal_transferase(nucleotides='a') Dseq[source]

docstring.

cut(*enzymes: EnzymesType) Tuple[DseqType, ...][source]

Returns a list of linear Dseq fragments produced in the digestion. If there are no cuts, an empty list is returned.

Parameters:

enzymes (enzyme object or iterable of such objects) – A Bio.Restriction.XXX restriction objects or iterable.

Returns:

frags – list of Dseq objects formed by the digestion

Return type:

list

Examples

>>> from pydna.dseq import Dseq
>>> seq=Dseq("ggatccnnngaattc")
>>> seq
Dseq(-15)
ggatccnnngaattc
cctaggnnncttaag
>>> from Bio.Restriction import BamHI,EcoRI
>>> type(seq.cut(BamHI))
<class 'tuple'>
>>> for frag in seq.cut(BamHI): print(repr(frag))
Dseq(-5)
g
cctag
Dseq(-14)
gatccnnngaattc
    gnnncttaag
>>> seq.cut(EcoRI, BamHI) ==  seq.cut(BamHI, EcoRI)
True
>>> a,b,c = seq.cut(EcoRI, BamHI)
>>> a+b+c
Dseq(-15)
ggatccnnngaattc
cctaggnnncttaag
>>>
cutsite_is_valid(cutsite: Tuple[Tuple[int, int], _AbstractCut | None]) bool[source]

Returns False if: - Cut positions fall outside the sequence (could be moved to Biopython) - Overhang is not double stranded - Recognition site is not double stranded or is outside the sequence - For enzymes that cut twice, it checks that at least one possibility is valid

get_cutsites(*enzymes: EnzymesType) List[Tuple[Tuple[int, int], _AbstractCut | None]][source]

Returns a list of cutsites, represented represented as ((cut_watson, ovhg), enz):

  • cut_watson is a positive integer contained in [0,len(seq)), where seq is the sequence that will be cut. It represents the position of the cut on the watson strand, using the full sequence as a reference. By “full sequence” I mean the one you would get from str(Dseq).

  • ovhg is the overhang left after the cut. It has the same meaning as ovhg in the Bio.Restriction enzyme objects, or pydna’s Dseq property.

  • enz is the enzyme object. It’s not necessary to perform the cut, but can be

    used to keep track of which enzyme was used.

Cuts are only returned if the recognition site and overhang are on the double-strand part of the sequence.

Parameters:

enzymes (Union[_RestrictionBatch,list[_AbstractCut]]) –

Return type:

list[tuple[tuple[int,int], _AbstractCut]]

Examples

>>> from Bio.Restriction import EcoRI
>>> from pydna.dseq import Dseq
>>> seq = Dseq('AAGAATTCAAGAATTC')
>>> seq.get_cutsites(EcoRI)
[((3, -4), EcoRI), ((11, -4), EcoRI)]

cut_watson is defined with respect to the “full sequence”, not the watson strand:

>>> dseq = Dseq.from_full_sequence_and_overhangs('aaGAATTCaa', 1, 0)
>>> dseq
Dseq(-10)
 aGAATTCaa
ttCTTAAGtt
>>> dseq.get_cutsites([EcoRI])
[((3, -4), EcoRI)]

Cuts are only returned if the recognition site and overhang are on the double-strand part of the sequence.

>>> Dseq('GAATTC').get_cutsites([EcoRI])
[((1, -4), EcoRI)]
>>> Dseq.from_full_sequence_and_overhangs('GAATTC', -1, 0).get_cutsites([EcoRI])
[]
left_end_position() Tuple[int, int][source]

The index in the full sequence of the watson and crick start positions.

full sequence (str(self)) for all three cases is AAA

AAA              AA               AAT
 TT             TTT               TTT
Returns (0, 1)  Returns (1, 0)    Returns (0, 0)
right_end_position() Tuple[int, int][source]

The index in the full sequence of the watson and crick end positions.

full sequence (str(self)) for all three cases is AAA

` AAA               AA                   AAA TT                TTT                  TTT Returns (3, 2)    Returns (2, 3)       Returns (3, 3) `

get_cut_parameters(cut: Tuple[Tuple[int, int], _AbstractCut | None] | None, is_left: bool) Tuple[int, int, int][source]

For a given cut expressed as ((cut_watson, ovhg), enz), returns a tuple (cut_watson, cut_crick, ovhg).

  • cut_watson: see get_cutsites docs

  • cut_crick: equivalent of cut_watson in the crick strand

  • ovhg: see get_cutsites docs

The cut can be None if it represents the left or right end of the sequence. Then it will return the position of the watson and crick ends with respect to the “full sequence”. The is_left parameter is only used in this case.

apply_cut(left_cut: Tuple[Tuple[int, int], _AbstractCut | None], right_cut: Tuple[Tuple[int, int], _AbstractCut | None]) Dseq[source]

Extracts a subfragment of the sequence between two cuts.

For more detail see the documentation of get_cutsite_pairs.

Parameters:
Return type:

Dseq

Examples

>>> from Bio.Restriction import EcoRI
>>> from pydna.dseq import Dseq
>>> dseq = Dseq('aaGAATTCaaGAATTCaa')
>>> cutsites = dseq.get_cutsites([EcoRI])
>>> cutsites
[((3, -4), EcoRI), ((11, -4), EcoRI)]
>>> p1, p2, p3 = dseq.get_cutsite_pairs(cutsites)
>>> p1
(None, ((3, -4), EcoRI))
>>> dseq.apply_cut(*p1)
Dseq(-7)
aaG
ttCTTAA
>>> p2
(((3, -4), EcoRI), ((11, -4), EcoRI))
>>> dseq.apply_cut(*p2)
Dseq(-12)
AATTCaaG
    GttCTTAA
>>> p3
(((11, -4), EcoRI), None)
>>> dseq.apply_cut(*p3)
Dseq(-7)
AATTCaa
    Gtt
>>> dseq = Dseq('TTCaaGAA', circular=True)
>>> cutsites = dseq.get_cutsites([EcoRI])
>>> cutsites
[((6, -4), EcoRI)]
>>> pair = dseq.get_cutsite_pairs(cutsites)[0]
>>> pair
(((6, -4), EcoRI), ((6, -4), EcoRI))
>>> dseq.apply_cut(*pair)
Dseq(-12)
AATTCaaG
    GttCTTAA
get_cutsite_pairs(cutsites: List[Tuple[Tuple[int, int], _AbstractCut | None]]) List[Tuple[None | Tuple[Tuple[int, int], _AbstractCut | None], None | Tuple[Tuple[int, int], _AbstractCut | None]]][source]

Returns pairs of cutsites that render the edges of the resulting fragments.

A fragment produced by restriction is represented by a tuple of length 2 that may contain cutsites or None:

  • Two cutsites: represents the extraction of a fragment between those two cutsites, in that orientation. To represent the opening of a circular molecule with a single cutsite, we put the same cutsite twice.

  • None, cutsite: represents the extraction of a fragment between the left edge of linear sequence and the cutsite.

  • cutsite, None: represents the extraction of a fragment between the cutsite and the right edge of a linear sequence.

Parameters:

cutsites (list[tuple[tuple[int,int], _AbstractCut]]) –

Return type:

list[tuple[tuple[tuple[int,int], _AbstractCut]|None],tuple[tuple[int,int], _AbstractCut]|None]

Examples

>>> from Bio.Restriction import EcoRI
>>> from pydna.dseq import Dseq
>>> dseq = Dseq('aaGAATTCaaGAATTCaa')
>>> cutsites = dseq.get_cutsites([EcoRI])
>>> cutsites
[((3, -4), EcoRI), ((11, -4), EcoRI)]
>>> dseq.get_cutsite_pairs(cutsites)
[(None, ((3, -4), EcoRI)), (((3, -4), EcoRI), ((11, -4), EcoRI)), (((11, -4), EcoRI), None)]
>>> dseq = Dseq('TTCaaGAA', circular=True)
>>> cutsites = dseq.get_cutsites([EcoRI])
>>> cutsites
[((6, -4), EcoRI)]
>>> dseq.get_cutsite_pairs(cutsites)
[(((6, -4), EcoRI), ((6, -4), EcoRI))]