pydna.dseqrecord

This module provides the Dseqrecord class, for handling double stranded DNA sequences. The Dseqrecord holds sequence information in the form of a pydna.dseq.Dseq object. The Dseq and Dseqrecord classes are subclasses of Biopythons Seq and SeqRecord classes, respectively.

The Dseq and Dseqrecord classes support the notion of circular and linear DNA topology.

class pydna.dseqrecord.Dseqrecord(record, *args, circular=None, n=5e-14, **kwargs)[source]

Bases: SeqRecord

Dseqrecord is a double stranded version of the Biopython SeqRecord [1] class. The Dseqrecord object holds a Dseq object describing the sequence. Additionally, Dseqrecord hold meta information about the sequence in the from of a list of SeqFeatures, in the same way as the SeqRecord does.

The Dseqrecord can be initialized with a string, Seq, Dseq, SeqRecord or another Dseqrecord. The sequence information will be stored in a Dseq object in all cases.

Dseqrecord objects can be read or parsed from sequences in FASTA, EMBL or Genbank formats. See the pydna.readers and pydna.parsers modules for further information.

There is a short representation associated with the Dseqrecord. Dseqrecord(-3) represents a linear sequence of length 2 while Dseqrecord(o7) represents a circular sequence of length 7.

Dseqrecord and Dseq share the same concept of length. This length can be larger than each strand alone if they are staggered as in the example below.

<-- length -->
GATCCTTT
     AAAGCCTAG

Parameters:

record (string, Seq, SeqRecord, Dseq or other Dseqrecord object) – This data will be used to form the seq property
circular (bool, optional) – True or False reflecting the shape of the DNA molecule
linear (bool, optional) – True or False reflecting the shape of the DNA molecule

Examples

>>> from pydna.dseqrecord import Dseqrecord
>>> a=Dseqrecord("aaa")
>>> a
Dseqrecord(-3)
>>> a.seq
Dseq(-3)
aaa
ttt
>>> from pydna.seq import Seq
>>> b=Dseqrecord(Seq("aaa"))
>>> b
Dseqrecord(-3)
>>> b.seq
Dseq(-3)
aaa
ttt
>>> from Bio.SeqRecord import SeqRecord
>>> c=Dseqrecord(SeqRecord(Seq("aaa")))
>>> c
Dseqrecord(-3)
>>> c.seq
Dseq(-3)
aaa
ttt

References

classmethod from_string(record: str = '', *args, circular=False, n=5e-14, **kwargs)[source]: docstring.

classmethod from_SeqRecord(record: SeqRecord, *args, circular=None, n=5e-14, **kwargs)[source]

property circular: The circular property can not be set directly. Use looped()

m()[source]: This method returns the mass of the DNA molecule in grams. This is calculated as the product between the molecular weight of the Dseq object and the

extract_feature(n)[source]

Extracts a feature and creates a new Dseqrecord object.

Parameters:: n (int) – Indicates the feature to extract

Examples

>>> from pydna.dseqrecord import Dseqrecord
>>> a=Dseqrecord("atgtaa")
>>> a.add_feature(2,4)
>>> b=a.extract_feature(0)
>>> b
Dseqrecord(-2)
>>> b.seq
Dseq(-2)
gt
ca

add_feature(x=None, y=None, seq=None, type_='misc', strand=1, *args, **kwargs)[source]

Add a feature of type misc to the feature list of the sequence.

Parameters:

x (int) – Indicates start of the feature
y (int) – Indicates end of the feature

Examples

>>> from pydna.seqrecord import SeqRecord
>>> a=SeqRecord("atgtaa")
>>> a.features
[]
>>> a.add_feature(2,4)
>>> a.features
[SeqFeature(SimpleLocation(ExactPosition(2), ExactPosition(4), strand=1), type='misc', qualifiers=...)]

seguid()[source]

Url safe SEGUID for the sequence.

This checksum is the same as seguid but with base64.urlsafe encoding instead of the normal base64. This means that the characters + and / are replaced with - and _ so that the checksum can be part of a URL.

Examples

>>> from pydna.dseqrecord import Dseqrecord
>>> a = Dseqrecord("aa")
>>> a.seguid()
'ldseguid=TEwydy0ugvGXh3VJnVwgtxoyDQA'

looped()[source]

Circular version of the Dseqrecord object.

The underlying linear Dseq object has to have compatible ends.

Examples

>>> from pydna.dseqrecord import Dseqrecord
>>> a=Dseqrecord("aaa")
>>> a
Dseqrecord(-3)
>>> b=a.looped()
>>> b
Dseqrecord(o3)
>>>

See also

pydna.dseq.Dseq.looped

tolinear()[source]

Returns a linear, blunt copy of a circular Dseqrecord object. The underlying Dseq object has to be circular.

This method is deprecated, use slicing instead. See example below.

Examples

>>> from pydna.dseqrecord import Dseqrecord
>>> a=Dseqrecord("aaa", circular = True)
>>> a
Dseqrecord(o3)
>>> b=a[:]
>>> b
Dseqrecord(-3)
>>>

terminal_transferase(nucleotides='a')[source]: docstring.

format(f='gb')[source]

Returns the sequence as a string using a format supported by Biopython SeqIO [2]. Default is “gb” which is short for Genbank.

Examples

>>> from pydna.dseqrecord import Dseqrecord
>>> x=Dseqrecord("aaa")
>>> x.annotations['date'] = '02-FEB-2013'
>>> x
Dseqrecord(-3)
>>> print(x.format("gb"))
LOCUS       name                       3 bp    DNA     linear   UNK 02-FEB-2013
DEFINITION  description.
ACCESSION   id
VERSION     id
KEYWORDS    .
SOURCE      .
  ORGANISM  .
            .
FEATURES             Location/Qualifiers
ORIGIN
        1 aaa
//

References

write(filename=None, f='gb')[source]

Writes the Dseqrecord to a file using the format f, which must be a format supported by Biopython SeqIO for writing [3]. Default is “gb” which is short for Genbank. Note that Biopython SeqIO reads more formats than it writes.

Filename is the path to the file where the sequece is to be written. The filename is optional, if it is not given, the description property (string) is used together with the format.

If obj is the Dseqrecord object, the default file name will be:

<obj.locus>.<f>

Where <f> is “gb” by default. If the filename already exists and AND the sequence it contains is different, a new file name will be used so that the old file is not lost:

<obj.locus>_NEW.<f>

References

find(other)[source]

find_aminoacids(other)[source]

>>> from pydna.dseqrecord import Dseqrecord
>>> s=Dseqrecord("atgtacgatcgtatgctggttatattttag")
>>> s.seq.translate()
Seq('MYDRMLVIF*')
>>> "RML" in s
True
>>> "MMM" in s
False
>>> s.seq.rc().translate()
Seq('LKYNQHTIVH')
>>> "QHT" in s.rc()
True
>>> "QHT" in s
False
>>> slc = s.find_aa("RML")
>>> slc
slice(9, 18, None)
>>> s[slc]
Dseqrecord(-9)
>>> code = s[slc].seq
>>> code
Dseq(-9)
cgtatgctg
gcatacgac
>>> code.translate()
Seq('RML')

find_aa(other)

>>> from pydna.dseqrecord import Dseqrecord
>>> s=Dseqrecord("atgtacgatcgtatgctggttatattttag")
>>> s.seq.translate()
Seq('MYDRMLVIF*')
>>> "RML" in s
True
>>> "MMM" in s
False
>>> s.seq.rc().translate()
Seq('LKYNQHTIVH')
>>> "QHT" in s.rc()
True
>>> "QHT" in s
False
>>> slc = s.find_aa("RML")
>>> slc
slice(9, 18, None)
>>> s[slc]
Dseqrecord(-9)
>>> code = s[slc].seq
>>> code
Dseq(-9)
cgtatgctg
gcatacgac
>>> code.translate()
Seq('RML')

map_trace_files(pth, limit=25)[source]

linearize(*enzymes)[source]

Similar to :func:cut.

Throws an exception if there is not excactly one cut i.e. none or more than one digestion products.

no_cutters(batch: RestrictionBatch = None)[source]: docstring.

unique_cutters(batch: RestrictionBatch = None)[source]: docstring.

once_cutters(batch: RestrictionBatch = None)[source]: docstring.

twice_cutters(batch: RestrictionBatch = None)[source]: docstring.

n_cutters(n=3, batch: RestrictionBatch = None)[source]: docstring.

cutters(batch: RestrictionBatch = None)[source]: docstring.

number_of_cuts(*enzymes)[source]: The number of cuts by digestion with the Restriction enzymes contained in the iterable.

cas9(RNA: str)[source]: docstring.

reverse_complement()[source]

Reverse complement.

Examples

>>> from pydna.dseqrecord import Dseqrecord
>>> a=Dseqrecord("ggaatt")
>>> a
Dseqrecord(-6)
>>> a.seq
Dseq(-6)
ggaatt
ccttaa
>>> a.reverse_complement().seq
Dseq(-6)
aattcc
ttaagg
>>>

See also

pydna.dseq.Dseq.reverse_complement

rc()

Reverse complement.

Examples

>>> from pydna.dseqrecord import Dseqrecord
>>> a=Dseqrecord("ggaatt")
>>> a
Dseqrecord(-6)
>>> a.seq
Dseq(-6)
ggaatt
ccttaa
>>> a.reverse_complement().seq
Dseq(-6)
aattcc
ttaagg
>>>

See also

pydna.dseq.Dseq.reverse_complement

synced(ref, limit=25)[source]

This method returns a new circular sequence (Dseqrecord object), which has been rotated in such a way that there is maximum overlap between the sequence and ref, which may be a string, Biopython Seq, SeqRecord object or another Dseqrecord object.

The reason for using this could be to rotate a new recombinant plasmid so that it starts at the same position after cloning. See the example below:

Examples

>>> from pydna.dseqrecord import Dseqrecord
>>> a=Dseqrecord("gaat", circular=True)
>>> a.seq
Dseq(o4)
gaat
ctta
>>> d = a[2:] + a[:2]
>>> d.seq
Dseq(-4)
atga
tact
>>> insert=Dseqrecord("CCC")
>>> recombinant = (d+insert).looped()
>>> recombinant.seq
Dseq(o7)
atgaCCC
tactGGG
>>> recombinant.synced(a).seq
Dseq(o7)
gaCCCat
ctGGGta

upper()[source]

Returns an uppercase copy. >>> from pydna.dseqrecord import Dseqrecord >>> my_seq = Dseqrecord(“aAa”) >>> my_seq.seq Dseq(-3) aAa tTt >>> upper = my_seq.upper() >>> upper.seq Dseq(-3) AAA TTT >>>

Returns:: Dseqrecord object in uppercase
Return type:: Dseqrecord

See also

pydna.dseqrecord.Dseqrecord.lower

lower()[source]

>>> from pydna.dseqrecord import Dseqrecord
>>> my_seq = Dseqrecord("aAa")
>>> my_seq.seq
Dseq(-3)
aAa
tTt
>>> upper = my_seq.upper()
>>> upper.seq
Dseq(-3)
AAA
TTT
>>> lower = my_seq.lower()
>>> lower
Dseqrecord(-3)
>>>

Returns:: Dseqrecord object in lowercase
Return type:: Dseqrecord

See also

pydna.dseqrecord.Dseqrecord.upper

orfs(minsize=300)[source]: docstring.

orfs_to_features(minsize=300)[source]: docstring.

copy_gb_to_clipboard()[source]: docstring.

copy_fasta_to_clipboard()[source]: docstring.

figure(feature=0, highlight='\x1b[48;5;11m', plain='\x1b[0m')[source]: docstring.

shifted(shift)[source]

Circular Dseqrecord with a new origin <shift>.

This only works on circular Dseqrecords. If we consider the following circular sequence:

GAAAT <-- watson strand

CTTTA <-- crick strand

The T and the G on the watson strand are linked together as well as the A and the C of the of the crick strand.

if shift is 1, this indicates a new origin at position 1:

new origin at the | symbol:

G|AAAT

C|TTTA

new sequence:

AAATG

TTTAC

Examples

>>> from pydna.dseqrecord import Dseqrecord
>>> a=Dseqrecord("aaat",circular=True)
>>> a
Dseqrecord(o4)
>>> a.seq
Dseq(o4)
aaat
ttta
>>> b=a.shifted(1)
>>> b
Dseqrecord(o4)
>>> b.seq
Dseq(o4)
aata
ttat

cut(*enzymes)[source]

Digest a Dseqrecord object with one or more restriction enzymes.

returns a list of linear Dseqrecords. If there are no cuts, an empty list is returned.

See also Dseq.cut() :param enzymes: A Bio.Restriction.XXX restriction object or iterable of such. :type enzymes: enzyme object or iterable of such objects

Returns:: Dseqrecord_frags – list of Dseqrecord objects formed by the digestion
Return type:: list

Examples

>>> from pydna.dseqrecord import Dseqrecord
>>> a=Dseqrecord("ggatcc")
>>> from Bio.Restriction import BamHI
>>> a.cut(BamHI)
(Dseqrecord(-5), Dseqrecord(-5))
>>> frag1, frag2 = a.cut(BamHI)
>>> frag1.seq
Dseq(-5)
g
cctag
>>> frag2.seq
Dseq(-5)
gatcc
    g

apply_cut(left_cut, right_cut)[source]