pydna.seqrecord

A subclass of the Biopython SeqRecord class.

Has a number of extra methods and uses the pydna._pretty_str.pretty_str class instread of str for a nicer output in the IPython shell.

class pydna.seqrecord.SeqRecord(seq, *args, id='id', name='name', description='description', **kwargs)[source]

Bases: SeqRecord

A subclass of the Biopython SeqRecord class.

Has a number of extra methods and uses the pydna._pretty_str.pretty_str class instread of str for a nicer output in the IPython shell.

classmethod from_Bio_SeqRecord(sr: SeqRecord)[source]: Creates a pydnaSeqRecord from a Biopython SeqRecord.

property locus: Alias for name property.

property accession: Alias for id property.

property definition: Alias for description property.

reverse_complement(*args, **kwargs)[source]: Return the reverse complement of the sequence.

rc(*args, **kwargs): Return the reverse complement of the sequence.

isorf(table=1)[source]

Detect if sequence is an open reading frame (orf) in the 5’-3’.

direction.

Translation tables are numbers according to the NCBI numbering [1].

Parameters:: table (int) – Sets the translation table, default is 1 (standard code)
Returns:: True if sequence is an orf, False otherwise.
Return type:: bool

Examples

>>> from pydna.seqrecord import SeqRecord
>>> a=SeqRecord("atgtaa")
>>> a.isorf()
True
>>> b=SeqRecord("atgaaa")
>>> b.isorf()
False
>>> c=SeqRecord("atttaa")
>>> c.isorf()
False

References

translate()[source]: docstring.

add_colors_to_features_for_ape()[source]

Assign colors to features.

compatible with the ApE editor.

add_feature(x=None, y=None, seq=None, type_='misc', strand=1, *args, **kwargs)[source]

Add a feature of type misc to the feature list of the sequence.

Parameters:

x (int) – Indicates start of the feature
y (int) – Indicates end of the feature

Examples

>>> from pydna.seqrecord import SeqRecord
>>> a=SeqRecord("atgtaa")
>>> a.features
[]
>>> a.add_feature(2,4)
>>> a.features
[SeqFeature(SimpleLocation(ExactPosition(2),
                           ExactPosition(4),
                           strand=1),
            type='misc',
            qualifiers=...)]

list_features()[source]

Print ASCII table with all features.

Examples

>>> from pydna.seq import Seq
>>> from pydna.seqrecord import SeqRecord
>>> a=SeqRecord(Seq("atgtaa"))
>>> a.add_feature(2,4)
>>> print(a.list_features())
+-----+---------------+-----+-----+-----+-----+------+------+
| Ft# | Label or Note | Dir | Sta | End | Len | type | orf? |
+-----+---------------+-----+-----+-----+-----+------+------+
|   0 | L:ft2         | --> | 2   | 4   |   2 | misc |  no  |
+-----+---------------+-----+-----+-----+-----+------+------+

extract_feature(n)[source]

Extract feature and return a new SeqRecord object.

Parameters:

n (int) –
extract (Indicates the feature to) –

Examples

>>> from pydna.seqrecord import SeqRecord
>>> a=SeqRecord("atgtaa")
>>> a.add_feature(2,4)
>>> b=a.extract_feature(0)
>>> b
SeqRecord(seq=Seq('gt'), id='ft2', name='part_name',
          description='description', dbxrefs=[])

sorted_features()[source]

Return a list of the features sorted by start position.

Examples

>>> from pydna.seqrecord import SeqRecord
>>> a=SeqRecord("atgtaa")
>>> a.add_feature(3,4)
>>> a.add_feature(2,4)
>>> print(a.features)
[SeqFeature(SimpleLocation(ExactPosition(3), ExactPosition(4),
                           strand=1),
            type='misc', qualifiers=...),
 SeqFeature(SimpleLocation(ExactPosition(2), ExactPosition(4),
                           strand=1),
            type='misc', qualifiers=...)]
>>> print(a.sorted_features())
[SeqFeature(SimpleLocation(ExactPosition(2), ExactPosition(4),
                           strand=1),
            type='misc', qualifiers=...),
 SeqFeature(SimpleLocation(ExactPosition(3), ExactPosition(4),
                           strand=1),
            type='misc', qualifiers=...)]

seguid()[source]

Return the url safe SEGUID [2] for the sequence.

This checksum is the same as seguid but with base64.urlsafe encoding instead of the normal base 64. This means that the characters + and / are replaced with - and _ so that the checksum can be a part of and URL or a filename.

Examples

>>> from pydna.seqrecord import SeqRecord
>>> a=SeqRecord("gattaca")
>>> a.seguid() # original seguid is +bKGnebMkia5kNg/gF7IORXMnIU
'lsseguid=tp2jzeCM2e3W4yxtrrx09CMKa_8'

References

comment(newcomment='')[source]: docstring.

datefunction()[source]: docstring.

stamp(now=datefunction, tool='pydna', separator=' ', comment='')[source]

Add seguid checksum to COMMENTS sections

The checksum is stored in object.annotations[“comment”]. This shows in the COMMENTS section of a formatted genbank file.

For blunt linear sequences:

SEGUID <seguid>

For circular sequences:

cSEGUID <seguid>

Fore linear sequences which are not blunt:

lSEGUID <seguid>

Examples

>>> from pydna.seqrecord import SeqRecord
>>> a = SeqRecord("aa")
>>> a.stamp()
'lsseguid=gBw0Jp907Tg_yX3jNgS4qQWttjU'
>>> a.annotations["comment"][:41]
'pydna lsseguid=gBw0Jp907Tg_yX3jNgS4qQWttj'

lcs(other, *args, limit=25, **kwargs)[source]

Return the longest common substring between the sequence.

and another sequence (other). The other sequence can be a string, Seq, SeqRecord, Dseq or DseqRecord. The method returns a SeqFeature with type “read” as this method is mostly used to map sequence reads to the sequence. This can be changed by passing a type as keyword with some other string value.

Examples

>>> from pydna.seqrecord import SeqRecord
>>> a = SeqRecord("GGATCC")
>>> a.lcs("GGATCC", limit=6)
SeqFeature(SimpleLocation(ExactPosition(0),
                          ExactPosition(6), strand=1),
                          type='read',
                          qualifiers=...)
>>> a.lcs("GATC", limit=4)
SeqFeature(SimpleLocation(ExactPosition(1),
                          ExactPosition(5), strand=1),
                          type='read',
                          qualifiers=...)
>>> a = SeqRecord("CCCCC")
>>> a.lcs("GGATCC", limit=6)
SeqFeature(None)

gc()[source]: Return GC content.

cai(organism='sce')[source]: docstring.

rarecodons(organism='sce')[source]: docstring.

startcodon(organism='sce')[source]: docstring.

stopcodon(organism='sce')[source]: docstring.

express(organism='sce')[source]: docstring.

copy()[source]: docstring.

dump(filename, protocol=None)[source]: docstring.

class pydna.seqrecord.ProteinSeqRecord(seq, *args, id='id', name='name', description='description', **kwargs)[source]

Bases: SeqRecord

reverse_complement(*args, **kwargs)[source]: Return the reverse complement of the sequence.

rc(*args, **kwargs): Return the reverse complement of the sequence.

isorf(*args, **kwargs)[source]

Detect if sequence is an open reading frame (orf) in the 5’-3’.

direction.

Translation tables are numbers according to the NCBI numbering [3].

Parameters:: table (int) – Sets the translation table, default is 1 (standard code)
Returns:: True if sequence is an orf, False otherwise.
Return type:: bool

Examples

>>> from pydna.seqrecord import SeqRecord
>>> a=SeqRecord("atgtaa")
>>> a.isorf()
True
>>> b=SeqRecord("atgaaa")
>>> b.isorf()
False
>>> c=SeqRecord("atttaa")
>>> c.isorf()
False

References

gc()[source]: Return GC content.

cai(*args, **kwargs)[source]: docstring.

rarecodons(*args, **kwargs)[source]: docstring.

startcodon(*args, **kwargs)[source]: docstring.

stopcodon(*args, **kwargs)[source]: docstring.

express(*args, **kwargs)[source]: docstring.