pydna.utils

Miscellaneous functions.

pydna.utils.three_frame_orfs(dna: str, limit: int = 100, startcodons: tuple = ('ATG',), stopcodons: tuple = ('TAG', 'TAA', 'TGA'))[source]

Overlapping orfs in three frames.

pydna.utils.shift_location(original_location, shift, lim)[source]

docstring.

pydna.utils.shift_feature(feature, shift, lim)[source]

Return a new feature with shifted location.

pydna.utils.smallest_rotation(s)[source]

Smallest rotation of a string.

Algorithm described in Pierre Duval, Jean. 1983. Factorizing Words over an Ordered Alphabet. Journal of Algorithms & Computational Technology 4 (4) (December 1): 363–381. and Algorithms on strings and sequences based on Lyndon words, David Eppstein 2011. https://gist.github.com/dvberkel/1950267

Examples

>>> from pydna.utils import smallest_rotation
>>> smallest_rotation("taaa")
'aaat'
pydna.utils.cai(seq: str, organism: str = 'sce', weights: dict = _weights)[source]

docstring.

pydna.utils.rarecodons(seq: str, organism='sce')[source]

docstring.

pydna.utils.express(seq: str, organism='sce')[source]

docstring.

NOT IMPLEMENTED YET

pydna.utils.open_folder(pth)[source]

docstring.

pydna.utils.rc(sequence: StrOrBytes) StrOrBytes[source]

Reverse complement.

accepts mixed DNA/RNA

pydna.utils.complement(sequence: str)[source]

Complement.

accepts mixed DNA/RNA

pydna.utils.memorize(filename)[source]

Cache functions and classes.

see pydna.download

pydna.utils.identifier_from_string(s: str) str[source]

Return a valid python identifier.

based on the argument s or an empty string

pydna.utils.flatten(*args) List[source]

Flattens an iterable of iterables.

Down to str, bytes, bytearray or any of the pydna or Biopython seq objects

pydna.utils.seq31(seq)[source]

Turn a three letter code protein sequence into one with one letter code.

The single input argument ‘seq’ should be a protein sequence using single letter codes, as a python string.

This function returns the amino acid sequence as a string using the one letter amino acid codes. Output follows the IUPAC standard (including ambiguous characters B for “Asx”, J for “Xle” and X for “Xaa”, and also U for “Sel” and O for “Pyl”) plus “Ter” for a terminator given as an asterisk.

Any unknown character (including possible gap characters), is changed into ‘Xaa’.

Examples

>>> from Bio.SeqUtils import seq3
>>> seq3("MAIVMGRWKGAR*")
'MetAlaIleValMetGlyArgTrpLysGlyAlaArgTer'
>>> from pydna.utils import seq31
>>> seq31('MetAlaIleValMetGlyArgTrpLysGlyAlaArgTer')
'M  A  I  V  M  G  R  W  K  G  A  R  *'
pydna.utils.randomRNA(length, maxlength=None)[source]

docstring.

pydna.utils.randomDNA(length, maxlength=None)[source]

docstring.

pydna.utils.randomORF(length, maxlength=None)[source]

docstring.

pydna.utils.randomprot(length, maxlength=None)[source]

docstring.

pydna.utils.eq(*args, **kwargs)[source]

Compare two or more DNA sequences for equality.

Compares two or more DNA sequences for equality i.e. if they represent the same double stranded DNA molecule.

Parameters:
  • args (iterable) – iterable containing sequences args can be strings, Biopython Seq or SeqRecord, Dseqrecord or dsDNA objects.

  • circular (bool, optional) – Consider all molecules circular or linear

  • linear (bool, optional) – Consider all molecules circular or linear

Returns:

eq – Returns True or False

Return type:

bool

Notes

Compares two or more DNA sequences for equality i.e. if they represent the same DNA molecule.

Two linear sequences are considiered equal if either:

  1. They have the same sequence (case insensitive)

  2. One sequence is the reverse complement of the other

Two circular sequences are considered equal if they are circular permutations meaning that they have the same length and:

  1. One sequence can be found in the concatenation of the other sequence with itself.

  2. The reverse complement of one sequence can be found in the concatenation of the other sequence with itself.

The topology for the comparison can be set using one of the keywords linear or circular to True or False.

If circular or linear is not set, it will be deduced from the topology of each sequence for sequences that have a linear or circular attribute (like Dseq and Dseqrecord).

Examples

>>> from pydna.dseqrecord import Dseqrecord
>>> from pydna.utils import eq
>>> eq("aaa","AAA")
True
>>> eq("aaa","AAA","TTT")
True
>>> eq("aaa","AAA","TTT","tTt")
True
>>> eq("aaa","AAA","TTT","tTt", linear=True)
True
>>> eq("Taaa","aTaa", linear = True)
False
>>> eq("Taaa","aTaa", circular = True)
True
>>> a=Dseqrecord("Taaa")
>>> b=Dseqrecord("aTaa")
>>> eq(a,b)
False
>>> eq(a,b,circular=True)
True
>>> a=a.looped()
>>> b=b.looped()
>>> eq(a,b)
True
>>> eq(a,b,circular=False)
False
>>> eq(a,b,linear=True)
False
>>> eq(a,b,linear=False)
True
>>> eq("ggatcc","GGATCC")
True
>>> eq("ggatcca","GGATCCa")
True
>>> eq("ggatcca","tGGATCC")
True
pydna.utils.cuts_overlap(left_cut, right_cut, seq_len)[source]
pydna.utils.location_boundaries(loc: SimpleLocation | CompoundLocation)[source]
pydna.utils.locations_overlap(loc1: SimpleLocation | CompoundLocation, loc2: SimpleLocation | CompoundLocation, seq_len)[source]