pydna.utils
Miscellaneous functions.
- pydna.utils.three_frame_orfs(dna: str, limit: int = 100, startcodons: tuple = ('ATG',), stopcodons: tuple = ('TAG', 'TAA', 'TGA'))[source]
Overlapping orfs in three frames.
- pydna.utils.smallest_rotation(s)[source]
Smallest rotation of a string.
Algorithm described in Pierre Duval, Jean. 1983. Factorizing Words over an Ordered Alphabet. Journal of Algorithms & Computational Technology 4 (4) (December 1): 363–381. and Algorithms on strings and sequences based on Lyndon words, David Eppstein 2011. https://gist.github.com/dvberkel/1950267
Examples
>>> from pydna.utils import smallest_rotation >>> smallest_rotation("taaa") 'aaat'
- pydna.utils.identifier_from_string(s: str) str [source]
Return a valid python identifier.
based on the argument s or an empty string
- pydna.utils.flatten(*args) List [source]
Flattens an iterable of iterables.
Down to str, bytes, bytearray or any of the pydna or Biopython seq objects
- pydna.utils.seq31(seq)[source]
Turn a three letter code protein sequence into one with one letter code.
The single input argument ‘seq’ should be a protein sequence using single letter codes, as a python string.
This function returns the amino acid sequence as a string using the one letter amino acid codes. Output follows the IUPAC standard (including ambiguous characters B for “Asx”, J for “Xle” and X for “Xaa”, and also U for “Sel” and O for “Pyl”) plus “Ter” for a terminator given as an asterisk.
Any unknown character (including possible gap characters), is changed into ‘Xaa’.
Examples
>>> from Bio.SeqUtils import seq3 >>> seq3("MAIVMGRWKGAR*") 'MetAlaIleValMetGlyArgTrpLysGlyAlaArgTer' >>> from pydna.utils import seq31 >>> seq31('MetAlaIleValMetGlyArgTrpLysGlyAlaArgTer') 'M A I V M G R W K G A R *'
- pydna.utils.eq(*args, **kwargs)[source]
Compare two or more DNA sequences for equality.
Compares two or more DNA sequences for equality i.e. if they represent the same double stranded DNA molecule.
- Parameters:
- Returns:
eq – Returns True or False
- Return type:
Notes
Compares two or more DNA sequences for equality i.e. if they represent the same DNA molecule.
Two linear sequences are considiered equal if either:
They have the same sequence (case insensitive)
One sequence is the reverse complement of the other
Two circular sequences are considered equal if they are circular permutations meaning that they have the same length and:
One sequence can be found in the concatenation of the other sequence with itself.
The reverse complement of one sequence can be found in the concatenation of the other sequence with itself.
The topology for the comparison can be set using one of the keywords linear or circular to True or False.
If circular or linear is not set, it will be deduced from the topology of each sequence for sequences that have a linear or circular attribute (like Dseq and Dseqrecord).
Examples
>>> from pydna.dseqrecord import Dseqrecord >>> from pydna.utils import eq >>> eq("aaa","AAA") True >>> eq("aaa","AAA","TTT") True >>> eq("aaa","AAA","TTT","tTt") True >>> eq("aaa","AAA","TTT","tTt", linear=True) True >>> eq("Taaa","aTaa", linear = True) False >>> eq("Taaa","aTaa", circular = True) True >>> a=Dseqrecord("Taaa") >>> b=Dseqrecord("aTaa") >>> eq(a,b) False >>> eq(a,b,circular=True) True >>> a=a.looped() >>> b=b.looped() >>> eq(a,b) True >>> eq(a,b,circular=False) False >>> eq(a,b,linear=True) False >>> eq(a,b,linear=False) True >>> eq("ggatcc","GGATCC") True >>> eq("ggatcca","GGATCCa") True >>> eq("ggatcca","tGGATCC") True