attc

integron_finder.attc.find_attc_max(integrons, replicon, distance_threshold, model_attc_path, max_attc_size, min_attc_size, evalue_attc=1.0, circular=True, out_dir='.', cpu=1)[source]

Look for attC site with cmsearch –max option which remove all heuristic filters. As this option make the algorithm way slower, we only run it in the region around a hit. We call it local_max or eagle_eyes.

Default hit

                 attC
__________________-->____-->_________-->_____________
______<--------______________________________________
         intI
              ^-------------------------------------^
             Search-space with --local_max

Updated hit

                 attC          ***         ***
__________________-->____-->___-->___-->___-->_______
______<--------______________________________________
         intI
Parameters:
  • integrons (list of Integron objects.) – the integrons may contain or not attC or intI.
  • replicon (Bio.Seq.SeqRecord object.) – replicon where the integrons were found (genomic fasta file).
  • distance_threshold (int) – the maximal distance between 2 elements to aggregate them.
  • evalue_attc (float) – evalue threshold to filter out hits above it.
  • model_attc_path (str) – path to the attc model (Covariance Matrix).
  • max_attc_size (int) – maximum value for the attC size.
  • min_attc_size (int) – minimum value for the attC size.
  • circular (bool) – True if replicon is circular, False otherwise.
  • out_dir (str) – The directory where to write results used indirectly by some called functions as infernal.local_max() or infernal.expand.
  • cpu (int) – call local_max with the right number of cpu
Returns:

Return type:

pd.DataFrame object

integron_finder.attc.search_attc(attc_df, keep_palindromes, dist_threshold, replicon_size)[source]

Parse the attc data set (sorted along start site) for the given replicon and return list of arrays. One array is composed of attC sites on the same strand and separated by a distance less than dist_threshold.

Parameters:
  • attc_df (pandas.DataFrame) –
  • keep_palindromes (bool) – True if the palindromes must be kept in attc result, False otherwise
  • dist_threshold (int) – the maximal distance between 2 elements to aggregate them
  • replicon_size (int) – the replicon number of base pair
Returns:

a list attC sites found on replicon

Return type:

list of pandas.DataFrame objects