infernal

integron_finder.infernal.expand(replicon, window_beg, window_end, max_elt, df_max, circular, dist_threshold, model_attc_path, max_attc_size=200, min_attc_size=40, evalue_attc=1.0, search_left=False, search_right=False, out_dir='.', cpu=1)[source]

for a given element, we can search on the left hand side (if integrase is on the right for instance) or right hand side (opposite situation) or both side (only integrase or only attC sites)

Parameters:
  • replicon (a Bio.Seq.SeqRecord object.) – The Replicon to annotate
  • window_beg (int) – start of window to search for attc (position of protein)
  • window_end (int) – end of window to search for attc (position of protein)
  • max_elt (pandas.DataFrame object) –

    DataFrame with columns:

    Accession_number cm_attC  cm_debut  cm_fin   pos_beg   pos_end sens   evalue
    

    and each row is an occurrence of attc site

  • df_max (pandas.DataFrame object) –

    DataFrame with columns

    Accession_number cm_attC  cm_debut  cm_fin   pos_beg   pos_end sens   evalue
    

    and each row is an occurrence of attc site

  • circular (bool) – True if replicon topology is circular otherwise False.
  • dist_threshold (int) – Two elements are aggregated if they are distant of dist_threshold [4kb] or less
  • max_attc_size (int) – The maximum value for the attC size
  • min_attc_size (int) – The minimum value for the attC size
  • model_attc_path (str) – the path to the attc model file
  • evalue_attc (float) – evalue threshold to filter out hits above it
  • search_left (bool) – trigger the local_max search on the left of the already detected element
  • search_right (bool) – trigger the local_max search on the right of the already detected element
  • out_dir (str) – The path to directory where to write results
  • cpu (int) – the number of cpu use by expand
Returns:

a copy of max_elt with attC hits

Return type:

pandas.DataFrame object

integron_finder.infernal.find_attc(replicon_path, replicon_id, cmsearch_path, out_dir, model_attc, incE=1.0, cpu=1)[source]

Call cmsearch to find attC sites in a single replicon.

Parameters:
  • replicon_path (str) – the path of the fasta file representing the replicon to analyse.
  • replicon_id (str) – the id of the replicon to analyse.
  • cmsearch_path (str) – the path to the cmsearch executable.
  • out_dir (str) – the path to the directory where cmsearch outputs will be stored.
  • model_attc (str) – path to the attc model (Covariance Matrix).
  • incE (float) – consider sequences <= this E-value threshold as significant (to get the alignment with -A)
  • cpu (int) – the number of cpu used by cmsearch.
Returns:

None, the results are written on the disk.

Raises:

RuntimeError – when cmsearch run failed.

integron_finder.infernal.local_max(replicon, window_beg, window_end, model_attc_path, strand_search='both', evalue_attc=1.0, max_attc_size=200, min_attc_size=40, cmsearch_bin='cmsearch', out_dir='.', cpu_nb=1)[source]
Parameters:
  • replicon (Bio.Seq.SeqRecord object.) – The name of replicon (without suffix)
  • window_beg (int) – Start of window to search for attc (position of protein).
  • window_end (int) – End of window to search for attc (position of protein).
  • model_attc_path (str) – The path to the covariance model for attc (eg: attc_4.cm) used by cmsearch to find attC sites
  • strand_search (str) –

    The strand on which to looking for attc. Available values:

    • ’top’: Only search the top (Watson) strand of target sequences.
    • ’bottom’: Only search the bottom (Crick) strand of target sequences
    • ’both’: search on both strands
  • evalue_attc (float) – evalue threshold to filter out hits above it
  • max_attc_size (int) – The maximum value fot the attC size
  • min_attc_size (int) – The minimum value fot the attC size
  • cmsearch_bin (str) – The path to cmsearch
  • out_dir (str) – The path to directory where to write results
  • cpu_nb (int) – The number of cpu used by cmsearch
Returns:

DataFrame with same structure as the DataFrame returns by read_infernal() where position are converted on position on replicon and attc are filtered by evalue, min_attc_size, max_attc_size also write a file with intermediate results <replicon_id>_subseq_attc_table_end.res this file store the local_max results before filtering by max_attc_size and min_attc_size

Return type:

pandas.DataFrame object

integron_finder.infernal.read_infernal(infile, replicon_id, len_model_attc, evalue=1, size_max_attc=200, size_min_attc=40)[source]

Function that parse cmsearch –tblout output and returns a pandas DataFrame

Parameters:
  • infile (str) – the path to the output of cmsearch in tabulated format (–tblout)
  • replicon_id (str) – the id of the replicon are the integrons were found.
  • len_model_attc (int) – the length of the attc model
  • evalue (float) – evalue threshold to filter out hits above it
  • size_max_attc (int) – The maximum value fot the attC size
  • size_min_attc (int) – The minimum value fot the attC size
Returns:

table with columns:

”Accession_number”, “cm_attC”, “cm_debut”, “cm_fin”, “pos_beg”, “pos_end”, “sens”, “evalue”
and each row is a hit that match the attc covariance model.

Return type:

pandas.DataFrame object