Tutorial

We assume here that the program is installed.

Basic use

Note

The different options will be shown separately, but they can be used alltogether unless otherwise stated.

You can see all available options with:

integron_finder -h

You can go to directory containing your sequence, or specify the path to that sequence and call:

integron_finder mysequence.fst

or:

integron_finder path/to/mysequence.fst

It will perform a search, and outputs the results in a directory called Results_Integron_Finder_mysequence. Within this directory, you can find:

  • mysequence.integrons

    A tabular file with the annotations of the different elements

  • mysequence.gbk

    A GenBank file with the sequence annotated with the same annotations from the previous file.

  • mysequence_X.pdf

    For each complete integron, a simple graphic of the region is depicted

  • other

    A folder containing outputs of the different step in the program. It includes notably the protein file in fasta (mysequence.prt).

Thorough local detection

This option allows a more sensitive search. It will be slower if integrons are found, but will be as fast if nothing is detected:

integron_finder mysequence.fst --local_max

Functional annotation

This option allows to annotate cassettes given HMM profiles. As Resfams database is distributed, to annotate antibiotic resistance genes, just use:

integron_finder mysequence.fst --func_annot

IntegronFinder will look in the directory Integron_Finder-x.x/data/Functional_annotation and use all .hmm files available to annotate. By default, there is only Resfams.hmm, but one can add any other HMM file here. Alternatively, if one wants to use a database which is present elsewhere on the user’s computer without copying it into that directory, one can specify the following option:

integron_finder mysequence.fst --path_func_annot bank_hmm

where bank_hmm is a file containing one absolute path to a hmm file per line, and you can comment out a line:

~/Downloads/Integron_Finder-x.x/data/Functional_annotation/Resfams.hmm
~/Documents/Data/Pfam-A.hmm
# ~/Documents/Data/Pfam-B.hmm

Here, annotation will be made using Pfam-A et Resfams, but not Pfam-B. If a protein is hit by 2 different profiles, the one with the best e-value will be kept.

Parallelization

The time limiting part are HMMER and INFERNAL. So IntegronFinder does not have parallel implementation (yet?), but the user can set the number of CPU used by HMMER and INFERNAL:

integron_finder mysequence.fst --cpu 4

Default is 1.

Circularity

By default, IntegronFinder assumes your replicon to be circular. However, if they aren’t, or if it’s PCR fragments or contigs, you can specify that it’s a linear fragment:

integron_finder mylinearsequence.fst --linear

However, if --linear is not used and the replicon is smaller than 4 x dt (where dt is the distance threshold, so 4kb by default), the replicon is considered linear to avoid clustering problem

Advanced options

Clustering of elements

attC sites are clustered together if they are on the same strand and if they are less than 4 kb apart. To cluster an array of attC sites and an integron integrase, they also must be less than 4 kb apart. This value has been empirically estimated and is consistent with previous observations showing that biggest gene cassettes are about 2 kb long. This value of 4 kb can be modify though:

integron_finder mysequence.fst --distance_thresh 10000

or, equivalently:

integron_finder mysequence.fst -dt 10000

This sets the threshold for clustering to 10 kb.

Note

The option --outdir allows you to chose the location of the Results folder (Results_Integron_Finder_mysequence). If this folder already exists, IntegronFinder will not re-run analyses already done, except functional annotation. It allows you to re-run rapidly IntegronFinder with a different --distance_threshold value. Functional annotation needs to re-run each time because depending on the aggregation parameters, the proteins associated with an integron might change.

attC evalue

The default evalue is 1. Sometimes, degenerated attC sites can have a evalue above 1 and one may want to increase this value to have a better sensitivity, to the cost of a much higher false positive rate.

integron_finder mysequence.fst --evalue_attc 5

Palindromes

attC sites are more or less palindromic sequences, and sometimes, a single attC site can be detected on the 2 strands. By default, the one with the highest evalue is discarded, but you can choose to keep them with the following option:

integron_finder mysequence.fst --keep_palindromes