inSeqt main page

Welcome to

The toolkit for assessing long read sequencing data.

Look here and here for sample reports

Download

Get the source code from here, or type

git clone https://github.com/grabherr/InSeqt.git

in a terminal window.

Build

cd into your local directory (cd Inseqt) and type make.

Run

The main executable is run like so:
./InSeqt
This will list all available options.
Run ./InSeqt [command] to see the options of each individual command.

InSeqt basic

To get a quick and basic idea of what the fundamentals of the data look like, run

BasicStats -i <file_or_list> -o <output_dir> > out

where you either provide a single fastq file, or a list of files, with one entry per line. Follow this up by

cd <output_dir>
./InSeqt report -i ../out

and it will generate an HTML file with basic statistics, length and GC distribution etc.

InSeqt lapcands

To first generate a list of read-read overlap candidates, run multiple instances of

FindOverlapCands -i <input> -o <outfile> -c <chunks> -n <chunkindex> -f <fraction>

where -c specifies how many chunks you process (equals the number of instances you run), -n the index of the chunk, and -f the fraction of reads - as an absolute value, not a percentage. For example, to analyze a set of overlaps using 5 processes on 1% of the data, run

FindOverlapCands -i <input> -o <outfile> -c 5 -n 0 -f 0.01 > overlaps0 &
FindOverlapCands -i <input> -o <outfile> -c 5 -n 1 -f 0.01 > overlaps1 &
FindOverlapCands -i <input> -o <outfile> -c 5 -n 2 -f 0.01 > overlaps2 &
FindOverlapCands -i <input> -o <outfile> -c 5 -n 3 -f 0.01 > overlaps3 &
FindOverlapCands -i <input> -o <outfile> -c 5 -n 4 -f 0.01 > overlaps4 &

Once done, concatenate the overlaps:

cat overlaps* > all_overlaps

To get a distribution of overlaps, run:

LapStats -i all_overlaps -r overlaps0.allreadnames -o <output_dir>