Welcome to
The toolkit for assessing long read sequencing data.
Look here and
here for sample reports
Download
Get the source code from here, or type
git clone https://github.com/grabherr/InSeqt.git
in a terminal window.
Build
cd into your local directory (cd Inseqt) and type make.
Run
The main executable is run like so:
./InSeqt
This will list all available options.
Run ./InSeqt [command] to see the options of each individual command.
InSeqt basic
To get a quick and basic idea of what the fundamentals of the data
look like, run
BasicStats -i <file_or_list> -o <output_dir> > out
where you either provide a single fastq file, or a list of files,
with one entry per line. Follow this up by
cd <output_dir>
./InSeqt report -i ../out
and it will generate an HTML file with basic statistics, length and
GC distribution etc.
InSeqt lapcands
To first generate a list of read-read overlap candidates, run
multiple instances of
FindOverlapCands -i <input> -o <outfile> -c
<chunks> -n <chunkindex> -f <fraction>
where -c specifies how many chunks you process (equals the number of
instances you run), -n the index of the chunk, and -f the fraction
of reads - as an absolute value, not a percentage. For example, to
analyze a set of overlaps using 5 processes on 1% of the data, run
FindOverlapCands -i <input> -o <outfile> -c 5 -n 0 -f
0.01 > overlaps0 &
FindOverlapCands -i <input> -o <outfile> -c 5 -n 1 -f
0.01 > overlaps1 &
FindOverlapCands -i <input> -o <outfile> -c 5 -n 2 -f
0.01 > overlaps2 &
FindOverlapCands -i <input> -o <outfile> -c 5 -n 3 -f
0.01 > overlaps3 &
FindOverlapCands -i <input> -o <outfile> -c 5 -n 4 -f
0.01 > overlaps4 &
Once done, concatenate the overlaps:
cat overlaps* > all_overlaps
To get a distribution of overlaps, run:
LapStats -i all_overlaps -r overlaps0.allreadnames -o
<output_dir>