bild
School of
Computer Science
and Communication

avdist: A tool for analyzing haplotype differences

Description

This is a simple tool for bootstrap analysis of haplotype differences. It computes hamming distance between pairs of sequences sampled from the input sequences and presents average difference and standard devitation of the results after some number of iterations. Indels are discarded from the distance calculation.

A previous version of this tool was used for Savolainen et al in Science 2003.

Please cite my name and this web page if you use this tool!

Usage

avdist [<options>] <iter> <samplesize> <ancient> <seqfile>

Arguments

iter
The number of times the sampling is performed.
samplesize
The number of sequences in each sample. Must be less than the number of sequences in the file.
seqfile
Name of file with sequences on simplified GDE or Fasta format. First a sequence identifier prefixed by '\#' or '>', then a new line with the whole sequence on it. Note! The sequence may not wrap over several lines!
ancient
A number between 1 and number of sequences in the file pointing out which sequence the distance is measured to. If 0, we just report the average difference between any pair of sequences in the sample.

Options

-v
Verbose mode. Presents more information on STDERR.
-m
Don't use multiplicity! This will also make your <ancient> sequence to be sampled regardless of a zero multiplicity.
-i
Verbose mode. Presents more information on STDERR.
The multiplicity, that is the number of times it is counted, of a sequence is taken from the id line. For example, in the following sequence file,
>name1 3
   ACGTACGT...
   >name2 50
   AAGTAAGT...
   >ancient 0
   AAGTACGT...
the first sequence is registered three times and the second is registered 50 times.

If you don't want your ancient sequence to be sampled, that is, if it is actually not in the observed sequences, be sure to set the multiplicity of it to zero! See example above.

Availability

avdist is distributed under the GNU General Public License, and is available in a compressed tar file. The software is a Perl script that usually runs without modification on a standard Linux or *nix system.

Published by: Lars Arvestad <arve@csc.kth.se>
Updated 2014-09-24