Sometimes the output from Phylip's
distance programs (e.g. protdist version 3.66) will produce output that concatenates two distances together:
3.509929 3.766076296.642222 33.870491 6.012086 6.570648 6.716925 4.990623 3.861747 3.861747 3.964430 3.964430822.377955 3.868161 3.637750267.453401 30.466508 4.428072 4.854979 34.665454 6.859330 5.273613 6.466854 3.548963 3.586986 6.230058126.479800 31.998087
3.509929 3.766076 296.642222 33.870491 6.012086 6.570648 6.716925 4.990623 3.861747 3.861747 3.964430 3.964430 822.377955 3.868161 3.637750 267.453401 30.466508 4.428072 4.854979 34.665454 6.859330 5.273613 6.466854 3.548963 3.586986 6.230058 126.479800 31.998087
I've written a tiny Unix/Linux (and Windows via Cygwin) script that uses the sed tool to fix
this problem. Phylip writes its distances so that there are always
six digits to the right of the decimal point. This script simply
looks for instances where there are six digits following a decimal point,
immediately followed by three digits. It then inserts a space between
the decimal-six-digit group of characters and the three-digit group of
characters. Note that this will also work if the distance is >= 1000, and
thus has four or more numbers preceding its decimal point.
The lines
above were produced by this script. The sed
syntax is
sed -e 's/\(\.[0-9]\{6\}\)\([0-9]\{3\}\)/\1 \2/g' < input > output
physed.sh
physed_check.sh
physed.sh
script would fix. It checks the file until it has found 10 lines that
violate the number format, and then quits. If there are fewer than 10
lines of format violations, or there are no format violations, then
the script will continue to check the entire file. This is especially
useful for a quick check of a large outfile, to determine whether or
not the physed.sh script is needed.
Cheers,
Doug Scofield
Indiana University Department of Biology
Edit for email address