I see that there are a lot of positions where you it says "no good hit " and "more than one hit " . Is there anyway I can minimize this.
since this pipe line was designed for ka/ks calculation for orthologous in close relative species or subspecies, so, we didn't give specific options for blat or bl2seq, if you want to let blat or bl2seq to find the not so high similarity hits, you can add options at line:
- 438 system "$blast2seq more-options -p blastn -i tmpdna1.fa -j tmpdna2.fa -D 1 -o tmp.fas.txt";
- 534 system "$blast2seq more-options -p blastn -i tmpdna1.fa -j tmpdna2.fa -D 1 -o tmp.fas.txt";
- 578 system "$blast2seq more-options -p blastn -i tmpdna1.fa -j tmpdna2.fa -D 1 -o tmp.fas.txt";
- 621 system "$blast2seq more-options -p blastn -i tmpdna1.fa -j tmpdna2.fa -D 0 -o tmp.fas.gap.txt";
But i didn't test on it, You can try it.
"more than one hit" meens that when the exon-CDS sequences bl2seq to the genome DNA, there're more than one hit in the genome DNA, the pipe line will get the best one to calculate;
if you want to ignore these data, you can use the shell script
grep -v " hit" problem_locs.log
to get the info of
grep -v " hit" problem_locs.log
to get the info of
- ===reverse=== :this means some exon-CDS have hit on both strand of the genome DNA, this kind of data was skiped.
- ===shift=== :frame shift was detected
- ===both-stop=== : both CDS and genome DNA meet stop-codon, this kind of data probabily was caused by not well annotation, or the gene used was non-coding gene, you'd better check mannually,
- ===hit-stop=== : the genome DNA meet stop-codon, this is normal and is easy to understand.
- ===query-stop=== :this kind of data should be very rare, if it happend, you'd better check mannually.