We try to figure out this problem by calculation four genome data.
Species | Family | Order | blat result size default option(Mb) | add blat_option = "-minIdentity=30" | add blat_option= "-minIdentity=30 -t =dnax -q=dnax" | #bl2seq_option = "" | #bl2seq_option = -q -2 / -r 2 and min_length=50 min_identity=0.6 |
divergence age |
D.mel(CDS) | Drosophilidae | Diptera | / | / | / | 4578 (mRNA) | / | / |
D.sim(v1.4) | Drosophilidae | Diptera | 23M | / | / | 3958 results (3957 arevaluable*, useD.mel chr4 ) | 4147 results (4094 are valuable, useD.mel chr4 ) | ~3 MY |
D.pse(v2.29) | Drosophilidae | Diptera | 13M | 27M | / | 397 results (389 are valuable, useD.mel chr4 ) | 655 results (493 are valuable, use D.mel chr4 ) | ~26.5 MY |
Apis mellifera(v2.0.15 not rm) | Apidae | Hymenoptera | 7.5M | 12M | 82M | 6 results (only one is valuable) with defaly option (use D.melchr4 ) | 239 (33 are valuable) | >>60 MY |
Aedes aegypti(v1.15 not rm) | Culicidae | Diptera | 4.9M | 9.4M | 92M | 8 results with default option (use D.melchr4 ) | 169 (93 is valuable) | >>60 MY |
* valuable means that the Ka, Ks is the number, not the "nan"
# these data is from the blat result without any option
From this table, we may give an conclusion that this pipe line can works for species that divergence less that 30MY. And it should be suitable for the genomes that divergence less that 10 MY.
If you want to try to use this pipe line to calculate the genomes with far evolutionary relationship, you'd better add the option blat_option = "-minIdentity=30 -t =dnax -q=dnax", add the bl2seq_option = "-q -2" or add the bl2seq_option = " -r 2", and reduce the option min_length and min_identity.