ZhangLab

This pipe line will be very OK if the gene name or transcript name is generated by regular ID, but for some kind of collection type (glean) annotation, you need to pre-prepared the gtf/gff file carefully.

For myself, i prefer to use the unique ID. Gene name or transcript name some time may be not unique.

Most of the errors appeared because of the files format.

Please make sure the Gene IDs in CDS is exactly the same as in the final column in the gff file,

for example: #Query_cds

>PACid_27043735 #this ID should be exactly the same as the blue one in GFF file

ATGGCTATATCGAAGCTTTTGATTGTTTTTCTTGTCGCATCTCTCCTTGTGCTCCGCCTT

>PACid_27045395

ATGCAGGTGGTGAACTCAAATATTCAAGCTGCTAGCTATCCTCCTGGGAAGAATATCGAT

##gff-version 3
Chr01 phytozome9_0 mRNA 1660 2502 . - . PACid_27043735
Chr01 phytozome9_0 CDS 1660 2502 . - 0 IPACid_27043735

Chr01 phytozome9_0 mRNA 2906 6646 . - . PACid_27045395
Chr01 phytozome9_0 CDS 6501 6644 . - 0 PACid_27045395
Chr01 phytozome9_0 five_prime_UTR 6645 6646 . - . PACid_27045395
Chr01 phytozome9_0 CDS 3506 3928 . - 0 PACid_27045395
Chr01 phytozome9_0 CDS 2906 3475 . - 0 PACid_27045395

FAQ 1 Do you use the gene name or transcript name as a ID???