There are currently three H. glycines genomes, but how do their genes relate to one another? I clustered the predicted proteins from these three genomes: TN10 Draft, the TN10 pseudomolecule, and the X12 genome. These families do include all alternatively splicing isoforms, so the gene counts may reflect more genes than are actually present. The purpose of this clustering is to be able to relate each gene from each genome to one another. The software used to accomplish this task was Orthofinder version 2.5.2 and Diamond version 22.214.171.124.
This is an excel file of these orthogroups --> Orthogroups.xlsx
Another method to infer these gene relationships can be had by viewing the alignments of genes from other genomes in JBrowse. Both the X12 and the TN10 pseudomolecule genomes have these gene alignments.
Some of these genes could not be found in the other assemblies, potentially indicating that these are genes from assembly artifacts or are population-specific genes. Here is a list of genes that were missing.