Untitled Document

Data standards

Overview

Standardisation of data entities is of increasing value in ensuring researchers and end-users are able to navigate the world of brassica big data. The adoption of FAIR (Findable Accessible Interoperable and Re-usable) data principles underlies the abilitie of humans, machines and cyborgs to make use of relevbant data and make meaningful connections. Several initives are underway of relevance to the brassica research community. These include development of Trait Dictionaries, Ontologies and standardisation of gene nomenclature (names).

Standardisation of gene model nomenclature for reference Brassica genomes


Previous Version 1: 2014-18, now deprecated

A distinct nomenclature standard for gene-model annotation was established and provisionally ratified by the MBGP Steering Group in 2014. In July 2018, it was decided to review this in light of pan-genome complexity and need for a pragmatic system that would be resilient for the coming decade. (see version 2 above)

Following discussions within the MBGP steering group and publication of reference genomes in 2014, the following standard was adopted for naming of gene models assigned to pseduo-chromosome sequences.



Functional gene nomenclature

syntax

<GENUS 1 LETTER> [<species 2 letters>]<GENOME 1 LETTER>|<X>.<NAME 3-6 LETTER CODE>.<locus assignment 1 letter>

where < > surrounds categories, [ ] indicates an optional item and | denotes "or". When referring to gene names, the string is italicized, whilst the corresponding protein name is not. For information - Gene Class Symbol list for Arabidopsis (TAIR)