Last modified on July 3, 2009, at 07:49

DAS specification

Notes on [1]

The DAS network of servers comprises a registry, several reference servers and several annotation servers. Tying these together are the concepts of reference objects and coordinate systems.

Note: The distinction between reference and annotation servers is conceptual rather than physical. That is, a single server instance can in fact play both roles by offering both sequences and annotations of those sequences.

Note: A server may support multiple coordinate systems, provided they do not contain reference objects with the same identifier.

Links

http://www.ebi.ac.uk/~aj/1.6_draft2/documents/spec.html

Rough notes

Notes on the Distributed Annotation System (DAS) specification (see [2])


  • Capabilities [3]
  • Features
  • Sources / DSN
  • IDs / URIs
  • Entry Points



DAS annotations refer to a common "reference sequence" with a set of "entry points". Entry points may correspond to entire chromosomes, a series of contigs, a protein...



The entry points describe the top level items on the reference sequence map. It is possible for each entry point to have substructure, basically a series of subsequences (components) and their start and end points. This structure is recursive. Each annotation is unambiguously located by providing its position as the start and stop positions relative to a "reference sequence." The reference sequence can be one of the entry points, or any of the subsequences within the entry point.

To give a concrete example, the C. elegans reference map consists of six chromosome-length entry points. Each chromosome is formed from several contigs called "superlinks", and each superlink contains one or more smaller contigs called "links". Links in turn are composed of one or more fully-sequenced clones. One could refer to an annotation by specifying its start or stop positions in clone, link, superlink, or chromosome coordinates. The distributed annotation system automatically converts any coordinate system into any other. Because coordinates within clones are more stable to revisions than coordinates within links or chromosomes, it is recommended that annotation coordinates be stored relative to the smallest sequencing unit.

The hierarchy is extensible. If the C. elegans gene predictions were stable, it would make sense to store certain annotations, such as the positions of exons, relative to the transcriptional unit.