Difference between revisions of "NCBI"

From Organic Design wiki
(Adding info)
(FASTA example)
Line 7: Line 7:
 
*The unique [http://www.ncbi.nlm.nih.gov/Sitemap/samplerecord.html#AccessionB ACCESSION] number
 
*The unique [http://www.ncbi.nlm.nih.gov/Sitemap/samplerecord.html#AccessionB ACCESSION] number
 
*The ORIGIN field
 
*The ORIGIN field
*Any amino acid [http://www.ncbi.nlm.nih.gov/Sitemap/samplerecord.html#TranslationB /translation] in the [http://www.ncbi.nlm.nih.gov/Sitemap/samplerecord.html#FeaturesB FEATURE] table
+
*Any amino acid [http://www.ncbi.nlm.nih.gov/Sitemap/samplerecord.html#TranslationB /translation] field in the [http://www.ncbi.nlm.nih.gov/Sitemap/samplerecord.html#FeaturesB FEATURE] table
  
 +
There is a condensed file format called a [http://www.ncbi.nlm.nih.gov/blast/fasta.shtml FASTA] format is usded to manipulate primary sequence information. FASTA files can be ''nucleotide'' or ''amino acid'' records
 +
''An example of an ''amino acid'' FASTA record''
 +
<pre>
 +
>gi|532319|pir|TVFV2E|TVFV2E envelope protein
 +
ELRLRYCAPAGFALLKCNDADYDGFKTNCSNVSVVHCTNLMNTTVTTGLLLNGSYSENRT
 +
QIWQKHRTSNDSALILLNKHYNLTVTCKRPGNKTVLPVTIMAGLVFHSQKYNLRLRQAWC
 +
HFPSNWKGAWKEVKEEIVNLPKERYRGTNDPKRIFFQRQWGDPETANLWFNCHGEFFYCK
 +
MDWFLNYLNNLTVDADHNECKNTSGTKSGNKRAPGPCVQRTYVACHIRSVIIWLETISKK
 +
TYAPPREGHLECTSTVTGMTVELNYIPKNRTNVTLSPQIESIWAAELDRYKLVEITPIGF
 +
APTEVRRYTGGHERQKRVPFVXXXXXXXXXXXXXXXXXXXXXXVQSQHLLAGILQQQKNL
 +
LAAVEAQQQMLKLTIWGVK
 +
</pre>
 
==See also==
 
==See also==
 
* [http://www.ncbi.nlm.nih.gov/Genbank/index.html NCBI GenBank Overview]
 
* [http://www.ncbi.nlm.nih.gov/Genbank/index.html NCBI GenBank Overview]

Revision as of 04:56, 5 May 2007

Genbank is a flat file database structure for primary nucleotide sequence information and auxillary information. These records can display the ORIGIN information for different nucleotide molecular types, and have no limit on the length of sequence displayed. Entire chromosomes can be stored as a genbank record for an organism of interest, potentially making the disk storage of the record very large.

An example Genbank sample record

Regular expressions matching parts we care about

There is a condensed file format called a FASTA format is usded to manipulate primary sequence information. FASTA files can be nucleotide or amino acid records An example of an amino acid FASTA record

>gi|532319|pir|TVFV2E|TVFV2E envelope protein
ELRLRYCAPAGFALLKCNDADYDGFKTNCSNVSVVHCTNLMNTTVTTGLLLNGSYSENRT
QIWQKHRTSNDSALILLNKHYNLTVTCKRPGNKTVLPVTIMAGLVFHSQKYNLRLRQAWC
HFPSNWKGAWKEVKEEIVNLPKERYRGTNDPKRIFFQRQWGDPETANLWFNCHGEFFYCK
MDWFLNYLNNLTVDADHNECKNTSGTKSGNKRAPGPCVQRTYVACHIRSVIIWLETISKK
TYAPPREGHLECTSTVTGMTVELNYIPKNRTNVTLSPQIESIWAAELDRYKLVEITPIGF
APTEVRRYTGGHERQKRVPFVXXXXXXXXXXXXXXXXXXXXXXVQSQHLLAGILQQQKNL
LAAVEAQQQMLKLTIWGVK

See also