This version of OCA adds the functionality of Boolean searches and wild card characters to 'extend' the search.

OCA allows the user to rapidly search through the contents of the entire PDB Archive for entries obeying certain constraints. A full text search can be made for any string appearing in the text of a PDB entry, excluding the coordinate records and PDB record names. Many specific records can be searched for regular expressions or numerical limits. OCA gives you the option of saving object sets resulting from queries. This saved set can be used as a starting point for further database operations or as a reference for your work. Every saved set includes the date of the search and the query from which it was generated.

The various fields shown on the OCA Advanced Search screen will be ANDed together before searching. All fields are insensitive to case.

It is a good idea to use the "Clear form" button before initiating any new searches. Once the constraints are set, click on the "Search" button to begin.

The full text searches used by OCA are based on the Glimpse indexing and query system. Once the search has been concluded, you may choose to view an entry using Rasmol, MAGE, or a VRML browser. Links to other resources such as SCOP and Entrez are presented when they exist. Any or all of the files may be displayed and downloaded with just the header information or complete with the coordinates, in PDB or mmCIF format.


OCA Overview

Simply enter you search string into the desired fields and click on the Search button. Hits found within the released entries are listed first, then any hits from the pending entries. If there are no hits, the browser may provide suggestions of similar words from which you can make a choice, or you can enter another word for a subsequent search.

Following a search, OCA presents you with the field searched, number of hits, your query and our suggestions, along with an output message and the chance to choose a suggested term an begin a new search. You may choose to refine your search if it results in more than 100 hits. You may download the list of returned ID codes for further reference, or select one ID code and retrieve it.

Retrieval presents you with the Atlas page. From this you may display and download the header information of the PDB file or the complete coordinate entry, in PDB or mmCIF format. If the entry has an associated structure factor or NMR restraints file, a PDB-generated biomol file, or a MacroMolecule(s) file created by EBI, you may view the data from here.

You may choose to view a structure using Rasmol, MAGE, or a VRML browser installed on your computer.

Links to other Web resources such as SCOP, NDB, Entrez, PDBREPORT, MMDB, ESTHER, PDBSUM, and DALI are presented when they exist, and these links are updated nightly.

Finally, you may enter another PDB ID code or return to the OCA main page from the bottom of the Atlas page.

The search fields of OCA are:

 PDB ID code            Four-character accession code
 
 Keyword                Molecule name, class or family, or related term
 
 Author                 Family name of depositor or author of associated
                        publication
                                 
 Text query             Any word in the complete PDB text 

 FASTA Search           Fasta search of the sequence
 
 Experiment             Method of structure determination
 
 Resolution             A unique value or range of values, in Angstroms
                                  
 Space group            Both extended and standard Hermann-Mauguin symbols
                                  
 Organism               Trivial name, systematic name or expression
                        system.

 Date (lower)           Date entry was released or updated
  
 Date (upper)           Date entry was released or updated

 Associated group       Prosthetic group, metal ion, ligand or substrate, or 
                        its three letter PDB abbreviation

  Boolean Searches and Wild Cards This version of OCA includes NOT, OR and 'Wild-cards' search.

The symbol '*' is used to denote a sequence of any number (including 0) of arbitrary characters. Just add a star'*' at the beginning or end of a word (or both) to 'extend' the search.

For example enter 

  "*ussman" in authors, 
  "*tox*    in keywords to retrieve entries with keywords like 
            neurotoxic and toxin 
  "phos*"   in Assoc.group.

Examples for NOT search are: 

   Author:        not sussman 
   Keyword:       -antifreeze 
   Organism:      -snake 
   Assoc. group:  -hem 
   Text query:    milk not sugar 

   NOTE: 1. You may use the word 'not' (case is unimportant) or 
            the minus (-) sign attached to the word 
         2. sussman NOT (harel Silman) will expand to 
            sussman -harel -silman 

Examples for OR search are:

  Keyword:      *ferr* or *hemo* 
  Author:       silman or harel 
  Text:         zinc AND (torpedo OR snake ) 
  Organism:     snake or torpedo 
  Assoc. group: *dibromo* or atp 
  Space group:  p31 or p 2 


Searching by PDB ID Code

This is a fast and simple way of finding a particular PDB entry. A PDB ID code or accession code is an identification code consisting of four characters. The first is a digit in the range 0 - 9, the remaining three are alpha-numeric.

You may use "*" or "." in place of any character, such as '9.' or '9*' to retrieve a list of PDB ID codes starting with "9", or '1.ce' or '1*ce' to retrieve a list of ID codes starting with "1" and ending with "ce".


Keyword Query

This searches only the HEADER, TITLE, KEYWDS and COMPND records, or fields, of the PDB entry. These fields contain the classification, title of the experiments, classification and related terms, and molecule names, respectively.

Searching on multiple terms in this field causes them to be 'anded' together, giving the same result as using 'and' between the terms. For instance, 'hemoglobin deoxy ferrous' and 'hemoglobin and deoxy and ferrous' have the same result.


Author Query

This field performs a search on the family names found in AUTHOR and JRNL AUTHOR fields. More than one name will be 'anded' for the search, and both fields are searched. The wild card '*' may be used. Stemming, or using just a portion of the word does not work.

For instance, on December 1, 1997, 'suss' returned no hits, 'suss*' returned 20 hits, as does 'sussman'. 'sussman and mathews' returned 0 hits, and 'sussman or mathews' returned 49 hits. 


Text Query

The full text search is based on the Glimpse indexing and query system.

This fields searches any word in the complete PDB file, not including the coordinate section, and not including the PDB record names. See the PDB Contents Guide for a complete description of the PDB format.

To get an idea of the power and speed of the browser, enter in the Text query field one of the following examples and press the Search button.

     zinc and torpedo or snake 
     (zinc and torpedo) or snake
In the first case, the query is interpreted as 'zinc and (torpedo or snake)'.


FASTA Search

Searchs a library containing all the protein sequences in the current PDB, using the FASTA package, for similar protein sequences to your's,

DATA FORMAT:

A sequence in any of the following "formats" can be entered (copy and paste) in the 'FASTA search' text input area. Spaces or character case are not important. A minimum of seven residue names must be entered.

1. Three letters code sequence:
   ---------------------------
ASN CYS GLN GLN TYR VAL ASP GLU GLN PHE PRO GLY PHE
SER GLY SER GLU MET TRP ASN PRO ASN ARG GLU MET SER
GLU ASP CYS LEU TYR LEU ASN ILE TRP VAL PRO SER PRO
ARG PRO LYS SER THR THR VAL MET VAL TRP ILE TYR GLY

2. One letter code sequence:
   ------------------------
NCQQYVDEQFPGFSGSEMWNPNREMS
EDCLYLNIWVPSPRPKSTTVMVWIYG
CUTOFF VALUE:

The 'cutoff value' limits the number of scores and alignments shown based on the expected number of scores. A cutoff value of 2.0 (-E 2.0) will show all library sequences with scores with an expectation value <= 2.0.

For protein searches, library sequences with E() values < 0.01 for searches of a 10,000 entry protein database are almost always homologous. Frequently sequences with E()-values from 1 - 10 are related as well. Remember, however, that these E() values also reflect differences between the amino acid composition of the query sequence and that of the "average" library sequence. Thus, when searches are done with query sequences with "biased" amino-acid composition, unrelated sequences may have "significant" scores because of sequence bias.

A file containing the sequence of every chain in the PDB in FASTA format can be found at ftp://ftp.rcsb.org/pub/pdb/derived_data/pdb_seqres.txt

FASTA is available from ftp://ftp.virginia.edu/pub/fasta/.

FASTA: Pearson, W.R. and Lipman, D.J. Improved tools for biological sequence comparison. Proc. Natl. Acad. Sci. U.S.A. 85:2444-2448(1988)


Experiment

Method of structure determination. Use the pop-up button to choose diffraction, nmr, theoretical model, or all techniques to constrain your search. The EXPDTA records of the PDB files are searched.


Resolution

You may enter a range, such as '2.17-2.20' for an inclusive range search, or a unique value, such as '3.0', in Angstroms. The REMARK 2 record is searched.


Space group

The CRYST1 record is searched. Both extended and standard Hermann-Mauguin symbols are recognized. Entering either 'P 21' or 'P 1 21 1' currently returns 738 hits.


Organism

The scientific and common names and the expression systems as found on the SOURCE records in the PDB file are searched.


Date (lower), Date (upper)

These refer to the date an entry was released or updated, in DAY-MONTH-YEAR format, using either '/' or '-' as separators. The month can be entered as a 3-letter name, as in 9/Sep/1986, or as a number, as in 30-11-1990.

01-Dec-1997 is inserted as the default in the Date lower field.


Associated group

You may search for a prosthetic group, metal ion, ligand or substrate, by chemical name or its three-letter PDB residue name. The PDB file HET and HETNAM records are searched.
See the The PDB Het Group Dictionary for complete descriptions of the het groups currently in use.


Send your comments, suggestions, and bug reports to Jaime Prilusky at Jaime.Prilusky@weizmann.ac.il.