Data Formats

The DIP database provides access to the interaction and sequence data in a variety of formats:

PSI-MI (MIF) - Molecular Interaction XML Format co-developed by us and other interaction data providers participating in the HUPO Proteomics Standards Initiative (HUPO-PSI) is a rich, XML-based format that can be used to describe interactions between a wide range of molecular types, for example nucleic acids, chemical entities, and molecular complexes. It can capture extensive details about each supported molecular interaction, including the biological and experimental role of each molecule within that interaction and detailed description of interacting domains.

We provide access to the sets of original curation records created by DIP curators according to the IMEx curation rules. These, together with the records imported from other active members of the IMEx consortium, constitute the primary, most complete set of the DIP experimental data. In addition, we generate number of files that organize experimental information about protein-protein interactions known to DIP into species-specific datasets of varying confidence.

MITAB2.5 - MITAB is a simpler, tab-delimited format developed within the HUPO Proteomics Standards Initiative (HUPO-PSI) for the benefit of users who require only minimal information in an easy to access configuration.

FASTA - Sequences of the proteins participating in interactions reported by DIP are provided in the form of FASTA files. The sequence of each protein, together with its DIP, RefSeq and Uniprot accessions is presented as a modified FASTA entry:

      >dip:DIP-310N|refseq:NP_116614|uniprot:P60010
      MDSEVAALVIDNGSGMCKAGFAGDDAPRAVFPSIVGRPRHQGIMVGMGQKDSYVGDEAQSKRGILTLRYPIEHGI    
      VTNWDDMEKIWHHTFYNELRVAPEEHPVLLTEAPMNPKSNREKMTQIMFETFNVPAFYVSIQAVLSLYSSGRTTG    
      IVLDSGDGVTHVVPIYAGFSLPHAILRIDLAGRDLTDYLMKILSERGYSFSTTAEREIVRDIKEKLCYVALDFEQ    
      EMQTAAQSSSIEKSYELPDGQVITIGNERFRAPEALFHPSVLGLESAGIDQTTYNSIMKCDVDVRKELYGNIVMS    
      GGTTMFPGIAERMQKEITALAPSSMKVKIIAPPERKYSVWIGGSILASLTTFQQMWISKQEYDESGPSIVHHKCF


XIN (legacy) - This is a legacy XML format that was used before introduction of PSI-MI (MIF). All information that was available in the original XIN files is now provided within the newest PSI-MI and MITAB files. A rudimentary description of the XIN format can be found here.