Dataset

Download the BIAS-PROFS GPCR dataset (often referred to as the "GDS" in publications) using the link below:
Download (1161kb, zip file)

Please cite this dataset using the following reference in any derivative publications:
On the hierarchical classification of G Protein-Coupled Receptors. Matthew N. Davies, Andrew Secker, Alex A. Freitas, Miguel Mendao, Jon Timmis and Darren R. Flower. (2007) pp. 3113-3118. Bioinformatics 23(23), December 2007: Oxford Journals.

Each class in the dataset is represented as a single text file. Within each file, each sequence has a header then the sequence on the following line. Each sequence spans one line only. This can make it slightly easier for automated processing of the dataset, however if viewing the files in a text-viewer, word wrap or line wrap must be turned on.

Notes

A brief summary of the class breakdown is shown in the table below. The dataset contains 8354 sequences in 108 classes. Please note, in some publications classes with fewer than 10 examples are removed. In this case, the dataset will contain 87 classes and 8222 sequences.
The structure of the dataset is shown in the table below, along with the number of example sequences in each class. Classes with fewer than 10 examples are shown in grey only.

ClassA 5526 Adrenergic 95 Adrenergic 95
    Amine 1489 Adrenoreceptor 174
        Dopamine 228
        Histamine 53
        MuscAcetyl 159
        Muscarinicacetylcholine 186
        Octopamine 37
        Serotonin 503
        Traceamine 149
    Anaphylatoxin 30 Anaphylatoxin 30
    Cannabinoid 151 Cannabinoid 151
    Fmetleuphe 7 Fmetleuphe 7
  GRHR 105 AKH 8
      Cora 6
      GRHR 91
  Hormone 205 FollicleStim 66
      Gonadotrophin 93
      Lutropin 9
      Thyrotropin 37
    Interleukin8 67 Interleukin8 67
  Leuko 16 BLT1 5
      BLT2 11
  Lyso 65 LysoEdg2 14
      LysoEdg4 9
      LysoEdg7 7
      SphingoEdg1 7
      SphingoEdg3 5
      SphingoEdg5 8
      SphingoEdg6 6
      SphingoEdg8 9
    Melaton 90 Melaton 90
  Nucleotide 266 Adenosine 93
      Purinergic 173
    Olfactory 90 Olfactory 90
  Peptide 2713 Adrenocorticotropic 18
      Adrenomedullin 15
      Allatostatin 15
      Angiotensin 128
      Bombesin 50
      Bradykinin 77
      C5A 33
      Chemokine 575
      Cholecystokinin 33
      Conopressin 2
      DrostatinC 6
      Duffy 65
      Endothelin 81
      fMetLeuPhe 7
      Galanin 66
      Interleukin8 67
      Kiss1 12
      MelaninConc 32
      Melanocortin 314
      Melanocyte 90
      Mesotocin 5
      Neuromedin 60
      NeuromedinB-U 47
      Neuropeptide 118
      NeuropeptideFF 17
      Neurotensin 35
      Opoid 170
      Orexigenic 7
      Orexin 40
      Oxytocin 41
      Prokineticin 22
      Prolactin 19
      Proteinase 34
      Somatostatin 157
      SubstanceK 12
      SubstanceP 19
      Sulfakinin 4
      Tachykinin 69
      Thrombin 34
      UrotensinII 13
      Vasopressin 93
      Vasotocin 11
    Platelet 21 Platelet 21
  Prostanoid 45 Prostacyclin 12
      Prostaglandin 27
      Thromboxane 6
  Thyro 71 ETHR 2
      Growth 34
      Thyro 35
ClassB 625 BrainSpec 26 BrainSpec 26
    Cadherin 65 Cadherin 65
    Calcitonin 55 Calcitonin 55
    Corticotropin 53 Corticotropin 53
    Diuretic 7 Diuretic 7
    EMR1 14 EMR1 14
    Gastric 13 Gastric 13
    Glucagon 24 Glucagon 24
    GrowthHorm 50 GrowthHorm 50
    Latrophilin 111 Latrophilin 111
    Methuselah 25 Methuselah 25
    PACAP 63 PACAP 63
    Parathyroid 40 Parathyroid 40
    Secretin 18 Secretin 18
    Vasocactive 61 Vasoactive 61
ClassC 2172 BOSS 55 BOSS 55
  CalcSense 588 CalcLike 30
      ExtraCalc 27
      Pheromone 531
    GABA 86 GABA 86
    GlutaMeta 178 GlutaMeta 178
    PutPher 311 PutPher 311
    Taste 954 Taste 954
ClassD 13 Pheromone 13 AlphaFac 13
ClassE 18 cAMP 18 cAMP 18

Total number of sequences - 8354