Pandora

Protein ANnotation Diagram ORiented Analysis

User Set Examples


1. User sets without additional properties (only protein IDs)

1.1 Preparing your file: If your file does not contain additional properties, just supply a list of protein IDs, separated by space / tab / newline / comma / semicolon. Your file should be something like:
Q8WUI4
104K_THEPA
108_LYCES
Q9CQV8
ACT2_ARATH
Q01525
10KD_VIGUN
P13813
1433_MESCR
11S3_HELAN
P42649
AACT_HUMAN
P29993
p23433
Q2M2R6
download this example or Run it

but the following file would also work:

Q8WUI4
104K_THEPA
    108_LYCES,
Q9CQV8  ACT2_ARATH;   Q01525;
10KD_VIGUN;         P13813
1433_MESCR,11S3_HELAN;
P42649;
AACT_HUMAN, 
P29993 p23433
download this example or Run it

1.2 Uploading your file to PANDORA: In PANDORA, choose the "User Set" option. Start by selecting the appropriate background database. Then enter the name of the file to upload, and from the "File Type" pulldown choose "Simple list of protein accession IDs". Specify whether you want to use either SwissProt/TrEMBL protein accession numbers as IDs, or GenBank gene accession numbers (the appropriate protein product of each gene will be selected from the database). You also have the option either to continue or to add some quantitative properties from the database (such as pI or length). Finally proceed by click the "PANDORA" button. You will continue to a webpage that will report how many of the protein IDs have been identified in the PANDORA database, and which proteins IDs were not detected. From there simply proceed by clicking the "PANDORA" button.
If you chose to add quantitative properties, you will see color histograms that describe their distribution on the subsets.


2. User sets with additional properties

2.1 Preparing your file: You can prepare your file by using Excel or any text editor. The first should be divided into columns separated by tabs or by commas (If you are using Excel than it will do this automatically if you choose to save your file in CSV or Tab-delimited formats).
The first row should contain the type of each column (do not worry I will explain). The second row should contain the name of each column. The third row and on should contain the data.
Now let us discuss the various column types:
  • Protein ID column: The first column should ALWAYS be the column that contains the protein accession numbers. You should set the type of this column as "a" (without the quotes) in the first row.
  • Binary property columns: A binary property column is like an annotation. Each protein can either have it or not have it. You can use these columns in several interesting ways, for example to compare two protein sets (assign each set as a binary property). Proteins that do not have this property should be either marked as "-" or "0" or left empty. Any other value will be considered as if the protein does have the property. You should set the type of these columns as "b" (without the quotes) in the first row.
  • Quantitative properties: Quantitative properties can represent any quantitative data that is associated to your proteins. Typically this would be expression values for microarray or proteomic experiments, but there are numerous ways to use this feature (for example you can also look at BLAST E-scores, protein pIs, quantitative functional assays, etc). You can enter for each protein a positive or negative number. Note that you are not required to put quantitative data for all proteins if you do not have it: There is a difference between putting a value of zero for a protein and not putting a value at all (the former means value zero and the latter means no data). You should set the type of these columns as "q" (without the quotes) in the first row.
  • Multiple binary property columns: Multiple binary columns represent a way for the user to enter properties that are discrete but not binary (like properties with a few categories). Another way to view this is like entering a new group of disjoint annotations. For example, if your proteins can be divided into four disjoint groups (A,B,C and D), you would like to use a multiple binary column and set the name of the column to "Groups" with A-D values on the proteins. The column will be translated into several binary columns, each representing a different category. You should set the type of these columns as "m" (without the quotes) in the first row.


Altogether, your file should look something like this:
a,m,q,b,b
Protein,Phenotype,Expression,Toxic,Cancer
HDAC7_HUMAN,lethal,-3,1,yes
AQP10_HUMAN,lethal,-2.2,1,yes
BAFL_HUMAN,disease,-2,,
CARKL_HUMAN,,-2.5,1,
CLN8_HUMAN,disease,-2,1	
CU005_HUMAN,,-1.6,,
DUS15_HUMAN,,,1,		
DYH11_HUMAN,healthy,2,,
EBPL_HUMAN,disease,0.3,1,
EMAL2_HUMAN,healthy,0.5,,
EPIPL_HUMAN,healthy,1.5,,
FMNL_HUMAN,,0,
GSH2_HUMAN,,0,	
IQG1_HUMAN,lethal,1,yes
KHL4_HUMAN,,1,
KR18_HUMAN,disease,-0.1,,
M10L1_HUMAN,healthy,2.1,,
MAL2_HUMAN,healthy,3.3,,
NDF6_HUMAN,healthy,0.12,,
NINJ1_HUMAN,lethal,-0.51,1,
download this example or Run it

In this sample file, we have 15 proteins and 4 user properties on them: 1 multiple binary (Phenotype: healthy/disease/lethal), 1 binary (Toxic and Cancer) and 1 quantitative (Expression). Note that not all proteins have values for all 4 properties.

2.2 Uploading your file to PANDORA: In PANDORA, choose the "User Set" option. Start by selecting the appropriate background database. Then enter the name of the file to upload, and from the "File Type" pulldown choose "Protein IDs with user properties" (you can select CSV format for comma separated files or Tab-Delimited format). Specify whether you want to use either SwissProt/TrEMBL protein accession numbers as IDs, or GenBank gene accession numbers (the appropriate protein product of each gene will be selected from the database). Now you have the option either to continue or to add some quantitative properties from the database (such as pI or length). Finally proceed by click the "PANDORA" button. You will continue to a webpage that will report how many of the protein IDs have been identified in the PANDORA database, and which proteins IDs were not detected. From there simply proceed by clicking the "PANDORA" button.
If you have quantitative properties on your proteins, you will see color histograms that describe their distribution on the subsets.
more
Links

Annotation Systems

Sequence Retrieval

ID mapping

Proteomics MS Databases

Phosphorylation Resources