Experimental Design Guide
This guide explains how to configure your experimental design table to get the most accurate results from your proteomics analysis. The experimental design defines how your raw files relate to each other—which files are replicates, which belong to the same sample, and how they should be grouped for statistical comparison.
Table of Contents
Section titled “Table of Contents”- Overview
- Field Definitions
- How Experimental Design Affects Your Results
- Common Experiment Scenarios
- Using the AI Chat Assistant
- Best Practices
- Troubleshooting
Overview
Section titled “Overview”When you upload raw files for analysis, you can define an experimental design that tells the system how your files relate to each other. This information is critical for:
- Accurate FDR calculations — Properly counting unique peptide and protein identifications
- Differential analysis — Comparing protein abundances between treatment groups
- Quantification — Aggregating intensities across fractions and replicates
Each raw file in your experiment gets assigned four properties:
| Field | Purpose |
|---|---|
sample | Identifies the biological source |
replicate | Identifies replicates within a treatment |
fraction | Identifies fractions from pre-fractionated samples |
treatment_group | Labels for differential analysis comparisons |
Field Definitions
Section titled “Field Definitions”Sample
Section titled “Sample”The sample field identifies the biological source of your data. Files with the same sample value are considered to come from the same biological unit.
Key behaviors:
- Files with the same
sampleANDreplicatewill have their peptide identifications combined across fractions - Use unique sample identifiers when each file represents a different biological sample
- Use the same sample identifier when multiple files come from the same biological source (e.g., fractionation)
Examples:
Patient_001,Patient_002,Patient_003— Each patient is a unique samplePool_A,Pool_A,Pool_A— Same pooled sample run as multiple fractions
Replicate
Section titled “Replicate”The replicate field identifies technical or biological replicates. Replicates are repeated measurements that help assess reproducibility and statistical power.
Key behaviors:
- Files with the same
samplebut differentreplicatevalues are treated as independent measurements - Replicate information is used for grouping during FDR calculation
- Can be used as a covariate in differential analysis to control for batch effects
Types of replicates:
- Technical replicates: Same sample measured multiple times (same biological source)
- Biological replicates: Different samples from the same condition (different biological sources)
Fraction
Section titled “Fraction”The fraction field identifies files that come from pre-fractionated samples. When a single sample is separated into multiple fractions before mass spectrometry analysis, each fraction is run as a separate file.
Key behaviors:
- Peptides identified in multiple fractions of the same sample/replicate count as ONE identification during FDR calculation
- Intensities can be aggregated across fractions for quantification
- Use sequential numbering (1, 2, 3, 4…) for fraction identifiers
When to use fractionation:
- Offline fractionation (e.g., high-pH reversed-phase, SCX)
- Size-exclusion chromatography
- Any workflow where one sample produces multiple raw files
Treatment Group
Section titled “Treatment Group”The treatment_group field defines the experimental conditions for differential analysis. This is how you specify which samples should be compared against each other.
Key behaviors:
- Used by the differential analysis algorithms to define comparison groups
- Requires at least 2 groups with 2+ files each for statistical analysis
- Labels should be descriptive (e.g., “Control”, “Disease”, “Treated”, “Untreated”)
Examples:
ControlvsDiseaseBaselinevsWeek4vsWeek8VehiclevsDrug_LowDosevsDrug_HighDose
How Experimental Design Affects Your Results
Section titled “How Experimental Design Affects Your Results”Peptide-Level FDR
Section titled “Peptide-Level FDR”During peptide-level false discovery rate (FDR) calculation, identifications are grouped by sample and replicate.
Effect: If the same peptide is identified in multiple fractions of the same sample/replicate, it counts as one identification, not multiple. This prevents inflating your peptide counts when using fractionation.
Example:
- Peptide
PEPTIDEKfound in Fraction 1 of Sample_A, Replicate 1 - Peptide
PEPTIDEKfound in Fraction 3 of Sample_A, Replicate 1 - Result: Counts as 1 unique peptide identification
Protein-Level FDR
Section titled “Protein-Level FDR”During protein-level FDR calculation, identifications are grouped by treatment_group, sample, and replicate.
Effect: This ensures proper tracking of which proteins are identified in which replicates, enabling accurate assessment of identification reproducibility across your experiment.
Common Experiment Scenarios
Section titled “Common Experiment Scenarios”Scenario A: Simple Case-Control Study
Section titled “Scenario A: Simple Case-Control Study”Setup: 6 samples, 3 controls and 3 disease cases, no fractionation, no technical replicates.
| File | sample | replicate | fraction | treatment_group |
|---|---|---|---|---|
| Control_1.raw | Control_1 | 1 | 1 | Control |
| Control_2.raw | Control_2 | 2 | 1 | Control |
| Control_3.raw | Control_3 | 3 | 1 | Control |
| Disease_1.raw | Disease_1 | 1 | 1 | Disease |
| Disease_2.raw | Disease_2 | 2 | 1 | Disease |
| Disease_3.raw | Disease_3 | 3 | 1 | Disease |
Notes:
- Each file is a unique biological sample
- Sample names are unique across all files
- Replicate numbers are sequential within each treatment group
- Fraction is set to 1 for all files (no fractionation)
Scenario B: Fractionated Samples
Section titled “Scenario B: Fractionated Samples”Setup: 2 samples, each pre-fractionated into 4 fractions (8 total files).
| File | sample | replicate | fraction | treatment_group |
|---|---|---|---|---|
| Sample1_F1.raw | Sample1 | 1 | 1 | Treatment |
| Sample1_F2.raw | Sample1 | 1 | 2 | Treatment |
| Sample1_F3.raw | Sample1 | 1 | 3 | Treatment |
| Sample1_F4.raw | Sample1 | 1 | 4 | Treatment |
| Sample2_F1.raw | Sample2 | 2 | 1 | Treatment |
| Sample2_F2.raw | Sample2 | 2 | 2 | Treatment |
| Sample2_F3.raw | Sample2 | 2 | 3 | Treatment |
| Sample2_F4.raw | Sample2 | 2 | 4 | Treatment |
Notes:
- Same
samplevalue for all fractions from the same biological sample - Same
replicatevalue for all fractions from the same sample - Different
fractionvalues (1-4) to identify each fraction - Peptides identified across fractions will be properly combined
Scenario C: Technical Replicates
Section titled “Scenario C: Technical Replicates”Setup: 2 biological samples, each run 3 times (6 total files).
| File | sample | replicate | fraction | treatment_group |
|---|---|---|---|---|
| Sample1_Run1.raw | Sample1 | 1 | 1 | Control |
| Sample1_Run2.raw | Sample1 | 2 | 1 | Control |
| Sample1_Run3.raw | Sample1 | 3 | 1 | Control |
| Sample2_Run1.raw | Sample2 | 1 | 1 | Disease |
| Sample2_Run2.raw | Sample2 | 2 | 1 | Disease |
| Sample2_Run3.raw | Sample2 | 3 | 1 | Disease |
Notes:
- Same
samplefor all runs of the same biological sample - Different
replicatevalues for each technical replicate run - This design allows assessment of technical variability
Scenario D: Biological Replicates with Fractionation
Section titled “Scenario D: Biological Replicates with Fractionation”Setup: 4 biological samples (2 per condition), each fractionated into 3 fractions (12 total files).
| File | sample | replicate | fraction | treatment_group |
|---|---|---|---|---|
| Ctrl_Bio1_F1.raw | Ctrl_Bio1 | 1 | 1 | Control |
| Ctrl_Bio1_F2.raw | Ctrl_Bio1 | 1 | 2 | Control |
| Ctrl_Bio1_F3.raw | Ctrl_Bio1 | 1 | 3 | Control |
| Ctrl_Bio2_F1.raw | Ctrl_Bio2 | 2 | 1 | Control |
| Ctrl_Bio2_F2.raw | Ctrl_Bio2 | 2 | 2 | Control |
| Ctrl_Bio2_F3.raw | Ctrl_Bio2 | 2 | 3 | Control |
| Treat_Bio1_F1.raw | Treat_Bio1 | 1 | 1 | Treatment |
| Treat_Bio1_F2.raw | Treat_Bio1 | 1 | 2 | Treatment |
| Treat_Bio1_F3.raw | Treat_Bio1 | 1 | 3 | Treatment |
| Treat_Bio2_F1.raw | Treat_Bio2 | 2 | 1 | Treatment |
| Treat_Bio2_F2.raw | Treat_Bio2 | 2 | 2 | Treatment |
| Treat_Bio2_F3.raw | Treat_Bio2 | 2 | 3 | Treatment |
Notes:
- Each biological replicate has a unique
samplename - Fractions from the same sample share the same
sampleandreplicate replicatenumbers are sequential within each treatment group- This design combines fractionation with biological replication
Scenario E: Time-Course Experiment
Section titled “Scenario E: Time-Course Experiment”Setup: 3 subjects measured at 3 time points (9 total files).
| File | sample | replicate | fraction | treatment_group |
|---|---|---|---|---|
| Subject1_T0.raw | Subject1_T0 | 1 | 1 | Baseline |
| Subject1_T1.raw | Subject1_T1 | 1 | 1 | Week4 |
| Subject1_T2.raw | Subject1_T2 | 1 | 1 | Week8 |
| Subject2_T0.raw | Subject2_T0 | 2 | 1 | Baseline |
| Subject2_T1.raw | Subject2_T1 | 2 | 1 | Week4 |
| Subject2_T2.raw | Subject2_T2 | 2 | 1 | Week8 |
| Subject3_T0.raw | Subject3_T0 | 3 | 1 | Baseline |
| Subject3_T1.raw | Subject3_T1 | 3 | 1 | Week4 |
| Subject3_T2.raw | Subject3_T2 | 3 | 1 | Week8 |
Notes:
- Each sample is unique (different time points are different samples)
- Subjects are tracked via replicate numbering
- Treatment groups represent time points for comparison
Using the AI Chat Assistant
Section titled “Using the AI Chat Assistant”The AI Chat assistant can leverage your experimental design for sophisticated analyses beyond standard processing.
Basic Capabilities
Section titled “Basic Capabilities”- Filter data by treatment groups — “Show me only the Disease samples”
- Run differential analysis — “Compare Control vs Disease groups”
- Generate volcano plots — “Create a volcano plot comparing the treatment groups”
Advanced Analyses with Replicate and Fraction Data
Section titled “Advanced Analyses with Replicate and Fraction Data”1. Aggregating Intensities Across Fractions
Section titled “1. Aggregating Intensities Across Fractions”When samples are pre-fractionated, the Chat can combine quantification data from multiple fractions belonging to the same sample/replicate.
Example prompts:
- “Aggregate protein intensities across all fractions for each sample”
- “Sum the intensities from fractions 1-4 for Sample_A”
- “Calculate total protein abundance per sample by combining fractions”
2. Using Replicate Information as Covariates
Section titled “2. Using Replicate Information as Covariates”Replicate numbers can be included as covariates in differential analysis to control for technical batch effects.
Example prompts:
- “Run differential analysis controlling for replicate batch effects”
- “Include replicate as a covariate in the comparison”
- “Account for run-to-run variability in the analysis”
Why this matters: Including replicate as a covariate can improve statistical power by separating true biological differences from technical variation introduced by different MS runs.
3. Reproducibility Analysis Across Replicates
Section titled “3. Reproducibility Analysis Across Replicates”Assess the consistency of your measurements across technical or biological replicates.
Example prompts:
- “Calculate the CV across replicates for each protein”
- “Which proteins have the highest variability between replicates?”
- “Show me the overlap of identified proteins between replicates”
- “Plot the correlation of intensities between replicate 1 and replicate 2”
4. Sample-Level vs File-Level Analysis
Section titled “4. Sample-Level vs File-Level Analysis”Choose the appropriate level of analysis based on your experimental design.
Example prompts:
- “Summarize protein identifications at the sample level” (combines fractions)
- “Compare identification counts between individual files”
- “Analyze technical vs biological variability using the replicate structure”
5. Custom Grouping for Complex Designs
Section titled “5. Custom Grouping for Complex Designs”For complex experimental designs, you can request custom aggregation strategies.
Example prompts:
- “Group data by subject and time point”
- “Calculate mean intensity per biological replicate”
- “Compare variability within subjects vs between subjects”
Best Practices
Section titled “Best Practices”-
Use descriptive, consistent naming — Choose sample names that are clear and follow a consistent pattern (e.g.,
Patient_001,Patient_002) -
Verify your design before analysis — Double-check that all files have the correct sample, replicate, fraction, and treatment_group assignments
-
Use sequential numbering — For replicates and fractions, use sequential integers (1, 2, 3…) for clarity
-
Balance your groups — For differential analysis, try to have similar numbers of samples in each treatment group
-
Document your design — Keep a record of what each sample represents for future reference
Don’ts
Section titled “Don’ts”-
Don’t mix fractionated and non-fractionated samples — If some samples are fractionated, keep the experimental design consistent
-
Don’t use the same sample name for different biological sources — Each unique biological sample should have a unique sample identifier
-
Don’t forget to set fraction values — Even if you don’t use fractionation, set fraction to “1” for all files
Common Mistakes to Avoid
Section titled “Common Mistakes to Avoid”| Mistake | Problem | Solution |
|---|---|---|
| Same sample name for different patients | FDR calculation will incorrectly combine identifications | Use unique sample names per biological source |
| Missing fraction values | May cause processing errors | Set fraction to “1” for non-fractionated samples |
| Inconsistent treatment_group labels | Differential analysis won’t work correctly | Use exact, consistent labels (“Control” not “control” or “CONTROL”) |
| Only 1 sample per treatment group | Cannot calculate statistics | Ensure at least 2 samples per group for differential analysis |
Summary
Section titled “Summary”Proper experimental design configuration is essential for accurate proteomics analysis. Remember:
- sample — Unique identifier for each biological source
- replicate — Identifies repeated measurements
- fraction — Links files from pre-fractionated samples
- treatment_group — Defines groups for differential comparison
When in doubt, refer to the common scenarios above to find a setup similar to your experiment, and adapt the naming scheme to your specific samples.
For additional help, the AI Chat assistant can answer questions about your specific experimental design and suggest the best configuration for your analysis goals.