Skip to content

Experimental Design Guide

This guide explains how to configure your experimental design table to get the most accurate results from your proteomics analysis. The experimental design defines how your raw files relate to each other—which files are replicates, which belong to the same sample, and how they should be grouped for statistical comparison.


When you upload raw files for analysis, you can define an experimental design that tells the system how your files relate to each other. This information is critical for:

  • Accurate FDR calculations — Properly counting unique peptide and protein identifications
  • Differential analysis — Comparing protein abundances between treatment groups
  • Quantification — Aggregating intensities across fractions and replicates

Each raw file in your experiment gets assigned four properties:

FieldPurpose
sampleIdentifies the biological source
replicateIdentifies replicates within a treatment
fractionIdentifies fractions from pre-fractionated samples
treatment_groupLabels for differential analysis comparisons

The sample field identifies the biological source of your data. Files with the same sample value are considered to come from the same biological unit.

Key behaviors:

  • Files with the same sample AND replicate will have their peptide identifications combined across fractions
  • Use unique sample identifiers when each file represents a different biological sample
  • Use the same sample identifier when multiple files come from the same biological source (e.g., fractionation)

Examples:

  • Patient_001, Patient_002, Patient_003 — Each patient is a unique sample
  • Pool_A, Pool_A, Pool_A — Same pooled sample run as multiple fractions

The replicate field identifies technical or biological replicates. Replicates are repeated measurements that help assess reproducibility and statistical power.

Key behaviors:

  • Files with the same sample but different replicate values are treated as independent measurements
  • Replicate information is used for grouping during FDR calculation
  • Can be used as a covariate in differential analysis to control for batch effects

Types of replicates:

  • Technical replicates: Same sample measured multiple times (same biological source)
  • Biological replicates: Different samples from the same condition (different biological sources)

The fraction field identifies files that come from pre-fractionated samples. When a single sample is separated into multiple fractions before mass spectrometry analysis, each fraction is run as a separate file.

Key behaviors:

  • Peptides identified in multiple fractions of the same sample/replicate count as ONE identification during FDR calculation
  • Intensities can be aggregated across fractions for quantification
  • Use sequential numbering (1, 2, 3, 4…) for fraction identifiers

When to use fractionation:

  • Offline fractionation (e.g., high-pH reversed-phase, SCX)
  • Size-exclusion chromatography
  • Any workflow where one sample produces multiple raw files

The treatment_group field defines the experimental conditions for differential analysis. This is how you specify which samples should be compared against each other.

Key behaviors:

  • Used by the differential analysis algorithms to define comparison groups
  • Requires at least 2 groups with 2+ files each for statistical analysis
  • Labels should be descriptive (e.g., “Control”, “Disease”, “Treated”, “Untreated”)

Examples:

  • Control vs Disease
  • Baseline vs Week4 vs Week8
  • Vehicle vs Drug_LowDose vs Drug_HighDose

How Experimental Design Affects Your Results

Section titled “How Experimental Design Affects Your Results”

During peptide-level false discovery rate (FDR) calculation, identifications are grouped by sample and replicate.

Effect: If the same peptide is identified in multiple fractions of the same sample/replicate, it counts as one identification, not multiple. This prevents inflating your peptide counts when using fractionation.

Example:

  • Peptide PEPTIDEK found in Fraction 1 of Sample_A, Replicate 1
  • Peptide PEPTIDEK found in Fraction 3 of Sample_A, Replicate 1
  • Result: Counts as 1 unique peptide identification

During protein-level FDR calculation, identifications are grouped by treatment_group, sample, and replicate.

Effect: This ensures proper tracking of which proteins are identified in which replicates, enabling accurate assessment of identification reproducibility across your experiment.


Setup: 6 samples, 3 controls and 3 disease cases, no fractionation, no technical replicates.

Filesamplereplicatefractiontreatment_group
Control_1.rawControl_111Control
Control_2.rawControl_221Control
Control_3.rawControl_331Control
Disease_1.rawDisease_111Disease
Disease_2.rawDisease_221Disease
Disease_3.rawDisease_331Disease

Notes:

  • Each file is a unique biological sample
  • Sample names are unique across all files
  • Replicate numbers are sequential within each treatment group
  • Fraction is set to 1 for all files (no fractionation)

Setup: 2 samples, each pre-fractionated into 4 fractions (8 total files).

Filesamplereplicatefractiontreatment_group
Sample1_F1.rawSample111Treatment
Sample1_F2.rawSample112Treatment
Sample1_F3.rawSample113Treatment
Sample1_F4.rawSample114Treatment
Sample2_F1.rawSample221Treatment
Sample2_F2.rawSample222Treatment
Sample2_F3.rawSample223Treatment
Sample2_F4.rawSample224Treatment

Notes:

  • Same sample value for all fractions from the same biological sample
  • Same replicate value for all fractions from the same sample
  • Different fraction values (1-4) to identify each fraction
  • Peptides identified across fractions will be properly combined

Setup: 2 biological samples, each run 3 times (6 total files).

Filesamplereplicatefractiontreatment_group
Sample1_Run1.rawSample111Control
Sample1_Run2.rawSample121Control
Sample1_Run3.rawSample131Control
Sample2_Run1.rawSample211Disease
Sample2_Run2.rawSample221Disease
Sample2_Run3.rawSample231Disease

Notes:

  • Same sample for all runs of the same biological sample
  • Different replicate values for each technical replicate run
  • This design allows assessment of technical variability

Scenario D: Biological Replicates with Fractionation

Section titled “Scenario D: Biological Replicates with Fractionation”

Setup: 4 biological samples (2 per condition), each fractionated into 3 fractions (12 total files).

Filesamplereplicatefractiontreatment_group
Ctrl_Bio1_F1.rawCtrl_Bio111Control
Ctrl_Bio1_F2.rawCtrl_Bio112Control
Ctrl_Bio1_F3.rawCtrl_Bio113Control
Ctrl_Bio2_F1.rawCtrl_Bio221Control
Ctrl_Bio2_F2.rawCtrl_Bio222Control
Ctrl_Bio2_F3.rawCtrl_Bio223Control
Treat_Bio1_F1.rawTreat_Bio111Treatment
Treat_Bio1_F2.rawTreat_Bio112Treatment
Treat_Bio1_F3.rawTreat_Bio113Treatment
Treat_Bio2_F1.rawTreat_Bio221Treatment
Treat_Bio2_F2.rawTreat_Bio222Treatment
Treat_Bio2_F3.rawTreat_Bio223Treatment

Notes:

  • Each biological replicate has a unique sample name
  • Fractions from the same sample share the same sample and replicate
  • replicate numbers are sequential within each treatment group
  • This design combines fractionation with biological replication

Setup: 3 subjects measured at 3 time points (9 total files).

Filesamplereplicatefractiontreatment_group
Subject1_T0.rawSubject1_T011Baseline
Subject1_T1.rawSubject1_T111Week4
Subject1_T2.rawSubject1_T211Week8
Subject2_T0.rawSubject2_T021Baseline
Subject2_T1.rawSubject2_T121Week4
Subject2_T2.rawSubject2_T221Week8
Subject3_T0.rawSubject3_T031Baseline
Subject3_T1.rawSubject3_T131Week4
Subject3_T2.rawSubject3_T231Week8

Notes:

  • Each sample is unique (different time points are different samples)
  • Subjects are tracked via replicate numbering
  • Treatment groups represent time points for comparison

The AI Chat assistant can leverage your experimental design for sophisticated analyses beyond standard processing.

  • Filter data by treatment groups — “Show me only the Disease samples”
  • Run differential analysis — “Compare Control vs Disease groups”
  • Generate volcano plots — “Create a volcano plot comparing the treatment groups”

Advanced Analyses with Replicate and Fraction Data

Section titled “Advanced Analyses with Replicate and Fraction Data”

1. Aggregating Intensities Across Fractions

Section titled “1. Aggregating Intensities Across Fractions”

When samples are pre-fractionated, the Chat can combine quantification data from multiple fractions belonging to the same sample/replicate.

Example prompts:

  • “Aggregate protein intensities across all fractions for each sample”
  • “Sum the intensities from fractions 1-4 for Sample_A”
  • “Calculate total protein abundance per sample by combining fractions”

2. Using Replicate Information as Covariates

Section titled “2. Using Replicate Information as Covariates”

Replicate numbers can be included as covariates in differential analysis to control for technical batch effects.

Example prompts:

  • “Run differential analysis controlling for replicate batch effects”
  • “Include replicate as a covariate in the comparison”
  • “Account for run-to-run variability in the analysis”

Why this matters: Including replicate as a covariate can improve statistical power by separating true biological differences from technical variation introduced by different MS runs.

3. Reproducibility Analysis Across Replicates

Section titled “3. Reproducibility Analysis Across Replicates”

Assess the consistency of your measurements across technical or biological replicates.

Example prompts:

  • “Calculate the CV across replicates for each protein”
  • “Which proteins have the highest variability between replicates?”
  • “Show me the overlap of identified proteins between replicates”
  • “Plot the correlation of intensities between replicate 1 and replicate 2”

Choose the appropriate level of analysis based on your experimental design.

Example prompts:

  • “Summarize protein identifications at the sample level” (combines fractions)
  • “Compare identification counts between individual files”
  • “Analyze technical vs biological variability using the replicate structure”

For complex experimental designs, you can request custom aggregation strategies.

Example prompts:

  • “Group data by subject and time point”
  • “Calculate mean intensity per biological replicate”
  • “Compare variability within subjects vs between subjects”

  1. Use descriptive, consistent naming — Choose sample names that are clear and follow a consistent pattern (e.g., Patient_001, Patient_002)

  2. Verify your design before analysis — Double-check that all files have the correct sample, replicate, fraction, and treatment_group assignments

  3. Use sequential numbering — For replicates and fractions, use sequential integers (1, 2, 3…) for clarity

  4. Balance your groups — For differential analysis, try to have similar numbers of samples in each treatment group

  5. Document your design — Keep a record of what each sample represents for future reference

  1. Don’t mix fractionated and non-fractionated samples — If some samples are fractionated, keep the experimental design consistent

  2. Don’t use the same sample name for different biological sources — Each unique biological sample should have a unique sample identifier

  3. Don’t forget to set fraction values — Even if you don’t use fractionation, set fraction to “1” for all files

MistakeProblemSolution
Same sample name for different patientsFDR calculation will incorrectly combine identificationsUse unique sample names per biological source
Missing fraction valuesMay cause processing errorsSet fraction to “1” for non-fractionated samples
Inconsistent treatment_group labelsDifferential analysis won’t work correctlyUse exact, consistent labels (“Control” not “control” or “CONTROL”)
Only 1 sample per treatment groupCannot calculate statisticsEnsure at least 2 samples per group for differential analysis

Proper experimental design configuration is essential for accurate proteomics analysis. Remember:

  • sample — Unique identifier for each biological source
  • replicate — Identifies repeated measurements
  • fraction — Links files from pre-fractionated samples
  • treatment_group — Defines groups for differential comparison

When in doubt, refer to the common scenarios above to find a setup similar to your experiment, and adapt the naming scheme to your specific samples.

For additional help, the AI Chat assistant can answer questions about your specific experimental design and suggest the best configuration for your analysis goals.