Experimental Design Guide

This guide explains how to configure your experimental design table to get the most accurate results from your proteomics analysis. The experimental design defines how your raw files relate to each other—which files are replicates, which belong to the same sample, and how they should be grouped for statistical comparison.

Overview
Field Definitions
How Experimental Design Affects Your Results
Common Experiment Scenarios
Using the AI Chat Assistant
Best Practices
Troubleshooting

Overview

When you upload raw files for analysis, you can define an experimental design that tells the system how your files relate to each other. This information is critical for:

Accurate FDR calculations — Properly counting unique peptide and protein identifications
Differential analysis — Comparing protein abundances between treatment groups
Quantification — Aggregating intensities across fractions and replicates

Each raw file in your experiment gets assigned four properties:

Field	Purpose
`sample`	Identifies the biological source
`replicate`	Identifies replicates within a treatment
`fraction`	Identifies fractions from pre-fractionated samples
`treatment_group`	Labels for differential analysis comparisons

Field Definitions

Sample

The sample field identifies the biological source of your data. Files with the same sample value are considered to come from the same biological unit.

Key behaviors:

Files with the same sample AND replicate will have their peptide identifications combined across fractions
Use unique sample identifiers when each file represents a different biological sample
Use the same sample identifier when multiple files come from the same biological source (e.g., fractionation)

Examples:

Patient_001, Patient_002, Patient_003 — Each patient is a unique sample
Pool_A, Pool_A, Pool_A — Same pooled sample run as multiple fractions

Replicate

The replicate field identifies technical or biological replicates. Replicates are repeated measurements that help assess reproducibility and statistical power.

Key behaviors:

Files with the same sample but different replicate values are treated as independent measurements
Replicate information is used for grouping during FDR calculation
Can be used as a covariate in differential analysis to control for batch effects

Types of replicates:

Technical replicates: Same sample measured multiple times (same biological source)
Biological replicates: Different samples from the same condition (different biological sources)

Fraction

The fraction field identifies files that come from pre-fractionated samples. When a single sample is separated into multiple fractions before mass spectrometry analysis, each fraction is run as a separate file.

Key behaviors:

Peptides identified in multiple fractions of the same sample/replicate count as ONE identification during FDR calculation
Intensities can be aggregated across fractions for quantification
Use sequential numbering (1, 2, 3, 4…) for fraction identifiers

When to use fractionation:

Offline fractionation (e.g., high-pH reversed-phase, SCX)
Size-exclusion chromatography
Any workflow where one sample produces multiple raw files

Treatment Group

The treatment_group field defines the experimental conditions for differential analysis. This is how you specify which samples should be compared against each other.

Key behaviors:

Used by the differential analysis algorithms to define comparison groups
Requires at least 2 groups with 2+ files each for statistical analysis
Labels should be descriptive (e.g., “Control”, “Disease”, “Treated”, “Untreated”)

Examples:

Control vs Disease
Baseline vs Week4 vs Week8
Vehicle vs Drug_LowDose vs Drug_HighDose

How Experimental Design Affects Your Results

Peptide-Level FDR

During peptide-level false discovery rate (FDR) calculation, identifications are grouped by sample and replicate.

Effect: If the same peptide is identified in multiple fractions of the same sample/replicate, it counts as one identification, not multiple. This prevents inflating your peptide counts when using fractionation.

Example:

Peptide PEPTIDEK found in Fraction 1 of Sample_A, Replicate 1
Peptide PEPTIDEK found in Fraction 3 of Sample_A, Replicate 1
Result: Counts as 1 unique peptide identification

Protein-Level FDR

During protein-level FDR calculation, identifications are grouped by treatment_group, sample, and replicate.

Effect: This ensures proper tracking of which proteins are identified in which replicates, enabling accurate assessment of identification reproducibility across your experiment.

Common Experiment Scenarios

Scenario A: Simple Case-Control Study

Setup: 6 samples, 3 controls and 3 disease cases, no fractionation, no technical replicates.

File	sample	replicate	fraction	treatment_group
Control_1.raw	Control_1	1	1	Control
Control_2.raw	Control_2	2	1	Control
Control_3.raw	Control_3	3	1	Control
Disease_1.raw	Disease_1	1	1	Disease
Disease_2.raw	Disease_2	2	1	Disease
Disease_3.raw	Disease_3	3	1	Disease

Notes:

Each file is a unique biological sample
Sample names are unique across all files
Replicate numbers are sequential within each treatment group
Fraction is set to 1 for all files (no fractionation)

Scenario B: Fractionated Samples

Setup: 2 samples, each pre-fractionated into 4 fractions (8 total files).

File	sample	replicate	fraction	treatment_group
Sample1_F1.raw	Sample1	1	1	Treatment
Sample1_F2.raw	Sample1	1	2	Treatment
Sample1_F3.raw	Sample1	1	3	Treatment
Sample1_F4.raw	Sample1	1	4	Treatment
Sample2_F1.raw	Sample2	2	1	Treatment
Sample2_F2.raw	Sample2	2	2	Treatment
Sample2_F3.raw	Sample2	2	3	Treatment
Sample2_F4.raw	Sample2	2	4	Treatment

Notes:

Same sample value for all fractions from the same biological sample
Same replicate value for all fractions from the same sample
Different fraction values (1-4) to identify each fraction
Peptides identified across fractions will be properly combined

Scenario C: Technical Replicates

Setup: 2 biological samples, each run 3 times (6 total files).

File	sample	replicate	fraction	treatment_group
Sample1_Run1.raw	Sample1	1	1	Control
Sample1_Run2.raw	Sample1	2	1	Control
Sample1_Run3.raw	Sample1	3	1	Control
Sample2_Run1.raw	Sample2	1	1	Disease
Sample2_Run2.raw	Sample2	2	1	Disease
Sample2_Run3.raw	Sample2	3	1	Disease

Notes:

Same sample for all runs of the same biological sample
Different replicate values for each technical replicate run
This design allows assessment of technical variability

Scenario D: Biological Replicates with Fractionation

Setup: 4 biological samples (2 per condition), each fractionated into 3 fractions (12 total files).

File	sample	replicate	fraction	treatment_group
Ctrl_Bio1_F1.raw	Ctrl_Bio1	1	1	Control
Ctrl_Bio1_F2.raw	Ctrl_Bio1	1	2	Control
Ctrl_Bio1_F3.raw	Ctrl_Bio1	1	3	Control
Ctrl_Bio2_F1.raw	Ctrl_Bio2	2	1	Control
Ctrl_Bio2_F2.raw	Ctrl_Bio2	2	2	Control
Ctrl_Bio2_F3.raw	Ctrl_Bio2	2	3	Control
Treat_Bio1_F1.raw	Treat_Bio1	1	1	Treatment
Treat_Bio1_F2.raw	Treat_Bio1	1	2	Treatment
Treat_Bio1_F3.raw	Treat_Bio1	1	3	Treatment
Treat_Bio2_F1.raw	Treat_Bio2	2	1	Treatment
Treat_Bio2_F2.raw	Treat_Bio2	2	2	Treatment
Treat_Bio2_F3.raw	Treat_Bio2	2	3	Treatment

Notes:

Each biological replicate has a unique sample name
Fractions from the same sample share the same sample and replicate
replicate numbers are sequential within each treatment group
This design combines fractionation with biological replication

Scenario E: Time-Course Experiment

Setup: 3 subjects measured at 3 time points (9 total files).

File	sample	replicate	fraction	treatment_group
Subject1_T0.raw	Subject1_T0	1	1	Baseline
Subject1_T1.raw	Subject1_T1	1	1	Week4
Subject1_T2.raw	Subject1_T2	1	1	Week8
Subject2_T0.raw	Subject2_T0	2	1	Baseline
Subject2_T1.raw	Subject2_T1	2	1	Week4
Subject2_T2.raw	Subject2_T2	2	1	Week8
Subject3_T0.raw	Subject3_T0	3	1	Baseline
Subject3_T1.raw	Subject3_T1	3	1	Week4
Subject3_T2.raw	Subject3_T2	3	1	Week8

Notes:

Each sample is unique (different time points are different samples)
Subjects are tracked via replicate numbering
Treatment groups represent time points for comparison

Using the AI Chat Assistant

The AI Chat assistant can leverage your experimental design for sophisticated analyses beyond standard processing.

Basic Capabilities

Filter data by treatment groups — “Show me only the Disease samples”
Run differential analysis — “Compare Control vs Disease groups”
Generate volcano plots — “Create a volcano plot comparing the treatment groups”

Advanced Analyses with Replicate and Fraction Data

1. Aggregating Intensities Across Fractions

When samples are pre-fractionated, the Chat can combine quantification data from multiple fractions belonging to the same sample/replicate.

Example prompts:

“Aggregate protein intensities across all fractions for each sample”
“Sum the intensities from fractions 1-4 for Sample_A”
“Calculate total protein abundance per sample by combining fractions”

2. Using Replicate Information as Covariates

Replicate numbers can be included as covariates in differential analysis to control for technical batch effects.

Example prompts:

“Run differential analysis controlling for replicate batch effects”
“Include replicate as a covariate in the comparison”
“Account for run-to-run variability in the analysis”

Why this matters: Including replicate as a covariate can improve statistical power by separating true biological differences from technical variation introduced by different MS runs.

3. Reproducibility Analysis Across Replicates

Assess the consistency of your measurements across technical or biological replicates.

Example prompts:

“Calculate the CV across replicates for each protein”
“Which proteins have the highest variability between replicates?”
“Show me the overlap of identified proteins between replicates”
“Plot the correlation of intensities between replicate 1 and replicate 2”

4. Sample-Level vs File-Level Analysis

Choose the appropriate level of analysis based on your experimental design.

Example prompts:

“Summarize protein identifications at the sample level” (combines fractions)
“Compare identification counts between individual files”
“Analyze technical vs biological variability using the replicate structure”

5. Custom Grouping for Complex Designs

For complex experimental designs, you can request custom aggregation strategies.

Example prompts:

“Group data by subject and time point”
“Calculate mean intensity per biological replicate”
“Compare variability within subjects vs between subjects”

Best Practices

Do’s

Use descriptive, consistent naming — Choose sample names that are clear and follow a consistent pattern (e.g., Patient_001, Patient_002)
Verify your design before analysis — Double-check that all files have the correct sample, replicate, fraction, and treatment_group assignments
Use sequential numbering — For replicates and fractions, use sequential integers (1, 2, 3…) for clarity
Balance your groups — For differential analysis, try to have similar numbers of samples in each treatment group
Document your design — Keep a record of what each sample represents for future reference

Don’ts

Don’t mix fractionated and non-fractionated samples — If some samples are fractionated, keep the experimental design consistent
Don’t use the same sample name for different biological sources — Each unique biological sample should have a unique sample identifier
Don’t forget to set fraction values — Even if you don’t use fractionation, set fraction to “1” for all files

Common Mistakes to Avoid

Mistake	Problem	Solution
Same sample name for different patients	FDR calculation will incorrectly combine identifications	Use unique sample names per biological source
Missing fraction values	May cause processing errors	Set fraction to “1” for non-fractionated samples
Inconsistent treatment_group labels	Differential analysis won’t work correctly	Use exact, consistent labels (“Control” not “control” or “CONTROL”)
Only 1 sample per treatment group	Cannot calculate statistics	Ensure at least 2 samples per group for differential analysis

Summary

Proper experimental design configuration is essential for accurate proteomics analysis. Remember:

sample — Unique identifier for each biological source
replicate — Identifies repeated measurements
fraction — Links files from pre-fractionated samples
treatment_group — Defines groups for differential comparison

When in doubt, refer to the common scenarios above to find a setup similar to your experiment, and adapt the naming scheme to your specific samples.

For additional help, the AI Chat assistant can answer questions about your specific experimental design and suggest the best configuration for your analysis goals.

Experimental Design Guide

Table of Contents

Overview

Field Definitions

Sample

Replicate

Fraction

Treatment Group

How Experimental Design Affects Your Results

Peptide-Level FDR

Protein-Level FDR

Common Experiment Scenarios

Scenario A: Simple Case-Control Study

Scenario B: Fractionated Samples

Scenario C: Technical Replicates

Scenario D: Biological Replicates with Fractionation

Scenario E: Time-Course Experiment

Using the AI Chat Assistant

Basic Capabilities

Advanced Analyses with Replicate and Fraction Data

1. Aggregating Intensities Across Fractions

2. Using Replicate Information as Covariates

3. Reproducibility Analysis Across Replicates

4. Sample-Level vs File-Level Analysis

5. Custom Grouping for Complex Designs

Best Practices

Do’s

Don’ts

Common Mistakes to Avoid

Summary