Equilibrium sedimentation data analysis software.
Dmitry B. Veprintsev *, Nicholas W. Foster* and Alan R. Fersht, FRS

*to whom correspondence should be addressed
*dbv@mrc-lmb.cam.ac.uk
*nwf@mrc-lmb.cam.ac.uk


 



Data Analysis.

Data is fitted to the appropriate models using the Marquardt algorithm (1). Multiple data sets are combined head-to-tail and indexed (by an additional x-value) and the fitting function used corresponding to global and local variables for the given dataset(s) (global analysis (2)). The errors reported (1s) are not the real errors but rather a statistical estimation of the goodness of fit
These values should be treated as an underestimate of the actual errors.

Fitting Models.

The general model for fitting data is as follows:

A(r)=Abaseline+Ab1exp(M1b1)+ Ab2exp(M2ß2)+ .........+Abiexp(Mißi)

ßi=w2(r2-rb2)(1-Vpir)/(2RT)

Abi=e(l)lCi for absorbance data or
Fr (fringe displacement)=klCi, where
k =3.33*MolecularWeight/(1.2).
3.33 is the accepted value for proteins at 675 nm and 12 mm (1.2 cm) path (3.33 fringes/(mg/ml)) in the case of interference optics.

M is molecular mass of the monomer, Ab and rb are absorbance and radius at the bottom, Abaseline is the absorbance of the buffer, w is angular rotational speed, l is path length, R is gas constant, T is absolute temperature, Vp is partial specific volume of the protein and r is solvent density. M is global variable and Abaseline and Ab are local for each dataset. This allows mixing of the datasets obtained in different sectors/speeds/wavelengths/concentrations as long as they share same M and same buffer conditions.

The single-species model can be used for estimating Vp since M is often known from the amino acid sequence.
(1-Vpr)M=const for the given fit. Correspondingly,
Vpfitted=(1/r)(1-(Mfitted/Mini)((1-Vpinir))

The Ci is concentration of the ith species at the bottom. The concentrations of different species (Ci) are linked by the association equilibrium (if any).
This is the analysis of the hetero-association model:

nA +B <-K-> AnB

K=[AnB]/([A]n[B])
[A]+n[AnB]=A0
[B]+[AnB]=B0

lg10([AnB])=lg10(K)+ nlg10([A]) + lg10([B])

The concentrations of the individual species go into the general model

[A] and [B] are the concentrations of monomers at the bottom, [AnB] is the concentration of the complex. K is the association constant. The logarithms of the association constant and concentrations are chosen as parameters for numerical reasons: this will make them of the same scale. The initial guesses for [A] and [B] are calculated by numerically solving full equation system above.

Multi Data Fitting.

Ultraspin has the advantage of being able to fit data either individually or global fits of all data similtaneously. It also allows global fitting of data  collected at different speeds, concentrations or wavelengths. The following table represents the results obtained for the analysis of the SAM domain of p73 using a single species model (4). The results of the fit of individual datasets and the global fit of all of them simultaneously are compared.
 
Curve # Mass (Da) Error (+/-)
2 9045 135
3 8528 133
4 9193 106
5 9015 156
6 8924 112
7 8867 121
Average 8884
Multi Fit 8924 49

The expected molecular mass for the SAM domain of p73 being 9017 Da.



Ultraspin Fitting models.


Click on the required model to go to a fuller description.
Models

OneSpecie_Mw_fixedshift
Spec1_Mw_varshift
Onespecie_V_fixedshift
Spec1_V_varshift
Spec1_Mw_varshift_Mult
Spec1_V_varshift_Mult
Dimerisation_Mw_K_varshift_Mult 
Dimerisation_K_varshift_Mult
Dimerisation_K_varshift_Mult_Specrum
Monomer_Dimer_Tetramer_K1_K2_varshift_Mult_Spectrum
Protein_DNA_2A_plus_B_AB_K1_varshift_Mult_Spectrum
Protein_DNA_NA_plus_B_AB_K1_varshift_Mult_Spectrum
Nmerisation_K_varshift_Mult
Noninteracting_2_Mult
Noninteracting_3_Mult
Noninteracting_2_Mult_fixedshift
Noninteracting_3_Mult_fixedshift
nA_plus_B_K1_AnB_K2_AnB_m_fixedshift_Mult_Spectrum
nA_plus_B_K1lock_AnB_K2_AnB_m_fixedshift_Mult_Spectrum
A_and_AnB_K2_AnB_m_fixedshift_Mult_Spectrum

Terminology

Onespecie - Single Species
Spec1 - Single  Species
Dimerisation - 2A -> A2 - Please note nmerisation provides more robust fitting
Monomer_Dimer_Tetramer - 4A -> 2A2 -> A4
Protein_DNA -nA + B ->AnB
Nmerisation - nA ->An
Noninteracting - A1,A2,A3
K - Association constant
Mw - Floating Molecular weight
V - Partial specific constant
Varshift - Floating baseline
fixedshift - Fixed baseline
Mult - Multiple datasets  fitting
Spectrum - Multiwavelength fitting
A_plus_B - nA + B -> AnB
 

If fitting a single dataset, all parameters can be set on the fit window. When fitting multiple datasets local parameters are set individually in the Absorbance spectrum and the Select Multiple windows.

Global Parameters.
 
Terminology
 
 
Region to fit data points between which data is fitted. On the FitForm it is only aplicable to the single dataset models. For multi dataset models this is specified on the SelectMultiple Form. 
GetFullLength Show the number of data points for selected dataset
Go Start fitting procedure
Simulate Simulate the fit with entered (initial) parameters, selected model and datasets. 
M1, M2 etc Expected Molecular mass (Daltons). This can be monomers participating in the complex formation or mass of individual component in non-interactiong models. For example, M1 could be for protein and M2 for DNA in protein-DNA complex formation. 
Vbar1, Vbar2 etc Partial specific volume of monomers corresponding to M1, M2 etc. 
Sol_d Solvent density. This is global parameter. 
Rbot Radius at which to choose the reference point (where Abottom is). 
Abs0 Baseline Absorbance. On the FitForm it is only aplicable to the single dataset models. For multi dataset models this is specified on the SelectMultiple Form. 
Abot, Abot2 Absorbance at the bottom of cell (at Rbottom). On the FitForm it is only aplicable to the single dataset models. For multi dataset models this is specified on the SelectMultiple Form. 
For some models (see individual models) this is used to calculate initial guess for concentration of monomer at the bottom. In case of protein-DNA Abot will be total absorbance of protein at the bottom and reference wavelengt, and Abot2 will be total absorbance of DNA. Loading absorbances are often acceptable as good guesses. 
lgK1, lgK2  log10 of Association constant (Kass=1/Kd, log10(106)=6, corresponding Kd is 10-6). K2 is for the second stage of association (for example, formation of dimer of dimers). 
ExtCo, ExtCo2 Extinction coefficients of the monomers at wavelength that is choosen as reference (ie, there will be 1 at this wavelength in the Data/Abs.Spectrum table). 
Nass Oligomerization state, number of monomers in the complex formed (i.e., 2 for dimerisation or 1 for 1:1 hetero-complex).
Mass Second number of monomers in the complex formed (i.e., 2 for dimerisation on the second stage)


OneSpecie_Mw_fixedshift,
Spec1_Mw_varshift,
Onespecie_V_fixedshift,
Spec1_V_varshift,
Spec1_Mw_varshift_Mult,
Spec1_V_varshift_Mult

These models are for a one-specie fit of  data - i.e., only  monomers are present.
The first four models (without _Mult in the name) fit only single dataset at a time. The latter two (Spec1_Mw_varshift_Mult and Spec1_V_varshift_Mult) can fit multiple datasets simultaneously.
The models Onespecie_V_fixedshift, Spec1_V_varshiftand Spec1_V_varshift_Mult use Vbar (partial specific volume) as a fitting parameter.  In reality, there is no need for these models as (1-pV)M=const for the monomeric models and the Vbar fitted can be easily calculated from the M fitted for M initial and is reported in the textbox Vbar(Mfixed).

The only global parameter is M, so it is possible to mix datasets with different concentrations, speed or wavelength.

global fit parameters: M;
local fit parameters: baseline, Absorbance at the bottom (Abot) - for single dataset models specify on the fit form, for multiple - on the multiple dataset selection form.

global non-variable parameters to be specified: p, Vbar(A)
local non-variable parameters to be specified: Rbottom



Dimerisation_Mw_K_varshift_Mult,
This model fits the data to the dimerisation model

2 A<-K1-> A2

The global parameters are M(A) and lgK1. Local parameters are baseline and Abottom. The concentration of the protein at the bottom of the cell is calculated from the from the Abottom.
This model can fit multiple datasets as long as they share common M and K, so different cells, speeds, concentrations are OK.
Ext COeff shell be specified for monomeric specie.

Dimerisation_K_varshift_Mult
The same model, but M is not varied. The global parameter is lgK1. Local parameters are baseline and Abottom.
This model can fit multiple datasets as long as they share common  K, so different cells, speeds, concentrations are OK.
Ext COeff shell be specified for monomeric specie.

Dimerisation_K_varshift_Mult_Specrum
Same model, but global parameters are lgK1 and C1 - concentration of the monomer at the botton of the cell. Initially, C1 is calculated based on the Abottom that should be specified in Abot editBox on the main FitForm using ext. coeff. specified and initial value of lgK1.
Local parameter is baseline.
This function will fit multiwavelength data for the same cell, measured at the same speed, concentration etc.
It will not work for datasets from different cells, speeds etc.
 



Monomer_Dimer_Tetramer_K1_K2_varshift_Mult_Spectrum

A <-K1-> A2 <-K2->A4

Global parameters are lgK1, lgK2 and lgC1 - concentration of the monomer at the botton of the cell. Initially, C1 is calculated based on the Abottom that should be specified in Abot editBox on the main FitForm using ext. coeff. specified and initial value of lgK1 and lgK2. Local parameter is baseline.
This function will fit multiwavelength data for the same cell, measured at the same speed, concentration etc.
It will not work for datasets from different cells, speeds etc.
 
 
 
 



Protein_DNA_2A_plus_B_AB_K1_varshift_Mult_Spectrum,
 The model below is more universal, and also works with lgK1 and lgC1 and lgC2, which makes it more robust. Use the one below, and concider this model depriciated.

Protein_DNA_NA_plus_B_AB_K1_varshift_Mult_Spectrum

Hetero- oligomerisation.
nA +B <-K-> AnB

global fit parameters: lg10(K); lg10(Ca_button); lg10(Cb_button);
local fit parameters: baseline
global non-variable parameters to be specified: p, Vbar(A); Vbar(B); n; Ma, Mb; AbsSpectrum_A, AbsSpectrum_B, Ext(A), Ext(B).
local non-variable parameters to be specified: Rbottom

lgC1 and lgC2 - concentrationss of monomers of A and B at the bottom are calculated from the Abot and Abot2 (total absorbances of A and B at the bottom) and Ext. coeff and Ext coeff2 and lgK1 given initially.

This function will fit multiwavelength data for the same cell, measured at the same speed, concentration etc.
It will not work for datasets from different cells, speeds etc.
 



Nmerisation_K_varshift_Mult

Homooligomerization
The only global parameter is lgK, so it is possible to mix data with
different concentrations, cells, speeds, wavelengths etc.

n A <K> An
Non-variable parameters
n - number of monomers in the complex (2 for dimer,  etc)
ext coeff (A)
M(A)
Vbar(A)
solvent density

Fit parameters
global:
lgK

local:
baseline
Abottom



Noninteracting_2_Mult,
Noninteracting_3_Mult,
Noninteracting_2_Mult_fixedshift,
Noninteracting_3_Mult_fixedshift

Model-free: Non-interacting species (2 and 3 components).
This model can be used with interacting systems as well. The global parameters are M1, M2, M3 (for 3 components fit). The local parameters are baseline, and initial absorbencies at the bottom A1b, A2b and A3b. Vbar1, Vbar2 and Vbar3 should also be specified.

global fit parameters: M1, M2, M3;

local fit parameters: baseline, A1_b, A2_b, A3_b;
global non-variable parameters to be specified: p, Vbar(M1); Vbar(M2); Vbar(M3);;
local non-variable parameters to be specified: Rbottom



nA_plus_B_K1_AnB_K2_AnB_m_fixedshift_Mult_Spectrum,
nA_plus_B_K1lock_AnB_K2_AnB_m_fixedshift_Mult_Spectrum,
A_and_AnB_K2_AnB_m_fixedshift_Mult_Spectrum

These models are for describing the situation when you have hetero-oligomerisation in the first step, and this complex oligomerizes itself.
nA_plus_B_K1_AnB_K2_AnB_m_fixedshift_Mult_Spectrum
the most general model.
nA+B <K1>AnB <K2> (AnB)m
The difference is that the global parameters are lgK1, lgK2=lgA and lgC1 and lgC2.  This means that if the multiple datasets are used, they should correspond to the same cell - multiwavelength data only.

A and AnB<K2>(AnB)m
lgC1=lg(A)
lgC2=lg[Anb]
lg[(AnB)m]=lgK2+m*lgC2

global fit:
lgK2
lgC1
lgC2
 
 
 



References.
(1) Marquardt, D. W. An Algorithm for Least Squares-Estimation of Nonlinear Parametrs. Journal of Society for Industrial and Applied Mathematics 11(2), 431-441. 1963.

(2) Johnson, M. L. and Frasier, S. G. (1985). Nonlinear least-squares analysis. Methods in Enzymology 117, 301-342.

(3)Poget. S.F., Legge, D.B., Proctor, M.R., Butler, P.J.G., Bycroft, M., Williams, R.L. (1999) J. Mol. Biol. 290, 867-879

(4) Wang, W.K., Bycroft, M, Foster, N.W., Buckle, A.M., Fersht, A.R., Chen, Y.W (2000) Crystal structure of the C-terminl sterile a motif (SAM) domain of human p73a does not show homotypic interaction (In preparation)


Nick Foster: Last updated November 2000
 

© MRC (UK) Centre for Protein Engineering 2000