Fitness

From QuB

Jump to: navigation, search

Fitness is a free program for weighted nonlinear least-squares curve fitting (regression). It was formerly called QFC. In addition to basic curve fitting, Fitness makes it easy to fit systems of differential equations (ODE), fit the same curve to data from multiple files, and to keep track of fitting results. Fitness works with Windows 2000/XP and recent Linux such as Ubuntu.

A dataset is fit with a function F(x) = \sum_i^{N_c} f_\theta (x). The fθ(x) are components with the same equation but different parameter values. A nonlinear optimizer attempts to minimize \sum_{i=1}^{N} ( weight_i * (data_i - F(x_i)) )^2 (the "weighted residual") by finding successively better parameter values, up to "max iterations" times. F(x) is overlaid on the dataset for visual confirmation.

Multi-file fits take two forms: separate, which tunes curve parameters for each file, and together (global), which finds one set of parameters which best describes all files.

Thanks to everyone behind Python, GTK, Cairo, and SciPy for making this possible.

Contents

Screenshot

Image:qfc-screenshot.png

Basic Curve Fitting

Open a data file

From the File menu, choose "Open data..."

Data should be in tab- or comma-separated text files, with series in columns. One column must contain the independent variable (x or time). The first row can have headers, used as series names.

Overlay a curve on your data

Your data is shown twice -- in full above, and the fitting selection below.

Pick a curve from the menu in the lower part of the screen. It is drawn in red over your data. If the curve is out of range, it looks like a flat line across the top or bottom.

Vary parameters and see the curve change

Curve parameters are listed below the equation. Move a slider, or type a new value in the box below the slider, and see the effect on the curve.

Fit the function to your data

Press the "Fit" button in the lower-right corner. Some curves are sensitive to the initial parameter values. If it doesn't fit nicely, try tuning the parameters by hand, then "Fit" again.


Data Preparation

There are two data displays. The top one shows the full unprocessed data. The bottom one shows the sub-range of processed data you've chosen for fitting.

Choose data series for the X and Y axes

The X series is the independent variable. Y is the dependent variable for fitting. Choose from any series (columns) in your data, using the "X" and "Y" menus between data displays.

Select a sub-region for fitting

Click and drag across either data display to select some data. To move one endpoint of the selection, point at it (where the background changes from white to blue), click and drag. You can also type the endpoints as percentages or X-values, in the panel between displays. To select all data, choose "Zoom out" from the "View" or right-click menus. You can also undo and redo zoom changes.

Exclude junk data

To exclude junk data, e.g. noise spikes, from the display and fitter, select it and click "Exclude: Sel". This writes an expression such as "10.0 <= X <= 20.0". You can edit this expression directly using any series names; for example, "Y1 <= 0" discards negative data in preparation for logarithmic fitting.

Use logarithmic scaling

To work with one or both axes in logarithmic space (base e), check the boxes "x log" or "y log." Any numbers <= 0 are mapped to k-1, where k=log(min_positive_value). Logarithmic data is used for display and fitting, but selection and exclusion bounds are always given in non-logarithmic coordinates.

Filter and resample

Fitting a large dataset can be slow. These two options can dramatically reduce the number of fitted datapoints (shown as "N" between data displays). To show these options, click the expander (a triangle or "+") between displays, under the word "sel."

Optionally apply a low-pass filter. The data is automatically decimated at twice the filter frequency. Filtering is only available for evenly-sampled data.

After the filter, you can optionally apply QuB's adaptive resampling: we find each stretch of data with standard deviation less than a threshold, and replace it with one representative point.

Assign weights for fitting

Next to filter and resample, the "Weight" box says how to weight each data point. By default each point weighs the same ("1.0") but you can enter any expression using the series names, e.g. "1/Y1", "sqrt(Y1)". Note that larger weight == more important, unlike some fitting packages.

Classically, each data point should be weighted by the inverse of its variance, so points with smaller error count more toward the fit. If you don't have a data series containing the points' variance, you might estimate it as equal to their Y value, and choose weights of "1/Y1" (or whatever your Y series is named). Or you can leave it as "1.0" and give each point equal weight.

Read x,y coordinates from the display

Between data displays, on the right, the field "N" shows the number of data points to be fit. Below it are (x, y) mouse coordinates.

See the data as line, points, or histogram

These three choices from the "View" menu change the way data is displayed and used.

Line
the default
Dot
each point's radius is proportional to log(weight)
Hist
interprets each data point as a histogram bin

Switch between absolute and relative X coordinates

By default Fitness uses relative X coordinates, so the fitting selection starts at X=0. That is, Xrel = Xabssel.start. This is sensible for time series, but foolish in other cases such as histograms. From the "View" menu choose "Absolute X" to toggle between absolute and relative coordinates.


Curves

Use a built-in curve

Linear
Slope * x + Icept
Exponential
Amp * e( − x / Tau)
Exponential (log x bin)
area[x,r * x] under Amp * e( − x / Tau)
useful for log-binned duration histograms, with bin bounds of [ri * bound0]
LogBase must be the same base used for binning, usually 10 or e (yes, you can type "e" as the value). MIL result histograms use log 10 base binning.
Declining Exponential
a + (b-a)*(1 - e^{\frac{-(x-x0)}{Tau}} + Slope*(x-x0)
Double Declining Exponential
a + (b-c)*(1 - e^{\frac{-(x-x0)}{Tau_1}} + c*(1-e^{\frac{-(x-x0)}{Tau_2}} + Slope*(x-x0)
Gaussian
Amp * e^{\frac{-(x - Mean)^2}{2 * Std^2}}
"bell curve"
Lorentzian
\frac{Amp}{\pi} * \frac{Gamma / 2}{(x - X0)^2 + (Gamma / 2)^2}

The built-in curves come in two varieties. The basic (e.g. "Exponential") has a constant baseline offset ("Base") which can be fit. The other (e.g. "Exponential-Linear") has a linear baseline offset, with "Slope" and "Icept."

Use a custom curve function

To enter a custom curve function, edit the text in "Eqn" or choose "Custom" from the Curves menu. The dialect is Python with "from math import *", which is comparable to C with math.h. You can use the usual standard math functions such as exp() and sqrt(). It will auto-detect parameter names, which must be legal Python identifiers:

  • starts with a letter
  • consists of letters, numbers, and underscores ('_')
  • is case-sensitive

Use a system of differential equations (ODE) as the fitting function

In the "Eqn" field, enter one or more first-order differential equations, separated by a semicolon (;). For example,

myvar' = myparam * myvar
a' = p*a; b' = q*b

are respectively an exponential and a complex exponential. If there is more than one variable being integrated, the first is used as the fitting function (in the second example, "a" is the fitting function).

For your convenience, a linear baseline with "Slope" and "Intercept" parameters is added.

Some problems may require "VODE" (available in the curve menu, same syntax). It is slower, but potentially more accurate.

Differential equations require numpy and scipy, available separately from www.scipy.org.

Use other data series in your function

Custom curves can use data series other than X and Y. For example, a third data series "D" can drive the function, as in:

f(x) = "D * exp(-x / Tau)"

Type the series name exactly as it's shown in the X and Y menus. If it's not recognized it'll show up as a curve parameter.

Write a custom curve class

(experienced C or Python programmers only)

Custom curve functions might not be sufficient if the python-interpreted custom curve is grindingly slow on your large dataset. Please contact us for a plugin development kit. We're happy to help.

Use multiple components

You can fit data to a sum of curves. The curves (called components) are identical except their parameter values. Essentially, it's easier than typing out "Amp1 * exp(-x / Tau1) + Amp2 * exp(-x / Tau2) + ..." Set the number of components to the right of the curve menu.

Select which parameters to fit

Each parameter has a check-box to the left of its name. Only checked parameters will be modified by the fitter. You might un-check a parameter if:

  • you know its value; for example, when fitting a histogram with gaussians, the baseline offset should stay at 0.
  • you know its approximate value, but the fitter makes it absurdly large or small. Sometimes you can un-check troublesome parameters, fit the other ones, then re-check them and get a better fit.

Put limits on a parameter

Each parameter can have a lower and/or upper limit. To edit them, click the Limits tab. If there are any limits, the tab turns green.

See error estimates for each parameter

The row of text-boxes labeled "+/-" shows error estimates for each active parameter. Errors are estimated using the square root of the diagonal of the approximate curvature matrix.

See the sum-square-difference between curve and data

The box labeled "SS Residual" shows the un-weighted residual, or sum-square-difference:

(dataiF(xi))2
i

See R-Squared

R-squared (R^2), a.k.a. the Coefficient of Determination, is the proportion of variance in the data that is described by the fit curve. At 1.0 the curve is a perfect fit. At 0.0 it's no better than a flat line. Negative values indicate it fits worse than a flat line.

See the Runs probability

The field "Runs p-value" applies the Wald-Wolfowitz runs test to the residual (data - curve). Briefly, this is the probability that a random signal would cross the origin as many times as observed. You may or may not want to interpret this with a significance threshold such as 0.05.

See the cross-correlation between parameters

Correlation between active curve parameters is shown at lower right. The color scale is alongside; brighter colors have higher absolute value. Point at a color to show its numerical value.

High cross-correlation can indicate your equation has too many parameters.

Technically, correlation is defined only at local maxima. When necessary we calculate "Pseudo" correlation along these lines.

Correlation plots require numpy and scipy, available separately from www.scipy.org.

Fitting

Use a built-in fitter

Levenburg-Marquardt

There is one built-in fitting algorithm: Levenburg-Marquardt. The implementation is based on netlib::minpack::lmdif.f, translated to ISO C by Joachim Wuttke [1].

The fitting parameters are:

max iterations
the fitter will try at most this many new curve parameter settings
epsilon
used in determining a suitable step length for the forward-difference approximation. This approximation assumes that the relative errors in the functions are of the order of epsilon. If epsilon is less than the machine precision, it is assumed that the relative errors in the functions are of the order of the machine precision.
step bound
used in determining the initial step bound. this bound is set to the product of stepbound and the euclidean norm of diag*x if nonzero, or else to stepbound itself. in most cases stepbound should lie in the interval (.1,100). 100 is a generally recommended value.
ftol
termination occurs when both the actual and predicted relative reductions in the sum of squares are at most ftol. Therefore, ftol measures the relative error desired in the sum of squares.
xtol
termination occurs when the relative error between two consecutive iterates is at most xtol. Therefore, xtol measures the relative error desired in the approximate solution.
gtol
termination occurs when the cosine of the angle between fvec and any column of the jacobian is at most gtol in absolute value. Therefore, gtol measures the orthogonality desired between the function vector and the columns of the jacobian.

Simplex

Simplex is more reliable far from the correct parameters, but converges slowly when it's close.

Simplex appears as an option if you have numpy and scipy, available separately from www.scipy.org.

Simplex-LM

Simplex-LM invokes first Simplex then Levenburg-Marquadt for optimal flexibility and speed. This combo appears if you have numpy and scipy, available separately from www.scipy.org.

Write a custom fitter class

(experienced C or Python programmers only)

If our fitter converges poorly for your data and curve, you can implement your own. Please contact us for a plugin development kit. We're happy to help.

Fit the curve to the data

Press the "Fit" button to run the fitter. If it doesn't converge nicely, try:

  • manually adjusting the initial curve parameters
  • un-checking one or more troublesome curve parameters

Customize the fitting strategy

Clicking Fit runs a script called the strategy. To edit the strategy, click the tab Strategy, under Curve. The strategy can be any Python script, preferably one that improves the parameters of Fitness.curve.

Fitness.fit(variables=['Slope', 'Intercept'], Slope=Initial, Intercept=Initial)

In your script, you can call Fitness.fit() more than once; click Add a step to repeat the last line.

variables
which curve parameters are checked for fitting (not held constant)
param=
Initial: the parameter value when Fit or Fit All was clicked
Last: for Fit All -- the parameter value as it has changed since then
or any number or expression e.g.
Theta=pi/4
Intercept=.5*low('Y1') + .5*high('Y1')

These might also be useful:

Fitness.residual, Fitness.r2, Fitness.curve.getParamVal(i), Fitness.fitter.iterations,
Fitness.fitter = qub.fits.CreateFitter(name)

Working with files and folders

You can open data from the File menu, or use the integrated folder browser. To show the browser, click "Show files and folders" at top left.

List all data in a folder

If you don't see "Folder" at the top, click "Show files and folders." Type in the folder name and press enter, or click "..." to pick the folder using a dialog.

Check the box "Sub-folders" to look in all folders inside the chosen folder.

If you've just saved a data file and it's not in the list, click "Refresh."

Open a file from the list for fitting

Click a file in the list to show it.

Edit the chosen file's Notes

Type whatever you like under "Notes." Notes are saved in the .qfc file alongside the data.

Edit the chosen file's Labels

Labels are keywords (separated by spaces) that are used to narrow down the list of files. Labels are saved in the .qfc file alongside the data.

List only files with certain Labels

In the upper-right, check the "containing..." box and type one or more labels. You can list files with all of the labels, or at least one label.

List only files saved in a date range

At the top, check the "from...to..." box and adjust the range of dates.

List only files matching a pattern

You can narrow down the list of files with a mask. The default mask is "*", which shows all text (.txt) files. Here are some other examples:

"concdata*"
all text files beginning with "concdata"
"condata*.*"
all files beginning with "concdata", of any type
"*.*"
all files of any type
"*Na*"
all text files with "Na" anywhere in the name

Fit the same curve to multiple files

Click "Fit All" at lower-right. In the "Fit All" dialog, highlight the files you wish to fit.

Fit files
separately -- fit the files one at a time, starting from the same curve parameters
together -- find the parameters which best describe all the files at once
Process data
what data prep settings to use for all the files
as each file was last used
all files like the current one
Label
if this isn't blank, Fitness will Keep the results under this name.

Fit All with Strategy

For every file, the default fitting strategy resets each parameter to its Initial value; the value it had when Fit All was pressed. If you'd rather keep the last file's optimized parameter value, edit the Strategy and replace Initial with Last for the relevant variable(s).

For difficult functions, a multi-line Strategy can help. In the first line, set all params to reasonable guesses, and have only one or two variables. In subsequent lines, set all params to Last and change the list of variables to carefully refine each one. In the last line, make all params variable.

Fitness.fit(variables=['Amp'], Amp=1.0, Theta=23.5)
Fitness.fit(variables=['Theta'], Amp=Last, Theta=Last)
Fitness.fit(variables=['Amp', 'Theta']) # unspecified parameters default to Last

Kept fits

Keep a fit

To save all curve parameters and data processing settings, click "Keep It" at lower-right, then enter a name. The fit will be listed under "Fits" at lower-left, and in the "Fitness Fit Table" window. Kept fits are saved in the .qfc file alongside the data.

Look at a kept fit

All the fits you have kept for this data file are listed under "Fits" at lower-left. Click one to see it.

Delete a kept fit

Click the fit in the "Fits" list, then press the Delete key.

Look at all kept fits for all listed files

Look at the "Fitness Fit Table" window. It appears automatically if any listed file has a kept fit.

  • to see an entry, double-click it
  • to sort, click a header
  • to copy part of the table to the clipboard, select it then press Ctrl-C or Right-click -> Copy.


Outputs

Save the fit curve

To save the X, Y, and Fit series as a data file, choose "Save fit curve as..." from the File menu.

Copy the fit curve

From the Edit menu choose "Copy data points." Same output as "Save fit curve", but copied to the clipboard.

Copy the curve parameters

To copy curve parameters and error limits to the clipboard, choose "Copy fit params" from the Edit menu.

Copy a picture

From the Edit menu choose "Copy image..."

Print a picture

From the File menu choose "Print..."

Visualizations

The Visualizations menu contains graphs that reflect the current data and fit curve. These graphs can themselves be curve-fit, saved and printed using the same Fitness interface.

See a plot of residuals

From the Visualizations menu, choose Residual to see a plot of f(x) - Y.

See a histogram of selected datapoints

From the Visualizations menu, choose Histogram to see an all-points histogram of selected Y.

See a histogram of residuals

From the Visualizations menu, choose Residual Histogram to see an all-points histogram of f(x) - Y.

See the spectrum (FFT) of selected datapoints

From the Visualizations menu, choose Spectrum to see the FFT of selected Y. The number of bins (frequencies) should be a power of 2. If the data selection has fewer points than bins, we pad on the right with zeros. If it has more points, we average STFT frames computed with a Hamming window and 2x overlap.

Presets

Save and restore data preparation settings

The "Presets" menu between the two data displays can store all the settings in that panel (selection, series, weights, filter, resample). To store a preset, choose "Add to menu". To load a preset, choose it from the menu.

Save and restore a curve with parameter values

The "Presets" menu to the right of "Curve" can store all the curve-related settings (curve, equation, parameter values and checks). To store a preset, choose "Add to menu". To load a preset, choose it from the menu.

Edit and delete presets

To clean up a presets menu, choose "Manage..." to bring up a dialog. On the left is the list of presets; right-click one to rename or delete. Left-click one to edit its details.

Python

Fitness is written largely in Python, and it uses Python to parse custom equations. Use "Python" menu to define additional constants, functions, and even plug-ins.

You can access Fitness's object hierarchy through the global names "window" and "Fitness". See site-packages/qub for details.

Environment

The environment script is run at startup. Put statements that refer to "window" or "Fitness", or that should wait until everything is ready, into the function after_initialize().

Scripts

Use the Python Scripts window to load and execute Python programs.

Personal tools