In: Enterprise Biology Software,
Version 7.0 © 2007 Robert P. Bolender
Enterprise Biology Software: VIII. Research (2007)
Robert P. Bolender
Biological
data differ remarkably from those of physics and chemistry in that their
properties are intimately tied to their physical locations. A hierarchical arrangement - consisting of
parts contained within parts - defines biology as a complex set of interacting
complexities, scaling in size from molecules to organisms. None of this uniqueness becomes apparent
until we start putting the parts back together.
If, for example, we don’t return the parts to their original in vivo settings
before attempting an interpretation, we run the risk of allowing our data to
become meaningless or at best severely limited.
In effect, a failure to recognize this special property of biology leads
inevitably to a semiquantitative science.
Our task this year therefore becomes one of understanding how biology
can become broken experimentally and then showing how we can fix it. The reward for our efforts takes us one step
closer to a quantitative biology.
The purpose
of the report this year is to explore ways of extending what we have learned
from the stereology literature to disciplines often reduced to relying on
semiquantitative approaches, such as biochemistry and molecular biology. We begin by defining the boundary conditions
for experimentation in biology, explore the unstable foundations of semiquantitative
data, and finally suggest workable alternatives. For example, a new category of hybrid
hierarchy equations can integrate biological data across disciplines and
provide methodological gold standards.
The main product to emerge from this effort is a rule book, one that
carefully addresses the uniqueness of biology.
The software package for 2007/8 (EBS, Version 7.0) includes new data
harvested largely from years 2004-5, expands the biology blueprint, introduces
cluster analysis, and offers guidelines for a mathematical biology.
Introduction
Transforming
biology into a quantitative science begins by changing the way we manage our
research data. An essential first step
consists of moving published data from the pages of journal articles into the
tables of relational databases where they can be standardized and used to look
for mathematical patterns. Such patterns
can be readily found in the connections that occur between parts and captured
as data pairs and repertoire equations.
In turn, these data pairs and equations can become the raw material for
building a mathematical biology.
Fundamental to the success of this approach includes an ability to
manage complexity in biology by mathematically unfolding and refolding
structural relationships - throughout the biological hierarchy of size. By leveraging the mathematical order inherent
in biological systems, we now have a workable strategy for reverse and forward
engineering structures of all sizes – one based largely on our ability to tap
into the mathematical core of biology.
From this we learn that mathematics and technology can become powerful
discovery tools when we use them in harmony with the intrinsic order of
biology.
The mathematical
biology described herein depends wholly on published data to generate the
empirical equations that we can use to explore an information space of uncommon
complexity. To qualify, these data must
be gathered with unbiased sampling methods within the framework of a rule-based
approach. The brief introduction that
follows considers three basic components of a mathematical biology that will
serve to illustrate how this quantitative approach to biology works. The Rule Book: Guidelines to a Mathematical
Biology continues this process, but in greater detail (Bolender,
2007).
In biology,
sampling is everything. Unbiased
sampling requires that all parts of the structure being sampled must have an
equal chance of being sampled. Any other
sampling scheme automatically becomes suspect.
Samples collected for biochemical analyses that do not come from a total
cell or tissue homogenate, or come from isolated cell or tissue fractions will
also fail this test – unless the data are collected and interpreted within the
framework of analytical fractionation (de Duve, 1974).
Research
experiments can become consistent with the organizing principles of biology
when they are designed as hierarchy equations.
The process is surprisingly straightforward. The equations define the problem in terms of
variables, which, in turn, are collected as data in the laboratory by applying
unbiased sampling methods. Finding a
solution to an experiment consists of entering data into an equation and
evaluating it. The challenge for the
reader will be to learn how to write balanced hierarchy equations. However, the reward for such an effort can be
substantial. Such a skill will save
countless hours in designing experiments, in reading research papers, and in
reviewing manuscripts and research proposals.
Data
appearing in the biology literature can be quantitative, semiquantitative, or
descriptive. To qualify as quantitative
data in a mathematical biology, they must be clearly identified, satisfy the
unbiased sampling requirement, and detect differences and changes
unambiguously. Being quantitative also
depends on how, where, and when the data are being used. For example, quantitative data in one setting
can quickly become semiquantitative in another.
Let’s
begin. Most biological data represent
directly or are derived from four basic quantities: volume (V), surface (S),
length (L), and number (N). Recall that
weight includes the product of a volume (V) and a density (ρ): W=V x
ρ.
In biology,
however, these four basic quantities can become linked mathematically in
curious ways because of the hierarchical organization of the parts. When parts are contained within parts, with
cells serving as the basic unit of construction, dependencies occur that become
mathematically inseparable. This means
that the parts cannot be separated from one another when they belong to a
quantitative unit. Let’s look at an
example.
Biological
parts arranged in a structural hierarchy become a function of three variables:
volume (V), mean volume (meanV), and number (N). This gives three relationships.
V = meanV ∙
N
N = V /
meanV
Mean V = V /
N
If we want
to know at any given time what biology and its parts are up to, we need to know
what’s happening to these three variables.
Notice that in practice we have to measure (or estimate) only two of the
variables, because the equation allows us to solve for the third. What happens when we decide to separate these
three variables in our experimental design?
We get exactly what we don’t want, namely an incomplete or
semiquantitative result. In effect, by
breaking them up, we break the rules.
Reductionism tells us we can, but we really cannot. Why? We
can take the data out of the hierarchy, but not the hierarchy out of the data. This turns out to be a fundamental property
of biology, behaving as a complex hierarchical system. It is a principle of experimental biology
perhaps universally unknown and yet enormously important.
Now, let’s
work through a few examples. We begin
with the general equation for volume, wherein a compartmental volume is the
product of the mean volume of a part and the number of parts:
V = meanV ∙
N.
By adding
subscripts for a cell (cell), we can focus on the behavior of a specific part,
namely a cell:
Vcell
= meanV(cell) ∙ N(cell), where for convenience we
can assign centimeter units:
cm3
= cm3 ∙ cm0 ; recall that cm0 = 1.
This equation
tells us that to get the total volume of a compartment of cells, we need to
know both the mean cell volume and the cell number. Alternatively, we can get the same
information by knowing the concentration (Vcell / Vstructure)
of the cells in the containing structure (Vstructure):
Vcell
= Vstructure ∙ (Vcell / Vstructure),
where cm3 = cm3 ∙ (cm3 / cm3).
Now, lets
compare the information content of these two equations.
(1) Vcell
= meanV(cell) ∙ N(cell)
(2) Vcell
= Vstructure ∙ (Vcell / Vstructure)
Equation (1)
contains information about the volumes and numbers of the parts (i.e., cells),
whereas equation (2) contains information only about the volumes of the
parts. To interpret a change in the
cells (Vcell) unambiguously, equation (1) will work but not equation
(2). Why? Because Vcell = meanV(cell)
∙ N(cell). A change in
the volume of cells can be influenced by a change in the mean cell volume, a
change in the number of cells, or a change in some combination of the two. Moreover, a change in equation (2) can be
influenced by a change in the volume of the parent structure (Vstructure)
plus all the changes that can occur in equation (1).
By combining
equations (1) and (2), we get:
meanV(cell)
∙ N(cell) = Vstructure ∙ (Vcell /
Vstructure) ,
We can then
solve for the concentration of cells (Vcell / Vstructure):
(Vcell /
Vstructure) = (meanV(cell) ∙ N(cell)) /
Vstructure , where (cm3 ∙ cm0) / cm3
= cm0.
Methods and Results