In: Enterprise Biology Software,
Version 7.0 © 2007 Robert P. Bolender
Enterprise Biology Software: VIII. Research (2007)
Robert P. Bolender
Biological
data differ remarkably from those of physics and chemistry in that their
properties are intimately tied to their physical locations. A hierarchical arrangement - consisting of
parts contained within parts - defines biology as a complex set of interacting
complexities, scaling in size from molecules to organisms. None of this uniqueness becomes apparent
until we start putting the parts back together.
If, for example, we don’t return the parts to their original in vivo settings
before attempting an interpretation, we run the risk of allowing our data to
become meaningless or at best severely limited.
In effect, a failure to recognize this special property of biology leads
inevitably to a semiquantitative science.
Our task this year therefore becomes one of understanding how biology
can become broken experimentally and then showing how we can fix it. The reward for our efforts takes us one step
closer to a quantitative biology.
The purpose
of the report this year is to explore ways of extending what we have learned
from the stereology literature to disciplines often reduced to relying on
semiquantitative approaches, such as biochemistry and molecular biology. We begin by defining the boundary conditions
for experimentation in biology, explore the unstable foundations of semiquantitative
data, and finally suggest workable alternatives. For example, a new category of hybrid
hierarchy equations can integrate biological data across disciplines and
provide methodological gold standards.
The main product to emerge from this effort is a rule book, one that
carefully addresses the uniqueness of biology.
The software package for 2007/8 (EBS, Version 7.0) includes new data
harvested largely from years 2004-5, expands the biology blueprint, introduces
cluster analysis, and offers guidelines for a mathematical biology.
Introduction
Transforming
biology into a quantitative science begins by changing the way we manage our
research data. An essential first step
consists of moving published data from the pages of journal articles into the
tables of relational databases where they can be standardized and used to look
for mathematical patterns. Such patterns
can be readily found in the connections that occur between parts and captured
as data pairs and repertoire equations.
In turn, these data pairs and equations can become the raw material for
building a mathematical biology.
Fundamental to the success of this approach includes an ability to
manage complexity in biology by mathematically unfolding and refolding
structural relationships - throughout the biological hierarchy of size. By leveraging the mathematical order inherent
in biological systems, we now have a workable strategy for reverse and forward
engineering structures of all sizes – one based largely on our ability to tap
into the mathematical core of biology.
From this we learn that mathematics and technology can become powerful
discovery tools when we use them in harmony with the intrinsic order of
biology.
The mathematical
biology described herein depends wholly on published data to generate the
empirical equations that we can use to explore an information space of uncommon
complexity. To qualify, these data must
be gathered with unbiased sampling methods within the framework of a rule-based
approach. The brief introduction that
follows considers three basic components of a mathematical biology that will
serve to illustrate how this quantitative approach to biology works. The Rule Book: Guidelines to a Mathematical
Biology continues this process, but in greater detail (Bolender,
2007).
In biology,
sampling is everything. Unbiased
sampling requires that all parts of the structure being sampled must have an
equal chance of being sampled. Any other
sampling scheme automatically becomes suspect.
Samples collected for biochemical analyses that do not come from a total
cell or tissue homogenate, or come from isolated cell or tissue fractions will
also fail this test – unless the data are collected and interpreted within the
framework of analytical fractionation (de Duve, 1974).
Research
experiments can become consistent with the organizing principles of biology
when they are designed as hierarchy equations.
The process is surprisingly straightforward. The equations define the problem in terms of
variables, which, in turn, are collected as data in the laboratory by applying
unbiased sampling methods. Finding a
solution to an experiment consists of entering data into an equation and
evaluating it. The challenge for the
reader will be to learn how to write balanced hierarchy equations. However, the reward for such an effort can be
substantial. Such a skill will save
countless hours in designing experiments, in reading research papers, and in
reviewing manuscripts and research proposals.
Data
appearing in the biology literature can be quantitative, semiquantitative, or
descriptive. To qualify as quantitative
data in a mathematical biology, they must be clearly identified, satisfy the
unbiased sampling requirement, and detect differences and changes
unambiguously. Being quantitative also
depends on how, where, and when the data are being used. For example, quantitative data in one setting
can quickly become semiquantitative in another.
Let’s
begin. Most biological data represent
directly or are derived from four basic quantities: volume (V), surface (S),
length (L), and number (N). Recall that
weight includes the product of a volume (V) and a density (ρ): W=V x
ρ.
In biology,
however, these four basic quantities can become linked mathematically in
curious ways because of the hierarchical organization of the parts. When parts are contained within parts, with
cells serving as the basic unit of construction, dependencies occur that become
mathematically inseparable. This means
that the parts cannot be separated from one another when they belong to a
quantitative unit. Let’s look at an
example.
Biological
parts arranged in a structural hierarchy become a function of three variables:
volume (V), mean volume (meanV), and number (N). This gives three relationships.
V = meanV ∙
N
N = V /
meanV
Mean V = V /
N
If we want
to know at any given time what biology and its parts are up to, we need to know
what’s happening to these three variables.
Notice that in practice we have to measure (or estimate) only two of the
variables, because the equation allows us to solve for the third. What happens when we decide to separate these
three variables in our experimental design?
We get exactly what we don’t want, namely an incomplete or
semiquantitative result. In effect, by
breaking them up, we break the rules.
Reductionism tells us we can, but we really cannot. Why? We
can take the data out of the hierarchy, but not the hierarchy out of the data. This turns out to be a fundamental property
of biology, behaving as a complex hierarchical system. It is a principle of experimental biology
perhaps universally unknown and yet enormously important.
Now, let’s
work through a few examples. We begin
with the general equation for volume, wherein a compartmental volume is the
product of the mean volume of a part and the number of parts:
V = meanV ∙
N.
By adding
subscripts for a cell (cell), we can focus on the behavior of a specific part,
namely a cell:
Vcell
= meanV(cell) ∙ N(cell), where for convenience we
can assign centimeter units:
cm3
= cm3 ∙ cm0 ; recall that cm0 = 1.
This equation
tells us that to get the total volume of a compartment of cells, we need to
know both the mean cell volume and the cell number. Alternatively, we can get the same
information by knowing the concentration (Vcell / Vstructure)
of the cells in the containing structure (Vstructure):
Vcell
= Vstructure ∙ (Vcell / Vstructure),
where cm3 = cm3 ∙ (cm3 / cm3).
Now, lets
compare the information content of these two equations.
(1) Vcell
= meanV(cell) ∙ N(cell)
(2) Vcell
= Vstructure ∙ (Vcell / Vstructure)
Equation (1)
contains information about the volumes and numbers of the parts (i.e., cells),
whereas equation (2) contains information only about the volumes of the
parts. To interpret a change in the
cells (Vcell) unambiguously, equation (1) will work but not equation
(2). Why? Because Vcell = meanV(cell)
∙ N(cell). A change in
the volume of cells can be influenced by a change in the mean cell volume, a
change in the number of cells, or a change in some combination of the two. Moreover, a change in equation (2) can be
influenced by a change in the volume of the parent structure (Vstructure)
plus all the changes that can occur in equation (1).
By combining
equations (1) and (2), we get:
meanV(cell)
∙ N(cell) = Vstructure ∙ (Vcell /
Vstructure) ,
We can then
solve for the concentration of cells (Vcell / Vstructure):
(Vcell /
Vstructure) = (meanV(cell) ∙ N(cell)) /
Vstructure , where (cm3 ∙ cm0) / cm3
= cm0.
Methods and Results
Discussion
What are
some of the experimental conditions that lead to semiquantitative data?
One of the
basic understandings to come from this project is that we are creating
non-natural phenotypes (e.g., in vitro conditions, transgenic animals)
that display very different properties – locally and globally. This means that such artificial phenotypes
appear inconsistent with the larger body of research data coming from natural
sources. Mixing data from in vivo,
in vitro, and transgenic studies can be expected to add considerable
noise and thereby limiting our ability to find patterns. Indeed, it may be prudent to treat in vivo,
in vitro, and transgenic experiments as distinct phenotypic categories
and to characterize them separately as mathematical phenotypes.
The richness
of biology appears so great that it cannot be captured by a single discipline,
no matter how powerful or successful it might be. Instead, the development of a mathematical
biology requires a community effort – one being defined by all those
disciplines capable of contributing variables to the equation(s) of an
experiment.
Solving a
problem in research biology is not unlike building a winning team. Define the positions and then recruit the top
players. Biochemistry and molecular
biology can do an excellent job at counting the total number of molecules in a
structure, but they are ill equipped to interpret these data in a complex in
vivo setting and have little experience with design-based methods. On the other hand, stereology struggles when
trying to count molecules but excels in dealing with complexity and in applying
design-based methods. Put these methods
together and we get our winning team. In
short, the hybrid hierarchy equation offers a general solution to several
problems. It can serve as the equation
of an experiment, become a development platform, or assume a role as a gold
standard.
The hybrid
hierarchy equation derives its power from a curious property. In contrast to a regular hierarchy equation
that is evaluated to provide an answer, the hybrid equation doesn’t have to be
evaluated because all the key variables are collected experimentally – within
the framework of a design-based model.
The singular advantage of this approach is that it allows us to avoid
some of the limitations and assumptions that often become inextricably bundled
with individual methods.
Let’s look
at an example. The hybrid hierarchy
equation for counting molecules with biochemistry and stereology in a
biological setting is as follows.
Nmolecules,structure
= Vstructure x (meanVcell x Ncell) /Vstructure x Nmolecules/Vcell
This
equation can be used to detect an in vivo change in the number of
molecules and to explain the change within a hierarchical setting. However, we can rearrange the equation and
solve for the molecular concentration – shown as a numerical density (Nmolecules/Vcell).
Nmolecules/Vcell
= Nmolecules,structure / {Vstructure x
(meanVcell x (Ncell) /Vstructure))}
Recall that
concentrations are the principal type of data being collected by most methods
in experimental biology. Notice,
however, that this concentration was not measured directly but instead derived
from two independent and reliable sources – biochemistry and stereology. In effect, this unique numerical density
(concentration) can therefore serve as a gold standard for all other numerical
density estimates collected in the usual ways.
In other words, it identifies a new platform for developing, checking,
and tuning any biochemical or section based method for counting molecules that
collects data in some form of an optical density. As such, it becomes a basic tool in building
the foundations of a mathematical biology.
See Chapter 6 in the Rule Book (Bolender, 2007) for further
details.
Let’s change
the subject. A story widely circulated
about chaos theory is the one about a butterfly that could change the weather patterns
throughout the world by simply flapping its wings – known as the “butterfly
effect.” Of course, such an event seems
quite unlikely, but it nonetheless raises an intriguing question. What might we do for this butterfly to
improve its chances of success in changing our global weather patterns? Think a moment. If we draw an analogy to chaos theory, then
all we might have to do is move the butterfly to the edge of chaos where such
events routinely occur as emergent properties.
How can we do this? I don’t
know. However, the question provokes
another one to which we might hazard an answer.
Consider
this. Is reverse engineering biology an
emergent property of the biology literature?
If the answer is yes, then how might we move our research data to the
edge of chaos to activate this emergent property? Now we have a question to which we might have
an answer. One way seems to consist of
moving our published data into a Universal Biology Database and use it
to generate the engineering equations. Look
back at the progress reports to see what has already emerged. This database allowed us to summarize a large
collection of control and experimental data with only two equations (Bolender, 2004), to use published data to
forward and reverse structures (Bolender, 2005), and to display the
mathematical core of biology as a stoichiometry blueprint (Bolender,
2006). Does this mean that a sizable
portion of the stereology literature is now sitting at the edge of chaos? I don’t know.
But if it is, then we might be exactly at the right place where many new
and quite unexpected things are about to happen.
Last year
you may recall that biochemistry and molecular biology did not appear to be
viable candidates for reverse engineering biology because the assay methods of
these disciplines forfeit most of the structural information (Table 1). Recall that the task of reverse engineering
biology is largely a structural exercise.
Out of a possible 30 dots, they got only six.
Table
1. Report 2006: Minimum requirements for
reverse engineering biology using published data; a preliminary
assessment. Can we fill in the missing
dots? (Adapted from 2006 report.)
|
Requirements for Reverse Engineering |
Stereology |
Biochemistry
|
Molecular Biology
|
|
In Vivo Data |
|
|
|
|
Concentration Data |
● |
● |
● |
|
Average Cell Data |
● |
|
|
|
Absolute Data |
● |
● |
|
|
Cell Counts |
● |
|
|
|
Molecule Counts |
● |
● |
● |
|
Minimize Bias |
● |
|
|
|
Minimize Animal Variability |
● |
|
|
|
Detect Change Unambiguously |
● |
● |
|
|
Design Experiments as Equations |
● |
|
|
|
Enforce Unbiased Sampling |
● |
|
|
|
Apply Biological Rules |
● |
|
|
|
Convert 2D Data back to 3D |
● |
|
|
|
Standardize Data |
● |
|
|
|
Generate Biological Blueprints |
● |
|
|
As you can
see in the scorecard this year (Table 2), biochemistry and molecular biology look
a good deal more promising in that the blue dots identify the likely outcomes
when these disciplines join forces with stereology within the framework of the
hybrid hierarchy equations. This may be
telling us that the long-term goals of systems biology may have become shorter.
Table
2. Report 2007: Minimum requirements for
reverse engineering biology using published data; a preliminary
assessment. Can we fill in the missing
dots? (Adapted from 2006 report and updated.)
|
|
|
Stereology +
|
Stereology +
|
|
Requirements for Reverse Engineering |
Stereology |
Biochemistry
|
Molecular Biology
|
|
In Vivo Data |
|
|
|
|
Concentration Data |
● |
● |
● |
|
Average Cell Data |
● |
● |
● |
|
Absolute Data |
● |
● |
● |
|
Cell Counts |
● |
● |
● |
|
Molecule Counts |
● |
● |
● |
|
Minimize Bias |
● |
● |
● |
|
Minimize Animal Variability |
● |
● |
● |
|
Detect Change Unambiguously |
● |
● |
● |
|
Design Experiments as Equations |
● |
● |
● |
|
Enforce Unbiased Sampling |
● |
● |
● |
|
Apply Biological Rules |
● |
● |
● |
|
Convert 2D Data back to 3D |
● |
● |
● |
|
Standardize Data |
● |
● |
● |
|
Generate Biological Blueprints |
● |
● |
● |
Where does biology
go to discover things? It goes to the
edge of chaos. Why? Because that’s where emergent properties
occur. These properties are the
discoveries. Can we copy this discovery
strategy of biology? Perhaps we can
because stereology seems to be unusually clever at opening tightly closed doors
- mathematically. How? One strategy might be to move more of the
published data of research biology – from many different disciplines - to the
edge of chaos as well. How do we do
that? Use more mathematics and
technology. How will we know when we
have made it to the edge? We will begin
to discover curious things for the first time and know exactly what to do next. In other words, we will be wired into the
secrets of biology - mathematically.
Why is it so
important for us to have a mathematical biology? Recall that nature uses two collections of
parts – the molecules of the periodic table and the genes of living systems –
to build truly remarkable things, perhaps by anticipating or recognizing the
value of emergent properties. Indeed,
many of the mysteries of biology may lie hidden in the timing and in the mix of
the parts that together emerge into something entirely new. For us, nature will always be our best and
wisest teacher, but to become a student we must succumb to the rigors of a
mathematical science. Will it be hard to
do? Yes, of course. But that also makes it both interesting and
worth the effort.
References
Board on
Life Sciences, National Academy of Sciences. 2008 The role of Theory in
Advancing 21st Century Biology: Catalyzing Transformative Research,
Washington D.C.: National Academies Press.
Bolender, R.
P. 2001a Enterprise Biology Software I. Research (2001) In: Enterprise Biology
Software, Version 1.0 ã
2001 Robert P. Bolender
Bolender, R.
P. 2002 Enterprise Biology Software III. Research (2002) In: Enterprise Biology
Software, Version 2.0 ã
2002 Robert P. Bolender
Bolender, R.
P. 2003 Enterprise Biology Software IV. Research (2003) In: Enterprise Biology
Software, Version 3.0 ã
2003 Robert P. Bolender
Bolender, R.
P. 2004 Enterprise Biology Software V. Research (2004) In: Enterprise Biology
Software, Version 4.0 ã
2004 Robert P. Bolender
Bolender, R.
P. 2005 Enterprise Biology Software VI. Research (2005) In: Enterprise Biology
Software, Version 5.0 ã
2005 Robert P. Bolender
Bolender, R.
P. 2006 Enterprise Biology Software VII. Research (2006) In: Enterprise Biology
Software, Version 6.0 ã
2006 Robert P. Bolender
Bolender, R.
P. 2007 Rule Book: Guidelines to a Mathematical Biology (2007) In: Enterprise Biology Software,
Version 7.0 ã 2007
Robert P. Bolender
De Duve, C.
1974 Nobel Lecture: Exploring cells with a centrifuge. From Nobel Lectures, Physiology or Medicine
1971-1980, Editor Jan Lindsten, World Publishing Co., Singapore, 1992.
Eisen, M.
B., Spellman, P. T., Brown, P. O., and D. Botstein. 1998 Cluster
analysis and display of genome-wide expression patterns. PNAS Genetics 95: 14863-14868.