In: Enterprise Biology Software,
Version 6.0 © 2006 Robert P. Bolender
Enterprise Biology Software: VI. Research (2006)
Robert P. Bolender
Summary
The Enterprise
Biology Software Project explores challenging questions in the life
sciences by looking for answers in the biology literature. Since many of these questions can be
translated into mathematical puzzles, they can be solved – for the most part -
by generating empirical equations. In
turn, these equations can produce new and more difficult puzzles often
extending to the mathematical core of biology.
For example, the progress report this year describes how we can solve
one of these harder puzzles by first separating two interacting complexities
and then unfolding each complexity in its turn. In effect, the question: “How do we reverse engineer biology?”
becomes a double puzzle: “How do we separate two complexities (methods and
biology) and how do we unfold and refold them?” A solution to the methods puzzle becomes a Universal Biology
Database, which in turn can be used to solve the biology puzzle by
characterizing biological phenotypes with equations. Such phenotypes can be represented either as a single equation or
as a stack thereof. When summarized
across biology, these equations offer a glimpse of the core by revealing the
mathematical organization of the parts as a biological blueprint. There is, however, one small surprise. In contrast to the widely held view that
biology is largely a nonlinear system, most of these phenotypes turn out to be
linear. This finding is most welcome in
that it greatly simplifies the task of reverse engineering biology. By including a query by example interface,
the 2006 software package offers ready access to this new data-driven biology
and may become one of our technology tickets to the future. Imagine – if you will - a time when each
research paper becomes a puzzle with a unique mathematical solution, where
biology operates according to a rulebook, and we can read from that book. Interested?
I invite you to run the software package.
Introduction
Biology plays by
the rules. By discovering these rules
and also playing by them, we improve our chances for success – in whatever we
attempt. This defines a central
strategy of the Enterprise Biology Software Project. Indeed, this process of discovery based on
solving mathematical puzzles is wonderfully simple and can be applied by
everyone. All we have to do is follow
the clues to find the rules that solve the puzzles. Each solution, of course, generates a new collection of clues and
the process continues from one puzzle to the next.
This rule-based
approach has yielded two key pieces of information. We now know that repertoire equations can unfold complexity into
a well-defined order, wherein these power equations have coefficients of
determination approaching one (r2~1). Moreover, these equations often display an exponent a (Y=bXa)
that also tends toward one. This means
that the repertoire equations effectively become linear (Y=bX).
Since the
repertoire equations define order in biology as the proportion of two parts X
and Y, we can assemble enough information to begin the process of reverse
engineering biology. Moreover, we can
design a Universal Biology Database with virtually unlimited scaling
properties by simply arranging the ratios of the parts (Y/X) in numerical order
and assigning them to decimal steps.
When fitted to regression curves, these data pairs become decimal
repertoire equations. The database
becomes universal because it can accept data from all disciplines capable of
forming ratios of structures, which of course includes molecules. By storing data from many different
disciplines in the same database table, the data become integrated -
mathematically – across disciplines, animals, settings, et cetera.
The software is
full of surprises. When running the
program from tab 8 of the Universal Biology Database, for example, we
can easily compare the proportions of the same two structures in control and
experimental settings. I was surprised
by how often controls and experimentals share the same decimal repertoire
equation – even when the absolute amounts differ. This tells us that biology prefers to make more or less of the
same thing, rather than making something different. Changes in the proportions of structures, however, can be readily
found in development, disease, and aging.
Perhaps the most reassuring finding of all, however, is the fact that
many different labs routinely produce the same results (i.e., the experimental
data share the same decimal repertoire equation(s)). This shows that both the methods and the researchers applying
them are indeed capable of generating reproducible results.
The new software
package comes with a new learning curve, but one that has a very gentle
slope. Specifically, by adding a “query
by example” interface we can write and send complex instructions to the
database. This provides full access to
the information with great ease and flexibility. Simply working through the examples should be all that is
required to become comfortable with the new format.
Although we’ll
consider these topics later in this report, a brief introduction here may be
helpful. Why do we want to reverse
engineer biology? The answer is
painfully simple. It encourages biology
to become a data-driven science. Recall
that chemistry is a product of physics and that biology a product of both
physics and chemistry. Chemistry, for
example, is very effective as a data-driven science because it plays by a set
of stoichiometry rules. We know these
rules and can use them to explain how small parts (atoms) are connected to form
larger molecules and compounds and to write balanced equations for chemical
reactions. In other words, we have
access to the mathematical core of chemistry.
Do analogous rules and balanced equations exist for biology? Yes, of course they do. The only difference is that the rules of
biology are much better hidden than those of chemistry and must be extracted
from the research literature with the help of mathematics and technology. What evidence supports this claim? Biological stereology can provide the
balanced equations and a Universal Biology Database the stoichiometry of
the parts. In effect, we can produce a
biological blueprint that shows how the parts of biology larger than molecules
are connected by rule. As such, this
table provides the first of many new access routes to the mathematical core of
biology – exactly what we would expect from a data-driven science.
Methods and
Results
The main purposes
of the Progress Report this year are to introduce (1) a new interface for the
literature database and (2) to explore an engineering model (forward and
reverse) for biological research. As
always, the best introduction to the software package comes from running the
programs.
The central feature
of the first universal database was a single table that could accommodate both
control and experimental data – even when coming from many different
disciplines (Bolender, 2005).
Effectively, it defined a simple - yet powerful - way of integrating
diverse data within and across research fields. Unfortunately, most of the details – defined as data entry fields
in the original databases - were no longer available. The new database remedies this shortcoming by connecting all
three databases into a single operational unit.

By adding a “query by example” interface to the Universal Biology Database 2, even the beginner can quickly learn to write intricate SQL (structured query language) scripts and submit them to the database.

This year the
software is being distributed on a mini CD, one that fits conveniently in a
standard envelope. The CD includes a
relational database, a runtime database engine, and a small collection of programs
and documents.
The main menu of
the program includes the progress report and an index of programs. Click on an item in the list to view
it.

1. Progress Report: A pdf file (this document).
2. Universal Biology Database: The database folder includes eleven tabs - six of which can be used to run programs. A central feature of the new software consists of a query by example interface. It allows us to define a database query by simply selecting items from lists or by entering a word (or part thereof), number, or a collection of words or numbers into one or more data entry fields. As each selection is made, the SQL script – often shown at the bottom of the screen - is modified accordingly. When completed, the set of instructions (query) is sent to the database by clicking on the Retrieve button. The results can be viewed, printed, stored as a file, or sent to an Excel worksheet.


Tab 2 - Introduction: Click on <Read> for a brief introduction to the
database.
Tab 3 - Background:
Click on <Read> for an explanation of why stereological data can create a
foundation for the Universal Biology Database.
Tab 4 – UBD data table: Select tab 4 and click on run. This screen displays the main table of the Universal Biology
Database 1.0 updated to 2.0.
Green identifies control data and blue experimental. In the lower left hand corner of the screen
– just to the left of the horizontal scrolling arrow – note the thick black
line. Using the mouse cursor, drag it
to the right to produce a split screen.
The Excel worksheet used to calculate the decimal repertoire equations
can be found at C:\Program Files\EBSTicket
2006\Files\2006_regression_equations_01.xls).

Tab 5 – UBD control data: Select tab 5 and click on run. The screen displays the new query by example interface. Notice that the same data fields appear
under two different headings – data catalogue (left panel) and query by example
(right panel). Use the catalogue to
discover what the database contains and the query panel to assemble a set of
instructions for the database. Numerous
examples of query criteria appear in the drop down lists, illustrating the
remarkable flexibility of this approach.
Simply work through some of the examples to see how the interface works.

A few introductory
comments, however, may be helpful.
Items selected from either panel – catalogue or query – can be used in a
search. Items selected from the
catalogue will retrieve all those items identical to the one chosen. For example, selecting <bird> from the
catalogue panel and clicking on the Retrieve button produces 58
responses. In contrast, typing <like
%bird%> into the organism field of the query panel produces 110
responses. In the first case, only
those rows containing just the word <bird> were retrieved, whereas in the
second case all those entries containing the word <bird> alone or in a
word string were retrieved.
Furthermore, note that entries coming from the catalogue panel are case
sensitive (upper and lower case), whereas those from the query panel are not.
Let’s try a simple query together. The objective of our search will be to retrieve all the data coming from the CA1 region of the control hippocampus. As shown below, we have written our request in the X Name field <like %ca1%>, using the query panel. Notice how the SQL script in the box at the bottom of the screen reflects our choice.

Clicking on the Retrieve
button yields the result below, which includes a collection of 920 screens.

To view these data
as a scrolling screen instead, press the view data button and the
following screen appears.

When we scroll to
the end of this table, two graphs appear.
The first shows a log-log plot of the X, Y data and the second a
histogram of the decimal repertoire equations.
The log-log plot displays the published data of CA1 as a mathematical
puzzle. Finding a solution consists of
using regression analysis to fit the points to power curves that carry
coefficients of determination (r2) close to one. This process generates a family of power
curves – called repertoire equations - that define how CA1 is related to itself
and to other structures in the brain.
Alternatively, the process can be simplified. Since each data ratio in the Universal Biology Database is
attached to a decimal repertoire equation, a collection of ratios automatically
becomes a stack of equations. Notice
the distinct steps in the histogram of second graph. They represent the repertoire equations as decimal steps and
illustrate the total range of connections available to the CA1 region of the
control hippocampus. Recall that each
connection also defines the proportion of the two parts, X and Y.

To illustrate the
process of finding equations, we can send this data table to an Excel worksheet
and then do some curve fitting. This is
accomplished by clicking first on the to excel button and then on the from
excel button. To keep a copy of
this worksheet, change the name from working.xls to something else. Otherwise, the file will be written over the
next time you click on the to excel button (To set the Excel path, click
on INSTALL READER in the main menu).

Tab 6 – UBD experimental data: Select tab 6 and click on run. Here the interface is the same as the one
just described for the control data.
Therefore, a brief example will suffice. Let’s look for all the data on schizophrenia in males over the
age of 60. The screen below shows how
we access these data, using our query by example interface.

The new puzzle
shown below can be solved for the decimal repertoire equations, which in turn
can be used to compare schizophrenia to males of other ages, to females, or to
other diseases of the central nervous system.
Alternatively, we can search for potentially related structures simply
by viewing the contents of an individual decimal step in the data table.

Tab 7 – UBD control + experimental data: Select tab 7 and click on run. Here, the Universal Biology Database
table includes all the control and experimental data. Let’s try another query.
What is the effect of aging on mitochondria? For the X structure we can type in <like %aging%> and for
the Y structure <like %mito%>.
Our result is a collection of control and experimental points that can
be evaluated either by inspection or by fitting the points to log-log
regression lines (power curves).
Tab 8 – UBD connection repertoire - Blueprint 1.0 - Control Data: The figure below illustrates the biological blueprint as a three-dimensional plot. It represents a collection of equations defining the proportions of one structure to another. In effect, it gives us an empirical view of the mathematical core of biology.

Select tab
8 and click on run. The connection
repertoire table uses the decimal repertoire equations – expressed as
proportions of whole numbers - to show how biological parts are connected by
rule. In effect, it provides a
structural blueprint for biological parts larger than molecules in terms of a
well-defined stoichiometry. This
mathematical overview of biology shows how structural connections define
phenotypes and can provide insights into how, when, and where these phenotypes
change.
The connection repertoire table shown below identifies the structures in an X,Y pair and shows how they are connected quantitatively to each other and to related structures. To be included in the table, the same data pair must appear at least three times in the database. This represents a rigorous test of both the methods and the investigators in that the three data pairs typically come from three different papers.

Several
distinct patterns quickly emerge from this table. A given pair of structures (X,Y) can display several distinct
phenotypes, characterized as a multiple of whole numbers (X:Y). For example, the proportion of mitochondria
to peroxisomes can be 10:1, 20:1, and 33:1.
Notice also that different pairs of structures can share similar
proportions. This overlap can be used
to link the equation associated with each data pair into a local or global
network of equations. Such networks
provide a substrate for connecting other data types and become a platform for
diagnosis and prediction. Table 1 shows
the data set that would be the starting point for assembling such a network for
peroxisomes.
Table 1 Equations representing the proportions of organelles.
|
Data Pair Proportion |
Decimal Repertoire Equation |
|
Mitochondrion:Peroxisome |
|
|
·
33:1 |
Y=0.03493X0.9999 |
|
·
20:1 |
Y=0.05459X0.9999 |
|
Y=0.12318X0.9999 |
|
|
·
25:1 |
Y=0.04442X0.9999 |
|
·
25:2 |
Y=0.08504X0.9998 |
|
·
14:1 |
Y=0.07455X0.9999 |
|
·
17:1 |
Y=0.06483X1.000 |
|
Nucleus:Peroxisome |
|
|
·
5:1 |
Y=0.22397X0.9999 |
|
·
10:1 |
Y=0.12318X0.9999 |
|
·
7:1 |
Y=0.17464X0.9998 |
|
·
14:1 |
Y=0.07455X0.9999 |
|
·
5:2 |
Y=0.4468X0.9999 |
|
Lysosome:Peroxisome |
|
|
·
3:1 |
Y=0.34610X0.9998 |
|
·
5:3 |
Y=0.64920X0.9998 |
|
·
3:2 |
Y=0.74784X0.9999 |
|
·
1:1 |
Y=1.19840X0.9996 |
|
·
1:2 |
Y=2.22114X0.9999 |
|
·
1:3 |
Y=3.44783X0.9998 |
|
Golgi:Peroxisome |
|
|
·
5:1 |
Y=0.22397X0.9999 |
|
·
5:2 |
Y=0.44680X0.9999 |
|
·
3:2 |
Y=0.74784X0.9999 |
|
·
5:4 |
Y=0.84873X0.9999 |
|
·
1:1 |
Y=1.1984X0.9996 |
|
·
1:3 |
Y=3.44783X0.9998 |
|
Lipid Droplet:Peroxisome |
|
|
·
10:1 |
Y=0.12318X0.9999 |
|
·
5:2 |
Y=0.44680X0.9999 |
|
·
5:4 |
Y=0.84873X0.9999 |
|
·
2:3 |
Y=1.72417X0.9999 |
|
·
1:10 |
Y=12.1598X0.9997 |
|
·
1:2 |
Y=2.22114X0.9999 |
|
·
1:3 |
Y=3.44783X0.9998 |
Why
is access to the connection repertoire blueprint important? Recall that biology often uses a remarkably
similar genome to produce a great variety of different animal species. Given our current understanding, it appears
that we are a product of at least two interacting forces: our genes and the way
they and their products produce and assemble our parts.
The
decimal repertoire equations suggest that biology has evolved a common parts
inventory that it draws from when assembling people, mice, frogs, or fish. The connection repertoire table allows us to
explore phenotypes as a function of their basic building blocks, namely the
decimal repertoire equations. By
defining phenotypes mathematically, we can study their life history in a given
species and detect departures from what is expected to be normal. The table also moves us closer to the
genome. When, for example, the
proportions of the parts match the proportions of their constituent molecules,
we can predict one from the other. Of
course, we might discover that some decimal repertoire equations can be
explained simply by determining the number of duplicate genes being read at a
given time or that exist in a given species.
Think of it this way. If genes
individually cannot determine a species, then perhaps the number of copies of a
given gene can.
If we summarize the connection repertoire table with a histogram, then the full range of phenotypic expression in biology can be seen. Notice that practically all the connections can be captured with only about 50 equations, with far fewer doing most of the work. The graph below shows that the connections between the parts tend to define five major peaks, each showing a clear preference for a specific proportion.

When
we focus on the connections of a single structure, such as the mitochondrion, a
slightly different pattern appears.
Although this organelle uses decimal repertoire equations from fewer
peaks, the positions of the peaks remain more or less the same as they appear
in the total data set.

Finally,
we can use the connection repertoire to make a few preliminary observations as
to the biological preferences. Of the
total entries (4,296), roughly 40% occur in six decimal repertoire equations
(Table 2).
Table 2 Decimal Repertoire - Total Data Set – Most Popular Equations and Proportions
|
Decimal
Repertoire Equation |
Sum |
% |
Proportion
(X:Y) |
|
106 |
6.5 |
50
to 1 |
|
|
0.1 |
237 |
14.6 |
10
to 1 |
|
0.3 |
296 |
18.3 |
3
to 1 |
|
1.0 |
469 |
29 |
1
to 1 |
|
1.5 |
311 |
19 |
2
to 3 |
|
10 |
200 |
12 |
1
to 10 |
When
we consider just counts of neurons (Table 3), we find that almost 70% of the
connections occur in six decimal repertoire equations that define only five
proportions: 3 to 1, 2 to 1, 3 to 2, 1 to 1, and 2 to 3. Notice that the proportions are largely
ratios of small whole numbers – curiously reminiscent of biochemical
stoichiometry and the law of multiple proportions.
Table 3 Decimal Repertoire – Numbers of Neurons – Most Popular Equations and Proportions
|
Decimal
Repertoire Equation |
Sum |
% |
Proportion
(X:Y) |
|
39 |
17 |
3
to 1 |
|
|
0.5 |
18 |
8 |
2
to 1 |
|
0.7 |
25 |
11 |
3
to 2 |
|
0.9 |
24 |
11 |
~ 1 to 1 |
|
1.0 |
91 |
40 |
1
to 1 |
|
1.5 |
30 |
13 |
2
to 3 |
Since both
the central and peripheral nervous systems rely on tandem connections between
neurons, disrupting cell proportions at any level may generate a variety of predictable
consequences – upstream and down.
Unintended consequences also exist.
Last year, for example, the connection matrix for the lateral geniculate
nucleus uncovered the disturbing fact that altering the genome of mice – at
locations considered unrelated to the nervous system – can actually change the
proportions of cells in the brain (Seecharan et al., 2003; Bolender,
2005).
Tab 9 – UBD change: Select tab 9 and click on run. The change data come from the design codes described previously
(Bolender, 2003-2005). In this screen,
X identifies control or experimental data and Y experimental. A ratio >1 indicates an increase (red),
<1 a decrease (blue), and =1 no change (green). Here the proportions are largely ratios of small whole numbers –
once again reminiscent of the law of multiple proportions. Finally, bear in mind that these decimal
repertoire equations belong exclusively to change data.
Let’s use this screen to see what exposures can change the hippocampus. Type <like %hippo%> into the X Structure field. Click on the Retrieve button and then on the show data button. The screen below identifies the direction of the change and conditions responsible. Such a sort provides insights into the repertoire of change available to the hippocampus. Notice, for example, how different conditions can produce both similar and different responses – in similar and different species. For further information, see Puzzle 2: The Hippocampus, (Bolender, 2005)

Scroll to the
bottom of the screen. Notice in the
distribution histogram that most parts of the hippocampus change only slightly
or not at all – a general pattern that persists throughout the nervous system
(Bolender, 2004-2005).

Tab 10 – Discussion: Click on <Read>.
Tab 11 – Resources, etc: Click on <Read>.
3. Citation: The citation screen includes one of several support screens displaying data from the Stereology Literature Database. It can be used to find references using a variety of approaches. For example, to view the effects of follicle stimulating hormone (FSH), the following query finds all the FSH papers stored in the database.

Click
on the view data button to browse the result set one paper at a
time.

4. Citation List: This table includes a comprehensive list of
publications for biological stereology.
To run a search, type a word or citation number into a data entry field
and press Enter. To read an
abstract or the full text of a paper online, click on the abstract
button and follow the instruction that appear.

5. Method: Use this screen to find all those papers
using one or more methods.

6. Find papers from
data: Papers can be found
from numerical data and their attributes.
Notice that individual screens are provided for control and experimental
data.

7. Catalogue of
original data: The data catalogue (Stereology Literature
Database) included with the earlier releases of BIOLOGYtabs (Bolender,
2001-2005) now comes with a query by example interface. This means that all the items in a database
table – including both text and numerical data - can be searched at all levels
of the hierarchy. Both control (green)
and experimental (blue) tables are included.
Select a screen by clicking on the name of the hierarchy level – listed
at the left of the screen.

8. Enter new universal
data: Pairs of data sharing similar references –
wherein the reference cancels - can be entered into the Universal Biology
Database. For example, data
recently acquired in your lab can be entered, assigned to a decimal repertoire
equation, analyzed, and compared to data previously published. Recall that to enter data, we begin by
clicking on the add button, enter the data, and finally store the data
in the database by clicking on the update button

9. Dictionary: A
collection of definitions offers help with the terminology. Enter a word – or part thereof – and press
the Enter key.

Discussion
By building and
testing a Universal Biology Database, we begin the process of
understanding how technology can produce a single source of information, one
common to all biological disciplines and at the same time capable of minimizing
methodological bias and animal variability (Bolender, 2004-2005). Seamless integration of data across
disciplines will become a critical asset as we begin to tackle the many
problems of complexity associated with reverse engineering biology. Challenges in the basic and clinical
sciences – long considered intractable - will become increasingly manageable as
we allow the fundamental design principles of biology to play a larger role in
our experimental strategies. The
progress report encourages this approach by suggesting ways of applying reverse
engineering techniques to uncover these principles.
Technology can
expand our options. As we begin to
describe diseases as stacks of equations, for example, we will have a new and
effective way of monitoring their progression.
Moreover, equations derived empirically from the biology literature can
serve as pointers to the molecules and genes involved in the onset of pathology
and contribute a rigorous approach to evaluating both treatments and the
recovery process. Consider the impact
of such a resource. While following
changes in the individual molecules of a few cells may be both interesting and
worthwhile, only a broader knowledge of the changes in the relationships of
many parts – large and small - can offer a meaningful approach to something as
complicated as biology.
As biology becomes
a data-driven science, reverse engineering becomes a standard research
model. An experiment can be run at any
hierarchical level, but the results will be automatically connected to
previously published data – at all levels above and below. To explain the underlying causes of a
result, we can drill down and look at the behavior of the smaller parts. Alternatively, to explain the broader
consequences, we can look at the behavior of the larger parts in the higher
levels. In effect, our experimental results
will seed networks of equations that can take us to wherever we wish to go.
Recall that reverse
engineering is the process of analyzing a completed structure to identify its
parts and their interrelationships. In
other words, it is the practice of figuring out how a product is made by taking
it apart. In general, however, we as
biologists tend to focus our attention primarily on taking things apart, rather
than on putting them back together.
The Enterprise
Biology Software Project is attempting to do both. By taking biology apart mathematically with
equations (reverse engineering), we can use these same equations to put
everything back together (forward engineering). For example, assembling the equations for the hippocampus last
year demonstrated the feasibility of predicting the structure of the
hippocampus – species by species (Bolender, 2005). In fact, anyone can now do it.
Run the hippocampus software, enter a single seed value into a network
of equations, and you can forward engineer a hippocampus.
Reverse
engineering biology makes sense as a research model because it solves many of
the problems created by complexity. By
taking things apart and then reassembling them, we can observe the effects of
unfolding and refolding. The advantage
of folding mathematically is that we can see when, where, and how complexity
forms. Moreover, this strategy works
consistently at every level of complexity.
Recall that all the control and experimental data could be summarized by
just two exponential equations, which were assembled from power equations
(Bolender, 2003-4). In turn, these
power equations became linear when expressed as decimal repertoire equations
(Bolender, 2005). In this case,
unfolding a global complexity consisted of going from exponential to power to
linear – from complex to simple.
A
practical example may be helpful. Let’s
see how we can turn a seemingly hard question into an easy one. For example, “How do we reverse engineer a
disease?” Go to tab 7 of the Universal
Biology Database, select <like
%schizo%>, click on the Retrieve button, and a stack of 58 equations appears
(in 855 rows). The equations describe
the schizophrenia phenotype(s) - mathematically. The stack contains information about the relationships of the
parts, their proportions, and where and when they can change. By comparing the disease phenotype to its
control, we can begin to figure out what went wrong – all across the biological
hierarchy of size. More importantly,
such detailed information may eventually allow us to diagnosis the onset of a
disease at a time when intervention is the most effective.
Thus
far, the reverse engineering effort has progressed only as far as organelles
and a few molecules. We still have to
populate the molecular level of the database before we can begin to interact
with the genes. Fortunately, molecular
biologists are already actively engaged in reverse engineering projects and the
Universal Biology Database may at some point become helpful to them (see
the report of the NYAS DREAM Project at http://www.nyas.org/dream). There is, however, a small problem. Molecular biologists tend to work with cells
in vitro and therefore these phenotypes may or may not resemble their
counterparts in vivo. In fact,
we may discover that the results of biological stereology and molecular biology
routinely experience different realities.
This creates an interesting opportunity. Since the data of biological stereology can serve as a gold
standard in both in vivo and in vitro settings, it may offer a
natural bridge between molecular biology and many other research
disciplines.
When looking for
patterns in biological data with regression analysis, a common finding is a
power function (y=bxa).
Indeed this observation supports the widely held view that biological
systems are largely nonlinear. It would
therefore seem helpful to understand how a biological system becomes nonlinear,
because in doing so the system becomes far more complex. What, for example, does biology gain from
being nonlinear?
Let’s begin by looking at a nonlinear connection between two biological parts. The figure below identifies the relationship between Golgi (X) and mitochondria (Y). As expected, the relationship is nonlinear as shown by a power curve (Y = 2.9965X0.8305) with an R2 = 0.8214. To be linear, the exponent a would have to be one, not 0.8305. Note that this regression curve was generated with the Excel spreadsheet – only for the purpose of illustration. When the exponent a does not approach 1.0, a statistical method other than the one used in Excel should be considered.

Table 4 Decimal repertoire equations for the relationship of Golgi to mitochondria.
|
Structure
X |
Structure
Y |
DR
Equation Nu |
X:Y |
Frequency |
DR
Equation |
R2 |
|
Golgi |
Mitochondrion |
0.25 |
4:1 |
2 |
Y=0.27299X1.000 |
0.9999 |
|
Golgi |
Mitochondrion |
0.3 |
3:1 |
4 |
Y=0.34618X1.000 |
0.9999 |
|
Golgi |
Mitochondrion |
0.5 |
2:1 |
3 |
Y=0.54542X1.000 |
0.9999 |
|
Golgi |
Mitochondrion |
0.6 |
5:3 |
2 |
Y=0.64920X0.999 |
0.9999 |
|
Golgi |
Mitochondrion |
0.8 |
5:4 |
7 |
Y=0.84873X1.000 |
0.9999 |
|
Golgi |
Mitochondrion |
0.9 |
9:8 |
2 |
Y=0.94680X1.000 |
0.9999 |
|
Golgi |
Mitochondrion |
1.0 |
1:1 |
8 |
Y=1.19840X0.999 |
0.9999 |
|
Golgi |
Mitochondrion |
1.5 |
2:3 |
7 |
Y=1.72417X0.999 |
0.9999 |
|
Golgi |
Mitochondrion |
2.0 |
1:2 |
9 |
Y=2.22114X1.000 |
0.9999 |
|
Golgi |
Mitochondrion |
2.5 |
2:5 |
14 |
Y=2.72862X1.000 |
0.9999 |
|
Golgi |
Mitochondrion |
3.0 |
1:3 |
4 |
Y=3.44783X1.000 |
0.9999 |
|
Golgi |
Mitochondrion |
4.0 |
1:4 |
6 |
Y=4.43500X1.000 |
0.9999 |
|
Golgi |
Mitochondrion |
5.0 |
1:5 |
4 |
Y=5.43461X1.000 |
0.9999 |
|
Golgi |
Mitochondrion |
6.0 |
1:6 |
4 |
Y=6.48703X1.000 |
0.9999 |
|
Golgi |
Mitochondrion |
7.0 |
1:7 |
1 |
Y=7.45264X1.000 |
0.9999 |
|
Golgi |
Mitochondrion |
8.0 |
1:8 |
2 |
Y=8.47694X1.000 |
0.9999 |
|
Golgi |
Mitochondrion |
9.0 |
1:9 |
3 |
Y=9.46970X0.999 |
0.9999 |
|
Golgi |
Mitochondrion |
10.0 |
1:10 |
6 |
Y=12.1598X0.999 |
0.9999 |
|
Golgi |
Mitochondrion |
15.0 |
1:15 |
6 |
Y=17.1766X0.999 |
0.9999 |
|
Golgi |
Mitochondrion |
20 |
1:20 |
14 |
Y=24.2499X0.999 |
0.9999 |
|
Golgi |
Mitochondrion |
30 |
1:30 |
7 |
Y=34.5613X1.000 |
0.9999 |
|
Golgi |
Mitochondrion |
40 |
1:40 |
3 |
Y=44.4335X1.000 |
0.9999 |
|
Golgi |
Mitochondrion |
50 |
1:50 |
1 |
Y=54.6769X0.999 |
0.9999 |
|
Golgi |
Mitochondrion |
60 |
1:60 |
2 |
Y=63.7480X0.999 |
0.9999 |
|
Golgi |
Mitochondrion |
80 |
1:80 |
1 |
Y=84.0673X0.999 |
0.9999 |
|
Golgi |
Mitochondrion |
100 |
1:100 |
2 |
Y=121.991X0.999 |
0.9999 |
|
Golgi |
Mitochondrion |
200 |
1:200 |
1 |
Y=244.096X1.003 |
0.9999 |
Table 5 The developing kidney – alternating linear (exponent a close to one) and nonlinear equations (exponent a not close to one); After Bertram et al, 2000)
|
Structure X |
Structure Y |
Equation |
Power Equation |
R2 |
|
Dev day 17 |
Dev day 18 |
Nonlinear |
Y= 1.74620X 1.0966 |
0.9996 |
|
Dev day 18 |
Dev day 19 |
Nonlinear |
Y= 1.79790X 0.9619 |
0.9999 |
|
Dev day 19 |
Dev day 20 |
Linear |
Y=1.782600X1.0031 |
0.9996 |
|
Dev day 20 |
Dev day 21 |
Linear |
Y= 2.174000X1.003 |
0.9999 |
Reverse engineering
biology requires large amounts of standardized data that can be integrated
across the entire biological hierarchy of size - seamlessly. By assembling a table of minimum
requirements (Table 6), we can put the magnitude of the undertaking into
perspective. The table suggests that
biological stereology can meet all the minimum requirements needed to reverse
engineer biology from organisms to cell organelles – as the contents of the
database allow (Bolender, 2001-2006).
In contrast, biochemistry and molecular biology appear to meet only some
of these requirements. This should not
come as a surprise in that reverse engineering biology is largely a structural
exercise.
We have, however,
at least two options. Either we can add
the experimental data of biochemistry and molecular biology to the Universal
Biology Database and let the hierarchical order of biological stereology
integrate these data or we can design new methods that mathematically combine
the data of several disciplines (e.g., Counting Molecules; Bolender,
2005). Both approaches should
work.
Given my reading of the literature, the task of incorporating molecular data into the Universal Biology Database appears largely a function of the experimental methods. In general, molecular methods tend to be disruptive in that they diminish or eliminate structural information. Homogenization, fractionation, isolation, purification, digestion, PCR analysis, immunoassay, western blot, northern blot, immunoblot, microarray analysis, and enzymology all forfeit structural order in exchange for access. In addition, these methods can have notable limitations - not the least of which includes multiple sources of bias (Fluck, et al., 2005). Indeed, these limitations may explain why molecular biologists routinely experience difficulty in dealing with biological complexity – both locally and globally (e.g., see the O’Reilly Network: Interview with Dr. Leroy Hood; search on <leroy hood complexity>).
Table 6 Minimum requirements for reverse engineering biology using published data; a preliminary assessment. Can we fill in the missing dots?
|
Requirements for Reverse Engineering |
Stereology |
Biochemistry
|
Molecular Biology
|
|
In Vivo Data |
|
|
|
|
Concentration Data |
● |
● |
● |
|
Average Cell Data |
● |
|
|
|
Absolute Data |
● |
● |
|
|
Cell Counts |
● |
|
|
|
Molecule Counts |
● |
● |
● |
|
Minimize Bias |
● |
|
|
|
Minimize Animal Variability |
● |
|
|
|
Detect Change Unambiguously |
● |
● |
|
|
Design Experiments as Equations |
● |
|
|
|
Enforce Unbiased Sampling |
● |
|
|
|
Apply Biological Rules |
● |
|
|
|
Convert 2D Data back to 3D |
● |
|
|
|
Standardize Data |
● |
|
|
|
Generate Biological Blueprints |
● |
|
|
|
In Vitro Data |
|
|
|
|
Concentration Data |
● |
● |
● |
|
Average Cell Data |
● |
● |
● |
|
Absolute Data |
● |
● |
● |
|
Cell Counts |
● |
● |
● |
|
Molecule Counts |
● |
● |
● |
|
Minimize Bias |
● |
|
|
|
Minimize Animal & Cell Variability |
● |
|
|
|
Detect Change Unambiguously |
● |
● |
● |
|
Design Experiments as Equations |
● |
|
|
|
Enforce Unbiased Sampling |
● |
|
|
|
Apply Biological Rules |
● |
|
|
|
Convert 2D Data back to 3D |
● |
|
|
|
Standardize Data |
● |
|
|
|
Generate Biological Blueprints |
● |
|
|
Consider three
striking issues. Point one. Biochemistry and molecular biology both
generate large amounts of in vitro data aimed at exploring the behavior
of molecules located in in vitro phenotypes. Here the problem becomes one of knowing when information gained
in an in vitro setting faithfully reflects that of the in vivo
setting. Point two. They both depend heavily on purification
techniques. Unless strict analytical
approaches are followed (de Duve, 1974), demonstrating unbiased sampling
becomes problematic. Point three. A given molecule may generate a troublesome
complexity by appearing at several different intracellular locations, in
several different cell types, at different times, and in variable amounts. In such cases, the structural location(s) of
the molecule become(s) the critical factor.
How can we solve this engineering problem for biochemistry and molecular biology? For the in vivo data of molecular biology, the solution may be quite simple. Use the Cavalieri method to get an unbiased estimate of the volume of the structure. Next, apply an unbiased sampling method to estimate the concentration (optical density) of an immunocytochemical label and then relate this concentration to the volume of the structure to get an absolute value (see Counting Molecules; Progress Report 2005). Such an approach will be far more successful in detecting a molecular change reliably than the risky practice of just comparing raw optical densities. For the remaining data types, the best current solution would seem to consist of forming data ratios – provided the reference variables cancel.
By obeying the laws
of nature, biology demonstrates the presence of a mathematical core. The challenge for us is to gain access to
that core. Why? It will allow us to solve problems currently
beyond our reach. How do we gain
access? Reverse engineering seems to be
the most direct approach. Here the
challenge is to create a large universal database from the biology literature
that we can use to extract both local and global information.
The Universal
Biology Database described in this report offers a new research
technology. It translates published
research data into equations that can become the building blocks of our understanding. These equations allow us to unravel
complexity and to pursue a well-defined strategy of reverse and forward
engineering. The connection repertoire
table, which summarizes the decimal repertoire equations, offers local and
global views of how biology is constructed by rule. It reveals – in considerable detail - the complexity of biology’s
structural blueprint and may provide the clues needed to capture the intrinsic
order of molecules and genes.
In biology, everything everywhere is connected by rule. In experimental biology, everything everywhere can be connected by rule. Knowing this becomes our ticket to the future.
References
Bertram, J. F.,
Young, R. J., Spencer, K., and I. Gordon. 2000 Quantitative analysis of the
developing rat kidney: absolute and relative volumes and growth curves. Anat Rec 258: 128-35.
Bolender, R. P.
2001a Enterprise Biology Software I. Research (2001) In: Enterprise Biology
Software, Version 1.0 ã 2001 Robert P.
Bolender
Bolender, R. P. 2002
Enterprise Biology Software III. Research (2002) In: Enterprise Biology
Software, Version 2.0 ã 2002 Robert P.
Bolender
Bolender, R. P.
2003 Enterprise Biology Software IV. Research (2003) In: Enterprise Biology
Software, Version 3.0 ã 2003 Robert P.
Bolender
Bolender, R. P.
2004 Enterprise Biology Software V. Research (2004) In: Enterprise Biology
Software, Version 4.0 ã 2004 Robert P.
Bolender
Bolender, R. P.
2005 Enterprise Biology Software VI. Research (2005) In: Enterprise Biology
Software, Version 5.0 ã 2005 Robert P.
Bolender
De Duve, C. 1974
Nobel Lecture: Exploring cells with a centrifuge. From Nobel Lectures, Physiology or Medicine 1971-1980, Editor Jan
Lindsten, World Publishing Co., Singapore, 1992.
Fluck, M., Dapp,
C., Schmutz, S., Wit, E., and H. Hoppeler. 2005 Transcriptional profiling of
tissue plasticity: role of shifts in gene expression and technical
limitations. Appl Physiol 99:
397-413.
Seecharan, D.J., Kulkarni, A.L., Lu, L., Rosen, G.D., and R.W. Williams. 2003 Genetic control of interconnected neuronal populations in the mouse primary visual system. Neurosci 23: 11178-88.