In: Enterprise Biology Software,
Version 4.0 © 2004 Robert P. Bolender
Enterprise Biology Software: V. Research (2004)
Robert P. Bolender
Summary
The Enterprise
Biology Software Project explores new approaches to discovery in the
life sciences by applying mathematics and technology to the biology
literature. This process includes (1)
standardizing the literature by moving research data from the pages of journals
into the tables of a relational database, (2) generating derived data libraries
from the database, (3) searching these new libraries for mathematical patterns,
and (4) distributing the databases, libraries, software, and observations
freely to contributing authors - on a CD.
The current release includes an updated stereology literature database,
new libraries (repertoire, analogy, drill-down, and ladder), a progress
report, and several new findings. In
short, the libraries continue to uncover widespread patterns of mathematical
order in biology. These patterns appear
not as properties of individual structures, but rather as connections between
structures. Connections, for example,
define repertoires of equations that form networks. In the report, we consider how these equations might help us
decipher the genetic regulatory networks.
Introduction
Background
The results of the Enterprise Biology Software Project
suggest that the mathematical core of biology is well hidden by several
factors, including the bias of our experimental methods, the fluctuation of
biological systems between equilibrium and nonequilibrium states, and the
confounding nature of change. Since the
success of the project depends on accessing this mathematical core, all these
issues had to be dealt with locally and globally - in both control and
experimental settings. This was
accomplished by storing published research data in a relational database and
then generating libraries of standardized data there from (Bolender,
2001-2004).
As the database
grew, it became apparent that these new libraries (data pairs; design codes)
could generate additional libraries consisting of equations. The equation libraries created a new
opportunity for exploring biology as a mathematical puzzle, one that could be
solved one step at a time by following the clues accompanying each new
library. In effect, we now know how to
translate the biology literature into collections of standardized
equations. The challenge for the reader
and for the writer as well is to find the clues and then use them to solve
additional pieces of the biology puzzle.
How do we find the
clues? Last year, you may recall that
the ladder equation library showed us that biological data expressed first as
data pairs and then as power equations could be summarized by a single
exponential equation. This suggested that
all the parts of biology might be ordered by a single rule. The major clue to come from that exercise
was that order could be found as power equations with r2 ≥ 0.999 by sorting the data
pairs as ratios (y/x) three or more rows at a time. In other words, we learned how to convert data pairs into
equations - quickly. The next clue came
from the observation that the rung equations grouped the data pairs at distinct
levels, resembling quantum steps. A
similar clue came from the design code library when it showed us that change
behaved quite unexpectedly - like a constant. Finally, the example of unfolding complexity by shifting our
perspective from a zero to a one-dimensional platform provided the catalyzing
clue.
Armed with so many
clues, the next piece of the puzzle was in easy reach. Let us begin with the fourth clue, the one
that recommended a shift in our perspective.
The promise here was that changing the platform for viewing data can
change what we see. Consider this. What would happen if instead of looking at
biology thorough our eyes we looked at the same biology through the eyes of
one of its parts? Although such a
question seems a bit odd at first, it turns out to be a quite good one because
it takes us to an interesting result.
If we assume for convenience - that any given part sees all those
parts to which it is connected mathematically, then we can share the
perspective of that part by simply writing the appropriate equations. The result becomes interesting to the larger
community if the parts found to be connected mathematically turn out to have
been produced by a similar genetic regulatory network. If true, then we have found a relatively
simple way of reverse engineering these otherwise elusive networks. In short, the perspective clue - supported
by the first three produced a new library (repertoire), one that may be
offering us our first glimpse of what organelles and cells may be
seeing. Along with our glimpse,
however, comes a view of biology that if confirmed suggests a level of
complexity for genetic regulation that may help to explain why genomes can do
so much more than just code for proteins.
The other new
libraries are more or less straightforward.
One uses equations to suggest a strategy for connecting larger
structures to molecules and genes (drill-down), whereas the other (analogy)
employs equations to hunt for similarities in the biology literature -
mathematically. Finally, a ladder
equation library was added for experimental data.
Progress
Currently,
the major activities of the Enterprise Biology Software Project
consist of entering data and generating new libraries. Research data submitted by authors were
added to the database and sent back as part of a technology package, as shown
below.
Start
![]()
Research Data (reprints contributed by authors)
![]()
![]()
![]()
Standardized Data (stored in a relational database)
![]()
Data Pair and Design Code Libraries (connected data sets; minimized bias)
![]()
Equation Libraries (mathematical patterns - local and global)
![]()
Enterprise Biology Software Package
![]()
Sent to Contributing Authors
First,
data were moved from research papers into a relational database and
standardized. For each paper, all the
data at a given hierarchical level of size (1 to 16) were then used to form
pairs of data, which juxtaposed control vs. control (data pairs) or control vs.
experimental (design codes). These two
basic data libraries were stored as database tables and used to generate the
equations that populate the derived data libraries. Writing equations consisted of filtering and sorting the data,
sending the results to an Excel worksheet, and plotting the x and y values as
power equations (y = bxa).
Finding these equations was simplified by sorting the ratio y/x from low
to high, and looking for an r2 = 0.999 or better with three or more
rows of data. The resulting graphs were
stored in Excel files and included with the software upgrade. The libraries old and new are listed in
Table 1.
Libraries: Libraries serve as
discovery platforms (Table 1). They
include one or more user interface screens, data, help files, and worked
examples (e.g., Excel worksheets; case studies). In effect, each new library helps to solve another piece of the
biology puzzle.
Table 1. Enterprise Biology Software Libraries.
|
Library |
Data |
Entries |
Applications |
|
Standardized Stereology Literature |
|
|
|
|
·
Citation
search |
original |
12,853 |
Find
references |
|
·
Citation
by paper contl |
original |
1,024 |
Print
paper contl data |
|
·
Citation
by paper contl + exptl |
original |
6,438 |
Print
paper contl + exptl data |
|
·
Methods
search SQL script |
original |
1,951 |
Find
papers by methods |
|
·
Control
Data |
original |
15,521 |
|
|
·
Experimental
Data |
original |
9,677 |
|
|
·
Contl
data by data point |
original |
12,164 |
Find data
by data point; level |
|
·
Contl+Exptl
data by data point |
original |
7,284 |
Find data
by data point; level |
|
·
Percentage
change data |
derived |
7,018 |
Find data
by change; level |
|
·
Phenotype
data |
original |
7,018 |
Find data
across 14 levels |
|
Connection Map |
|
|
|
|
·
Type
1 (2str/2+points/1level/1paper) |
derived |
182 |
Find
connections/minimize bias |
|
·
Type
2 (2+str/2+points/1level/1paper) |
derived |
81 |
Find
connections/minimize bias |
|
·
Type
3(2+str/2+points/1+levels/1paper) |
derived |
323 |
Find
connections/minimize bias |
|
·
Type
4 (data pairs) |
derived |
22,445 |
Find
connections/minimize bias |
|
Data Replicator |
|
|
|
|
·
One from
one (data from 1 paper) |
derived |
702 |
Predict
data |
|
·
Many
from one (data from 1+ papers) |
derived |
27 |
Predict
data |
|
Biological Algorithm |
|
|
|
|
·
Connections
upstream and down |
derived |
458 |
Predict
organs and organisms |
|
Data Pair |
|
|
|
|
Global
(data from 1+ papers) |
derived |
112 |
Find
connections/minimize bias |
|
Design Code |
|
|
|
|
·
Local
(data from 1 paper) |
derived |
2398 |
Identify
and predict change |
|
·
Global
(data from1+ papers) |
derived |
58 |
Identify and
predict change |
|
Ladder Equation (Data Pairs) |
|
|
|
|
·
Total
data pairs |
derived |
25 |
Generalize
structure in biology |
|
·
Organ |
derived |
19 |
Generalize
structure by organ |
|
·
Cell |
derived |
19 |
Generalize
structure by cell |
|
·
Organelle |
derived |
22 |
Generalize
structure by organelle |
|
|
|
|
|
|
New for 2004 |
|
|
|
|
Repertoire (Data Pairs) |
|
|
|
|
·
Organelles,
inclusions, and cells |
derived |
771 |
Find
connections; Make predictions |
|
Ladder Equation (Design Codes) |
|
|
|
|
·
Total
design codes |
derived |
25 |
Generalize
structure in biology |
|
Analogy (Design Codes) |
|
|
|
|
·
Selected
design codes |
derived |
140 |
Look for
similar changes |
|
Drill-Down (Design Codes) |
|
|
|
|
·
Selected
design codes |
derived |
183 |
Simplify
complexity |
Figure 1 indicates that the stereology literature database currently includes 60,000 data entries, of which more than half represent derived data. This community resource offers abundant opportunities for finding connections between and among the many parts that define biology.

Figure 1. Research data stored in the stereology literature database.
Results: The principle findings
of the project are listed below.
2001 to 2003
· Biological data can be transferred from
research papers to a relational database and standardized.
· The production database demonstrates the
feasibility of creating an electronic literature for the life sciences.
· A connection model for research biology
yields widespread mathematical patterns, whereas the traditional change model
does not.
· When stored in a database, published
research data serve as a key resource for producing derived data.
· Biological data are subject to an uncertainty
principle and therefore carry an unknown bias.
· Libraries can be designed that minimize bias
(data pairs, design codes).
· Structures in biology are connected by rule
(connection model).
· Algorithms can be written that generate
organs and organisms from a single seed value.
· Complex research data can be unfolded by
viewing data from a higher dimension.
· Relationships of structure to function can
be expressed mathematically.
· Change in biology can be generalized and predicted.
· More than twenty thousand connections
between structures in biology can be summarized by a single exponential
equation.
New for 2004
· The organization of biological parts can be
defined explicitly as repertoires of equations.
· The repertoire library views connections
from different structural perspectives by forming networks of equations.
· Experimental data can be summarized by a
single exponential equation, as shown earlier for control data.
· Mathematical analogies can encourage
serendipity.
· Drilling down into a data set can reveal the
presence of nested equations.
· Optimization may be a first principle of
biology.
Repertoire Library
(Data Pairs)
The repertoire
library offers views of the same published data from many different
perspectives by transforming tabular data into networks of equations. These networks, which show how structures
are connected mathematically, are displayed as collections of power
equations.
Organelles: If we imagine that each of the many parts
of biology sees its world from a unique perspective, then what might we learn
by sharing these different perspectives?
For example, what parts of a cell would be of the greatest interest to a
mitochondrion? One way of answering
such a question is to list all those parts to which a mitochondrion has a
mathematical connection. In other
words, we can define a mitochondrial perspective as family of power equations
(y = bxa) wherein mitochondria (expressed as a volume or surface)
would be the x variable:
y(organelle i) =
bx(mitochondria)a .
Generating these
equations is a simple task, using the literature database. Start with the data pairs table (included
with the current EBS upgrade), type <mito> into the x name field, press
Enter, click on the sort y radio button, and save the results as an Excel
file. In Excel, notice that all the
terms in the x name column begin with mito
, whereas the organelles in the y
name column carry different names sorted alphabetically. Start with Golgi and sort the x/y column
numerically (low to high). Next,
highlight the first three data pairs of Golgi and select a scatter graph. Change the x and y axes to logs and fit the
points with a power regression line. If
the r2 is greater than 0.999, add extra points (row by row) to the
graph until the r2 comes close to 0.999. If not, move the calculation box down one row at a time until the
r2 becomes greater than or equal to 0.999. The power equation that appears on the graph describes a
mathematical connection between mitochondria and Golgi. Repeat the process for all the organelles
and inclusions connected to mitochondria and the resulting set of equations
becomes the mitochondrial perspective.
(Note that for each graph, as shown in Figure 2, there is the
corresponding Excel worksheet.)

1)
BIOLOGYtabs 4.4.
2) Excel worksheet.
Figure 2. The repertoire library for mitochondria consists of a set of power equations. Calculations can be viewed by opening an Excel worksheet.
In turn, these
equations shown above as lines can be programmed as computed fields on a
work screen wherein a given value for a mitochondrion will generate the
expected values for all the connected structures.

Figure 3. The repertoire library can also display the equations illustrated as regression lines in Figure 2 as an interactive network of equations. Apply a change to mitochondria and see how the connected organelles respond.
The first thing we
can learn from this perspective is that mitochondria share connections with
several cytoplasmic organelles. Notice,
however, that mitochondria display multiple connections with the same organelle
each one being identified by a unique power equation. Similar patterns of order appear when
perspectives are generated for the remaining organelles and inclusions. In other words, the proportion of organelles
one to another typically appears as a collection of fixed
relationships. In effect, the library
includes a repertoire (catalogue) of mathematical connections for organelles
that can occur across biology. It tells
us that organelles are connected by rule and these rules can be captured by
equations.
The repertoire
library for organelles and inclusions contains 513 power equations typically
with r2s equal to or better than 0.999. If organelles maintain these stoichiometric-like proportions when
an equilibrium state is reached after a change, then we can expect that a
change in mitochondria will be accompanied by predictable changes in its
connected neighbors. However, each of
its neighbors be it an organelle or inclusion has its own set of connections. This means that an intricate web of
connections could therefore translate the change we typically observe in a
single organelle into a widespread effect.
In such a setting, change would enjoy an unexpected wealth of
complexity. Recall that complexity may
be natures way of producing new emergent properties (Bolender, 2001).
Notice that Figure
3 presents us with a challenging problem along with a new set of clues. If we imagine that the connections among the
downstream products of a genetic regulatory network can serve as quantitative
markers of that network, then how would we write a set of equations describing
the network for a specific cell in a specific setting? The solution to such a problem consists of
selecting equations from each of the organelle columns that can describe appropriately
a specific cell type in a specific setting.
In turn, this connected set of equations would be expected to predict
the relative proportions of organelles in the specific cell. Our reward for solving this problem might be
a compound mirror of equations with which to view earlier genetic activity
not unlike the way an astronomer looks at stars. The question, of course, is how do we select the appropriate
equation(s) from each of the organelle columns?
The answer may be
as simple as moving from one dimension to another. Consider the following experiment. Estimate the densities of the eight organelles in the specific
cell type and calculate the ratio y/x for each data pair to get the proportions
of the organelles (y(organelle) = bx(mitochondion)a). Finally, compare the new proportions to
those in Figure 3 and select the closest match. In effect, we can use Figure 3 as a lookup table. More importantly, perhaps, we now know how
to turn a zero dimensional data point (y/x) without connections into a one
dimensional power equation with connections.
In other words, by using the repertoire equations as a lookup table, we
can transform an information poor data point into an information rich line with
surprisingly little effort. Indeed,
such dimensional shifts may offer a host of new products and clues.
Cells: The repertoire library for cells identifies mathematical
connections between cells and as just described for organelles the power
equations display the step-like pattern.
However, the cell-to-cell relationships are complex. The proportion of cells can be identified
within and across species as (1) one cell to one cell or (2) one cell to many
cells. For example, the proportion of
pulmonary endothelial cells to type II cells in the mouse, goat and rat is the
same (one to one), but this same proportion is also shared between endothelial
cells and fat-storing, interstitial, Kupffer, macrophage, mesenchymal, and
glial cells (one to many). The
proportion is expressed as y = 0.2693x0.9973, where x = endothelial
cell and y = cell i. Such a result,
which persists for many combinations of cells, suggests that proportions of
cells are ordered by rule and that these rules are being conserved across species. Although the genetic mechanism responsible
for controlling cell proportions is unknown, at least we now know that such
proportions exist and that they can be quantified.
Global View: The
repertoire library was also used to look for general patterns of organization. To generate global views for organelles and
cells, the power equations were fitted to exponentials as described earlier for
ladder equations (Bolender, 2003). When
plotted as a group (without regression lines) they offer a striking view of
organelle connections (Figure 4). Each
blue point represents the y intercept of a power equation and each stack of
points a different organelle view. The
figure suggests that the cellular mechanism responsible for defining the
organellar composition of cells operates by discrete steps (quanta) and that
the underlying principles may be related, as suggested by the similar ranges
and slopes of the exponential stacks.
For those readers interested in exploring the organellar organization of
cells, two questions quickly fall into focus.
When and why do organelles locate at specific locations in the
stack?

Figure 4. When the y intercepts of power equations are fitted to exponential equations, a global pattern of order can be seen. Connections between organelles appear to be ordered by rule.
Perspective: The repertoire library offers a global view
of structural order by connecting local equations. It shows that different animals can produce remarkably similar
parts, but that the proportion of these parts one to another can be either
similar or different. The fact that
different animals can share similar connections and similar proportions of
parts, suggests that they might also be sharing a similar genetic blueprint or
simply following a design strategy that leads to a similar phenotype. This means that each set of equations
attached to a specific part reflects the general rules guiding the
establishment of such relationships. In
other words, the equations would seem to offer a broad overview of a
fundamental organizing principle of biology.
If this proves to be the case, then it would not seem unreasonable to assume that we can assemble sets of equations for specific animals. Such an accomplishment would be accompanied by a substantial improvement in our ability to predict a large number of parts from one or a few seed values. Recall that the current form of the repertoire library makes no attempt to connect data across hierarchical levels organelles and cells are treated separately. However, the only factor limiting such connections seems to be the amount of data available. In the future, large farms of equations spanning many hierarchical levels are likely to become commonplace.
Ladder Equation Library
(Design Codes)
Last year, ladder
equations were reported for the data pair library (control vs. control). By increasing the number of entries in the
design code library (control vs. experimental) to 2,400, a similar albeit
provisional estimate was also made for the experimental data.
If we start with
the 2,400 design codes in the literature database, form ratios (structure
y/structure x), sort the ratios (ascending), and collect sets of ratios that
give power curves with an r2 = 0.9999, we can generate a set of 23
equations describing the design codes.
Since the slopes (a) of these power curves also tended to be close to
one, the y intercept (b) of each equation served to identify a unit of order. In turn, when the y intercepts were plotted
as if they were rungs on a ladder a single exponential equation of the
form y = exa the ladder equation appeared. This means that we can summarize the
experimental data set of more than 2,400 entries with a single exponential
equation having a r2 = 0.9991:
y = 0.1194e0.1354x
,
where y is the y
intercept of the power equation and x the rung number.
Analogy Library (Design Codes)
In biology,
interpreting the results of an experiment often includes reasoning by
analogy. To wit, resemblances imply
similarities. The analogy library takes
this convention one step further by employing power equations as mathematical
analogies. For example, if we detect a
specific amount of change in a structure (e.g., mitochondria) and want to know
where a similar change has occurred elsewhere, the library can provide such
information.
Drill-Down Library (Design Codes)
Typically,
a given design code is part of a larger code and, at the same time, consists of
many smaller codes (Bolender, 2003).
Recall that the design codes extend across the hierarchy of size as a
set of nested equations. As such, they
willingly serve as mathematical pathways to and from the genome.
The
drill-down library illustrates that complex design code equations can be
simplified by expressing them as two or more simpler equations (illustrated as
before and after graphs). This
coherency of equations is a fortunate relationship because it allows us to
define an experimental process mathematically as the passage through a
connected set of equations. In the
drill-down library, the direction of information flow is from the organism to
the gene across the hierarchical levels defined by the relational database
model (Bolender, 2001b). Theoretically,
a drill-down library can be used to find the genetic origin(s) of a biological
part.
Enterprise Biology
Software (2004)
The Enterprise
Biology Software package for 2004 updates the stereology literature database
through 2003, adds the repertoire, analogy, drill down, and ladder
equation (design code) libraries, upgrades applications, and includes a
progress report. Details of the upgrade
can be found in the installation instructions (BIOLOGYtabs 2004).
Stereology
Literature Database
Database Update: This year, data taken from submitted
reprints were added to the literature database and design codes were harvested
from an additional 165 papers.
Libraries
Previous libraries
were updated to reflect the recently entered data and new libraries were
generated from the data pair and design code files.
Searching Libraries: The data pair library includes two columns
of control data, whereas the design code library includes one column each of
control and experimental data. The
major purpose of these libraries is to generate equation libraries. Recall that:
Data Pairs (control
vs. control)
· Detect a connection between two structures,
two functions, or a structure and a function.
· Detect connections among several structures,
functions, and structures and functions.
· Compare control data coming from one or
several papers.
· Generate equations for predicting structure
and function.
· Identify patterns.
· Define repertoires for organelles and cells
with equations.
Design Codes
(control vs. experimental)
· Detect change quantitatively and qualitatively
as connected sets.
· Identify patterns of change.
· Generate equations for predicting changes in
structure and function.
· Use equations to search for analogies.
· Offer a drill-down approach for identifying nested
equations.
To simplify their
use, both data pair and design code libraries share a similar interfaces and
methods for generating equations (Figure 5).

Figure 5. Data of the stereology database are selected from tables and sent to Excel worksheets where they can be fitted to regression equations.
Data Pair Library
Types: The data
pair library (BIOLOGYtabs 2004; 4.2-4.6) includes sets of equations
calculated as regression curves derived from the data pair table (Figure
5). An equation defines a quantitative
connection between two or more parts.
This library includes collections of data and equations.
Properties and Rules: See
Progress Report 2003 (Bolender, 2003).
Applications: Tab 4 of BIOLOGYtabs 2004 displays the data
pairs as a table of X and Y values (4.1), a collection of related structures
(4.1), a collection of ladder equations (4.2), a repertoire of connections for
organelles and cells (4.3), and networks of equations link (4.4, 4.5)
· 4.1: Data Pair Library Find Equations:
Click on the picture button to display the data pairs table (Table
6). Use the table to select structures
of interest and then export them to an Excel file for analysis.

1) Click on the picture
button. 2) Select data
pairs. 3) Analyze data in Excel.
Figure 6. Finding equations.
· 4.2: Data Pair Library: The
collection illustrates with equations pairs of structures connected by rule
(Table 7). Notice that mathematical
order can be maintained within and across both structures and species at one
or more levels. In effect, the data
pair library helps to explain the organizational complexity of biology in terms
of equations. Excel worksheets, which
show how the equations were generated, are readily available for viewing.
![]()

1) View equations. 2) View calculations.
Figure 7. Viewing data pair equations.
· 4.3: Ladder Equation Library: The
library shows that biological data can be ordered locally (power equations) and
globally (exponential equations) and that the order occurs in distinct steps
(quanta). The steps can be seen as
parallel sets of equations and as steps in histograms (Figure 8). The library includes ladder equations for
the total collection of data pairs and for subsets thereof (organ, cell, and
organelle).

1) Parallel power equations. 2) Steps
Figure 8. Rung equations appear as sets of parallel
curves and as steps in histograms.
4.3: Repertoire Library
Equations: The library illustrates the types of
connections being used by organelles and cells. In effect, it can show with equations the what, where, or
when an animal genome makes structures.
It also illustrates with before and after graphs the process of
extracting order (as equations) from otherwise noisy data (Figure 9).

Figure 9. Repertoire equations.
Repertoire equations
are displayed in both local and global settings. Local equations offer a detailed picture of connections between
specific data pairs, whereas the global equations summarize many local
equations with a single exponential equation.
4.5, 4.6: Repertoire Library Equations (expressed as a network): If we consider each local repertoire equation to be a piece of a biological puzzle, then connecting these pieces mathematically gives us a networked view. Expressing the equations of 4.3 as columns of computed fields provides a repertoire of networked equations. In the nuclear repertoire (Figure 10; right), notice the white data entry field at the left. When a value for the nucleus is entered, all the numbers in the organelle columns at the right change by rule (as defined by the repertoire equations).

1) Select the
repertoire folder. 2) View the
nuclear repertoire.
Figure 10. Repertoire equations as networks.
Notice that for
each nuclear value, there are 8 values for cytoplasm, 8 for dense bodies, 6 for
endoplasmic reticulum, 11 for Golgi, 10 for lipid droplets, 18 for
mitochondria, and 6 for peroxisomes. In
other words, change the volume of the nucleus and all the cytoplasmic
structures with a mathematical connection to the nucleus change their volumes
by rule. This tells us that our ability
to design a prediction model for a specific cell type in a particular setting
depends importantly on knowing how to pick the appropriate equation(s) from
each organelle column.
Design Code
Library
Types: The design
code library (BIOLOGYtabs 2004; 5.2-5.6) compares control to
experimental data with power equations.
The library includes six collections.
Properties and Rules: See
Progress Report 2003 (Bolender, 2003).
Applications: Tab 5 of BIOLOGYtabs 2004 displays the
design codes as a table of x and y values (5.1), a collection of simple code
equations (5.2), a collection of complex codes (5.3), ladder equations (5.4),
analogy equations (5.5) and drill-down equations (5.6).
· 5.1: Design Code Library Find Equations:
Click on the picture button to display the design code table (Figure
11). Use it to select structures of
interest and then export them to an Excel file for analysis. Data previously used to generate an equation
can be viewed by opening its Excel worksheet.

1)
Click on picture button. 2) Select structures. 3) Find patterns
in Excel.
Figure 11. Design code library Find equations. Select data > send to worksheet > make
calculations.
· 5.2: Design Code Library (Simple): The
library of equations is grouped according to control, experimental,
development, aging, disease, and relationships of structure to function (Figure
12). The design code equations were
calculated using data from a single research paper.

1) Select view 2) View worksheet.
Figure 12. Design code library Simple.
· 5.3: Design Code Library (Complex
Equations): A complex design code combines the data of
several simple design code equations (Figure 13). It characterizes change by structure and by event. Recall that structural data (V, S, L, N) can
be used to identify both qualitative and quantitative changes,
whereas density (Vv, Sv, Lv, Nv) and mean (mV, mS, mL) data can detect qualitative
changes.

1) Global curve. 2) Local curve.
Figure 13. Design code library Complex.
Types: The ladder
equation library (BIOLOGYtabs 5.4) includes a collection of ladder
(exponential) and rung (power) equations that together summarize data for total
and selected sets. The summary takes
the form of a single exponential equation: y = exa.
Properties: The
ladder equation summarizes all the experimental connections in the library
database (Figure 14) expressed as design codes (Figure 12) with the single
expression:
y = 0.1194e0.1354x,
where y equals the y intercept of the power (rung) equations and x the number of the rung (e.g., 1 to 24).

1) Ladder equation 2) Rung equation
Figure 14. Ladder equation library for design codes.
For a discussion of
ladder equations, see Bolender (2003).
5.5: Design Code Library (Analogy Equations): The library includes a collection of equations that plot experimental data for the same structures coming from different experimental settings and animals (Figure 15).

1)
Select equation. 2) Identify
structures in worksheet.
Figure 15. Analogy equations find structures that
change similarly.
Types: The analogy
library (BIOLOGYtabs 5.5) was derived from the design code library and
includes one collection.
Properties:
Analogy equations display several properties.
Application: The
library continues to reinforce the view that the composition of cells and the
changes therein are tightly controlled by the genome and that this control
often extends across the species boundary.
5.6: Design Code Library (Drill-Down Equations): The library includes a collection of equations that identify patterns in complex data and contribute to a strategy for tracking changes back to the genome (Figure 16).

1)
Select equation. 2) Identify structures in worksheet.
Figure 16. The drill-down library finds order in
research data.
Types: The drill-down
library (BIOLOGYtabs 5.6) was derived from the design code library and
includes one collection.
Properties: The
drill-down library illustrates the properties of nested equations.
Application: The
library uses a before and after format to illustrate how a complex data set can
be simplified by resolving it into two or more power equations, each with a r2
≥ 0.999. If these equations
reflect the underlying order of a cellular response, then they suggest that the
changes we typically report may in fact be a composite of several nested
changes. To wit, order in biology can
be defined quantitatively as equations embedded in equations.
Turning Papers
into Equations
The Enterprise
Biology Software Project turns reprints into equations and equations into discovery
platforms. Such an outcome owes its
success to the stereology literature, which is an ideal place to hunt for order
in biology. Although such hunting
requires assistance from mathematics and technology, these research tools can
be readily captured in software and distributed freely to contributing
authors.
The equations tell
us that mathematical order in biology can be found in the connections between
structures and that the connections define networks. This universal connectivity creates robust pathways for both
discovery and prediction. Moreover, the
connections support the dimensional shifts that provide access to otherwise
inaccessible information.
Equations ease the
discovery process enormously because they can simplify complexity. Consider, for example, the repertoire
equations in BIOLOGYtabs (4.4-4.6).
They demonstrate that the relationship of one structure to another is
complex, but that the complexity can be explained by a set of equations. Once a pattern of connections is found,
hunting for the next level of complexity consists merely of generating yet
another equation library. One result
leads naturally to the next. What could
be simpler?
Biological
Stereology and the Systems Biology Revolution
The Systems
Biology Revolution promises to translate the results of the genome
sequencing projects into an understanding of how genes produce and maintain the
all the many structures and functions that are biology. This includes the daunting task of
unraveling the complexities of the genetic regulatory networks. Since the reach of these networks extends
across the biological hierarchy of size, they can be studied by starting at
either end or somewhere in between with the option of proceeding upstream or
down. Success in mapping and explaining
these networks will no doubt require massive amounts of highly reliable data
supported by robust connections.
In figuring out
these regulatory networks, we can imagine two likely directions for research
upstream and down. One seems easy the
other hard. When moving upstream from
the gene products to the genes, all the organizational information is already
in place and just waiting to be discovered.
Moreover, moving upstream is equivalent to going from a complex setting
to a simpler one, a task easily handled by equations (e.g., the repertoire and
drill-down libraries). Moving
downstream, however, is more challenging because each item added to the network
must be fitted painstakingly into the scheme qualitatively and quantitatively
one step at a time with the appropriate authentication. In short, it is often easier to take
something apart than to put something together especially in the absence of a
blueprint.
Biological
stereology can contribute importantly to systems biology. It generates critical data sets routinely,
moves data throughout the structural hierarchy with ease, identifies specific
structures at specific locations in intact animals with light and electron microscopy,
and regularly leverages its relational database of published research.
Is there a ready
link between stereology and molecular biology?
Yes, indeed. Since the data of
molecular biology can be expressed as ratios, we have a direct match with the data
pairs and design codes described herein.
How might this be helpful? If,
for example, we combine the ratio data of the stereology database with those of
microarray analyses in an Excel worksheet, then specific genes or gene
products could be run with the stereological data to hunt for regression
curves with r2 ³ 0.999. Finding such curves might point to
associations between downstream structures and their upstream genes or gene
products. Such an approach may prove
especially helpful because the products, locations, and functions of so many
genes remain a mystery. Since the
results of many microarray experiments are already being published in
catalogues on the Internet, large amounts of gene expression data are freely
available to stereologists. An
introduction to these new microarray methods may be of interest to the reader (www.ncbi.nlm.nih.gov/About/primer/microarrays.html
.
As the systems biology
revolution takes root, we will follow with great interest the strategies
developed for dealing with the requirements of a critical data set. On close inspection, it appears that many
papers in molecular biology rely exclusively on density data for detecting
differences and changes. Such a narrow
approach may hinder even the most determined efforts at unraveling
complexity. Piggybacking on the
strengths of stereology may therefore offer a welcome and profitable
solution. Such a strategy makes sense. Stereology offers a mathematical
infrastructure capable of moving data across the biological hierarchy of
size. Combine microarray data with
those of stereology and the mathematical connections extend all the way from
genes to organisms and back.
Unfolding
Complexity with the Equation Libraries
Since the products
of genetic regulatory networks include the downstream expression of molecules,
organelles, and cells, working our way upstream from these structures should
take us back to the origins of these networks.
In effect, quantifying the end products of the genetic regulatory
networks at different hierarchical levels of expression becomes an exercise in
unfolding complexity. This unfolding
process might also tell us something about the rules that are in play.
Within the framework of a connection model, complexity in biology can be unfolded using two simple rules. Consider the structures, A and B (Figure 17). They are connected by rule the proportion rule. However, the proportion rule can vary according the amount of each structure A and B, according to the absolute amount rule.

Figure 17. Rules for detecting differences and changes.
The ratio of the
two structures (A/B) defines the connection of a data pair or a design
code. More or less of structures A and
or B will change the proportion rule, except when both A and B change equally. In this case, A/B (green) = A/B (blue) = A/B
(red), as shown in Figure 18. Recall
that data related to a structure, to an average structure, and to a unit of
reference volume (a density) can be used to calculate the ratio of two
structures. However, when only density
data are available for this calculation, only part of the story will be
known. The contribution of the absolute
amount rule will remain a mystery (see Table 2).

Figure 18. Rules for detecting differences and changes. In this example, the proportions are the same, even when the absolute amounts are different.
This means that
density data alone cannot detect the absolute amounts of the structures. Since genetic regulatory networks control
both the proportions of structures and their absolute amounts, the complexity
of a connection between two structures can be reduced to two rule-based
events. The proportion rule can
be detected with just density data, but the absolute amount rule
requires the information of a critical data set. Both are equally important for explaining a connection. Before leaving this figure, there is an
additional point worth mentioning.
Recall that the majority of the differences and changes detected with
the design code equations show experimental curves parallel to the reference
curves (BIOLOGYtabs 5.0). This means
that the experimental structures fitted to the regression line reacted
uniformly as a group, as illustrated by the red and blue structures in Figure
18. In such a case, the proportion
rule would indicate no difference or change, whereas the absolute amount
rule would tell a very different story.
In other words, density data alone cannot detect an increase or decrease
when the curves are parallel.
If we start with two structures A and B and introduce a difference (more or less), then we can see that many outcomes exist.
A D

B E

C F

Figure 19. Detecting differences and changes is a
function of two rules.
In turn, we can
summarize Figures 18 and 19 with Table 2 by expressing the connections as
magnitudes (absolute amounts) and directions (proportions =1, >1,
<1). Note that B = blue (less), G =
green (same), and R = red (more). The
table demonstrates that the nine absolute events produce three relative
outcomes.
Table2. Detecting differences and changes using a rule-based system (see Figures 18, 19).

Taken together, the
figures (18, 19) and the corresponding data table (Table 2) summarize the
complexity of a connection between two structures - A and B. This same pattern applies to each pair of
structures at every level of the biological hierarchy of size. Observe, however, that the complex
relationship of structure A to structure B can only hint at the real-world
complexity wherein many more than two structures are connected (Figure 3).
The point of the example is to show that a difference or change in a connection can be produced by more than one event. For example, no change in structure A and a decrease in structure B produce a relative increase (Figure 19 A). Moreover, notice in Table 2 that a relative increase can be explained by three very different genetic events. This means that densities by themselves cannot detect a gene based event reliably because such an event involves the expression of two rules not one. Examples of these rules actually being expressed can be found throughout the derived data libraries.
Progress as a
Progression of Models
One way of
assessing the progress of the Enterprise Biology Software Project is to
review how the stereology literature database can be analyzed by applying
different data models. Notice how each
model attempts to deal with complexity and how the solutions help to define the
subsequent model.
Access Model: Information stored at many different
locations (journals) was transferred to a single location a relational
database. In turn, the database and
supporting software were distributed freely to the user community.
Change Model: Research data can be can be expressed as a percentage
change by dividing an experimental value by its control and then multiplying by
100%.
Connection
Model: The connection model
relies on the quantitative relationship of two structures, which includes data
pairs (for control data) and design codes (for experimental data).
Network Model: The network model connects the equations of
the connection model into local and global networks.
A First Principles
Approach
A first principle
can be defined as a law upon which others are founded or from which others are
derived. It is a general truth,
comprehending many subordinate truths, and not deductible from others.
How do we hunt for first principles in biology? Consider the strategy of the ladder equations wherein all the connections between structures were generalized with a single equation one for control data and another for experimental. However, finding two equations instead of one suggested that the equations must be subordinate to a larger truth. Notice what happens when we plot the two equations together (Figure 20). The curves intersect and bear a striking resemblance to the solution of a linear programming problem. Is this telling us that the structure of living things is built on a mathematical platform designed to produce the best results? If the answer is yes, then we may have a candidate for a first principle (Walthrop, 1992).

Figure 20. Is structural organization a best solution?
Figure 20 presents
us with a new challenge. Starting with
these two equations, can we reverse the summarizing process and tease out the
details of structural order - across animals and experimental settings? In other words, can we write networks of
repertoire equations capable of predicting the structure of specific animals in
specific settings?
Such an exercise may one day become a practical necessity. We now know from genome sequences that organisms can have remarkably similar genotypes, but amazingly different phenotypes. If we begin with such a similar genetic blueprint, why do we end up so different? If, for example, we share 97% of our genes with the chimpanzee, does this mean that only 3% of our genome is uniquely human? If we dont have species-specific genes, what do we have? If, instead, we discover that speciation depends importantly on the types of connections that form between structures, should we be looking for structural compounds with unique properties? If such compounds exist, can we expect to find stoichiometric-like rules operating at and across all levels of organization?
Concluding
Comments
We now have a
better understanding of why published data in their original form fail to yield
widespread patterns of mathematical order in biology. Biological data carry an unknown bias and the same structural
part can assume many different values in different settings. A principal observation of the Enterprise
Biology Software Project is that the mathematical organization of
biology occurs in the connections between the structural parts and that these
connections can be captured as libraries of equations. Indeed, the equations reveal a pervasive
mathematical order both within and across species. By embracing structures of all sizes from all animals, the
relational database model based on the principles of stereology - can provide
the local and global views of biology fundamental to the task of unraveling
complexity. Although the magnitude of
biological complexity remains undetermined, we now know how to unfold
complexity and to explore it with equations.
In short, we are learning how to explore biology as a mathematical
puzzle.
References
Bolender, R. P.
2001a Enterprise Biology Software I. Research (2001) In: Enterprise Biology
Software , Version 1.0 γ 2001 Robert P.
Bolender
Bolender, R. P.
2001b Enterprise Biology Software II. Education (2001) In: Enterprise Biology
Software , Version 1.0 γ 2001 Robert P.
Bolender
Bolender, R. P.
2002 Enterprise Biology Software III. Research (2002) In: Enterprise Biology
Software , Version 2.0 γ 2002 Robert P.
Bolender
Bolender, R. P.
2003 Enterprise Biology Software IV. Research (2003) In: Enterprise Biology
Software , Version 3.0 γ 2002 Robert P.
Bolender
Walthrop, M. M. Complexity. 1992 Simon & Schuster, New York.