This article describes DynaCRUX, an extension of the basic CalAST Report Generator software that adds many new features. DynaCRUX stands for Dynamically Create Reports Using XML. The original CalAST report generator design is described here.
While report generators are common in desktop systems running Windows and MacOS, similar software for Linux is less available and generally less complete. The purpose of the new design is to provide a platform and browser-independent reporting facility for use in database-driven web applications. While not as full-featured as commercially available software, DynaCRUX provides a good deal of functionality that web application designers might like to provide to their customers and management team. For many such applications, DynaCRUX is sufficient, and it can be easily extended to add new capabilities.
DynaCRUX is a work in progress, but is rapidly becoming operational. This document is a "white paper" that given a somewhat technical overview of the software. Alternatively, you may learn more about it by reading the DynaCRUX User Guide, which takes you through an example of building a report and all its support files.
The obvious next question is "What does this software do?" Given a database-driven web application, it is useful for the users of such an application to easily pull reports from the database that extract information in a reasonably understandable form. These can be management reports, activity reports, financial reports, or virtually anything that a user of the database might want to know. For example, a company might allow their customers to generate their own reports, freeing up in-house personnel for other tasks.
In general, the following capabilities are desired.
The original report generator software addressed many of these needs, but required a great deal of synchronization. It also had some performance issues due to the nature of the implementation, which sacrificed speed for flexibility. It had no run-time control capability.
The synchronization issue is important, and is best understood in terms of how the software works. Two programs are required. On the client, a GUI (Graphical User Interface) is used to specify which fields are of interest, and which records to retrieve. On the server, a separate program interprets the request and does all the data retrieval. But the fields available always have to be the same in both sides of the program. Adding a new field to the database (and, implicitly, to reports) requires modification of the client side to specify a new item that could be used in a report definition, and the server code has to be taught how to retrieve that data.
While this will always be an issue (one cannot ask for non-existent table fields), DynaCRUX uses a template to drive both sides of the program, and thus to keep both programs "in sync." The template can be retrieved by the client-side GUI code to build a suitable user interface, and the server can use it to figure out how the data is to be retrieved. The use of templates is one of the major improvements in DynaCRUX, as it greatly reduces the amount of code that needs to be updated as the data changes.
To understand how DynaCRUX works, it is helpful to consider how a report is run. We will assume that a report is already defined and stored in the database. Defining and editing of reports is a separate issue, but almost any XML editor could be used. (A template-driven user interface is more useful, of course.) In general, the steps are as follows. Remember that the client side runs in a browser.
As can be seen from the above explanation of how DynaCRUX works, there are really three separate parts possible in a report definition. The controls section is used to generate controls in the GUI so the user can provide run-time parameter values. The retrieval section specifies what fields are to be used for this report. This includes not only what field values are in each line of the report, but also grouping and sorting considerations. The presentation section specifies how data is to be formatted. The contents of this section are closely tied to HTML DIVs and to CSS classes, or use XSL for formatting.
These three section correspond to three phases of operation - parameter specification, data retrieval, and report presentation.
As a general rule, controls will result in the generation of GUI elements that are stored inside a form presented to the user during the first phase. The form is then posted (possibly using Ajax techniques) to the server to pass the user-supplied values along with the retrieval specification. Suppose you wanted to retrieve a list of wines (my favorite example) from only one country, but you wanted to specify which country at run time. You could include a radio button group control in your report specification, with buttons for France, Italy, USA, Australia, etc. This would then be displayed to the user at run time, and you would make sure one was selected (or specify a default). The selected value would then be passed to the server during the data retrieval phase.
In the data retrieval phase, the "middle section" of the report definition is used by the server to build SQL. The SQL query is then executed to retrieve results from the database. The results are then formatted as XML and returned to the client for presentation.
In the last phase, you may use a standard algorithm of your own choosing to display the results as a table to the user. Or, you may specify where the results go as part of the report definition. In such a case, client-side code can direct the results to an appropriate place on the screen and in a specified style.
One of the more difficult ideas in DynaCRUX and its predecessor seems to be the difference between grouping and sorting. In fact, most of the difference between grouping and sorting is in the presentation phase. However, it will always be the case that grouping takes precedence over sorting, and becomes the "primary sort" with sorting (or ordering) coming later. We repeat here the wine list example to clarify things.

Here we have a "restaurant-style" wine list. The varietals (Merlot, Pinot Noir, etc.) are used as groups. Not only are they sorted first, but note that the name of each varietal appears only once as a heading. In contrast, the vintage year is used in sorting but is not used for grouping. The name of the vintner is also a sort term. So if you look at the Pinot Noir part of the table, you will see that the years are in order, and three of the wines are from 2004. The vintage year is repeated - every line has the vintage year displayed. Within each vintage year, the vintner names appear in alphabetic order, so this was also part of a sort. And, as with vintage year, they appear in every line of the table. The repetition of the information is a clue that this was an "order-by" term, rather than a "group-by" term.
Again, this is primarily a presentation issue, although it does imply that groups are the first sort terms. In technical terms, the generated SQL would include something like ORDER BY Varietal,Vintage,Vintner.
Ideally, a GUI is used to define the report, as it can be easily constrained to only those fields that are meaningful to retrieve. A field is not restricted to something that is actually in the database, as you sometimes will want to retrieve something that does not have a specific table and column name, or you might wish to retrieve a named group of fields. The template definition understands how to address this problem, and we will discuss this further in a later section.
Here, we just wish to present a possible GUI and explain how it could be used to specify a report. Consider the following form.

This is an "empty" report definition. It specifies the "middle" portion of the report request, ignoring controls and presentation. It does illustrate how report "objects" can be used to build a report and save it.
The left hand column lists all the objects that are defined by the template. Each item in the template has a "GUI name" that is used to represent the object to the user. This may be different from the name used in the database, and is likely to be different from the name used in data transmission. These last be must be XML-legal tags, and so may not contain spaces, start with numbers, etc.
If a named filter was associated with this report, it would be displayed in the second column.
The third column contains buttons that move objects from one column to another. Buttons whose labels include ">>" move things to the right in the display, while buttons whose labels contain "<<" move things to the left. To move an object from the exclude list to the Group By list, simply click on it, then click on the "Group >>" button to move it. This particular GUI allows multiple selection, so you can move more than one object at a time. Note, however, that only two levels of grouping are permitted. (It gets quite confusing if this is not the case.) There are no such constraints on Sort By or Include.
If you make a mistake, you can select any item from a right hand column and move it back to the Exclude list.
Note that this GUI allows you to save a report by name, but does not have any mechanism for specifying controls, presentation, or filters. In fact, this is really a GUI from the older report generator, although it would be useful (albeit limited) in the DynaCRUX environment. The important idea here is just to understand how such a user interface could look, and to understand that all the objects are derived from the template.
It should also be noted that additional controls are often added to the GUI. These would be design-time controls because their values are determined at the time the report is designed. Every run would use the same values that are built in to the report definition. In contrast, run-time controls are specified as part of the report, but the values are not determined until the user runs the report.
A good example of run-time controls would be specification of a date range for an activity report run once a month on some previous month's data. A design-time control might be something like a specification on a wine list that this report shows only wines currently in stock. Both types become part of a WHERE clause in the generated query; the only difference is when the value of the control is collected. A GUI could easily specify both types of control. It would have to make a distinction between them.
It is now time to look at the XML definition of a report. Since XML always requires one root node, we will always have a <report> node as the outer value. Just as HTML has <head> and <body> sections, the report node has children, in this case representing the three phases: <controls>, <retrieval>, and <presentation>.
The controls section contains one or more <control> specifications, which includes the type and class of control to be displayed. These are specified as attributes of the control node (see below). There are two classes - define and run. A define control is part of the report itself, and does not show on the display at run time. It can be edited as part of the report, and its value is set at that time. A run control is part of the display at run time, since its value must be acquired at that point.
Depending on the type of the control, there will be additional nodes that define how the control is to built and what values are allowed. These should always include a <name> node, but the structure will look different for different controls. Consider the following definition:
<controls>
<control
type="radio" class="run">
<label>Choose a Language: </label>
<name>Language</name>
<button>French</button>
<button>Italian</button>
<button
default="true">Other</button>
</control>
</controls>
We define a single control, representing a radio button group. In this case the GUI should display a label ("Choose a Language:") followed by three buttons, labeled French, Italian, and Other. The other button will be selected when the group is displayed. Since this is expected to be part of a named <form> node in the web page, the form itself can be passed to the server, which will receive the information that Language (the database column name) was set to Other (unless the user changed it). The server side code will need to be written in such a way that it can interpret this information properly in building a WHERE clause.
The <retrieval> section of a report contains information about ordering and inclusion. It also may contain information about design-time controls and filters. Here is the definition of a report that uses a named filter, which must be understood by the server-side code.
<retrieval>
<filter>
<name>openinvoices</name>
</filter>
<group></group>
<order>
<item>Order Date</item>
</order>
<include>
<item>Customer</item>
<item>Purchase Order
No</item>
<item>Caller Name</item>
<item>Caller Phone</item>
<item>Caller
Email</item>
<item>Invoice Number</item>
<item>Invoice
Total</item>
</include>
</retrieval>
This is sufficiently complex that a bit of explanation is required. First, the system has knowledge of a named filter called "openinvoice" that selects only invoices that have not yet been paid. The server-side code transforms this into a suitable part of the WHERE clause at run time (e.g., WHERE Paid IS NULL). Note that we only allow named filters, because allowing the user to specify raw SQL would be a huge security risk.
For sorting purposes, we note that there is no grouping in this report, but we do sort by order date. This is specified as an <item> node under the <order> subsection.
The <include> list contains a number of <item> nodes specify data objects to be retrieved from the database for each member of the result set.
Please note that this returns a result set in XML, where each row contains all the information specified by the <group>, <order>, and <include> terms. It is also possible (and sometimes quite useful) to return one or more values that correspond to the report itself, such as the number of items found. To achieve this, we use named <query> nodes. Again, we must prevent SQL injection attacks, so we specifically disallow the inclusion of SQL in the report specification itself. However, we can use a named object that the server side code understands, just as with filters. So we could add a <query> node with value resultSetSize, if that was a name known to the server-side code. This value would then be returned with the result set so we could display it as part of the report presentation.
How this specification can be implemented through SQL generation on the server side presents some choices. In the older software, it was done in two phases. The first phase generated an ordered list of objects, each of which was a unique identifier (the primary key) of a specific table. A separate XML string was built from the report definition. It specified a base table and key, plus the <group> and <order> specification. The base table and key name were used to build a SELECT statement, the <group> and <order> sections were used to build an ORDER BY clause. Any design-time controls were used for filtering as part of a WHERE clause. (Run-time controls and filters were not available.)
In the second phase, software "walked the list" of objects returned in the first phase and retrieved all the data values specified by <group>,<order>, and <include> groups for each object. This worked well, but was understandably slow. So in DynaCRUX, it was decided to move this work from the client to the server, and eliminate much of the traffic.
We generally recommend simply passing the entire <retrieval> section to the server and let it implement any algorithm that returns a correct answer as a result set, preferably one that is efficient. It is clearly possible to build a somewhat complex SQL query that will retrieve all this data in one pass. That is usually the way that commercial report generators work. We supply example code that implements all the concepts described here so you can see one possible solution.
Much of the power and maintainability of DynaCRUX comes from the fact that it avoids hard-coded information wherever possible, and uses templates to build both client and server-side code. Of course, the template primarily resides on the server. However, using Ajax techniques, it is possible to retrieve the template at run time and use it to build a GUI that only presents choices that are meaningful. Since the server-side code includes the template as well, it is highly likely that synchronization can be maintained between client and server without a great deal of programming time. In most cases, the system can then be extended by simply making additions to the template.
You should look at the template as exporting a working definition of the database for reporting purposes. You don't need to describe everything in the database, but you do have to provide enough information to allow DynaCRUX to generate a proper query for data retrieval purposes.
What does the template describe?
Clearly, every database is different, but DynaCRUX attempts to provide a language that can describe fairly complex relationships within a database. The Database Template is an XML document that provides such a description. Note that there is only one template document per database, but an arbitrary number of XML report documents are possible. This is why these are often stored in the database itself.
Some of the challenges that we faced in designing the template is that a given data object may or may not have an actual representation in the database, and that the naming rules for GUI, data transmission, and database column names may all be different. Thus, the template system must allow for all these possibilities and more.
We will start with the retrieval objects themselves.
Each object that can be specified in the retrieval section has a number of attributes specified in the template. But is is easier to understand them if we first explain the types of these objects that are allowed.
Here, then, are the attributes. Any attribute not required is simply omitted. It is not necessary to specify an empty node.
So for example, we might have:
<object>
<displayname>Order Date</displayname>
<queryname>OrderDate</displayname>
<xmlname>order_date</xmlname>
<type>element</type>
<table>Orders</table>
<format>date</format>
</object>
If the user chose to make this part of a report, then this would result in generated SQL containing something like:
SELECT ...Orders.OrderDate... FROM ...Orders...WHERE ...
Another interesting possibility is that you might wish your reports to contain all the columns in a table as a group, or some set that are always treated as a group. For example, in our wine database, we might have a table of purchases which specify the date, source, price, and number of bottles purchased. (This would also refer to a wine on the master wine list.) If you wanted to restrict the user to always retrieving this information as a group, then this can be specified in XML as follows.
<object>
<displayname>Purchase Info</displayname>
<xmlname>purchaseinfo</xmlname>
<type>group</type>
<table>purchases</table>
</object>
<object>
<displayname>Date</displayname>
<queryname>PurchaseDate</queryname>
<xmlname>purchasedate</xmlname>
<parent>Purchase Info</parent>
</object>
<object>
<displayname>Source</displayname>
<queryname>Source</queryname>
<xmlname>source</xmlname>
<parent>Purchase Info</parent>
</object>
<object>
<displayname>Price</displayname>
<queryname>BottlePrice</queryname>
<xmlname>price</xmlname>
<parent>Purchase Info</parent>
</object>
<object>
<displayname>Quantity</displayname>
<queryname>Quantity</queryname>
<xmlname>quantity</xmlname>
<parent>Purchase Info</parent>
</object>
Requesting a Purchase Info object will retrieve all four objects that have Purchase Info as a parent. Note that you cannot retrieve the Quantity object directly in the above example, since it contains no table information. If a complete specification is provided, then you could also retrieve it separately from its group. But "child" objects in a group don't need a table specification, since they must all come from the table specified in the parent.
Of course, most objects you will wish to use in a report are simple, but it is often the case that more than one table will be involved. In such a case, the server code must also understand the relationship between the tables. The server-side code must note which tables are used in the retrieval request, and build a FROM clause with JOINs dynamically in response. This is necessary to allow you to retrieve information from tables other than the primary one, but only where fields match.
Table relationships are managed in the tables section of the template. Going back to our wine list example, we wish to have a base table that relates to other tables in the database. But the wine list table is the "heart of the system" in some ways. Not all databases are designed this way, but it is useful to specify such a base table. Here is the definition:
<table>
<name>winelist</name>
<key>WineListID</key>
<references foreignkey="VarietalID">varietals</references>
<references foreignkey="VintnerID">vintners</references>
<references foreignkey="LocationID">locations</references>
<references foreignkey="RegionID">regions</references>
</table>
The template definition of the table specifies its database name, the name of the primary key if present, and notes that four columns are foreign keys that reference other tables. The tables that they reference are the values of the references node. Attributes are used to specify the column name in the winelist table that contains the foreign key. Note that we do not need to specify the names of the primary keys of the other tables in this example, since by default it assumes the primary key is used. Also, we don't specify the names of all the columns in the winelist table. Any columns we wish to use in reports are contained in objects.
There are many more types of relationships possible in a database. In the above table definition, the base table references other tables. But other tables (e.g., purchases) can have foreign keys that refer back to the base table in a many-to-one relationship, and it is not uncommon to have relationships that reference something other than the primary key of a table. All these possibilities can be specified in the template document.
There are many ways that database contents may be presented to provide useful information to a web application "operator" in any role.
In truth, the list of possible reports that people may wish to run is practically endless, and changes with some frequency Thus, flexibility in specifying reports (both one-time and recurring) is key. But it is also necessary to produce reports in different formats. For example, a list of all the inventory items that are currently out of stock is probably best presented as a table. But a customer invoice is much more complex, having sections for billing address, work or product line items, totals, terms, invoice numbers, etc. Besides, you will probably want your company logo as part of an invoice.
Many database front ends (e.g., Microsoft Access) provide this capability, but databases built without such a "wrapper" will not provide much in terms of report generation. Even where provided, it may not meet a company's needs. Thus DynaCRUX is designed to give the user control of how a report looks.
Since DynaCRUX is itself a web application, it renders its reports to a web browser. Once there, the browser's built-in print capability is used to make paper copies of a report. Or, if there is a PDF printer driver installed, the report can be converted to that format and saved.
But sometimes the user just wants to save the report on a local hard drive for further manipulation. Most commonly, this would be done by a spreadsheet program, and DynaCRUX has the ability to export data in either CSV (Comma Separated Values) form or XML form for importation into a spreadsheet program.