Example 1: The first example shows a 'schema level' script written in the Groovy language. The script will calculate the Pearson linear correlation coefficient for two variables. The data for the variables are obtained as entity.field(s) contained in a given data set defined in SQL. We think this script might be highly useful when deciding upon the distinct variables to employ in the construction and analysis of QSAR expressions/models, derived from your data model. In order to help show the script as a useful template, for other such approaches, we can break the script into it's three conceptual parts:
A Groovy class called 'QueryBuilder' which will return a 'sample' result set from a RDBMS table.
A Groovy function called 'Pearson' that implements the correlation coefficient over the sample set
A Groovy SWING window which provides the user interface to the record set and Pearson computation
QueryBuilder class This class obtains an internal injected connection and executes the stated SQL statement in order to obtain the sample results set. This class can be instantiated with a SQL WHERE clause filter. It returns the fields from a JChem table separately for binding to the SWING controls and for use in the calculations. Currently, the SQL must be edited in the class, by the user who runs the script. The entity in use is a JChem structures entity, with logP and TPSA field calculations already added.
Pearson function This function accepts two lists of data (X and Y) and returns the Pearson Correlation coefficient. It is easy to replace this function with your own linear correlation coefficient implementation.
Swing interface The SWING interface shows the results of the SQL statement and provides two navigation buttons 'Next' and 'Previous'. There are two drop down boxes that can be used to select the 'X' and 'Y' variable choices for the Pearson calculation. Finally, there is a button to compute the coefficient. This calls the function and sets the adjacent text box value with the return result 'r'. You are encouraged to try different SWING formats! Currently the drop down box is hard coded with known fields for the entity defined.
Example 2: The second example is a more evolved version. The code(function) for calculating the Pearson coefficient now includes a call to an external grape library. This should be a more efficient way and might handle much larger X and Y data sets better than completing the same calculations in the Groovy language. Coming soon in version 5.9!