Intersecting Sets

This script works in the Wombat data set, installed into the data tree with the compound view. It creates a list of compounds that have activity at rat, guinea pig, and human respectively (as listed in the WOMBAT.ACT.LIST table, under BIO.SPECIES). The number of compounds that belong to all three sets are then calculated. This is a simple script that can serve as the basis for more complex list manipulation and calculation.

// Put this under Wombat (compound view) datatree in sample data project
 
import com.im.df.api.util.*
import com.im.commons.progress.*
import com.im.df.api.dml.*
import com.im.df.api.support.*
 
def rootEty = dataTree.rootVertex.entity
def detailVertex = dataTree.rootVertex.edges.find { it.destination.entity.name == 'WOMBAT.ACT.LIST' }
def detailEty = detailVertex.destination.entity
def fldBioSpecies = detailEty.fields.items.find { it.name == 'BIO.SPECIES' }
 
println 'root entity:' + rootEty
println 'detail entity:' + detailEty
println 'bio species field:' + fldBioSpecies
 
values = [ 'rat', 'guinea pig', 'human' ]
 
List result
def edp = DIFUtilities.findEntityDataProvider(rootEty)
def env = EnvUtils.createDefaultEnvironmentRO('searching', false)
try {
values.each { value ->
DFTermExpression expr = DFTermsFactory.createFieldOperatorValueExpr(Operators.EQUALS, fldBioSpecies, null, value)
List ids = edp.queryForIds(dataTree, expr, SortDirective.EMPTY, env)
if (result == null) {
result = new ArrayList(ids)
} else {
result.retainAll(ids)
}
println 'Structures for ' + value + ' => ids.size = ' + ids.size
}
println 'Result (intersection of all lists): size = ' + result.size()
} finally {
env?.feedback.finish()
}