Data integration in drug discovery software

Author

Chris Leeding, PhD

Understand how StarDrop connects to your data

One of the challenges facing drug discovery scientists is maintaining consistent data across the different software tools that they use to support their daily work. Data can come from many different sources and in many different formats which can easily result in time spent translating between formats at the expense of doing the interesting work, and an ongoing struggle to ensure that the data they are working with is the best and most-up-to-date data at that time.

At Optibrium we recognise that our customers will often want to make use of other tools as well as those that we develop, and one of our key principles has always been that we make it as easy as possible to switch between tools without disrupting the workflow.

At a very basic level, this means that StarDrop supports loading data from many different standard file formats (SDF, MOL2, SMI, CVS, etc.) as well as our proprietary data formats, and also supports saving data in those formats so that it can be easily passed to other applications for further analysis or reporting. However, we recognise that this is only part of the solution, and we have also focused on ways to allow for a seamless integration between our software and other applications so that the user can move data between them without the need to use intermediate files at all. To discover how data integration works in StarDrop read on

How does StarDrop access my data?

The key to integrating StarDrop with the rest of your technology is a Python interface. This can be used to implement additional functionality within the application. It may sound a little intimidating to users without a technical background, but one of our principal aims in developing this interface has been to shield the user from the details of what is happening under the hood.

Flexibility of Python scripting environment

The interface enables users or admins to add extra functionality via StarDrop’s Custom Scripts menu.

To assist with deploying and maintaining this functionality, StarDrop can be configured to load scripts from a shared location automatically. This enables an administrator to distribute scripts to all users, and to maintain and update those scripts, without requiring individual users to make any changes themselves. From the user’s perspective, they will simply see a set of new menu items in the StarDrop application and these can be launched in the normal way.

Over time we have developed a collection of scripts ourselves. We have also supported many customers to develop their own new functionality, to customise StarDrop for their particular needs. The Python language is popular for its ease of use and flexibility, potentially supporting a huge range of functionality. In practice we find that our customers are mostly interested in three broad areas:

Retrieving data from a remote source (e.g. in-house databases) and loading it into StarDrop.
Extracting information from a data set in StarDrop and passing that data to a remote application or service that can generate additional columns in the StarDrop data set.
Sending the contents of a StarDrop data set to a different application, e.g. as a SDF file for the user to work on.

In this article, we’re going to focus on the first of these scenarios.

Query Interface

One of the most popular additions to StarDrop is our Query Interface tool. This can be added to any StarDrop installation via a simple installation process.

The guiding principle behind the Query Interface is that StarDrop users can access data from many different sources via a simple, intuitive user interface without needing to know any details about where the data is stored, how it is configured or how it is accessed.

To run a query, the user selects the properties of interest from a searchable list and specifies filters to apply to the data. For instance, in the screenshot below we have opted to retrieve chemical, structure, corporate ID and some assay data for all compounds with logP in a particular range:

Screenshot showing retrieval of chemical, structure, corporate id and some assay data for all compounds with logP in a particular range

The user can specify filters for numerical, text and date fields. These can be concatenated to further restrict the search:

Image showing how the user can specify filters for numerical, text and date fields, and these can be concatenated to further restrict the search

In addition, where the data source supports it, we can also perform chemical structure filtering. This is shown below:

Image showing how to perform chemical structure filtering

Customisation

The standard Query Interface can be configured to work with simple database schemas or flat files. But of course, the interesting data isn’t always available in these formats. Therefore, we have worked with many users to develop customised Query Interfaces that work on more complex data structures behind the scenes but deliver exactly the same user experience. This can involve amalgamating data from multiple sources (e.g. a data repository for experimental results and a registration system for chemical structure information), accessing alternative sources of data (e.g. web services) or performing some pre-processing of data (e.g. aggregation of multiple experimental results).

As well as handling complex data sources, customisation also enables some more personalised functionality to be added to help you get the most out of your data. Examples include:

View data at different aggregation levels (e.g. batch vs compound)
One-click access to commonly used queries such as
- Show all the data for my project
- Show all compounds with data for this assay
- Show all the data for these compounds
See the individual experimental results behind an aggregated value in the data set (e.g. to investigate an apparently anomalous value)

Other connection models

The Query Interface is a flexible tool for accessing a variety of data sources. However, we have also developed more specific tools to support the retrieval of data from other applications. These include other software vendors such as CDD Vault, or customers’ own in-house applications.

Some of these are already available for download from our website, but we are always happy to discuss extending the range of tools available if you have a particular requirement.

Refresh

In addition to having ready access to data, the ability to keep StarDrop up-to-date is important. To address this, StarDrop supports the concept of refreshing a data set to include the latest data. This can be applied to any data set generated from the Query Interface. It can also be applied in many other more customised scenarios. From a user perspective, it only requires a single click on a menu item.

StarDrop will repeat the operation that initially generated the data set. It will then merge any new data with the existing data. New and revised data is then highlighted so that the user can easily inspect the changes:

Image showing new and revised data is then highlighted so that the user can easily inspect the changes

Interested in trying StarDrop?

Get in touch with the team to discuss your specific projects and needs, and book a software demo.

Get in touch

Chris Leeding, PhD

Chris earned a BSc and PhD in Chemistry from King’s College London, specialising in organic reaction mechanisms. After working at the Royal Society of Chemistry, he joined Inpharmatica in 2006, contributing to early StarDrop features like the Auto-Modeller. At Optibrium, he has driven key developments for StarDrop and, since 2018, has led custom projects integrating Optibrium software with clients’ in-house tools.

Cookies

Understand how StarDrop connects to your data

How does StarDrop access my data?

Flexibility of Python scripting environment

Query Interface

Customisation

Other connection models

Refresh

Interested in trying StarDrop?

About the author

Chris Leeding, PhD

More drug discovery resources

How much does drug discovery software cost?

Supporting Canadian chemistry pioneers with fast SAR analysis

How to evaluate StarDrop: A step-by-step guide