How should I prepare and store my data for cheminformatics applications?
Structuring your cheminformatics data First, the easiest format to work with is a simple table of data, where each row…
The term is derived from agricultural usage where individual silos are used as stores for different types of grain. Regardless of the crop level within each, their contents remain physically separated. In the case of information, the separation is more often due to technological circumstances than physical. Studies of organisational behaviour show that social and informational separation often increases in prevalence as the numbers of employees, specialisations, and organisational units rise. Given the number of contributors and technologies involved in a typical drug discovery project, it’s easy for data to get isolated and not be made accessible to other teams or systems. As a result, we get information silos.
Collecting lots of data is only useful if we’re able to analyse and interpret it correctly and quickly. Often, the platform we use for modelling compound properties or running assays isn’t the same as the platform we use to visualise the resulting data. If we can’t easily move data between systems, we can’t work efficiently, and silos form very quickly. So, we’re definitely going to be slowing down discovery. Every time this happens, the ultimate impact is that we reduce the speed to market for our molecules while increasing costs. Plus, it wastes scientists’ valuable time. Who wants to spend hours downloading, transforming and uploading data files when they could be experimenting or analysing?
In addition, silos inevitably result in duplicated effort. Two teams in an organisation might be looking at similar data or repeating unnecessary investigations without being aware of the other.
Silos also create data security and quality headaches. It’s much easier to effectively manage data cleaning and retention when there’s a single source to deal with rather than many disconnected systems.
Numerous. At a departmental level, we might be struggling with:
Beyond this, further barriers arise. Drug discovery is often competitive in nature. We care about protecting our IP and keeping valuable information to ourselves, so there’s also an industry culture element to the issue.
We’ll always need different systems to accomplish our different tasks; there’ll never be one platform that does it all. So, it’s important to ensure that our software systems are seamlessly integrated.
If you’re interested in hearing about Optibrium’s integration methods in detail, our Director of Implementation, Chris Leeding, has written an excellent series of blogs covering data integration in drug discovery software and AI platforms.
We can also use dedicated collaborative design platforms, which allow multiple users to work on the same data sets and share information easily. Tools like Schrodinger’s LiveDesign, CDD Vault and Optibrium’s Idea Tracker are all purpose-built for collaboration.
In our webinar, Optibrium’s Dr Tamsin Mansley, and Collaborative Drug Discovery (CDD)’s Janice Darlington discussed the importance of avoiding information silos and enabling collaboration in drug discovery, with live collaborative, integrated software demonstrations as well as audience Q&A. You can watch the event on-demand on YouTube.
Structuring your cheminformatics data First, the easiest format to work with is a simple table of data, where each row…
At a very basic level, this means that StarDrop supports loading data from many different standard file formats (SDF, MOL2,…