Cerella™ Release Notes
Version 1.1.12 release notes (September 2023)
New Features
Data upload and model-building
- Endpoint data for duplicate compounds are now merged using a default or user- defined merge rule (previously, data was taken from one compound)
- The minimum and maximum number of iteration layers may be specified for the impute and virtual models in hyperparameter optimisation
Data source configuration
- A merge rule for duplicate compounds can now specified for each endpoint
Changes
Data upload and model-building
- Updated to Alchemite v0.71.1 and binary version 20230728
- Synonyms are now extracted from CDD Vault when creating a data source
- The CDD Vault integration has been updated to use the get readout rows API endpoint
- The data upload process is now run in a Kubernetes job
- Predicted values from previous model-building runs are removed from the data matrix if the compound is not included in the uploaded data source(s)
- The Upload data, training and prediction option now uses the current user-defined test set (if this has been changed since hyperparameter optimisation was last run)
Web UI
- Compounds and endpoints are now reported for each data source in the data upload report
REST API
- Queries with the unsupported NOT EXISTS operator result in a clear error message
- The performance of query interface data file upload has been improved
- The query interface data source status endpoint now returns the data source name and type
Bug Fixes
Data upload and model-building
- Fixed a bug that resulted in the data upload report generation failing on an invalid structure
- Fixed a bug that resulted in the structure tracker failing to retrieve the correct compounds if client IDs contained hyphens
Web UI
- Fixed bug that resulted in the data upload report view failing to be updated if data upload was re-run
REST API
- Fixed a problem with inconsistent factor errors of x0 and x1 for a Cerella-transformed endpoint
Version 1.1.11 release notes (July 2023)
New Features
Data upload and model-building
- Added a new data upload report, summarising the data sources, compounds and endpoints uploaded for model building.
Web UI
- Added a new System Logs panel, allowing logs for data upload and model-building processes to be viewed and downloaded for a specified date and time range.
- Added the ability for the full matrix of measured and predicted values from the impute, virtual and selected models to be downloaded for the training and test set.
Changes
Data upload and model-building
- The data upload pipeline is now more resilient and retries REST API requests that fail.
- The data set split for hyperparameter optimisation is now run as a Kubernetes job rather than on the ingest-upload server to increase available memory.
Web UI
- Removed old log download links from the System Status panel.
Bug Fixes
Data upload and model-building
- Fixed a bug stopping the data source status API endpoint from including custom data sources.
Web UI
- Fixed a bug that stopped data source checkboxes from appearing on the Add User panel.
- Fixed a bug that resulted in values being read from the wrong file for virtual and selected individual endpoint results.
REST API
- Fixed a bug that stopped Cerella prediction requests from returning results for valid compounds if there was a single invalid structure within the batch.
- Fixed a bug that resulted in missing predictions when measured input values were 0.
Version 1.1.10 release notes (May 2023)
New Features
Data upload and model building:
- Added support for input-only endpoints.
- Enabled the hyperparameter validation combination method to be set to mean rather than median, which remains the default.
Data source configuration:
- Added support for input-only endpoints
Changes
Web UI:
- Users can no longer delete their own UserAdmin permissions or change their own Active status
- The data source file name is now displayed for flat-file data sources
Bug Fixes
Web UI:
- The endpoint rollback axis label is now included in any saved image
- The axis label wording on the endpoint rollback plot has been improved
Version 1.1.9 release notes (April 2023)
New Features
Data upload and model building:
- Updated to Alchemite v0.61.1
Cerella API:
- Updated to Alchemite v0.61.1
Web UI:
- Added a mailto link for Cerella support
Changes
Data upload and model building:
- Refactored the virtual model validation process to use Alchemite’s analyse_validate job with the virtualExperimentValidation option
- Tidied information reported in data upload and model building logging
- Modified rules to apply log10 transformations to certain bounded percentages by default
Data source configuration:
- Added molecule name and cdd_registry_number to CDD Vault data source properties
Cerella API:
- Tidied information reported in server logging
- Added optimistic concurrency control for custom descriptors endpoints
Web UI:
- Update copyright statements
Bug Fixes
Data upload and model building:
- Fixed a data upload and prediction failure encountered when a new data set had additional columns not included in the original model training
- Fixed an inconsistency in median R² values from internal validation (hyperparameter optimisation). Alchemite now combines the predictions for each fold and then computes R², instead of computing the R² for each fold individually and then averaging
- Tidied logging rules to ensure that identifiers of invalid structures are not written to the data upload log
- Tidied logging rules to ensure that endpoint group and display group information is not written to the data upload log
Web UI:
- Removed unnecessary whitespace displayed in the importance matrix plot
Version 1.1.8 release notes (March 2023)
New Features
Data upload and model building:
- Added support for specifying priority endpoints in hyperparameter optimisation
- Enabled administrators to specify and store data upload and model building configuration
- Enabled imputation and virtual model hyperparameter optimisation processes to run concurrently
Changes
Data upload and model building:
- Modified the wording for the data upload and model building steps in the administration interface
- Improved exception traceback logging in the data upload and model building pipeline
- Provided access to separate imputation and virtual hyperparameter optimisation logs
- Added timings for model training, validation and prediction in the administration interface
- Updated the presentation of data upload and model building logs and timings in the administration interface
- Excluded endpoints without any validation results from validation reports
- Added help text for model building and prediction options in the administration interface
Data source configuration:
- Added help text for the ‘Hidden’ checkbox in the data source configuration editor
- Added details of transformations to the units report for data source endpoints
Cerella API:
- Changed the location from which the predicted value rejection threshold is loaded to be the configuration server rather than the environment
Bug Fixes
Data upload and model building:
- Fixed the behaviour of endpoints with an empty measurement group; each endpoint is now placed in a separate endpoint group
- If the data upload step fails, the pipeline now terminates, rather than running any subsequent model building steps
- Fixed model validation failures that occurred due to the virtual test values file persisting between model building runs
- Ensured that Cerella no longer treats an empty string as a category value in data upload
- A meaningful error message is returned instead of ‘internal server error’ when returning model statistics if the Cerella engine is unavailable
Data source configuration:
- Fixed a problem with transformation editing for numeric data source endpoints
- Fixed an issue that meant the endpoint group could not be changed from ‘Mixed’ when multiple endpoints were selected
Cerella API:
- Fixed a failure to return predicted Cerella values (with ‘internal server error’) if the request did not include measured values