Cerella™ Release Notes

Version 1.1.15 release notes (January 2024)

New Features

Web UI

  • The name of any uploaded test set identifier file for a data source is displayed
  • All visible users and data sources can be selected or deselected with a single click

Changes

Data upload and model building

  • The storage server has been replaced by an AWS EFS volume mount
  • Updated to Alchemite v0.82.0

Web UI

  • Added a vertical scroll bar to the Model Comparison table

Bug Fixes

Data upload and model building

  • Fixed an issue with missing permissions for a model comparison log query
  • Fixed an issue with permissions to allow obsolete endpoints to be deleted during a scheduled data upload
  • Updated to Alchemite v0.82.0
    • Fixed a model server timeout on startup
    • Fixed model server blocking requests
    • Fixed an issue causing the hyperparameter optimisation to miss cycles

Web UI

  • Fixed a failure to show transformations configured as None correctly when a data source was created from a JSON data source configuration file

Version 1.1.14 release notes (December 2023)

New Features

Data upload and model-building

  • Added the ability to upload a CSV file of compounds and endpoint data in order to make predictions using the impute, virtual and selected models for comparison. The endpoints in the CSV file should match those in the current Cerella models

Data source configuration

  • Added the option to ignore qualifiers on values for an individual endpoint in data upload and model building

Web UI

  • Added a password reveal feature
  • Added the ability to see the status of custom data sources

Changes

Data upload and model-building

  • Changed the default p-value for rejecting low-confidence predictions to 0.3

Web UI

  • Added an inactivity timeout of 15 minutes

Bug Fixes

Web UI

  • Fixed the Update button in the System Management pane that was not visible when browser scaling was applied
  • Fixed a copy to clipboard bug on Firefox

Version 1.1.13 release notes (October 2023)

New Features

Data upload and model-building

  • Endpoints with low variance are listed in the data upload report
  • Highly correlated endpoints are listed in the data upload report
  • Multiple scheduled data upload and model-building jobs can be defined with different model-building options

Web UI

  • Rollback data for all endpoints can be downloaded in a single file

Changes

Data upload and model-building

  • If a chemical structure represents more than one discrete component (e.g., a mixture), only the largest is used in the descriptor calculator

Web UI

  • In the data source configuration, the Use factor checkbox has been replaced with an Error type selector
  • The performance of endpoint rollback display and download for individual endpoints has been improved

Bug Fixes

Data upload and model-building

  • Fixed a bug that resulted in an out-of-bounds error in endpoint rollback calculation
  • Fixed a bug that resulted in a JSON parse failure when retrieving query results in the data upload pipeline

Web UI

  • Removed unnecessary error messages that were shown when the user did not have permission to access certain controls

REST API

  • Fixed a bug that resulted in model services returning error status in a clean Cerella installation
  • Fixed a bug that could result in Cerella values not being retrieved in a query
  • Fixed a bug that stopped suggested measurements being returned if there was a duplicate endpoint in the request

Version 1.1.12 release notes (September 2023)

New Features

Data upload and model-building

  • Endpoint data for duplicate compounds are now merged using a default or user- defined merge rule (previously, data was taken from one compound)
  • The minimum and maximum number of iteration layers may be specified for the impute and virtual models in hyperparameter optimisation

Data source configuration

  • A merge rule for duplicate compounds can now specified for each endpoint

Changes

Data upload and model-building

  • Updated to Alchemite v0.71.1 and binary version 20230728
  • Synonyms are now extracted from CDD Vault when creating a data source
  • The CDD Vault integration has been updated to use the get readout rows API endpoint
  • The data upload process is now run in a Kubernetes job
  • Predicted values from previous model-building runs are removed from the data matrix if the compound is not included in the uploaded data source(s)
  • The Upload data, training and prediction option now uses the current user-defined test set (if this has been changed since hyperparameter optimisation was last run)

Web UI

  • Compounds and endpoints are now reported for each data source in the data upload report

REST API

  • Queries with the unsupported NOT EXISTS operator result in a clear error message
  • The performance of query interface data file upload has been improved
  • The query interface data source status endpoint now returns the data source name and type

Bug Fixes

Data upload and model-building

  • Fixed a bug that resulted in the data upload report generation failing on an invalid structure
  • Fixed a bug that resulted in the structure tracker failing to retrieve the correct compounds if client IDs contained hyphens

Web UI

  • Fixed bug that resulted in the data upload report view failing to be updated if data upload was re-run

REST API

  • Fixed a problem with inconsistent factor errors of x0 and x1 for a Cerella-transformed endpoint

Version 1.1.11 release notes (July 2023)

New Features

Data upload and model-building

  • Added a new data upload report, summarising the data sources, compounds and endpoints uploaded for model building

Web UI

  • Added a new System Logs panel, allowing logs for data upload and model-building processes to be viewed and downloaded for a specified date and time range
  • Added the ability for the full matrix of measured and predicted values from the impute, virtual and selected models to be downloaded for the training and test set

Changes

Data upload and model-building

  • The data upload pipeline is now more resilient and retries REST API requests that fail
  • The data set split for hyperparameter optimisation is now run as a Kubernetes job rather than on the ingest-upload server to increase available memory

Web UI

  • Removed old log download links from the System Status panel

Bug Fixes

Data upload and model-building

  • Fixed a bug stopping the data source status API endpoint from including custom data sources

Web UI

  • Fixed a bug that stopped data source checkboxes from appearing on the Add User panel
  • Fixed a bug that resulted in values being read from the wrong file for virtual and selected individual endpoint results

REST API

  • Fixed a bug that stopped Cerella prediction requests from returning results for valid compounds if there was a single invalid structure within the batch
  • Fixed a bug that resulted in missing predictions when measured input values were 0

Version 1.1.9 release notes (April 2023)

New Features

Data upload and model building​:

  • Updated to Alchemite v0.61.1

Cerella API​:

  • Updated to Alchemite v0.61.1

Web UI:

  • Added a mailto link for Cerella support

Changes

Data upload and model building​:

  •  Refactored the virtual model validation process to use Alchemite’s analyse_validate job with the virtualExperimentValidation option
  • Tidied information reported in data upload and model building logging
  • Modified rules to apply log10 transformations to certain bounded percentages by default

Data source configuration:

  • Added molecule name and cdd_registry_number to CDD Vault data source properties

Cerella API:

  • Tidied information reported in server logging
  • Added optimistic concurrency control for custom descriptors endpoints

Web UI:

  • Update copyright statements

Bug Fixes

Data upload and model building​:

  • Fixed a data upload and prediction failure encountered when a new data set had additional columns not included in the original model training​
  • Fixed an inconsistency in median R² values from internal validation (hyperparameter optimisation). Alchemite now combines the predictions for each fold and then computes R², instead of computing the R² for each fold individually and then averaging
  • Tidied logging rules to ensure that identifiers of invalid structures are not written to the data upload log
  • Tidied logging rules to ensure that endpoint group and display group information is not written to the data upload log

Web UI​:

  • Removed unnecessary whitespace displayed in the importance matrix plot

Version 1.1.10 release notes (May 2023)

New Features

Data upload and model building​:

  • Added support for input-only endpoints.
  • Enabled the hyperparameter validation combination method to be set to mean rather than median, which remains the default.

Data source configuration:

  • Added support for input-only endpoints

Changes

Web UI:

  • Users can no longer delete their own UserAdmin permissions or change their own Active status
  • The data source file name is now displayed for flat-file data sources

Bug Fixes

Web UI​:

  • The endpoint rollback axis label is now included in any saved image
  • The axis label wording on the endpoint rollback plot has been improved

Version 1.1.8 release notes (March 2023)

New Features

Data upload and model building​:

  •  Added support for specifying priority endpoints in hyperparameter optimisation
  • Enabled administrators to specify and store data upload and model building configuration
  • Enabled imputation and virtual model hyperparameter optimisation processes to run concurrently

Changes

Data upload and model building​:

  • Modified the wording for the data upload and model building steps in the administration interface
  • Improved exception traceback logging in the data upload and model building pipeline
  • Provided access to separate imputation and virtual hyperparameter optimisation logs
  • Added timings for model training, validation and prediction in the administration interface
  • Updated the presentation of data upload and model building logs and timings in the administration interface
  • Excluded endpoints without any validation results from validation reports
  • Added help text for model building and prediction options in the administration interface

Data source configuration:

  • Added help text for the ‘Hidden’ checkbox in the data source configuration editor
  • Added details of transformations to the units report for data source endpoints

Cerella API:

  • Changed the location from which the predicted value rejection threshold is loaded to be the configuration server rather than the environment

Bug Fixes

Data upload and model building​:

  •  Fixed the behaviour of endpoints with an empty measurement group; each endpoint is now placed in a separate endpoint group
  • If the data upload step fails, the pipeline now terminates, rather than running any subsequent model building steps
  • Fixed model validation failures that occurred due to the virtual test values file persisting between model building runs
  • Ensured that Cerella no longer treats an empty string as a category value in data upload
  • A meaningful error message is returned instead of ‘internal server error’ when returning model statistics if the Cerella engine is unavailable

Data source configuration​:

  •  Fixed a problem with transformation editing for numeric data source endpoints
  • Fixed an issue that meant the endpoint group could not be changed from ‘Mixed’ when multiple endpoints were selected

Cerella API​:

  • Fixed a failure to return predicted Cerella values (with ‘internal server error’) if the request did not include measured values