Model¶

class champollion.Champollion(epsilon=1.0, gamma=0.01, lambda_prior=20.0, use_keops=False, device='auto', random_state=None, max_iter=300000, learning_rate=0.001, sinkhorn_tol=0.001, log_every=40, monitor_gradient_norm=None, gradient_norm_tol=0.001, wandb_log=False, verbose=False, prior_weight=None)¶

Bases: object

Integrates unpaired single-cell multimodal data using a small set paired of cells called bridge.

Champollion learns a bilinear cross-modality cost matrix A from the paired bridge cells, then uses the learned cost to transport unpaired cells between two modalities. Inputs are expected to be AnnData-like objects, usually stored in a MuData object for the bridge and passed as a modality-keyed dictionary for transport.

Parameters:

epsilon – Entropic regularization strength used in the optimal transport problem. The default value of 1 should not be changed as it amounts only to a rescaling of the problem.
gamma – Lasso regularization weight applied to the learned matrix A.
lambda_prior – Weight of the optional prior cost term. This is the lambda parameter from the paper.
use_keops – If True, use KeOps LazyTensors for symbolic cost and plan operations to reduce dense memory use.
device – Torch device. Use "auto" to select CUDA when available and CPU otherwise.
random_state – Optional random seed passed to torch before fitting.
max_iter – Number of Adam optimization iterations for fitting A.
learning_rate – Adam learning rate.
sinkhorn_tol – Tolerance used when checking Sinkhorn marginal convergence.
log_every – Number of fit iterations between metric logging and convergence checks.
monitor_gradient_norm – If truthy, enable gradient-norm stopping during fitting.
gradient_norm_tol – Gradient norm threshold used when gradient-norm stopping is enabled.
wandb_log – If True, log metrics to Weights & Biases. Requires installing the optional wandb extra.
verbose – If True, print fit and transport progress.
prior_weight – Deprecated alias for lambda_prior.

SAVE_FORMAT_VERSION = 1¶

fit(mdata, modality_1, modality_2, x_1_rep='X', x_2_rep='X', y_prior_1_rep=None, y_prior_2_rep=None, feature_names=None)¶

Fit the cross-modality cost on the paired bridge cells.

The bridge’s two modalities must contain the same observations. If the observation names are the same but ordered differently, the second modality is reordered to match the first.

Representations can be specified as "X", "obsm/key", "layers/key", or by shorthand key when unambiguous. If a representation comes from .X or .layers, feature names are taken from adata.var_names. If it comes from .obsm, feature names are generated unless supplied with feature_names.

Parameters:

mdata – MuData-like object with a .mod mapping containing both modalities for the paired bridge cells.
modality_1 – Names of the two modalities to align.
modality_2 – Names of the two modalities to align.
x_1_rep – Main representations used to learn the bilinear cost matrix.
x_2_rep – Main representations used to learn the bilinear cost matrix.
y_prior_1_rep – Optional prior representations. Provide both or neither.
y_prior_2_rep – Optional prior representations. Provide both or neither.
feature_names – Optional mapping from modality name to feature names for the main representations, mainly useful for .obsm representations.

Returns:

The fitted model.

Return type:

Champollion

transport(adatas, x_reps=None, y_prior_reps=None, store_cost=True, store_plan=False, max_iter_sink=1000, log_every=10, feature_names=None)¶

Compute transport between unpaired modality-specific AnnData objects.

This step uses the learned matrix A and the representation schema recorded during fit. The input dictionary must contain exactly the two modality names used during fit.

Parameters:

adatas – Dictionary mapping the modality names used during fit to AnnData objects.
x_reps – Optional mapping from modality name to main representation. If omitted, the representation names specified during fit are reused.
y_prior_reps – Optional mapping from modality name to prior representation. If the model was fitted with priors and this is omitted, the prior representation names specified during fit are reused.
store_cost – Whether to keep the cost object in the returned result. With KeOps, this may be symbolic rather than dense.
store_plan – Whether to compute and store the transport plan immediately. If False, the plan is computed lazily on first access.
max_iter_sink – Maximum number of Sinkhorn iterations used to solve the transport problem.
log_every – Number of transport Sinkhorn iterations between convergence checks.
feature_names – Optional feature-name overrides for transport representations.

Returns:

Object containing potentials, diagnostics, lazy cost/plan access, and downstream transfer/projection utilities.

Return type:

TransportResult

training_transport()¶

Return transport quantities for the paired bridge cells.

Returns:: Result object for the paired cells used in fit.
Return type:: TransportResult

save(path)¶

Save the fitted model state needed for future transport.

The saved file contains hyperparameters, A, modality names, representation names, feature names, and schema metadata. It does not store bridge data, dense costs, dense plans, optimizers, or cached fit internals.

Parameters:: path – Destination path passed to torch.save.

classmethod load(path, device='auto', use_keops=None)¶

Load a fitted Champollion model saved with save().

Parameters:

path – Path to a saved Champollion model.
device – Device on which to load the learned matrix A.
use_keops – Optional override for the saved KeOps setting.

Returns:

Fitted model ready for transport.

Return type:

Champollion

A_dataframe()¶

Return the learned matrix A as a labeled DataFrame.

Returns:: Matrix with rows named by the first modality’s features and columns named by the second modality’s features.
Return type:: pandas.DataFrame

save_A(path, format='auto')¶

Save the learned matrix A with feature labels.

Parameters:

path – Output path.
format – One of "auto", "csv", "tsv", "parquet", "pkl", or "pickle". "auto" infers the format from the path suffix.

top_interactions(feature, modality, k=10, direction='both')¶

Return top weighted interactions for one feature in A.

Parameters:

feature – Feature name in the queried modality.
modality – Modality containing feature.
k – Maximum number of interactions to return.
direction – "positive", "negative", or "both". "both" ranks by absolute weight.

Returns:

Interaction table with source/target modalities, feature names, signed weights, and absolute weights.

Return type:

pandas.DataFrame