Model¶
- class champollion.Champollion(epsilon=1.0, gamma=0.01, lambda_prior=20.0, use_keops=False, device='auto', random_state=None, max_iter=300000, learning_rate=0.001, sinkhorn_tol=0.001, log_every=40, monitor_gradient_norm=None, gradient_norm_tol=0.001, wandb_log=False, verbose=False, prior_weight=None)¶
Bases:
objectIntegrates unpaired single-cell multimodal data using a small set paired of cells called bridge.
Champollion learns a bilinear cross-modality cost matrix
Afrom the paired bridge cells, then uses the learned cost to transport unpaired cells between two modalities. Inputs are expected to be AnnData-like objects, usually stored in a MuData object for the bridge and passed as a modality-keyed dictionary for transport.- Parameters:
epsilon – Entropic regularization strength used in the optimal transport problem. The default value of 1 should not be changed as it amounts only to a rescaling of the problem.
gamma – Lasso regularization weight applied to the learned matrix
A.lambda_prior – Weight of the optional prior cost term. This is the lambda parameter from the paper.
use_keops – If
True, use KeOps LazyTensors for symbolic cost and plan operations to reduce dense memory use.device – Torch device. Use
"auto"to select CUDA when available and CPU otherwise.random_state – Optional random seed passed to torch before fitting.
max_iter – Number of Adam optimization iterations for fitting
A.learning_rate – Adam learning rate.
sinkhorn_tol – Tolerance used when checking Sinkhorn marginal convergence.
log_every – Number of fit iterations between metric logging and convergence checks.
monitor_gradient_norm – If truthy, enable gradient-norm stopping during fitting.
gradient_norm_tol – Gradient norm threshold used when gradient-norm stopping is enabled.
wandb_log – If
True, log metrics to Weights & Biases. Requires installing the optionalwandbextra.verbose – If
True, print fit and transport progress.prior_weight – Deprecated alias for
lambda_prior.
- SAVE_FORMAT_VERSION = 1¶
- fit(mdata, modality_1, modality_2, x_1_rep='X', x_2_rep='X', y_prior_1_rep=None, y_prior_2_rep=None, feature_names=None)¶
Fit the cross-modality cost on the paired bridge cells.
The bridge’s two modalities must contain the same observations. If the observation names are the same but ordered differently, the second modality is reordered to match the first.
Representations can be specified as
"X","obsm/key","layers/key", or by shorthand key when unambiguous. If a representation comes from.Xor.layers, feature names are taken fromadata.var_names. If it comes from.obsm, feature names are generated unless supplied withfeature_names.- Parameters:
mdata – MuData-like object with a
.modmapping containing both modalities for the paired bridge cells.modality_1 – Names of the two modalities to align.
modality_2 – Names of the two modalities to align.
x_1_rep – Main representations used to learn the bilinear cost matrix.
x_2_rep – Main representations used to learn the bilinear cost matrix.
y_prior_1_rep – Optional prior representations. Provide both or neither.
y_prior_2_rep – Optional prior representations. Provide both or neither.
feature_names – Optional mapping from modality name to feature names for the main representations, mainly useful for
.obsmrepresentations.
- Returns:
The fitted model.
- Return type:
- transport(adatas, x_reps=None, y_prior_reps=None, store_cost=True, store_plan=False, max_iter_sink=1000, log_every=10, feature_names=None)¶
Compute transport between unpaired modality-specific AnnData objects.
This step uses the learned matrix
Aand the representation schema recorded duringfit. The input dictionary must contain exactly the two modality names used duringfit.- Parameters:
adatas – Dictionary mapping the modality names used during
fitto AnnData objects.x_reps – Optional mapping from modality name to main representation. If omitted, the representation names specified during
fitare reused.y_prior_reps – Optional mapping from modality name to prior representation. If the model was fitted with priors and this is omitted, the prior representation names specified during
fitare reused.store_cost – Whether to keep the cost object in the returned result. With KeOps, this may be symbolic rather than dense.
store_plan – Whether to compute and store the transport plan immediately. If
False, the plan is computed lazily on first access.max_iter_sink – Maximum number of Sinkhorn iterations used to solve the transport problem.
log_every – Number of transport Sinkhorn iterations between convergence checks.
feature_names – Optional feature-name overrides for transport representations.
- Returns:
Object containing potentials, diagnostics, lazy cost/plan access, and downstream transfer/projection utilities.
- Return type:
- training_transport()¶
Return transport quantities for the paired bridge cells.
- Returns:
Result object for the paired cells used in
fit.- Return type:
- save(path)¶
Save the fitted model state needed for future transport.
The saved file contains hyperparameters,
A, modality names, representation names, feature names, and schema metadata. It does not store bridge data, dense costs, dense plans, optimizers, or cached fit internals.- Parameters:
path – Destination path passed to
torch.save.
- classmethod load(path, device='auto', use_keops=None)¶
Load a fitted Champollion model saved with
save().- Parameters:
path – Path to a saved Champollion model.
device – Device on which to load the learned matrix
A.use_keops – Optional override for the saved KeOps setting.
- Returns:
Fitted model ready for
transport.- Return type:
- A_dataframe()¶
Return the learned matrix
Aas a labeled DataFrame.- Returns:
Matrix with rows named by the first modality’s features and columns named by the second modality’s features.
- Return type:
pandas.DataFrame
- save_A(path, format='auto')¶
Save the learned matrix
Awith feature labels.- Parameters:
path – Output path.
format – One of
"auto","csv","tsv","parquet","pkl", or"pickle"."auto"infers the format from the path suffix.
- top_interactions(feature, modality, k=10, direction='both')¶
Return top weighted interactions for one feature in
A.- Parameters:
feature – Feature name in the queried modality.
modality – Modality containing
feature.k – Maximum number of interactions to return.
direction –
"positive","negative", or"both"."both"ranks by absolute weight.
- Returns:
Interaction table with source/target modalities, feature names, signed weights, and absolute weights.
- Return type:
pandas.DataFrame