.. DO NOT EDIT. .. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY. .. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE: .. "_auto_examples/shortclips/06_plot_banded_ridge_model.py" .. LINE NUMBERS ARE GIVEN BELOW. .. only:: html .. note:: :class: sphx-glr-download-link-note :ref:`Go to the end ` to download the full example code .. rst-class:: sphx-glr-example-title .. _sphx_glr__auto_examples_shortclips_06_plot_banded_ridge_model.py: ===================================================================== Fit a banded ridge model with both wordnet and motion energy features ===================================================================== In this example, we model the fMRI responses with a `banded ridge regression`, with two different feature spaces: motion energy and wordnet categories. *Banded ridge regression:* Since the relative scaling of both feature spaces is unknown, we use two regularization hyperparameters (one per feature space) in a model called banded ridge regression [1]_. Just like with ridge regression, we optimize the hyperparameters over cross-validation. An efficient implementation of this model is available in the `himalaya `_ package. *Running time:* This example is more computationally intensive than the previous examples. With a GPU backend, model fitting takes around 6 minutes. With a CPU backend, it can last 10 times more. .. GENERATED FROM PYTHON SOURCE LINES 20-20 .. code-block:: Python :dedent: 1 .. GENERATED FROM PYTHON SOURCE LINES 22-24 Path of the data directory -------------------------- .. GENERATED FROM PYTHON SOURCE LINES 24-28 .. code-block:: Python from voxelwise_tutorials.io import get_data_home directory = get_data_home(dataset="shortclips") print(directory) .. rst-class:: sphx-glr-script-out .. code-block:: none /home/jlg/mvdoc/voxelwise_tutorials_data/shortclips .. GENERATED FROM PYTHON SOURCE LINES 29-33 .. code-block:: Python # modify to use another subject subject = "S01" .. GENERATED FROM PYTHON SOURCE LINES 34-39 Load the data ------------- As in the previous examples, we first load the fMRI responses, which are our regression targets. .. GENERATED FROM PYTHON SOURCE LINES 39-50 .. code-block:: Python import os import numpy as np from voxelwise_tutorials.io import load_hdf5_array file_name = os.path.join(directory, "responses", f"{subject}_responses.hdf") Y_train = load_hdf5_array(file_name, key="Y_train") Y_test = load_hdf5_array(file_name, key="Y_test") print("(n_samples_train, n_voxels) =", Y_train.shape) print("(n_repeats, n_samples_test, n_voxels) =", Y_test.shape) .. rst-class:: sphx-glr-script-out .. code-block:: none (n_samples_train, n_voxels) = (3600, 84038) (n_repeats, n_samples_test, n_voxels) = (10, 270, 84038) .. GENERATED FROM PYTHON SOURCE LINES 51-53 We also compute the explainable variance, to exclude voxels with low explainable variance from the fit, and speed up the model fitting. .. GENERATED FROM PYTHON SOURCE LINES 53-61 .. code-block:: Python from voxelwise_tutorials.utils import explainable_variance ev = explainable_variance(Y_test) print("(n_voxels,) =", ev.shape) mask = ev > 0.1 print("(n_voxels_mask,) =", ev[mask].shape) .. rst-class:: sphx-glr-script-out .. code-block:: none (n_voxels,) = (84038,) (n_voxels_mask,) = (6849,) .. GENERATED FROM PYTHON SOURCE LINES 62-64 We average the test repeats, to remove the non-repeatable part of fMRI responses. .. GENERATED FROM PYTHON SOURCE LINES 64-68 .. code-block:: Python Y_test = Y_test.mean(0) print("(n_samples_test, n_voxels) =", Y_test.shape) .. rst-class:: sphx-glr-script-out .. code-block:: none (n_samples_test, n_voxels) = (270, 84038) .. GENERATED FROM PYTHON SOURCE LINES 69-70 We fill potential NaN (not-a-number) values with zeros. .. GENERATED FROM PYTHON SOURCE LINES 70-73 .. code-block:: Python Y_train = np.nan_to_num(Y_train) Y_test = np.nan_to_num(Y_test) .. GENERATED FROM PYTHON SOURCE LINES 74-75 And we make sure the targets are centered. .. GENERATED FROM PYTHON SOURCE LINES 75-78 .. code-block:: Python Y_train -= Y_train.mean(0) Y_test -= Y_test.mean(0) .. GENERATED FROM PYTHON SOURCE LINES 79-81 Then we load both feature spaces, that are going to be used for the linear regression model. .. GENERATED FROM PYTHON SOURCE LINES 81-104 .. code-block:: Python feature_names = ["wordnet", "motion_energy"] Xs_train = [] Xs_test = [] n_features_list = [] for feature_space in feature_names: file_name = os.path.join(directory, "features", f"{feature_space}.hdf") Xi_train = load_hdf5_array(file_name, key="X_train") Xi_test = load_hdf5_array(file_name, key="X_test") Xs_train.append(Xi_train.astype(dtype="float32")) Xs_test.append(Xi_test.astype(dtype="float32")) n_features_list.append(Xi_train.shape[1]) # concatenate the feature spaces X_train = np.concatenate(Xs_train, 1) X_test = np.concatenate(Xs_test, 1) print("(n_samples_train, n_features_total) =", X_train.shape) print("(n_samples_test, n_features_total) =", X_test.shape) print("[n_features_wordnet, n_features_motion_energy] =", n_features_list) .. rst-class:: sphx-glr-script-out .. code-block:: none (n_samples_train, n_features_total) = (3600, 8260) (n_samples_test, n_features_total) = (270, 8260) [n_features_wordnet, n_features_motion_energy] = [1705, 6555] .. GENERATED FROM PYTHON SOURCE LINES 105-109 Define the cross-validation scheme ---------------------------------- We define again a leave-one-run-out cross-validation split scheme. .. GENERATED FROM PYTHON SOURCE LINES 109-117 .. code-block:: Python from sklearn.model_selection import check_cv from voxelwise_tutorials.utils import generate_leave_one_run_out # indice of first sample of each run run_onsets = load_hdf5_array(file_name, key="run_onsets") print(run_onsets) .. rst-class:: sphx-glr-script-out .. code-block:: none [ 0 300 600 900 1200 1500 1800 2100 2400 2700 3000 3300] .. GENERATED FROM PYTHON SOURCE LINES 118-119 We define a cross-validation splitter, compatible with ``scikit-learn`` API. .. GENERATED FROM PYTHON SOURCE LINES 119-123 .. code-block:: Python n_samples_train = X_train.shape[0] cv = generate_leave_one_run_out(n_samples_train, run_onsets) cv = check_cv(cv) # copy the cross-validation splitter into a reusable list .. GENERATED FROM PYTHON SOURCE LINES 124-130 Define the model ---------------- The model pipeline contains similar steps than the pipeline from previous examples. We remove the mean of each feature with a ``StandardScaler``, and add delays with a ``Delayer``. .. GENERATED FROM PYTHON SOURCE LINES 130-136 .. code-block:: Python from sklearn.pipeline import make_pipeline from sklearn.preprocessing import StandardScaler from voxelwise_tutorials.delayer import Delayer from himalaya.backend import set_backend backend = set_backend("torch_cuda", on_error="warn") .. GENERATED FROM PYTHON SOURCE LINES 137-147 To fit the banded ridge model, we use ``himalaya``'s ``MultipleKernelRidgeCV`` model, with a separate linear kernel per feature space. Similarly to ``KernelRidgeCV``, the model optimizes the hyperparameters over cross-validation. However, while ``KernelRidgeCV`` has to optimize only one hyperparameter (``alpha``), ``MultipleKernelRidgeCV`` has to optimize ``m`` hyperparameters, where ``m`` is the number of feature spaces (here ``m = 2``). To do so, the model implements two different solvers, one using hyperparameter random search, and one using hyperparameter gradient descent. For large number of targets, we recommend using the random-search solver. .. GENERATED FROM PYTHON SOURCE LINES 149-152 The class takes a number of common parameters during initialization, such as ``kernels``, or ``solver``. Since the solver parameters vary depending on the solver used, they are passed as a ``solver_params`` dictionary. .. GENERATED FROM PYTHON SOURCE LINES 152-163 .. code-block:: Python from himalaya.kernel_ridge import MultipleKernelRidgeCV # Here we will use the "random_search" solver. solver = "random_search" # We can check its specific parameters in the function docstring: solver_function = MultipleKernelRidgeCV.ALL_SOLVERS[solver] print("Docstring of the function %s:" % solver_function.__name__) print(solver_function.__doc__) .. rst-class:: sphx-glr-script-out .. code-block:: none Docstring of the function solve_multiple_kernel_ridge_random_search: Solve multiple kernel ridge regression using random search. Parameters ---------- Ks : array of shape (n_kernels, n_samples, n_samples) Input kernels. Y : array of shape (n_samples, n_targets) Target data. n_iter : int, or array of shape (n_iter, n_kernels) Number of kernel weights combination to search. If an array is given, the solver uses it as the list of kernel weights to try, instead of sampling from a Dirichlet distribution. Examples: - `n_iter=np.eye(n_kernels)` implement a winner-take-all strategy over kernels. - `n_iter=np.ones((1, n_kernels))/n_kernels` solves a (standard) kernel ridge regression. concentration : float, or list of float Concentration parameters of the Dirichlet distribution. If a list, iteratively cycle through the list. Not used if n_iter is an array. alphas : float or array of shape (n_alphas, ) Range of ridge regularization parameter. score_func : callable Function used to compute the score of predictions versus Y. cv : int or scikit-learn splitter Cross-validation splitter. If an int, KFold is used. fit_intercept : boolean Whether to fit an intercept. If False, Ks should be centered (see KernelCenterer), and Y must be zero-mean over samples. Only available if return_weights == 'dual'. return_weights : None, 'primal', or 'dual' Whether to refit on the entire dataset and return the weights. Xs : array of shape (n_kernels, n_samples, n_features) or None Necessary if return_weights == 'primal'. local_alpha : bool If True, alphas are selected per target, else shared over all targets. jitter_alphas : bool If True, alphas range is slightly jittered for each gamma. random_state : int, or None Random generator seed. Use an int for deterministic search. n_targets_batch : int or None Size of the batch for over targets during cross-validation. Used for memory reasons. If None, uses all n_targets at once. n_targets_batch_refit : int or None Size of the batch for over targets during refit. Used for memory reasons. If None, uses all n_targets at once. n_alphas_batch : int or None Size of the batch for over alphas. Used for memory reasons. If None, uses all n_alphas at once. progress_bar : bool If True, display a progress bar over gammas. Ks_in_cpu : bool If True, keep Ks in CPU memory to limit GPU memory (slower). This feature is not available through the scikit-learn API. conservative : bool If True, when selecting the hyperparameter alpha, take the largest one that is less than one standard deviation away from the best. If False, take the best. Y_in_cpu : bool If True, keep the target values ``Y`` in CPU memory (slower). diagonalize_method : str in {"eigh", "svd"} Method used to diagonalize the kernel. return_alphas : bool If True, return the best alpha value for each target. Returns ------- deltas : array of shape (n_kernels, n_targets) Best log kernel weights for each target. refit_weights : array or None Refit regression weights on the entire dataset, using selected best hyperparameters. Refit weights are always stored on CPU memory. If return_weights == 'primal', shape is (n_features, n_targets), if return_weights == 'dual', shape is (n_samples, n_targets), else, None. cv_scores : array of shape (n_iter, n_targets) Cross-validation scores per iteration, averaged over splits, for the best alpha. Cross-validation scores will always be on CPU memory. best_alphas : array of shape (n_targets, ) Best alpha value per target. Only returned if return_alphas is True. intercept : array of shape (n_targets,) Intercept. Only returned when fit_intercept is True. .. GENERATED FROM PYTHON SOURCE LINES 164-173 The hyperparameter random-search solver separates the hyperparameters into a shared regularization ``alpha`` and a vector of positive kernel weights which sum to one. This separation of hyperparameters allows to explore efficiently a large grid of values for ``alpha`` for each sampled kernel weights vector. We use *20* random-search iterations to have a reasonably fast example. To have better results, especially for larger number of feature spaces, one might need more iterations. (Note that there is currently no stopping criterion in the random-search method.) .. GENERATED FROM PYTHON SOURCE LINES 173-177 .. code-block:: Python n_iter = 20 alphas = np.logspace(1, 20, 20) .. GENERATED FROM PYTHON SOURCE LINES 178-181 Batch parameters, used to reduce the necessary GPU memory. A larger value will be a bit faster, but the solver might crash if it is out of memory. Optimal values depend on the size of your dataset. .. GENERATED FROM PYTHON SOURCE LINES 181-185 .. code-block:: Python n_targets_batch = 200 n_alphas_batch = 5 n_targets_batch_refit = 200 .. GENERATED FROM PYTHON SOURCE LINES 186-188 We put all these parameters in a dictionary ``solver_params``, and define the main estimator ``MultipleKernelRidgeCV``. .. GENERATED FROM PYTHON SOURCE LINES 188-197 .. code-block:: Python solver_params = dict(n_iter=n_iter, alphas=alphas, n_targets_batch=n_targets_batch, n_alphas_batch=n_alphas_batch, n_targets_batch_refit=n_targets_batch_refit) mkr_model = MultipleKernelRidgeCV(kernels="precomputed", solver=solver, solver_params=solver_params, cv=cv) .. GENERATED FROM PYTHON SOURCE LINES 198-206 We need a bit more work than in previous examples before defining the full pipeline, since the banded ridge model requires `multiple` precomputed kernels, one for each feature space. To compute them, we use the ``ColumnKernelizer``, which can create multiple kernels from different column of your features array. ``ColumnKernelizer`` works similarly to ``scikit-learn``'s ``ColumnTransformer``, but instead of returning a concatenation of transformed features, it returns a stack of kernels, as required in ``MultipleKernelRidgeCV(kernels="precomputed")``. .. GENERATED FROM PYTHON SOURCE LINES 208-212 First, we create a different ``Kernelizer`` for each feature space. Here we use a linear kernel for all feature spaces, but ``ColumnKernelizer`` accepts any ``Kernelizer``, or ``scikit-learn`` ``Pipeline`` ending with a ``Kernelizer``. .. GENERATED FROM PYTHON SOURCE LINES 212-223 .. code-block:: Python from himalaya.kernel_ridge import Kernelizer from sklearn import set_config set_config(display='diagram') # requires scikit-learn 0.23 preprocess_pipeline = make_pipeline( StandardScaler(with_mean=True, with_std=False), Delayer(delays=[1, 2, 3, 4]), Kernelizer(kernel="linear"), ) preprocess_pipeline .. raw:: html
Pipeline(steps=[('standardscaler', StandardScaler(with_std=False)),
                    ('delayer', Delayer(delays=[1, 2, 3, 4])),
                    ('kernelizer', Kernelizer())])
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.


.. GENERATED FROM PYTHON SOURCE LINES 224-226 The column kernelizer applies a different pipeline on each selection of features, here defined with ``slices``. .. GENERATED FROM PYTHON SOURCE LINES 226-236 .. code-block:: Python from himalaya.kernel_ridge import ColumnKernelizer # Find the start and end of each feature space in the concatenated ``X_train``. start_and_end = np.concatenate([[0], np.cumsum(n_features_list)]) slices = [ slice(start, end) for start, end in zip(start_and_end[:-1], start_and_end[1:]) ] slices .. rst-class:: sphx-glr-script-out .. code-block:: none [slice(0, 1705, None), slice(1705, 8260, None)] .. GENERATED FROM PYTHON SOURCE LINES 237-245 .. code-block:: Python kernelizers_tuples = [(name, preprocess_pipeline, slice_) for name, slice_ in zip(feature_names, slices)] column_kernelizer = ColumnKernelizer(kernelizers_tuples) column_kernelizer # (Note that ``ColumnKernelizer`` has a parameter ``n_jobs`` to parallelize # each ``Kernelizer``, yet such parallelism does not work with GPU arrays.) .. raw:: html
ColumnKernelizer(transformers=[('wordnet',
                                    Pipeline(steps=[('standardscaler',
                                                     StandardScaler(with_std=False)),
                                                    ('delayer',
                                                     Delayer(delays=[1, 2, 3, 4])),
                                                    ('kernelizer', Kernelizer())]),
                                    slice(0, 1705, None)),
                                   ('motion_energy',
                                    Pipeline(steps=[('standardscaler',
                                                     StandardScaler(with_std=False)),
                                                    ('delayer',
                                                     Delayer(delays=[1, 2, 3, 4])),
                                                    ('kernelizer', Kernelizer())]),
                                    slice(1705, 8260, None))])
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.


.. GENERATED FROM PYTHON SOURCE LINES 246-247 Then we can define the model pipeline. .. GENERATED FROM PYTHON SOURCE LINES 247-254 .. code-block:: Python pipeline = make_pipeline( column_kernelizer, mkr_model, ) pipeline .. raw:: html
Pipeline(steps=[('columnkernelizer',
                     ColumnKernelizer(transformers=[('wordnet',
                                                     Pipeline(steps=[('standardscaler',
                                                                      StandardScaler(with_std=False)),
                                                                     ('delayer',
                                                                      Delayer(delays=[1,
                                                                                      2,
                                                                                      3,
                                                                                      4])),
                                                                     ('kernelizer',
                                                                      Kernelizer())]),
                                                     slice(0, 1705, None)),
                                                    ('motion_energy',
                                                     Pipeline(steps=[('standardscaler',
                                                                      StandardScaler(with_std=False)),
                                                                     ('delayer',
                                                                      Delayer(delays=[1,
                                                                                      2,
                                                                                      3,
                                                                                      4]...
                     MultipleKernelRidgeCV(cv=_CVIterableWrapper(cv=[(array([   0,    1, ..., 3598, 3599]), array([300, 301, ..., 598, 599])), (array([   0,    1, ..., 3598, 3599]), array([600, 601, ..., 898, 899])), (array([   0,    1, ..., 3598, 3599]), array([ 900,  901, ..., 1198, 1199])), (array([   0,    1, ..., 3598, 3599]), array([3000, 3001, ..., 3298, 3...1, ..., 2398, 2399])), (array([   0,    1, ..., 3598, 3599]), array([2400, 2401, ..., 2698, 2699]))]),
                                           kernels='precomputed',
                                           solver_params={'alphas': array([1.e+01, 1.e+02, 1.e+03, 1.e+04, 1.e+05, 1.e+06, 1.e+07, 1.e+08,
           1.e+09, 1.e+10, 1.e+11, 1.e+12, 1.e+13, 1.e+14, 1.e+15, 1.e+16,
           1.e+17, 1.e+18, 1.e+19, 1.e+20]),
                                                          'n_alphas_batch': 5,
                                                          'n_iter': 20,
                                                          'n_targets_batch': 200,
                                                          'n_targets_batch_refit': 200}))])
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.


.. GENERATED FROM PYTHON SOURCE LINES 255-265 Fit the model ------------- We fit on the train set, and score on the test set. To speed up the fit and to limit the memory peak in Colab, we only fit on voxels with explainable variance above 0.1. With a GPU backend, the fitting of this model takes around 6 minutes. With a CPU backend, it can last 10 times more. .. GENERATED FROM PYTHON SOURCE LINES 265-279 .. code-block:: Python pipeline.fit(X_train, Y_train[:, mask]) scores_mask = pipeline.score(X_test, Y_test[:, mask]) scores_mask = backend.to_numpy(scores_mask) print("(n_voxels_mask,) =", scores_mask.shape) # Then we extend the scores to all voxels, giving a score of zero to unfitted # voxels. n_voxels = Y_train.shape[1] scores = np.zeros(n_voxels) scores[mask] = scores_mask print("(n_voxels,) =", scores.shape) .. rst-class:: sphx-glr-script-out .. code-block:: none [ ] 0% | 0.00 sec | 20 random sampling with cv | [.. ] 5% | 4.91 sec | 20 random sampling with cv | [.... ] 10% | 9.74 sec | 20 random sampling with cv | [...... ] 15% | 14.19 sec | 20 random sampling with cv | [........ ] 20% | 18.47 sec | 20 random sampling with cv | [.......... ] 25% | 22.76 sec | 20 random sampling with cv | [............ ] 30% | 27.24 sec | 20 random sampling with cv | [.............. ] 35% | 31.90 sec | 20 random sampling with cv | [................ ] 40% | 36.31 sec | 20 random sampling with cv | [.................. ] 45% | 40.81 sec | 20 random sampling with cv | [.................... ] 50% | 45.24 sec | 20 random sampling with cv | [...................... ] 55% | 50.00 sec | 20 random sampling with cv | [........................ ] 60% | 54.41 sec | 20 random sampling with cv | [.......................... ] 65% | 58.99 sec | 20 random sampling with cv | [............................ ] 70% | 63.24 sec | 20 random sampling with cv | [.............................. ] 75% | 68.01 sec | 20 random sampling with cv | [................................ ] 80% | 72.37 sec | 20 random sampling with cv | [.................................. ] 85% | 76.87 sec | 20 random sampling with cv | [.................................... ] 90% | 81.16 sec | 20 random sampling with cv | [...................................... ] 95% | 85.98 sec | 20 random sampling with cv | [........................................] 100% | 90.25 sec | 20 random sampling with cv | (n_voxels_mask,) = (6849,) (n_voxels,) = (84038,) .. GENERATED FROM PYTHON SOURCE LINES 280-286 Compare with a ridge model -------------------------- We can compare with a baseline model, which does not use one hyperparameter per feature space, but instead shares the same hyperparameter for all feature spaces. .. GENERATED FROM PYTHON SOURCE LINES 286-300 .. code-block:: Python from himalaya.kernel_ridge import KernelRidgeCV pipeline_baseline = make_pipeline( StandardScaler(with_mean=True, with_std=False), Delayer(delays=[1, 2, 3, 4]), KernelRidgeCV( alphas=alphas, cv=cv, solver_params=dict(n_targets_batch=n_targets_batch, n_alphas_batch=n_alphas_batch, n_targets_batch_refit=n_targets_batch_refit)), ) pipeline_baseline .. raw:: html
Pipeline(steps=[('standardscaler', StandardScaler(with_std=False)),
                    ('delayer', Delayer(delays=[1, 2, 3, 4])),
                    ('kernelridgecv',
                     KernelRidgeCV(alphas=array([1.e+01, 1.e+02, 1.e+03, 1.e+04, 1.e+05, 1.e+06, 1.e+07, 1.e+08,
           1.e+09, 1.e+10, 1.e+11, 1.e+12, 1.e+13, 1.e+14, 1.e+15, 1.e+16,
           1.e+17, 1.e+18, 1.e+19, 1.e+20]),
                                   cv=_CVIterableWrapper(cv=[(array([   0,    1, ..., 3598, 3599]), array...)), (array([   0,    1, ..., 3598, 3599]), array([600, 601, ..., 898, 899])), (array([   0,    1, ..., 3598, 3599]), array([ 900,  901, ..., 1198, 1199])), (array([   0,    1, ..., 3598, 3599]), array([3000, 3001, ..., 3298, 3...1, ..., 2398, 2399])), (array([   0,    1, ..., 3598, 3599]), array([2400, 2401, ..., 2698, 2699]))]),
                                   solver_params={'n_alphas_batch': 5,
                                                  'n_targets_batch': 200,
                                                  'n_targets_batch_refit': 200}))])
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.


.. GENERATED FROM PYTHON SOURCE LINES 301-310 .. code-block:: Python pipeline_baseline.fit(X_train, Y_train[:, mask]) scores_baseline_mask = pipeline_baseline.score(X_test, Y_test[:, mask]) scores_baseline_mask = backend.to_numpy(scores_baseline_mask) # extend to unfitted voxels n_voxels = Y_train.shape[1] scores_baseline = np.zeros(n_voxels) scores_baseline[mask] = scores_baseline_mask .. GENERATED FROM PYTHON SOURCE LINES 311-316 Here we plot the comparison of model prediction accuracies with a 2D histogram. All 70k voxels are represented in this histogram, where the diagonal corresponds to identical model prediction accuracy for both models. A distribution deviating from the diagonal means that one model has better predictive performance than the other. .. GENERATED FROM PYTHON SOURCE LINES 316-324 .. code-block:: Python import matplotlib.pyplot as plt from voxelwise_tutorials.viz import plot_hist2d ax = plot_hist2d(scores_baseline, scores) ax.set(title='Generalization R2 scores', xlabel='KernelRidgeCV', ylabel='MultipleKernelRidgeCV') plt.show() .. image-sg:: /_auto_examples/shortclips/images/sphx_glr_06_plot_banded_ridge_model_001.png :alt: Generalization R2 scores :srcset: /_auto_examples/shortclips/images/sphx_glr_06_plot_banded_ridge_model_001.png :class: sphx-glr-single-img .. GENERATED FROM PYTHON SOURCE LINES 325-331 We see that the banded ridge model (``MultipleKernelRidgeCV``) outperforms the ridge model (``KernelRidegeCV``). Indeed, banded ridge regression is able to find the optimal scalings of each feature space, independently on each voxel. Banded ridge regression is thus able to perform a soft selection between the available feature spaces, based on the cross-validation performances. .. GENERATED FROM PYTHON SOURCE LINES 333-349 Plot the banded ridge split --------------------------- On top of better prediction accuracy, banded ridge regression also gives a way to disentangle the contribution of the two feature spaces. To do so, we take the kernel weights and the ridge (dual) weights corresponding to each feature space, and use them to compute the prediction from each feature space separately. .. math:: \hat{y} = \sum_i^m \hat{y}_i = \sum_i^m \gamma_i K_i \hat{w} Then, we use these split predictions to compute split :math:`\tilde{R}^2_i` scores. These scores are corrected so that their sum is equal to the :math:`R^2` score of the full prediction :math:`\hat{y}`. .. GENERATED FROM PYTHON SOURCE LINES 349-365 .. code-block:: Python from himalaya.scoring import r2_score_split Y_test_pred_split = pipeline.predict(X_test, split=True) split_scores_mask = r2_score_split(Y_test[:, mask], Y_test_pred_split) print("(n_kernels, n_samples_test, n_voxels_mask) =", Y_test_pred_split.shape) print("(n_kernels, n_voxels_mask) =", split_scores_mask.shape) # extend to unfitted voxels n_kernels = split_scores_mask.shape[0] n_voxels = Y_train.shape[1] split_scores = np.zeros((n_kernels, n_voxels)) split_scores[:, mask] = backend.to_numpy(split_scores_mask) print("(n_kernels, n_voxels) =", split_scores.shape) .. rst-class:: sphx-glr-script-out .. code-block:: none (n_kernels, n_samples_test, n_voxels_mask) = torch.Size([2, 270, 6849]) (n_kernels, n_voxels_mask) = torch.Size([2, 6849]) (n_kernels, n_voxels) = (2, 84038) .. GENERATED FROM PYTHON SOURCE LINES 366-367 We can then plot the split scores on a flatmap with a 2D colormap. .. GENERATED FROM PYTHON SOURCE LINES 367-377 .. code-block:: Python from voxelwise_tutorials.viz import plot_2d_flatmap_from_mapper mapper_file = os.path.join(directory, "mappers", f"{subject}_mappers.hdf") ax = plot_2d_flatmap_from_mapper(split_scores[0], split_scores[1], mapper_file, vmin=0, vmax=0.25, vmin2=0, vmax2=0.5, label_1=feature_names[0], label_2=feature_names[1]) plt.show() .. image-sg:: /_auto_examples/shortclips/images/sphx_glr_06_plot_banded_ridge_model_002.png :alt: 06 plot banded ridge model :srcset: /_auto_examples/shortclips/images/sphx_glr_06_plot_banded_ridge_model_002.png :class: sphx-glr-single-img .. GENERATED FROM PYTHON SOURCE LINES 378-389 The blue regions are better predicted by the motion-energy features, the orange regions are better predicted by the wordnet features, and the white regions are well predicted by both feature spaces. Compared to the last figure of the previous example, we see that most white regions have been replaced by either blue or orange regions. The banded ridge regression disentangled the two feature spaces in voxels where both feature spaces had good prediction accuracy (see previous example). For example, motion-energy features predict brain activity in early visual cortex, while wordnet features predict in semantic visual areas. For more discussions about these results, we refer the reader to the original publication [1]_. .. GENERATED FROM PYTHON SOURCE LINES 391-397 References ---------- .. [1] Nunez-Elizalde, A. O., Huth, A. G., & Gallant, J. L. (2019). Voxelwise encoding models with non-spherical multivariate normal priors. Neuroimage, 197, 482-492. .. rst-class:: sphx-glr-timing **Total running time of the script:** (1 minutes 58.143 seconds) .. _sphx_glr_download__auto_examples_shortclips_06_plot_banded_ridge_model.py: .. only:: html .. container:: sphx-glr-footer sphx-glr-footer-example .. container:: sphx-glr-download sphx-glr-download-jupyter :download:`Download Jupyter notebook: 06_plot_banded_ridge_model.ipynb <06_plot_banded_ridge_model.ipynb>` .. container:: sphx-glr-download sphx-glr-download-python :download:`Download Python source code: 06_plot_banded_ridge_model.py <06_plot_banded_ridge_model.py>` .. only:: html .. rst-class:: sphx-glr-signature `Gallery generated by Sphinx-Gallery `_