.. DO NOT EDIT.
.. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY.
.. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE:
.. "_auto_examples/shortclips/03_plot_wordnet_model.py"
.. LINE NUMBERS ARE GIVEN BELOW.

.. only:: html

    .. note::
        :class: sphx-glr-download-link-note

        :ref:`Go to the end <sphx_glr_download__auto_examples_shortclips_03_plot_wordnet_model.py>`
        to download the full example code

.. rst-class:: sphx-glr-example-title

.. _sphx_glr__auto_examples_shortclips_03_plot_wordnet_model.py:


=======================================
Fit a ridge model with wordnet features
=======================================

In this example, we model the fMRI responses with semantic "wordnet" features,
manually annotated on each frame of the movie stimulus. The model is a
regularized linear regression model, known as ridge regression. Since this
model is used to predict brain activity from the stimulus, it is called a
(voxelwise) encoding model.

This example reproduces part of the analysis described in Huth et al (2012)
[1]_. See this publication for more details about the experiment, the wordnet
features, along with more results and more discussions.

*Wordnet features:* The features used in this example are semantic labels
manually annotated on each frame of the movie stimulus. The semantic labels
include nouns (such as "woman", "car", or "building") and verbs (such as
"talking", "touching", or "walking"), for a total of 1705 distinct category
labels. To interpret our model, labels can be organized in a graph of semantic
relationship based on the `Wordnet <https://wordnet.princeton.edu/>`_ dataset.

*Summary:* We first concatenate the features with multiple temporal delays to
account for the slow hemodynamic response. We then use linear regression to fit
a predictive model of brain activity. The linear regression is regularized to
improve robustness to correlated features and to improve generalization
performance. The optimal regularization hyperparameter is selected over a
grid-search with cross-validation. Finally, the model generalization
performance is evaluated on a held-out test set, comparing the model
predictions to the corresponding ground-truth fMRI responses.

.. GENERATED FROM PYTHON SOURCE LINES 33-35

Path of the data directory
--------------------------

.. GENERATED FROM PYTHON SOURCE LINES 35-39

.. code-block:: Python

    from voxelwise_tutorials.io import get_data_home
    directory = get_data_home(dataset="shortclips")
    print(directory)


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    /home/jlg/mvdoc/voxelwise_tutorials_data/shortclips


.. GENERATED FROM PYTHON SOURCE LINES 40-44

.. code-block:: Python


    # modify to use another subject
    subject = "S01"


.. GENERATED FROM PYTHON SOURCE LINES 45-55

Load the data
-------------

We first load the fMRI responses. These responses have been preprocessed as
described in [1]_. The data is separated into a training set ``Y_train`` and a
testing set ``Y_test``. The training set is used for fitting models, and
selecting the best models and hyperparameters. The test set is later used
to estimate the generalization performance of the selected model. The
test set contains multiple repetitions of the same experiment to estimate
an upper bound of the model prediction accuracy (cf. previous example).

.. GENERATED FROM PYTHON SOURCE LINES 55-66

.. code-block:: Python

    import os
    import numpy as np
    from voxelwise_tutorials.io import load_hdf5_array

    file_name = os.path.join(directory, "responses", f"{subject}_responses.hdf")
    Y_train = load_hdf5_array(file_name, key="Y_train")
    Y_test = load_hdf5_array(file_name, key="Y_test")

    print("(n_samples_train, n_voxels) =", Y_train.shape)
    print("(n_repeats, n_samples_test, n_voxels) =", Y_test.shape)


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    (n_samples_train, n_voxels) = (3600, 84038)
    (n_repeats, n_samples_test, n_voxels) = (10, 270, 84038)


.. GENERATED FROM PYTHON SOURCE LINES 67-72

If we repeat an experiment multiple times, part of the fMRI responses might
change. However the modeling features do not change over the repeats, so the
voxelwise encoding model will predict the same signal for each repeat. To
have an upper bound of the model prediction accuracy, we keep only the
repeatable part of the signal by averaging the test repeats.

.. GENERATED FROM PYTHON SOURCE LINES 72-76

.. code-block:: Python

    Y_test = Y_test.mean(0)

    print("(n_samples_test, n_voxels) =", Y_test.shape)


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    (n_samples_test, n_voxels) = (270, 84038)


.. GENERATED FROM PYTHON SOURCE LINES 77-78

We fill potential NaN (not-a-number) values with zeros.

.. GENERATED FROM PYTHON SOURCE LINES 78-81

.. code-block:: Python

    Y_train = np.nan_to_num(Y_train)
    Y_test = np.nan_to_num(Y_test)


.. GENERATED FROM PYTHON SOURCE LINES 82-86

Then, we load the semantic "wordnet" features, extracted from the stimulus at
each time point. The features corresponding to the training set are noted
``X_train``, and the features corresponding to the test set are noted
``X_test``.

.. GENERATED FROM PYTHON SOURCE LINES 86-95

.. code-block:: Python

    feature_space = "wordnet"

    file_name = os.path.join(directory, "features", f"{feature_space}.hdf")
    X_train = load_hdf5_array(file_name, key="X_train")
    X_test = load_hdf5_array(file_name, key="X_test")

    print("(n_samples_train, n_features) =", X_train.shape)
    print("(n_samples_test, n_features) =", X_test.shape)


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    (n_samples_train, n_features) = (3600, 1705)
    (n_samples_test, n_features) = (270, 1705)


.. GENERATED FROM PYTHON SOURCE LINES 96-107

Define the cross-validation scheme
----------------------------------

To select the best hyperparameter through cross-validation, we must define a
cross-validation splitting scheme. Because fMRI time-series are
autocorrelated in time, we should preserve as much as possible the temporal
correlation. In other words, because consecutive time samples are correlated,
we should not put one time sample in the training set and the immediately
following time sample in the validation set. Thus, we define here a
leave-one-run-out cross-validation split that keeps each recording run
intact.

.. GENERATED FROM PYTHON SOURCE LINES 107-115

.. code-block:: Python


    from sklearn.model_selection import check_cv
    from voxelwise_tutorials.utils import generate_leave_one_run_out

    # indice of first sample of each run
    run_onsets = load_hdf5_array(file_name, key="run_onsets")
    print(run_onsets)


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    [   0  300  600  900 1200 1500 1800 2100 2400 2700 3000 3300]


.. GENERATED FROM PYTHON SOURCE LINES 116-117

We define a cross-validation splitter, compatible with ``scikit-learn`` API.

.. GENERATED FROM PYTHON SOURCE LINES 117-121

.. code-block:: Python

    n_samples_train = X_train.shape[0]
    cv = generate_leave_one_run_out(n_samples_train, run_onsets)
    cv = check_cv(cv)  # copy the cross-validation splitter into a reusable list


.. GENERATED FROM PYTHON SOURCE LINES 122-139

Define the model
----------------

Now, let's define the model pipeline.

We first center the features, since we will not use an intercept. The mean
value in fMRI recording is non-informative, so each run is detrended and
demeaned independently, and we do not need to predict an intercept value in
the linear model.

However, we prefer to avoid normalizing by the standard deviation of each
feature. If the features are extracted in a consistent way from the stimulus,
their relative scale is meaningful. Normalizing them independently from each
other would remove this information. Moreover, the wordnet features are
one-hot-encoded, which means that each feature is either present (1) or not
present (0) in each sample. Normalizing one-hot-encoded features is not
recommended, since it would scale disproportionately the infrequent features.

.. GENERATED FROM PYTHON SOURCE LINES 139-143

.. code-block:: Python


    from sklearn.preprocessing import StandardScaler
    scaler = StandardScaler(with_mean=True, with_std=False)


.. GENERATED FROM PYTHON SOURCE LINES 144-152

Then we concatenate the features with multiple delays to account for the
hemodynamic response. Due to neurovascular coupling, the recorded BOLD signal
is delayed in time with respect to the stimulus onset. With different delayed
versions of the features, the linear regression model will weigh each delayed
feature with a different weight to maximize the predictions. With a sample
every 2 seconds, we typically use 4 delays [1, 2, 3, 4] to cover the
hemodynamic response peak. In the next example, we further describe this
hemodynamic response estimation.

.. GENERATED FROM PYTHON SOURCE LINES 152-155

.. code-block:: Python

    from voxelwise_tutorials.delayer import Delayer
    delayer = Delayer(delays=[1, 2, 3, 4])


.. GENERATED FROM PYTHON SOURCE LINES 156-183

Finally, we use a ridge regression model. Ridge regression is a linear
regression with L2 regularization. The L2 regularization improves robustness
to correlated features and improves generalization performance. The L2
regularization is controlled by a hyperparameter ``alpha`` that needs to be
tuned for each dataset. This regularization hyperparameter is usually
selected over a grid search with cross-validation, selecting the
hyperparameter that maximizes the predictive performances on the validation
set. See the previous example for more details about ridge regression and
hyperparameter selection.

For computational reasons, when the number of features is larger than the
number of samples, it is more efficient to solve ridge regression using the
(equivalent) dual formulation [2]_. This dual formulation is equivalent to
kernel ridge regression with a linear kernel. Here, we have 3600 training
samples, and 1705 * 4 = 6820 features (we multiply by 4 since we use 4 time
delays), therefore it is more efficient to use kernel ridge regression.

With one target, we could directly use the pipeline in ``scikit-learn``'s
``GridSearchCV``, to select the optimal regularization hyperparameter
(``alpha``) over cross-validation. However, ``GridSearchCV`` can only
optimize a single score across all voxels (targets). Thus, in the
multiple-target case, ``GridSearchCV`` can only optimize (for example) the
mean score over targets. Here, we want to find a different optimal
hyperparameter per target/voxel, so we use the package `himalaya
<https://github.com/gallantlab/himalaya>`_ which implements a
``scikit-learn`` compatible estimator ``KernelRidgeCV``, with hyperparameter
selection independently on each target.

.. GENERATED FROM PYTHON SOURCE LINES 183-185

.. code-block:: Python

    from himalaya.kernel_ridge import KernelRidgeCV


.. GENERATED FROM PYTHON SOURCE LINES 186-195

``himalaya`` implements different computational backends,
including two backends that use GPU for faster computations. The two
available GPU backends are "torch_cuda" and "cupy". (Each backend is only
available if you installed the corresponding package with CUDA enabled. Check
the ``pytorch``/``cupy`` documentation for install instructions.)

Here we use the "torch_cuda" backend, but if the import fails we continue
with the default "numpy" backend. The "numpy" backend is expected to be
slower since it only uses the CPU.

.. GENERATED FROM PYTHON SOURCE LINES 195-199

.. code-block:: Python

    from himalaya.backend import set_backend
    backend = set_backend("torch_cuda", on_error="warn")
    print(backend)


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    <module 'himalaya.backend.torch_cuda' from '/home/jlg/mvdoc/miniconda3/envs/voxelwise_tutorials/lib/python3.10/site-packages/himalaya/backend/torch_cuda.py'>


.. GENERATED FROM PYTHON SOURCE LINES 200-203

To speed up model fitting on GPU, we use single precision float numbers.
(This step probably does not change significantly the performances on non-GPU
backends.)

.. GENERATED FROM PYTHON SOURCE LINES 203-206

.. code-block:: Python

    X_train = X_train.astype("float32")
    X_test = X_test.astype("float32")


.. GENERATED FROM PYTHON SOURCE LINES 207-210

Since the scale of the regularization hyperparameter ``alpha`` is unknown, we
use a large logarithmic range, and we will check after the fit that best
hyperparameters are not all on one range edge.

.. GENERATED FROM PYTHON SOURCE LINES 210-212

.. code-block:: Python

    alphas = np.logspace(1, 20, 20)


.. GENERATED FROM PYTHON SOURCE LINES 213-214

We also indicate some batch sizes to limit the GPU memory.

.. GENERATED FROM PYTHON SOURCE LINES 214-219

.. code-block:: Python

    kernel_ridge_cv = KernelRidgeCV(
        alphas=alphas, cv=cv,
        solver_params=dict(n_targets_batch=500, n_alphas_batch=5,
                           n_targets_batch_refit=100))


.. GENERATED FROM PYTHON SOURCE LINES 220-227

Finally, we use a ``scikit-learn`` ``Pipeline`` to link the different steps
together. A ``Pipeline`` can be used as a regular estimator, calling
``pipeline.fit``, ``pipeline.predict``, etc. Using a ``Pipeline`` can be
useful to clarify the different steps, avoid cross-validation mistakes, or
automatically cache intermediate results. See the ``scikit-learn``
`documentation <https://scikit-learn.org/stable/modules/compose.html>`_ for
more information.

.. GENERATED FROM PYTHON SOURCE LINES 227-234

.. code-block:: Python

    from sklearn.pipeline import make_pipeline
    pipeline = make_pipeline(
        scaler,
        delayer,
        kernel_ridge_cv,
    )


.. GENERATED FROM PYTHON SOURCE LINES 235-236

We can display the ``scikit-learn`` pipeline with an HTML diagram.

.. GENERATED FROM PYTHON SOURCE LINES 236-240

.. code-block:: Python

    from sklearn import set_config
    set_config(display='diagram')  # requires scikit-learn 0.23
    pipeline


.. raw:: html

    <div class="output_subarea output_html rendered_html output_result">
    <style>#sk-container-id-1 {
      /* Definition of color scheme common for light and dark mode */
      --sklearn-color-text: black;
      --sklearn-color-line: gray;
      /* Definition of color scheme for unfitted estimators */
      --sklearn-color-unfitted-level-0: #fff5e6;
      --sklearn-color-unfitted-level-1: #f6e4d2;
      --sklearn-color-unfitted-level-2: #ffe0b3;
      --sklearn-color-unfitted-level-3: chocolate;
      /* Definition of color scheme for fitted estimators */
      --sklearn-color-fitted-level-0: #f0f8ff;
      --sklearn-color-fitted-level-1: #d4ebff;
      --sklearn-color-fitted-level-2: #b3dbfd;
      --sklearn-color-fitted-level-3: cornflowerblue;

      /* Specific color for light theme */
      --sklearn-color-text-on-default-background: var(--sg-text-color, var(--theme-code-foreground, var(--jp-content-font-color1, black)));
      --sklearn-color-background: var(--sg-background-color, var(--theme-background, var(--jp-layout-color0, white)));
      --sklearn-color-border-box: var(--sg-text-color, var(--theme-code-foreground, var(--jp-content-font-color1, black)));
      --sklearn-color-icon: #696969;

      @media (prefers-color-scheme: dark) {
        /* Redefinition of color scheme for dark theme */
        --sklearn-color-text-on-default-background: var(--sg-text-color, var(--theme-code-foreground, var(--jp-content-font-color1, white)));
        --sklearn-color-background: var(--sg-background-color, var(--theme-background, var(--jp-layout-color0, #111)));
        --sklearn-color-border-box: var(--sg-text-color, var(--theme-code-foreground, var(--jp-content-font-color1, white)));
        --sklearn-color-icon: #878787;
      }
    }

    #sk-container-id-1 {
      color: var(--sklearn-color-text);
    }

    #sk-container-id-1 pre {
      padding: 0;
    }

    #sk-container-id-1 input.sk-hidden--visually {
      border: 0;
      clip: rect(1px 1px 1px 1px);
      clip: rect(1px, 1px, 1px, 1px);
      height: 1px;
      margin: -1px;
      overflow: hidden;
      padding: 0;
      position: absolute;
      width: 1px;
    }

    #sk-container-id-1 div.sk-dashed-wrapped {
      border: 1px dashed var(--sklearn-color-line);
      margin: 0 0.4em 0.5em 0.4em;
      box-sizing: border-box;
      padding-bottom: 0.4em;
      background-color: var(--sklearn-color-background);
    }

    #sk-container-id-1 div.sk-container {
      /* jupyter's `normalize.less` sets `[hidden] { display: none; }`
         but bootstrap.min.css set `[hidden] { display: none !important; }`
         so we also need the `!important` here to be able to override the
         default hidden behavior on the sphinx rendered scikit-learn.org.
         See: https://github.com/scikit-learn/scikit-learn/issues/21755 */
      display: inline-block !important;
      position: relative;
    }

    #sk-container-id-1 div.sk-text-repr-fallback {
      display: none;
    }

    div.sk-parallel-item,
    div.sk-serial,
    div.sk-item {
      /* draw centered vertical line to link estimators */
      background-image: linear-gradient(var(--sklearn-color-text-on-default-background), var(--sklearn-color-text-on-default-background));
      background-size: 2px 100%;
      background-repeat: no-repeat;
      background-position: center center;
    }

    /* Parallel-specific style estimator block */

    #sk-container-id-1 div.sk-parallel-item::after {
      content: "";
      width: 100%;
      border-bottom: 2px solid var(--sklearn-color-text-on-default-background);
      flex-grow: 1;
    }

    #sk-container-id-1 div.sk-parallel {
      display: flex;
      align-items: stretch;
      justify-content: center;
      background-color: var(--sklearn-color-background);
      position: relative;
    }

    #sk-container-id-1 div.sk-parallel-item {
      display: flex;
      flex-direction: column;
    }

    #sk-container-id-1 div.sk-parallel-item:first-child::after {
      align-self: flex-end;
      width: 50%;
    }

    #sk-container-id-1 div.sk-parallel-item:last-child::after {
      align-self: flex-start;
      width: 50%;
    }

    #sk-container-id-1 div.sk-parallel-item:only-child::after {
      width: 0;
    }

    /* Serial-specific style estimator block */

    #sk-container-id-1 div.sk-serial {
      display: flex;
      flex-direction: column;
      align-items: center;
      background-color: var(--sklearn-color-background);
      padding-right: 1em;
      padding-left: 1em;
    }


    /* Toggleable style: style used for estimator/Pipeline/ColumnTransformer box that is
    clickable and can be expanded/collapsed.
    - Pipeline and ColumnTransformer use this feature and define the default style
    - Estimators will overwrite some part of the style using the `sk-estimator` class
    */

    /* Pipeline and ColumnTransformer style (default) */

    #sk-container-id-1 div.sk-toggleable {
      /* Default theme specific background. It is overwritten whether we have a
      specific estimator or a Pipeline/ColumnTransformer */
      background-color: var(--sklearn-color-background);
    }

    /* Toggleable label */
    #sk-container-id-1 label.sk-toggleable__label {
      cursor: pointer;
      display: block;
      width: 100%;
      margin-bottom: 0;
      padding: 0.5em;
      box-sizing: border-box;
      text-align: center;
    }

    #sk-container-id-1 label.sk-toggleable__label-arrow:before {
      /* Arrow on the left of the label */
      content: "▸";
      float: left;
      margin-right: 0.25em;
      color: var(--sklearn-color-icon);
    }

    #sk-container-id-1 label.sk-toggleable__label-arrow:hover:before {
      color: var(--sklearn-color-text);
    }

    /* Toggleable content - dropdown */

    #sk-container-id-1 div.sk-toggleable__content {
      max-height: 0;
      max-width: 0;
      overflow: hidden;
      text-align: left;
      /* unfitted */
      background-color: var(--sklearn-color-unfitted-level-0);
    }

    #sk-container-id-1 div.sk-toggleable__content.fitted {
      /* fitted */
      background-color: var(--sklearn-color-fitted-level-0);
    }

    #sk-container-id-1 div.sk-toggleable__content pre {
      margin: 0.2em;
      border-radius: 0.25em;
      color: var(--sklearn-color-text);
      /* unfitted */
      background-color: var(--sklearn-color-unfitted-level-0);
    }

    #sk-container-id-1 div.sk-toggleable__content.fitted pre {
      /* unfitted */
      background-color: var(--sklearn-color-fitted-level-0);
    }

    #sk-container-id-1 input.sk-toggleable__control:checked~div.sk-toggleable__content {
      /* Expand drop-down */
      max-height: 200px;
      max-width: 100%;
      overflow: auto;
    }

    #sk-container-id-1 input.sk-toggleable__control:checked~label.sk-toggleable__label-arrow:before {
      content: "▾";
    }

    /* Pipeline/ColumnTransformer-specific style */

    #sk-container-id-1 div.sk-label input.sk-toggleable__control:checked~label.sk-toggleable__label {
      color: var(--sklearn-color-text);
      background-color: var(--sklearn-color-unfitted-level-2);
    }

    #sk-container-id-1 div.sk-label.fitted input.sk-toggleable__control:checked~label.sk-toggleable__label {
      background-color: var(--sklearn-color-fitted-level-2);
    }

    /* Estimator-specific style */

    /* Colorize estimator box */
    #sk-container-id-1 div.sk-estimator input.sk-toggleable__control:checked~label.sk-toggleable__label {
      /* unfitted */
      background-color: var(--sklearn-color-unfitted-level-2);
    }

    #sk-container-id-1 div.sk-estimator.fitted input.sk-toggleable__control:checked~label.sk-toggleable__label {
      /* fitted */
      background-color: var(--sklearn-color-fitted-level-2);
    }

    #sk-container-id-1 div.sk-label label.sk-toggleable__label,
    #sk-container-id-1 div.sk-label label {
      /* The background is the default theme color */
      color: var(--sklearn-color-text-on-default-background);
    }

    /* On hover, darken the color of the background */
    #sk-container-id-1 div.sk-label:hover label.sk-toggleable__label {
      color: var(--sklearn-color-text);
      background-color: var(--sklearn-color-unfitted-level-2);
    }

    /* Label box, darken color on hover, fitted */
    #sk-container-id-1 div.sk-label.fitted:hover label.sk-toggleable__label.fitted {
      color: var(--sklearn-color-text);
      background-color: var(--sklearn-color-fitted-level-2);
    }

    /* Estimator label */

    #sk-container-id-1 div.sk-label label {
      font-family: monospace;
      font-weight: bold;
      display: inline-block;
      line-height: 1.2em;
    }

    #sk-container-id-1 div.sk-label-container {
      text-align: center;
    }

    /* Estimator-specific */
    #sk-container-id-1 div.sk-estimator {
      font-family: monospace;
      border: 1px dotted var(--sklearn-color-border-box);
      border-radius: 0.25em;
      box-sizing: border-box;
      margin-bottom: 0.5em;
      /* unfitted */
      background-color: var(--sklearn-color-unfitted-level-0);
    }

    #sk-container-id-1 div.sk-estimator.fitted {
      /* fitted */
      background-color: var(--sklearn-color-fitted-level-0);
    }

    /* on hover */
    #sk-container-id-1 div.sk-estimator:hover {
      /* unfitted */
      background-color: var(--sklearn-color-unfitted-level-2);
    }

    #sk-container-id-1 div.sk-estimator.fitted:hover {
      /* fitted */
      background-color: var(--sklearn-color-fitted-level-2);
    }

    /* Specification for estimator info (e.g. "i" and "?") */

    /* Common style for "i" and "?" */

    .sk-estimator-doc-link,
    a:link.sk-estimator-doc-link,
    a:visited.sk-estimator-doc-link {
      float: right;
      font-size: smaller;
      line-height: 1em;
      font-family: monospace;
      background-color: var(--sklearn-color-background);
      border-radius: 1em;
      height: 1em;
      width: 1em;
      text-decoration: none !important;
      margin-left: 1ex;
      /* unfitted */
      border: var(--sklearn-color-unfitted-level-1) 1pt solid;
      color: var(--sklearn-color-unfitted-level-1);
    }

    .sk-estimator-doc-link.fitted,
    a:link.sk-estimator-doc-link.fitted,
    a:visited.sk-estimator-doc-link.fitted {
      /* fitted */
      border: var(--sklearn-color-fitted-level-1) 1pt solid;
      color: var(--sklearn-color-fitted-level-1);
    }

    /* On hover */
    div.sk-estimator:hover .sk-estimator-doc-link:hover,
    .sk-estimator-doc-link:hover,
    div.sk-label-container:hover .sk-estimator-doc-link:hover,
    .sk-estimator-doc-link:hover {
      /* unfitted */
      background-color: var(--sklearn-color-unfitted-level-3);
      color: var(--sklearn-color-background);
      text-decoration: none;
    }

    div.sk-estimator.fitted:hover .sk-estimator-doc-link.fitted:hover,
    .sk-estimator-doc-link.fitted:hover,
    div.sk-label-container:hover .sk-estimator-doc-link.fitted:hover,
    .sk-estimator-doc-link.fitted:hover {
      /* fitted */
      background-color: var(--sklearn-color-fitted-level-3);
      color: var(--sklearn-color-background);
      text-decoration: none;
    }

    /* Span, style for the box shown on hovering the info icon */
    .sk-estimator-doc-link span {
      display: none;
      z-index: 9999;
      position: relative;
      font-weight: normal;
      right: .2ex;
      padding: .5ex;
      margin: .5ex;
      width: min-content;
      min-width: 20ex;
      max-width: 50ex;
      color: var(--sklearn-color-text);
      box-shadow: 2pt 2pt 4pt #999;
      /* unfitted */
      background: var(--sklearn-color-unfitted-level-0);
      border: .5pt solid var(--sklearn-color-unfitted-level-3);
    }

    .sk-estimator-doc-link.fitted span {
      /* fitted */
      background: var(--sklearn-color-fitted-level-0);
      border: var(--sklearn-color-fitted-level-3);
    }

    .sk-estimator-doc-link:hover span {
      display: block;
    }

    /* "?"-specific style due to the `<a>` HTML tag */

    #sk-container-id-1 a.estimator_doc_link {
      float: right;
      font-size: 1rem;
      line-height: 1em;
      font-family: monospace;
      background-color: var(--sklearn-color-background);
      border-radius: 1rem;
      height: 1rem;
      width: 1rem;
      text-decoration: none;
      /* unfitted */
      color: var(--sklearn-color-unfitted-level-1);
      border: var(--sklearn-color-unfitted-level-1) 1pt solid;
    }

    #sk-container-id-1 a.estimator_doc_link.fitted {
      /* fitted */
      border: var(--sklearn-color-fitted-level-1) 1pt solid;
      color: var(--sklearn-color-fitted-level-1);
    }

    /* On hover */
    #sk-container-id-1 a.estimator_doc_link:hover {
      /* unfitted */
      background-color: var(--sklearn-color-unfitted-level-3);
      color: var(--sklearn-color-background);
      text-decoration: none;
    }

    #sk-container-id-1 a.estimator_doc_link.fitted:hover {
      /* fitted */
      background-color: var(--sklearn-color-fitted-level-3);
    }
    </style><div id="sk-container-id-1" class="sk-top-container"><div class="sk-text-repr-fallback"><pre>Pipeline(steps=[(&#x27;standardscaler&#x27;, StandardScaler(with_std=False)),
                    (&#x27;delayer&#x27;, Delayer(delays=[1, 2, 3, 4])),
                    (&#x27;kernelridgecv&#x27;,
                     KernelRidgeCV(alphas=array([1.e+01, 1.e+02, 1.e+03, 1.e+04, 1.e+05, 1.e+06, 1.e+07, 1.e+08,
           1.e+09, 1.e+10, 1.e+11, 1.e+12, 1.e+13, 1.e+14, 1.e+15, 1.e+16,
           1.e+17, 1.e+18, 1.e+19, 1.e+20]),
                                   cv=_CVIterableWrapper(cv=[(array([   0,    1, ..., 3598, 3599]), array...9])), (array([   0,    1, ..., 3598, 3599]), array([1200, 1201, ..., 1498, 1499])), (array([   0,    1, ..., 3598, 3599]), array([ 900,  901, ..., 1198, 1199])), (array([   0,    1, ..., 3598, 3599]), array([2400, 2401, ...,...1, ..., 3298, 3299])), (array([   0,    1, ..., 3598, 3599]), array([2100, 2101, ..., 2398, 2399]))]),
                                   solver_params={&#x27;n_alphas_batch&#x27;: 5,
                                                  &#x27;n_targets_batch&#x27;: 500,
                                                  &#x27;n_targets_batch_refit&#x27;: 100}))])</pre><b>In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook. <br />On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.</b></div><div class="sk-container" hidden><div class="sk-item sk-dashed-wrapped"><div class="sk-label-container"><div class="sk-label  sk-toggleable"><input class="sk-toggleable__control sk-hidden--visually" id="sk-estimator-id-1" type="checkbox" ><label for="sk-estimator-id-1" class="sk-toggleable__label  sk-toggleable__label-arrow ">&nbsp;&nbsp;Pipeline<a class="sk-estimator-doc-link " rel="noreferrer" target="_blank" href="https://scikit-learn.org/1.4/modules/generated/sklearn.pipeline.Pipeline.html">?<span>Documentation for Pipeline</span></a><span class="sk-estimator-doc-link ">i<span>Not fitted</span></span></label><div class="sk-toggleable__content "><pre>Pipeline(steps=[(&#x27;standardscaler&#x27;, StandardScaler(with_std=False)),
                    (&#x27;delayer&#x27;, Delayer(delays=[1, 2, 3, 4])),
                    (&#x27;kernelridgecv&#x27;,
                     KernelRidgeCV(alphas=array([1.e+01, 1.e+02, 1.e+03, 1.e+04, 1.e+05, 1.e+06, 1.e+07, 1.e+08,
           1.e+09, 1.e+10, 1.e+11, 1.e+12, 1.e+13, 1.e+14, 1.e+15, 1.e+16,
           1.e+17, 1.e+18, 1.e+19, 1.e+20]),
                                   cv=_CVIterableWrapper(cv=[(array([   0,    1, ..., 3598, 3599]), array...9])), (array([   0,    1, ..., 3598, 3599]), array([1200, 1201, ..., 1498, 1499])), (array([   0,    1, ..., 3598, 3599]), array([ 900,  901, ..., 1198, 1199])), (array([   0,    1, ..., 3598, 3599]), array([2400, 2401, ...,...1, ..., 3298, 3299])), (array([   0,    1, ..., 3598, 3599]), array([2100, 2101, ..., 2398, 2399]))]),
                                   solver_params={&#x27;n_alphas_batch&#x27;: 5,
                                                  &#x27;n_targets_batch&#x27;: 500,
                                                  &#x27;n_targets_batch_refit&#x27;: 100}))])</pre></div> </div></div><div class="sk-serial"><div class="sk-item"><div class="sk-estimator  sk-toggleable"><input class="sk-toggleable__control sk-hidden--visually" id="sk-estimator-id-2" type="checkbox" ><label for="sk-estimator-id-2" class="sk-toggleable__label  sk-toggleable__label-arrow ">&nbsp;StandardScaler<a class="sk-estimator-doc-link " rel="noreferrer" target="_blank" href="https://scikit-learn.org/1.4/modules/generated/sklearn.preprocessing.StandardScaler.html">?<span>Documentation for StandardScaler</span></a></label><div class="sk-toggleable__content "><pre>StandardScaler(with_std=False)</pre></div> </div></div><div class="sk-item"><div class="sk-estimator  sk-toggleable"><input class="sk-toggleable__control sk-hidden--visually" id="sk-estimator-id-3" type="checkbox" ><label for="sk-estimator-id-3" class="sk-toggleable__label  sk-toggleable__label-arrow ">Delayer</label><div class="sk-toggleable__content "><pre>Delayer(delays=[1, 2, 3, 4])</pre></div> </div></div><div class="sk-item"><div class="sk-estimator  sk-toggleable"><input class="sk-toggleable__control sk-hidden--visually" id="sk-estimator-id-4" type="checkbox" ><label for="sk-estimator-id-4" class="sk-toggleable__label  sk-toggleable__label-arrow ">KernelRidgeCV</label><div class="sk-toggleable__content "><pre>KernelRidgeCV(alphas=array([1.e+01, 1.e+02, 1.e+03, 1.e+04, 1.e+05, 1.e+06, 1.e+07, 1.e+08,
           1.e+09, 1.e+10, 1.e+11, 1.e+12, 1.e+13, 1.e+14, 1.e+15, 1.e+16,
           1.e+17, 1.e+18, 1.e+19, 1.e+20]),
                  cv=_CVIterableWrapper(cv=[(array([   0,    1, ..., 3598, 3599]), array([1500, 1501, ..., 1798, 1799])), (array([   0,    1, ..., 3598, 3599]), array([1200, 1201, ..., 1498, 1499])), (array([   0,    1, ..., 3598, 3599]), array([ 900,  901, ..., 1198, 1199])), (array([   0,    1, ..., 3598, 3599]), array([2400, 2401, ...,...1, ..., 3298, 3299])), (array([   0,    1, ..., 3598, 3599]), array([2100, 2101, ..., 2398, 2399]))]),
                  solver_params={&#x27;n_alphas_batch&#x27;: 5, &#x27;n_targets_batch&#x27;: 500,
                                 &#x27;n_targets_batch_refit&#x27;: 100})</pre></div> </div></div></div></div></div></div>
    </div>
    <br />
    <br />

.. GENERATED FROM PYTHON SOURCE LINES 241-245

Fit the model
-------------

We fit on the training set..

.. GENERATED FROM PYTHON SOURCE LINES 245-248

.. code-block:: Python


    _ = pipeline.fit(X_train, Y_train)


.. GENERATED FROM PYTHON SOURCE LINES 249-256

..and score on the test set. Here the scores are the :math:`R^2` scores, with
values in :math:`]-\infty, 1]`. A value of :math:`1` means the predictions
are perfect.

Note that since ``himalaya`` is implementing multiple-targets
models, the ``score`` method differs from ``scikit-learn`` API and returns
one score per target/voxel.

.. GENERATED FROM PYTHON SOURCE LINES 256-259

.. code-block:: Python

    scores = pipeline.score(X_test, Y_test)
    print("(n_voxels,) =", scores.shape)


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    (n_voxels,) = torch.Size([84038])


.. GENERATED FROM PYTHON SOURCE LINES 260-264

If we fit the model on GPU, scores are returned on GPU using an array object
specific to the backend we used (such as a ``torch.Tensor``). Thus, we need to
move them into ``numpy`` arrays on CPU, to be able to use them for example in
a ``matplotlib`` figure.

.. GENERATED FROM PYTHON SOURCE LINES 264-266

.. code-block:: Python

    scores = backend.to_numpy(scores)


.. GENERATED FROM PYTHON SOURCE LINES 267-274

Plot the model prediction accuracy
----------------------------------

To visualize the model prediction accuracy, we can plot it for each voxel on
a flattened surface of the brain. To do so, we use a mapper that is specific
to the each subject's brain. (Check previous example to see how to use the
mapper to Freesurfer average surface.)

.. GENERATED FROM PYTHON SOURCE LINES 274-281

.. code-block:: Python

    import matplotlib.pyplot as plt
    from voxelwise_tutorials.viz import plot_flatmap_from_mapper

    mapper_file = os.path.join(directory, "mappers", f"{subject}_mappers.hdf")
    ax = plot_flatmap_from_mapper(scores, mapper_file, vmin=0, vmax=0.4)
    plt.show()


.. image-sg:: /_auto_examples/shortclips/images/sphx_glr_03_plot_wordnet_model_001.png
   :alt: 03 plot wordnet model
   :srcset: /_auto_examples/shortclips/images/sphx_glr_03_plot_wordnet_model_001.png
   :class: sphx-glr-single-img


.. GENERATED FROM PYTHON SOURCE LINES 282-293

We can see that the "wordnet" features successfully predict part of the
measured brain activity, with :math:`R^2` scores as high as 0.4. Note that
these scores are generalization scores, since they are computed on a test set
that was not used during model fitting. Since we fitted a model independently
in each voxel, we can inspect the generalization performances at the best
available spatial resolution: individual voxels.

The best-predicted voxels are located in visual semantic areas like EBA, or
FFA. This is expected since the wordnet features encode semantic information
about the visual stimulus. For more discussions about these results, we refer
the reader to the original publication [1]_.

.. GENERATED FROM PYTHON SOURCE LINES 295-306

Plot the selected hyperparameters
---------------------------------

Since the scale of alphas is unknown, we plot the optimal alphas selected by
the solver over cross-validation. This plot is helpful to refine the alpha
grid if the range is too small or too large.

Note that some voxels might be at the maximum regularization value in the
grid search. These are voxels where the model has no predictive power, thus
the optimal regularization parameter is large to lead to a prediction equal
to zero. We do not need to extend the alpha range for these voxels.

.. GENERATED FROM PYTHON SOURCE LINES 306-312

.. code-block:: Python


    from himalaya.viz import plot_alphas_diagnostic
    best_alphas = backend.to_numpy(pipeline[-1].best_alphas_)
    plot_alphas_diagnostic(best_alphas=best_alphas, alphas=alphas)
    plt.show()


.. image-sg:: /_auto_examples/shortclips/images/sphx_glr_03_plot_wordnet_model_002.png
   :alt: 03 plot wordnet model
   :srcset: /_auto_examples/shortclips/images/sphx_glr_03_plot_wordnet_model_002.png
   :class: sphx-glr-single-img


.. GENERATED FROM PYTHON SOURCE LINES 313-325

Visualize the regression coefficients
-------------------------------------

Here, we go back to the main model on all voxels. Since our model is linear,
we can use the (primal) regression coefficients to interpret the model. The
basic intuition is that the model will use larger coefficients on features
that have more predictive power.

Since we know the meaning of each feature, we can interpret the large
regression coefficients. In the case of wordnet features, we can even build a
graph that represents the features that are linked by a semantic
relationship.

.. GENERATED FROM PYTHON SOURCE LINES 327-329

We first get the (primal) ridge regression coefficients from the fitted
model.

.. GENERATED FROM PYTHON SOURCE LINES 329-333

.. code-block:: Python

    primal_coef = pipeline[-1].get_primal_coef()
    primal_coef = backend.to_numpy(primal_coef)
    print("(n_delays * n_features, n_voxels) =", primal_coef.shape)


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    (n_delays * n_features, n_voxels) = (6820, 84038)


.. GENERATED FROM PYTHON SOURCE LINES 334-344

Because the ridge model allows a different regularization per voxel, the
regression coefficients may have very different scales. In turn, these
different scales can introduce a bias in the interpretation, focusing the
attention disproportionately on voxels fitted with the lowest alpha. To
address this issue, we rescale the regression coefficient to have a norm
equal to the square-root of the :math:`R^2` scores. We found empirically that
this rescaling best matches results obtained with a regularization shared
across voxels. This rescaling also removes the need to select only best
performing voxels, because voxels with low prediction accuracies are rescaled
to have a low norm.

.. GENERATED FROM PYTHON SOURCE LINES 344-347

.. code-block:: Python

    primal_coef /= np.linalg.norm(primal_coef, axis=0)[None]
    primal_coef *= np.sqrt(np.maximum(0, scores))[None]


.. GENERATED FROM PYTHON SOURCE LINES 348-349

Then, we aggregate the coefficients across the different delays.

.. GENERATED FROM PYTHON SOURCE LINES 349-361

.. code-block:: Python


    # split the ridge coefficients per delays
    delayer = pipeline.named_steps['delayer']
    primal_coef_per_delay = delayer.reshape_by_delays(primal_coef, axis=0)
    print("(n_delays, n_features, n_voxels) =", primal_coef_per_delay.shape)
    del primal_coef

    # average over delays
    average_coef = np.mean(primal_coef_per_delay, axis=0)
    print("(n_features, n_voxels) =", average_coef.shape)
    del primal_coef_per_delay


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    (n_delays, n_features, n_voxels) = (4, 1705, 84038)
    (n_features, n_voxels) = (1705, 84038)


.. GENERATED FROM PYTHON SOURCE LINES 362-365

Even after averaging over delays, the coefficient matrix is still too large
to interpret it. Therefore, we use principal component analysis (PCA) to
reduce the dimensionality of the matrix.

.. GENERATED FROM PYTHON SOURCE LINES 365-372

.. code-block:: Python

    from sklearn.decomposition import PCA

    pca = PCA(n_components=4)
    pca.fit(average_coef.T)
    components = pca.components_
    print("(n_components, n_features) =", components.shape)


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    (n_components, n_features) = (4, 1705)


.. GENERATED FROM PYTHON SOURCE LINES 373-376

We can check the ratio of explained variance by each principal component.
We see that the first four components already explain a large part of the
coefficients variance.

.. GENERATED FROM PYTHON SOURCE LINES 376-378

.. code-block:: Python

    print("PCA explained variance =", pca.explained_variance_ratio_)


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    PCA explained variance = [0.35199594 0.08103119 0.05653881 0.03766727]


.. GENERATED FROM PYTHON SOURCE LINES 379-385

Similarly to [1]_, we correct the coefficients of features linked by a
semantic relationship. When building the wordnet features, if a frame was
labeled with `wolf`, the authors automatically added the semantically linked
categories `canine`, `carnivore`, `placental mammal`, `mamma`, `vertebrate`,
`chordate`, `organism`, and `whole`. The authors thus argue that the same
correction needs to be done on the coefficients.

.. GENERATED FROM PYTHON SOURCE LINES 385-393

.. code-block:: Python


    from voxelwise_tutorials.wordnet import load_wordnet
    from voxelwise_tutorials.wordnet import correct_coefficients
    _, wordnet_categories = load_wordnet(directory=directory)
    components = correct_coefficients(components.T, wordnet_categories).T
    components -= components.mean(axis=1)[:, None]
    components /= components.std(axis=1)[:, None]


.. GENERATED FROM PYTHON SOURCE LINES 394-399

Finally, we plot the first principal component on the wordnet graph. In such
graph, edges indicate "is a" relationships (e.g. an `athlete` "is a"
`person`). Each marker represents a single noun (circle) or verb (square).
The area of each marker indicates the principal component magnitude, and the
color indicates the sign (red is positive, blue is negative).

.. GENERATED FROM PYTHON SOURCE LINES 399-411

.. code-block:: Python


    from voxelwise_tutorials.wordnet import plot_wordnet_graph
    from voxelwise_tutorials.wordnet import apply_cmap

    first_component = components[0]
    node_sizes = np.abs(first_component)
    node_colors = apply_cmap(first_component, vmin=-2, vmax=2, cmap='coolwarm',
                             n_colors=2)

    plot_wordnet_graph(node_colors=node_colors, node_sizes=node_sizes)
    plt.show()


.. image-sg:: /_auto_examples/shortclips/images/sphx_glr_03_plot_wordnet_model_003.png
   :alt: 03 plot wordnet model
   :srcset: /_auto_examples/shortclips/images/sphx_glr_03_plot_wordnet_model_003.png
   :class: sphx-glr-single-img


.. GENERATED FROM PYTHON SOURCE LINES 412-423

According to the authors of [1]_, "this principal component distinguishes
between categories with high stimulus energy (e.g. moving objects like
`person` and `vehicle`) and those with low stimulus energy (e.g. stationary
objects like `sky` and `city`)".

In this example, because we use only a single subject and we perform a
different voxel selection, our result is slightly different than in the
original publication. We also use a different regularization parameter in
each voxel, while in [1]_ all voxels had the same regularization parameter.
However, we do not aim at reproducing exactly the results of the original
publication, but we rather describe the general approach.

.. GENERATED FROM PYTHON SOURCE LINES 425-427

To project the principal component on the cortical surface, we first need to
use the fitted PCA to transform the primal weights of all voxels.

.. GENERATED FROM PYTHON SOURCE LINES 427-441

.. code-block:: Python


    # transform with the fitted PCA
    average_coef_transformed = pca.transform(average_coef.T).T
    print("(n_components, n_voxels) =", average_coef_transformed.shape)
    del average_coef

    # We make sure vmin = -vmax, so that the colormap is centered on 0.
    vmax = np.percentile(np.abs(average_coef_transformed), 99.9)

    # plot the primal weights projected on the first principal component.
    ax = plot_flatmap_from_mapper(average_coef_transformed[0], mapper_file,
                                  vmin=-vmax, vmax=vmax, cmap='coolwarm')
    plt.show()


.. image-sg:: /_auto_examples/shortclips/images/sphx_glr_03_plot_wordnet_model_004.png
   :alt: 03 plot wordnet model
   :srcset: /_auto_examples/shortclips/images/sphx_glr_03_plot_wordnet_model_004.png
   :class: sphx-glr-single-img


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    (n_components, n_voxels) = (4, 84038)


.. GENERATED FROM PYTHON SOURCE LINES 442-446

This flatmap shows in which brain regions the model has the largest
projection on the first component. Again, this result is different from the
one in [1]_, and should only be considered as reproducing the general
approach.

.. GENERATED FROM PYTHON SOURCE LINES 448-450

Following [1]_, we also plot the next three principal components on the
wordnet graph, mapping the three vectors to RGB colors.

.. GENERATED FROM PYTHON SOURCE LINES 450-461

.. code-block:: Python


    from voxelwise_tutorials.wordnet import scale_to_rgb_cube

    next_three_components = components[1:4].T
    node_sizes = np.linalg.norm(next_three_components, axis=1)
    node_colors = scale_to_rgb_cube(next_three_components)
    print("(n_nodes, n_channels) =", node_colors.shape)

    plot_wordnet_graph(node_colors=node_colors, node_sizes=node_sizes)
    plt.show()


.. image-sg:: /_auto_examples/shortclips/images/sphx_glr_03_plot_wordnet_model_005.png
   :alt: 03 plot wordnet model
   :srcset: /_auto_examples/shortclips/images/sphx_glr_03_plot_wordnet_model_005.png
   :class: sphx-glr-single-img


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    (n_nodes, n_channels) = (1705, 3)


.. GENERATED FROM PYTHON SOURCE LINES 462-465

According to the authors of [1]_, "this graph shows that categories thought
to be semantically related (e.g. athletes and walking) are represented
similarly in the brain".

.. GENERATED FROM PYTHON SOURCE LINES 467-468

Finally, we project these principal components on the cortical surface.

.. GENERATED FROM PYTHON SOURCE LINES 468-480

.. code-block:: Python


    from voxelwise_tutorials.viz import plot_3d_flatmap_from_mapper

    voxel_colors = scale_to_rgb_cube(average_coef_transformed[1:4].T, clip=3).T
    print("(n_channels, n_voxels) =", voxel_colors.shape)

    ax = plot_3d_flatmap_from_mapper(voxel_colors[0], voxel_colors[1],
                                     voxel_colors[2], mapper_file=mapper_file,
                                     vmin=0, vmax=1, vmin2=0, vmax2=1, vmin3=0,
                                     vmax3=1)
    plt.show()


.. image-sg:: /_auto_examples/shortclips/images/sphx_glr_03_plot_wordnet_model_006.png
   :alt: 03 plot wordnet model
   :srcset: /_auto_examples/shortclips/images/sphx_glr_03_plot_wordnet_model_006.png
   :class: sphx-glr-single-img


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    (n_channels, n_voxels) = (3, 84038)


.. GENERATED FROM PYTHON SOURCE LINES 481-483

Again, our results are different from the ones in [1]_, for the same reasons
mentioned earlier.

.. GENERATED FROM PYTHON SOURCE LINES 485-495

References
----------

.. [1] Huth, A. G., Nishimoto, S., Vu, A. T., & Gallant, J. L. (2012).
   A continuous semantic space describes the representation of thousands of
   object and action categories across the human brain. Neuron, 76(6),
   1210-1224.

.. [2] Saunders, C., Gammerman, A., & Vovk, V. (1998).
   Ridge regression learning algorithm in dual variables.


.. rst-class:: sphx-glr-timing

   **Total running time of the script:** (1 minutes 16.347 seconds)


.. _sphx_glr_download__auto_examples_shortclips_03_plot_wordnet_model.py:

.. only:: html

  .. container:: sphx-glr-footer sphx-glr-footer-example

    .. container:: sphx-glr-download sphx-glr-download-jupyter

      :download:`Download Jupyter notebook: 03_plot_wordnet_model.ipynb <03_plot_wordnet_model.ipynb>`

    .. container:: sphx-glr-download sphx-glr-download-python

      :download:`Download Python source code: 03_plot_wordnet_model.py <03_plot_wordnet_model.py>`


.. only:: html

 .. rst-class:: sphx-glr-signature

    `Gallery generated by Sphinx-Gallery <https://sphinx-gallery.github.io>`_