Speeding up Dataset.__getitems__

I have a model with a forward function that receives optional parameters, like this:

class MyModel(nn.Module):
  ...

  def forward(self, interactions: torch.Tensor, user_features: Optional[torch.Tensor] = None):
    """
      Where _N_ is the number of items

      interactions (Tensor): Nx2
      user_features (Tensor): Nx(number of features)
    """
    ...

For this reason, my Dataset also returns a dict, based on a DataFrame

class StackOverflowDataset(torch.utils.data.Dataset):
    def __init__(self, data, user_features=None):
        self._data = data
        self._user_features = user_features

    def __getitem__(self, idx):
        if self._user_features is None:
            return {'interactions': self._data[idx]}
        else:
            return {
               'interactions': self._data[idx], 
               'user_features': self._user_features[self._data[idx]['user']]
            }

    def __len__(self):
        return len(self._data)

Most of the training time is spent on __getitem__, which makes me wonder: is having optional arguments on MyModel.forward bad practice? I guess most of the time is spent playing with numpy item by item, and then converting it to PyTorch’s tensors and moving it to the GPU. Is there any way I can “pre-process” all of this beforehand but still use a Dataloader that returns a dict?

It seems that Datapipe also goes row by row. Would it be possible to directly return a dict with the whole ‘interactions’ tensor, and then the Dataloader slices it up?