To what extent does batch size affect the degree of overfitting in a dynamic dataset

I'm currently testing out the effect that batch size has on overfitting by plotting the loss curves for the training and testing data that is being fed into a multi-layer peceptron The data is being generated through this code Ntrain = 100 Nbatch = 20 Ntest = 500 x_train = torch.rand(Ntrain, ndims)10.0 # Inputs from 0 to 10 y_train = func(x_train, ndims) # generate noisey y training data y_train = y_train.view(-1, 1) # reshape y data x_test = torch.rand(Ntest, ndims)10.0 # Inputs from 0 to 10 y_test = func(x_test, ndims) # generate noisey y test data y_test = y_test.view(-1, 1) # reshape y data instead of using a fixed dataset with the optimiser being adam with the mse loss function being used. From doing some very light research, it appears that using larger batches increases the risk of overfitting however, the plotted loss curves for the sizes 15, 45, 60, 80, 200, and 2000 seem to indicate the opposite, in which overfitting seems to decrease the larger the dataset size becomes. I was just wondering if maybe there is an issue with my model, my values, whether large batch sizes only increase the risk of overfitting based on their proportions to a fixed dataset, or whether there are other factors that affect this.

May 7, 2025 - 23:56

I'm currently testing out the effect that batch size has on overfitting by plotting the loss curves for the training and testing data that is being fed into a multi-layer peceptron

The data is being generated through this code

Ntrain = 100
Nbatch = 20
Ntest = 500
 x_train = torch.rand(Ntrain, ndims)*10.0  # Inputs from 0 to 10
    y_train = func(x_train, ndims) # generate noisey y training data
    y_train = y_train.view(-1, 1) # reshape y data
    x_test = torch.rand(Ntest, ndims)*10.0  # Inputs from 0 to 10
    y_test = func(x_test, ndims) # generate noisey y test data
    y_test = y_test.view(-1, 1) # reshape y data

instead of using a fixed dataset with the optimiser being adam with the mse loss function being used.

From doing some very light research, it appears that using larger batches increases the risk of overfitting however, the plotted loss curves for the sizes 15, 45, 60, 80, 200, and 2000 seem to indicate the opposite, in which overfitting seems to decrease the larger the dataset size becomes.

I was just wondering if maybe there is an issue with my model, my values, whether large batch sizes only increase the risk of overfitting based on their proportions to a fixed dataset, or whether there are other factors that affect this.