master-degree-notes/Foundation of data science/slides/quiz.txt

105 lines
5.2 KiB
Text
Raw Normal View History

2024-12-30 01:50:33 +01:00
The aliasing effect on a downsampled image can be reduced using which of the following methods on the original image? *
possible answers:
reducing the high-frequency components of image by sharpening the image
increasing the high-frequency components of image by sharpening the image
increasing the high-frequency components of image by blurring the image
reducing the high-frequency components of image by blurring the image
Which of the following operation is done on the pixels in sharpening the image, in the spatial domain? *
possible answers:
Differentiation
Mean
Meadian
Integration
Consider a binary classifier that yields for 5 samples the probability scores: 0.9, 0.6, 0.5, 0.2, 0.05 The corresponding correct labels of the samples are 1, 0, 1, 0, 1. What is the threshold that yields the best precision if the application requires a recall larger than or equal to 0.3? *
possible answers:
0.95
0.8
0.55
0.3
0.1
0.03
With reference to the above problem, what is the largest achievable precision if the application requires a recall larger than or equal to 0.3?
*
possible answers:
1
0.95
0.8
0.55
0.3
0.1
0.03
With reference to the above problem, what is the value of precision and recall if the score threshold is set to 0.3?
*
possible answers:
P=1.0, R=1.0
P=0.667, R=0.667
P=0.667, R=1.0
P=0.333, R=0.333
P=0.333, R=0.667
The relationship between number of beers consumed (x) and blood alcohol content (y) was studied in 16 male college students by using least squares regression. The following regression equation was obtained from this study: y= -0.0127 + 0.0180x The above equation implies that: *
possible answers:
each beer consumed increases blood alcohol by 1.27%
on average it takes 1.8 beers to increase blood alcohol content by 1%
each beer consumed increases blood alcohol by an average of amount of 1.8%
each beer consumed increases blood alcohol by exactly 0.018
Let us consider multivariate linear regression, with n the number of features, m the number of data samples, Theta the vector of parameters. Which of the following is likely true? *
possible answers:
Optimizing by gradient descent yields n local maxima
The dimensionality of Theta is m+1, if one considers additionally the bias/offset term
Feature scaling aids the model optimization only if n>1 d
One may device new features by considering the multiplication of pairs of the original features only if the product is non zero
What do large values of the log-likelihood value indicate? *
possible answers:
That there are a greater number of explained vs. unexplained observations
That the model fits the data well
That as the predictor variable increases, the likelihood of the outcome occurring decreases
That the model is a poor fit of the data
With reference to neural network classifiers, which of the following is correct? *
possible answers:
(Fully-connected) Neural network classifiers are generative models
(Fully-connected) Neural network classifiers are linear classifiers, irrespective of the number of layers
Neural network classifiers leverage a softmax function to convert class scores to normalized probabilities
The procedure of gradient check leverages the analytic computing of gradients in large networks
Consider the neural network that corresponds to this equation: loss = (ReLU((x*y)+z) - t)^2. Assume ^2 means to the power of 2, and consider that x=1, y=2, z=1, t=4. What is the loss value? *
possible answers:
1
-1
-2
2
With reference to the above problem, what is the derivative of the loss with respect to t? *
possible answers:
1
-1
-2
2
Consider Convolutional Neural Networks (ConvNets). Which of the following is correct? *
possible answers:
ConvNets leverage fully connected layers to reduce the amount of computation and parameters
In ConvNets, activation features remain local throughout the network, due to the limited kernel sizes
The number of activation maps after each layer depends on the input number of channels
Max pooling is applied separately at each channel activation map
Consider Self-Attention Mechanisms in Transformer models. Which of the following is correct?
*
possible answers:
Self-attention allows the model to weigh the importance of different input tokens independently.
Self-attention is computationally efficient for long sequences compared to 1-D Convolutional neural networks.
Self-attention layers require sequential processing of input tokens.
Self-attention only considers the local context of each token.
Consider Principal Component Analysis (PCA). Which of the following is a correct statement about its limitations?
*
possible answers:
PCA is highly effective in handling nonlinear relationships between features.
PCA is robust to outliers and noisy data.
PCA can be computationally expensive for high-dimensional datasets.
PCA can be used to identify the underlying causal relationships between features.
Which of the following is a common choice for the prior distribution in a Variational Autoencoder (VAE)?
*
possible answers:
Normal distribution: Its simplicity and analytical tractability make it a popular choice.
Uniform distribution: It ensures that all latent space points are equally likely, preventing mode collapse.
Exponential distribution: It can capture the inherent skewness in certain types of data.
Beta distribution: It is suitable for modeling probabilities and proportions.