7.3 KiB
The idea is to complement the weaknesses of a system with the strengths of another.
Examples:
- multiple biometric traits (e.g. signature + fingerprint, used in India, USA ecc.)
- most obvious meaning of multi biometrics
- multiple instances: same trait but acquired in different nuances (i.e. 2 or more different fingers, both irises, both ears, multiple instances of hand geometry...)
- repeated instances: same trait, same element, but acquired multiple times
- multiple algorithms: same trait, same element but using multiple classifiers
- exploits strengths and weaknesses
- multiple sensors: i.e. fingerprint with both optical and capacitive sensor
Where do the fusion happen? It can happen
- at sensor level:
- not always feasible
- at feature level: fusing feature vectors before matching
- not always feasible: feature vectors should be comparable in nature and size
- an example is when we have multiple samples of the same traits, in this case they will be certainly comparable
- score level fusion: or match level fusion. Consists in fusing the scores (probability scores) or rankings
- most feasible solution
- each system works by itself
- scores need to be comparable: normalization in a common range may be required
- decision level fusion: separate decisions (look at slide)
!Pasted image 20241212084256.png
Feature level fusion
!Pasted image 20241212084349.png
Better results are expected, since much more information is still present Possible problems:
- incompatible feature set
- feature vector combination may cause "curse of dimensionality"
- a more complex matcher may be required
- combined vectors may include noisy or redundant data.
Feature level fusion: serial
example: use SIFT (scalar invariant feature transform) Phases:
- feature extraction (SIFT feature set)
- feature normalization: required due to the possible significant differences in the scale of the vector values
- si crea un vettore solo composto dai due feature vector
Problems to address:
- feature selection / reduction
- è più efficiente scegliere poche feature rispetto all'intero vettore, si possono usare tecniche come
- clutering k-means mantenendo solo i centri dei cluster
- performed after linking the two normalized vectors
- neighborhood elimination
- points at a certain distance are eliminated
- performed before linking, on the single vectors
- points belonging to specific regions
- only points in specific regions of the train (e.g. face, nose, mouth...) are maintained
- clutering k-means mantenendo solo i centri dei cluster
- è più efficiente scegliere poche feature rispetto all'intero vettore, si possono usare tecniche come
- matching
- point pattern matching
- method to find the number of paired "points" between the probe vector and the gallery one
- two points are paired if their distance is smaller than a threshold
- point pattern matching
Feature level fusion: parallel
parallel combination of the two vectors:
- vector normalization
- shorter vector is extended to match the size of the other one
- e.g. zero-padding
- **pre-processing of vectors
- step 1: transform vectors in unitary vectors (dividing them by their L2 norm)
- step 2: weighted combination through the coefficient
\theta
, based on the lenght of X and Y - we can then use X as the real part and Y as the imaginary part of the final vector
- further feature processing:
- using linear techniques like PCA, L-L expansion, LDA
Feature level fusion: CCA
The idea is to find a pair of transformations that maximizes the correlation between characteristics
Score level fusion
Transformation based: scores from different matchers are first normalized in a common domain and then combined using fusion rules
Classifier based: the scores are considered as features and included into a feature vector. A further classifier is trained (can be SVM, decision tree, neural netework...)
Fusion rules
Abstract: each classifier outputs a class label Majority vote: each classifier votes for a class
Rank: each classifier outputs its class rank Borda count:
- each classifier produces a ranking (classifica) according to the probability of the pattern belonging to them
- ranking are converted in scores and summed up
- the class with the highest final score is the one chosen by the multi-classifier
es. su 4 posti disponibili, la classe più probabile ha rank 4, quella meno probabile rank 1. I rank di ogni classificatore si sommano. Can also be used in identification open set, using a threshold to discard low scores (score is the sum of ranks)
Measurement: each classifier outputs its classification score !Pasted image 20241212090608.png
Different methods are possible (i.e. sum, weighted sum, mean, product, weighted product, max, min, ecc.)
- sum: the sum of the returned confidence vectors is computed, pattern is classified according to the highest value
Scores from different matchers are typically unhomogeneous:
- different range
- similarity vs distance
- different distributions
Normalization is required! But there are issues to consider when choosing a normalization method:
- robustness: the transformation should not be influenced by outliers
- effectiveness: estimated parameters for the score distribution should be best approximate the real values
Reliability
A reliability measure for each single response of each subsystem before fusing them in a final response. Confidence margins being a possible solution.
Poh e Bengio: solution based on FAR and FRR M(\nabla) = |FAR(\nabla)-FR{R}(\nabla)|
Decision level fusion
!Pasted image 20241212091320.png
A common way is majority voting. But also serial combination (AND) or parallel combination (OR) can be used. Be careful when using OR: if a single classifier says ok but the other fails, it is accepted (less secure)!
Template updating - Co-Update method
mi sono distratto, integrare con slide
Data normalization
When minimum and maximum values are known, normalization is trivial. For this reason, we assumed to miss an exact estimate of the maximum value. We chose the average value in its place, in order to stress normalization functions even more.
Normalization functions:
- min/max
s'_{k}=\frac{s_{k}-min}{max-min}
-
z-score
-
median/mad
-
sigmoid
- tanh
!Pasted image 20241212094046.png
The Min-max normalization technique performs a “mapping” (shifting + compression/dilation) of the interval between the minimum and maximum values in the interval between 0 and 1 Pro: range tra 0 e 1 Contro: bisogna conoscere minimo e massimo dello score di ogni sottosistema !Pasted image 20241212093902.png
Standardizzazione per media e varianza, ampiamente usato contro: non porta lo score in un range fisso !Pasted image 20241212093927.png
median/MAD: si sottrae la mediana e si divide per la mediana dei valori assoluti funziona male se la distribuzione degli score non è gaussiana. Non preserva la distribuzione originale e non garantisce nemmeno un range fisso :/ !Pasted image 20241212093943.png
Sigmoide: porta nell'intervallo aperto (0, 1) contro 1: verso gli estremi distorce parecchio contro 2: dipende dai parametri k e c che dipendono a sua volta dalla distribuzione degli score !Pasted image 20241212094000.png
Tanh: garantisce range (0, 1) contro: tende a concentrare eccessivamente i valori verso il centro (0.5). !Pasted image 20241212094016.png