The idea is to complement the weaknesses of a system with the strengths of another. Examples: - **multiple biometric traits** (e.g. signature + fingerprint, used in India, USA ecc.) - most obvious meaning of multi biometrics - **multiple instances:** same trait but acquired in different nuances (i.e. 2 or more different fingers, both irises, both ears, multiple instances of hand geometry...) - **repeated instances:** same trait, same element, but acquired multiple times - **multiple algorithms:** same trait, same element but using multiple classifiers - exploits strengths and weaknesses - **multiple sensors:** i.e. fingerprint with both optical and capacitive sensor Where do the fusion happen? It can happen - **at sensor level:** - not always feasible - **at feature level:** fusing feature vectors before matching - not always feasible: feature vectors should be comparable in nature and size - an example is when we have multiple samples of the same traits, in this case they will be certainly comparable - **score level fusion:** or match level fusion. Consists in fusing the scores (probability scores) or rankings - most feasible solution - each system works by itself - scores need to be comparable: normalization in a common range may be required - **decision level fusion:** separate decisions (look at slide) ![[Pasted image 20241212084256.png|500]] #### Feature level fusion ![[Pasted image 20241212084349.png|600]] Better results are expected, since much more information is still present Possible problems: - incompatible feature set - feature vector combination may cause "curse of dimensionality" - a more complex matcher may be required - combined vectors may include noisy or redundant data. ##### Feature level fusion: serial example: use SIFT (scalar invariant feature transform) Phases: - feature extraction (SIFT feature set) - feature normalization: required due to the possible significant differences in the scale of the vector values - si crea un vettore solo composto dai due feature vector Problems to address: - **feature selection / reduction** - è più efficiente scegliere poche feature rispetto all'intero vettore, si possono usare tecniche come - **clutering k-means** mantenendo solo i centri dei cluster - performed after linking the two normalized vectors - **neighborhood elimination** - points at a certain distance are eliminated - performed before linking, on the single vectors - **points belonging to specific regions** - only points in specific regions of the train (e.g. face, nose, mouth...) are maintained - **matching** - **point pattern matching** - method to find the number of paired "points" between the probe vector and the gallery one - two points are paired if their distance is smaller than a threshold ##### Feature level fusion: parallel parallel combination of the two vectors: - **vector normalization** - shorter vector is extended to match the size of the other one - e.g. zero-padding - **pre-processing of vectors - step 1: transform vectors in unitary vectors (dividing them by their L2 norm) - step 2: weighted combination through the coefficient $\theta$, based on the lenght of X and Y - we can then use X as the real part and Y as the imaginary part of the final vector - **further feature processing:** - using linear techniques like PCA, L-L expansion, LDA ##### Feature level fusion: CCA The idea is to find a pair of transformations that maximizes the correlation between characteristics #### Score level fusion ![[Pasted image 20241212085003.png]] Transformation based: scores from different matchers are first normalized in a common domain and then combined using fusion rules Classifier based: the scores are considered as features and included into a feature vector. A further classifier is trained (can be SVM, decision tree, neural netework...) ##### Fusion rules **Abstract:** each classifier outputs a class label Majority vote: each classifier votes for a class **Rank:** each classifier outputs its class rank Borda count: - each classifier produces a ranking (classifica) according to the probability of the pattern belonging to them - ranking are converted in scores and summed up - the class with the highest final score is the one chosen by the multi-classifier es. su 4 posti disponibili, la classe più probabile ha rank 4, quella meno probabile rank 1. I rank di ogni classificatore si sommano. Can also be used in identification open set, using a threshold to discard low scores (score is the sum of ranks) **Measurement:** each classifier outputs its classification score ![[Pasted image 20241212090608.png|600]] Different methods are possible (i.e. sum, weighted sum, mean, product, weighted product, max, min, ecc.) - sum: the sum of the returned confidence vectors is computed, pattern is classified according to the highest value Scores from different matchers are typically unhomogeneous: - different range - similarity vs distance - different distributions Normalization is required! But there are issues to consider when choosing a normalization method: - robustness: the transformation should not be influenced by outliers - effectiveness: estimated parameters for the score distribution should be best approximate the real values ##### Reliability A reliability measure for each single response of each subsystem before fusing them in a final response. Confidence margins being a possible solution. Poh e Bengio: solution based on FAR and FRR $M(\nabla) = |FAR(\nabla)-FR{R}(\nabla)|$ #### Decision level fusion ![[Pasted image 20241212091320.png|600]] A common way is majority voting. But also serial combination (AND) or parallel combination (OR) can be used. Be careful when using OR: if a single classifier says ok but the other fails, it is accepted (less secure)! #### Template updating - Co-Update method mi sono distratto, integrare con slide #### Data normalization When minimum and maximum values are known, normalization is trivial. For this reason, we assumed to **miss** an exact estimate of the maximum value. We chose the average value in its place, in order to stress normalization functions even more. Normalization functions: - min/max - $s'_{k}=\frac{s_{k}-min}{max-min}$ - z-score - - median/mad - - sigmoid - - tanh ![[Pasted image 20241212094046.png|300]] The Min-max normalization technique performs a “mapping” (shifting + compression/dilation) of the interval between the minimum and maximum values in the interval between 0 and 1 Pro: range tra 0 e 1 Contro: bisogna conoscere minimo e massimo dello score di ogni sottosistema ![[Pasted image 20241212093902.png|200]] Standardizzazione per media e varianza, ampiamente usato contro: non porta lo score in un range fisso ![[Pasted image 20241212093927.png|200]] median/MAD: si sottrae la mediana e si divide per la mediana dei valori assoluti funziona male se la distribuzione degli score non è gaussiana. Non preserva la distribuzione originale e non garantisce nemmeno un range fisso :/ ![[Pasted image 20241212093943.png|200]] Sigmoide: porta nell'intervallo aperto (0, 1) contro 1: verso gli estremi distorce parecchio contro 2: dipende dai parametri k e c che dipendono a sua volta dalla distribuzione degli score ![[Pasted image 20241212094000.png|200]] Tanh: garantisce range (0, 1) contro: tende a concentrare eccessivamente i valori verso il centro (0.5). ![[Pasted image 20241212094016.png|200]]