master-degree-notes/Biometric Systems/final notes/2. Performance indexes.md

8.2 KiB
Raw Blame History

Problems of biometric systems:

  • wide intra-class variations

    • maybe different facial expression, different light, different view point...
  • very small inter-class variations

    • two different person very similar (i.e. twins)
  • possible spoofing attacks, in different moments !Pasted image 20241002181936.png

  • [non universality](LEZIONE2_Indici_di_prestazione.pdf#page=6&selection=0,10,0,26&color=yellow|LEZIONE2_Indici_di_prestazione, p.6)

    • e.g. people with no voice, people with cataract, people with poor fingerprints...

Most difficult traits to exploit:

  • retina fundus
  • behavioral traits (i.e. way of walking)
  • handwriting

[What to compare?](LEZIONE2_Indici_di_prestazione.pdf#page=8&selection=0,10,0,26&color=yellow|LEZIONE2_Indici_di_prestazione, p.8)

  • **Sample
    • raw captured data
  • Hand-crafted features
    • manually engineered by the data scientist and extracted from samples
    • can also be substituted with embeddings: features automatically extracted by deep architectures
  • Template
    • collection of features extracted from the row data, examples:
      • a histogram representing the frequencies of relevant values in the image (e.g. greylevel values)
      • a vector of values each representing a relevant measure (e.g. Bertillon measures)
      • time series of acceleration values (one per axis)
      • a set of triplets as for relevant fingerprint points representing the coordinates of the points and the direction of the tangent to the ridge in that point.

[!PDF|red] LEZIONE2_Indici_di_prestazione, p.8

Hand-crafted features

not the template of the entire biometric system.

Comparing templates

  • Euclidian distance
  • Cosine similarity
    • cosine of the angle between two vectors
    • affected also by the direction of the vectors
  • Pearson correlation (for histograms or sets of points)
    • statistical measure that evaluates the linear relationship between two variables. It tells you whether an increase or decrease in one variable tends to correspond with an increase or decrease in another, and how strong that relationship is (ChatGPT)
  • Bhattacharyya distance (histograms)
    • measure of the similarity (or dissimilarity) between two probability distributions
    • the Bhattacharyya distance can compare feature distributions between two different classes (e.g., color histograms of objects)

[!PDF|yellow] LEZIONE2_Indici_di_prestazione, p.10

(Pearson) Correlation

how signals are similar to eachother. Often used to compare fingerprints, by computing the correlation between two fingerprints.

For time series we have to address an issue: temporal sequences may vary in speed or timing, e.g. in two repetitions of a walking sequence, there might be differences in walking speed between repetitions, but the spatial path of limbs remain highly similar. Another example could be audio recordings, same voice but different speed.

Dynamic Time Warping allows for "warping" of the time axis, meaning it can stretch or compress sections of the sequences to achieve the best possible alignment. This is useful when parts of one sequence are faster or slower than the corresponding parts in the other sequence.

!Pasted image 20241002135922.png

each point is paired with the most convenient one. It's not necessarily that points corresponds to the same instant in time.

Comparing the results of submitting a template to a Deep Learning model
  • if using deep learning we should use the architecture to extract the embeddings (for both gallery and probe templates): we can delete the classification layer in order to get the embeddings that the architecture would use for the final classification.
  • mbeddings can be compard as they were vectors of hand-crafted features.

Possible errors: verification

  • Genuine Match (GM, GA): the claimed identity is true and subject is accepted
  • False Rejection (FR, FNM, type I error): claimed identity is true but the subjet is rejected
  • Genuine Reject (GR, GNM): an impostor is rejected
  • False Acceptance (FA, FM, type II error): an impostor is accepted :/

It's important to define a good threshold. If too high we will get a lot of false acceptance. If too low we will get a lot of false rejection!

When computing rates:

  • False Rejection Rate (FRR) is the number of FR divided by the number of GM+FR.
    • in fact, GM and FR have the same denominator and sum up to 1.
  • False Acceptance Rate is the number of FA divided by total number of impostor attempts (FA + GR)
  • Equal Error Rate is the value at a specific threshold, where FAR and FRR are the same value.
  • Detection Error Trade-off: a plot that shows the trade-off between the FAR and FRR at different threshold settings of a system
  • Receiving Operating Curve: a plot that shows the True Positive Rate (TPR) (also called Sensitivity) against the False Positive Rate (FPR) (1 - Specificity) at various threshold settings.

Key Differences Between ROC and DET Curves:

  • ROC Curve: Focuses on the true positives and false positives, showing the ability to discriminate between classes (genuine vs impostor).
  • DET Curve: Focuses on the false rejection rate (FRR) and false acceptance rate (FAR), helping to analyze trade-offs between security and usability in verification systems.

Two synthetic metrics for instance could be ERR and area below ROC curve.

(we might have more templates for the same person to address inter-class variation. Of course templates should be different, not computed i.e. by frames of the same video, as some of them could be blurred and close frames are exactly the same!)

[!PDF|yellow] LEZIONE2_Indici_di_prestazione, p.20

When

in false acceptance we can have two possible scenarios

  • pj does not belong to the gallery (most trivial)
  • pj belongs to an enrolled subject but the probe claimed another identity, not the real one.

What if ERR in two systems is the same, but the curves are different?

We can use ROC curve or DET curve. For ROC, we can compute the area below the curve and use it as a metric, the higher the better.

Possible errors: identificaiton - open set

In an open set identification task, the system determines if the individual's biometric signature matches a signature of someone in the gallery. The individual does not make and identity claim.

  • More possible error situations, depending on the matcher and on the threshold
  • A problem may occur if the system returns more possible candidates below the threshold. Who is the right one?

[!PDF|yellow] LEZIONE2_Indici_di_prestazione, p.27

Possible errors: identification open set

correct detect and identify rate = rate over which the correct individual has the identified score and so is identified correctly.

false alarm rate = rate over which unenrolled users are identified as another user in the db.

We compute that by testing the system with lots of probes belonging to set Pg if enrolled or set Pn if not.

We define

  • rango(pj) = the position in the list where the first template for the correct identity is returned

  • DIR (at rank k) (Detection and Identication Rate (at rank k)): the probability of correct identification at rank k (the correct subject is returned at position k)

  • The rate between the number of individuals correctly recognized at rank k and the number of probes belonging to individuals in PG

  • If identification does NOT happen at rank 1, we have a False Reject.

  • FRR or more specifically FNIR (False Reject Rate or False Negative Identification Rate): the probability of false reject expressed as 1 - DIR (at rank 1)

  • FAR or more specifically FPIR (False Acceptance Rate or False Positive Identification Rate) or False Alarm Rate (Watch List): the probability of false acceptance/alarm

  • The rate between the nuber of impostor recognized by error and the total number of impostors in PN

Closed set

We don't have thresholds! The only possible error is that the correct identity does not appear at rank 1.