Bias & monitoring

This page explains exactly how CVsprings produces its scores, the design choices that reduce bias risk, the limitations we know about, and a live view of scoring consistency computed from your own organization’s audit data.

1 · How scores are produced

CVsprings uses a deterministic, rule-based matching engine — not a machine-learning model. The same input always produces the same score, and the same rules are applied identically to every CV. The pipeline:

Text extraction. Plain text is extracted from the uploaded PDF or DOCX. Images — including photos — are never extracted or processed; scanned image-only CVs are rejected with an error rather than scored.
Four sub-scores, each computed by transparent rules:

KeywordsSkillsExperienceEducation

Keywords — the 40 most frequent meaningful words and word pairs are extracted from your job description (common stop-words removed) and looked up in the CV text; the score is the percentage found.
Skills — a curated list of 168 named skills is checked against the job description; those the JD mentions are then looked up in the CV; the score is the percentage matched (a neutral 50 is used when the JD names none of the listed skills).
Experience — patterns such as “7 years of experience” and seniority words (director, principal, lead, senior, junior…) are detected in both texts and compared, producing a heuristic 0–100 value.
Education — the highest qualification keyword found (PhD > Master > Bachelor > diploma/certificate) is compared between CV and JD requirements.

The overall score is the weighted average of the four sub-scores using the weights you set (default 40/30/20/10); the verdict bands (“Excellent/Good/Partial/Poor Match”) are fixed thresholds on that number. Each saved record stores the weights and the scoring-engine version used, so any score can be reconstructed later.

What Anonymize removes

When the Anonymize toggle is on, the following are stripped from the CV text before scoring:

Removed: email addresses · phone numbers · URLs (incl. social profiles) · address-like lines (street number + street word) · standalone year numbers 1940–2015 (birth-year range) · standalone name lines (2–4 capitalized words on their own line). Photos are never processed regardless of this setting, because only text is extracted.

Anonymize is a text-level filter with known gaps — see the limitations section below for what it cannot remove.

2 · Design mitigations

No demographic inputs. CVsprings does not ask for, collect, or use any demographic data (gender, ethnicity, age, disability status) anywhere in the product.
Identical treatment within a batch. Every CV in a batch is scored with the same extraction rules, the same job description, and the same weights — there is no per-candidate variation in how the engine behaves.
Anonymization option to strip contact details and name lines before scoring (see above for exactly what is removed).
Human decision requirement. No score is ever converted into a decision automatically. Decisions (shortlist / hold / reject) exist only as explicit recruiter input, and each saved record attributes its review to the signed-in user.
Immutable records. Scores, weights and provenance fields cannot be edited after a record is saved; only the recruiter’s decision and note can change, and changes are kept in an append-only history.

3 · Known limitations

We list these candidly because deployers need them to use the tool responsibly:

Vocabulary sensitivity. The engine matches words, not meaning. A candidate who describes the right experience in different words than the job description scores lower than one who mirrors the JD’s vocabulary. Non-native phrasing or unconventional CV styles can be penalized for the same reason.
Scores vary with JD wording. Because keywords are extracted from the job description by frequency, rephrasing the JD changes the keyword set and therefore the scores. Compare candidates only within the same JD and weights.
Anonymization cannot remove all proxies. University names, employer names, club memberships, languages, employment gaps, names mentioned mid-sentence, and other indirect signals remain in the text and can correlate with protected characteristics.
Anonymize can remove legitimate data. The birth-year filter (1940–2015) also removes ordinary dates in that range — e.g. an older graduation year — which can slightly change the experience signal for the same CV.
Heuristic experience/education detection. Years of experience and qualifications are detected by text patterns; unusual formats can be missed. Verify tenure and qualifications in interviews rather than relying on the sub-scores.
No outcome-level bias detection. CVsprings does not collect demographic data, so it cannot measure whether outcomes differ across demographic groups. We recommend that clients periodically review their decision outcomes against their own equality-monitoring data and processes.

A high or low score is a signal about text similarity to the job description — it is not a verdict on the person. Always review the underlying CV before acting on a score.

4 · Scoring consistency in your organization

This page documents actual product behaviour and is not legal advice. See the EU AI Act compliance page for how these features map to provider obligations.