OSquare | Digital Web Agency

This is the article I almost didn't write.

A client—mid-size tech company, 800 employees, growing fast—asked us to integrate AI into their hiring pipeline. They were getting 2,000+ applications per open role. Their two-person HR team was drowning.

The request seemed reasonable: use AI to screen resumes, identify top candidates, and surface them faster. Standard stuff in 2025, right?

It was anything but standard.

What We Built (Version 1)

A system that scored resumes based on keyword matching, experience relevance, education fit, and skill alignment with the job description. Pretty typical. We trained it on the profiles of their top performers.

Results seemed great. Processing time dropped from 3 weeks to 2 days. The system surfaced 50 candidates per role instead of HR reading all 2,000.

Then we audited.

What We Found

The AI had developed a preference for candidates from three specific universities. Not because those graduates were better—but because 70% of the company's current top performers happened to come from those schools. The AI learned an existing bias and amplified it.

It also penalized career gaps. A woman who took two years off for childcare? Lower score. Someone who traveled for a year after university? Lower score. The AI saw "gap = risk" because the training data showed that historically, the company hired people with linear career paths.

It favored candidates who used specific jargon in their resumes. Not because jargon correlated with skill, but because the company's existing employees used that jargon, and AI matched the patterns.

We had built a very efficient bias machine.

Version 2: What We Changed

We anonymized everything. Names, universities, photos (yes, some resumes had photos), age indicators—all stripped before AI scoring. This alone changed the candidate pool dramatically.

We removed gap penalization entirely. A resume gap tells you nothing about someone's ability. We had to hard-code this because AI kept finding proxies for the same bias.

We stopped training on "top performers." Instead, we defined skills and experience thresholds manually with the hiring managers. AI matched against criteria, not against existing employees.

We added a randomizer. 10% of candidates who didn't meet the AI's threshold were randomly surfaced anyway. This caught several strong candidates the system would have filtered out—including someone who is now one of their best engineers.

The Results After Adjustment

Diversity of candidate pool: improved significantly. The shortlist started reflecting the actual applicant pool instead of the existing team's demographics.

Quality of hires (measured at 6-month review): unchanged. The people hired through the adjusted system performed just as well as those from the biased version. That's the key finding—removing bias didn't lower quality.

HR time: still way down. From 3 weeks to 3 days for initial screening.

Candidate experience: faster responses, which candidates appreciated. One candidate told them: "You were the first company to get back to me in under a week."

The Ethical Line

Here's the part that keeps me up at night. We could have deployed Version 1 and the client would have been thrilled. Faster hiring, candidates who "fit the culture" (read: look like everyone already there), and impressive efficiency metrics.

Nobody would have complained. The bias would have been invisible—automated and buried in an algorithm.

This is the fundamental danger of AI in people operations. It's not that AI is biased—all systems reflect the data they're trained on. It's that AI makes bias scalable, efficient, and invisible. A human reviewer might notice they're rejecting every resume without a university degree. An algorithm won't notice. It'll just do it faster.

My Recommendation

Use AI in hiring for logistics: scheduling interviews, sending updates, organizing documents. It's brilliant at this.

For evaluation? Only with extreme caution, thorough auditing, and human oversight at every stage. And audit regularly—not once, but every quarter. Because bias creeps back in as new data enters the system.

The two-person HR team now processes 2,000 applications in 3 days instead of 3 weeks. They love the tool. But they also understand that the tool requires supervision. Like a powerful vehicle that needs a driver who stays alert—not one who falls asleep at the wheel.