AI 할루시네이션을 0으로 만드는 3단계

AI에게 "이 이력서를 JD와 비교 분석해줘"라고 시키면, 놀라울 정도로 그럴듯한 결과를 줍니다. 문제는, 절반이 거짓말이라는 겁니다.

Candidate Analyzer를 만들면서 가장 먼저 부딪힌 벽이 이것이었습니다. AI가 분석 결과를 만들어내는데, 이력서에 없는 경력을 추가하거나, JD에 없는 요구사항을 만들어서 비교합니다. 겉보기에는 전문적이고 논리적인데, 팩트가 아닙니다.

채용 분석에서 팩트가 아닌 정보가 섞이면 어떻게 될까요? 리크루터가 잘못된 근거로 후보자를 추천하게 됩니다. 그건 프로덕트가 아니라 사고입니다.

• • •

한 번에 시키면 거짓말을 합니다

처음에는 단순했습니다. AI에게 이력서와 JD를 동시에 주고, "이 후보자가 이 포지션에 적합한지 분석해줘"라고 한 번에 요청했습니다.

결과는 인상적이었습니다. 강점, 약점, 매칭률, 추천 코멘트까지 깔끔하게 정리된 리포트가 나왔습니다. 리크루터 경력 12년인 제가 봐도 "오, 괜찮은데?"라고 느낄 정도였습니다.

그런데 이력서를 다시 꺼내서 대조해보니, 문제가 보이기 시작했습니다.

실제 발생한 할루시네이션 사례

사례 1: 이력서에 "Python 3년"이라고 되어 있는데, AI가 "Python 및 ML 파이프라인 5년 경험"으로 부풀려서 분석. ML 경험은 이력서에 한 줄도 없었습니다.

사례 2: JD에 "우대사항"으로 적힌 게 분석 결과에서는 "필수 요구사항"으로 둔갑. 후보자가 우대사항을 충족하지 못한 걸 "핵심 미스매치"로 판정.

사례 3: 이력서에 "팀 리드 경험"이 없는데, "팀 프로젝트 참여" 문구를 보고 "리더십 경험 보유"로 해석. 현장에서 이 차이는 결정적입니다.

왜 이런 일이 일어날까요? AI는 "그럴듯한 답"을 만들도록 학습되어 있기 때문입니다. 이력서와 JD를 동시에 처리하면, AI는 둘 사이의 간극을 메우려고 합니다. 그 과정에서 없는 정보를 생성합니다. AI 입장에서는 "이 후보자가 JD에 잘 맞는다"라는 결론을 뒷받침하기 위해 근거를 만들어내는 겁니다.

AI는 빈칸을 채우도록 만들어졌습니다. 문제는, 채용 분석에서 빈칸은 "없는 정보"이지 "채워야 할 정보"가 아니라는 겁니다.

• • •

3단계로 나눈 이유

해결책은 의외로 단순했습니다. 한 번에 시키지 않으면 됩니다.

AI에게 "분석해줘"라고 한 번에 시키면, AI는 추출/구조화/비교/판단을 한꺼번에 하려고 합니다. 이 과정에서 각 단계의 경계가 무너지고, 입력 데이터와 출력 데이터가 섞입니다. "이력서에서 읽은 것"과 "AI가 만들어낸 것"의 구분이 사라지는 겁니다.

그래서 파이프라인을 3단계로 분리했습니다. 각 단계는 독립적으로 실행되고, 이전 단계의 출력만 다음 단계의 입력으로 사용합니다.

팩트 추출

이력서 → 구조화된 데이터

극도로 보수적 설정

→

JD 구조화

JD → 요구사항 매트릭스

극도로 보수적 설정

→

매칭 분석

데이터 vs 매트릭스

분석 최적화 설정

핵심 원칙은 하나입니다. 각 단계에서 AI가 할 수 있는 일을 극단적으로 제한합니다. Stage 1에서는 "추출만" 합니다. 판단하지 않습니다. Stage 2에서도 "구조화만" 합니다. 해석하지 않습니다. 판단과 해석은 Stage 3에서만 허용하되, Stage 1과 2의 출력 데이터만 사용하도록 강제합니다.

• • •

Stage 1: 팩트만 추출하기

첫 번째 단계는 이력서에서 팩트만 추출하는 것입니다. 여기서 "팩트"란, 이력서에 명시적으로 적혀 있는 정보를 말합니다.

Stage 1의 규칙

하는 것: 이름, 경력, 기술 스택, 학력, 자격증 등 이력서에 적힌 정보를 구조화된 JSON으로 변환

하지 않는 것: 해석, 추론, 판단, 보완. "이 사람은 리더십이 있을 것 같다" 같은 추론 금지

Temperature: 극도로 보수적인 설정 (창의성 최소, 정확성 최대)

극도로 보수적인 Temperature 설정은 AI에게 "절대 창의적으로 생각하지 마"라고 말하는 것과 같습니다. 이력서에 "Python 3년"이라고 적혀 있으면, 출력에도 정확히 "Python 3년"이 나옵니다. "Python 및 관련 라이브러리 활용 경험"으로 부풀리지 않습니다.

이 단계에서 AI가 원본 이력서에 없는 정보를 추가하면, 이후 모든 분석이 오염됩니다. 그래서 프롬프트에 명시적으로 강제합니다. "이력서에 명시되지 않은 정보는 절대 추가하지 마시오." 보수적인 Temperature와 이 프롬프트 조합이 Stage 1의 핵심입니다.

• • •

Stage 2: JD를 구조화하기

두 번째 단계는 JD(Job Description)를 구조화합니다. JD라는 문서는 사실 꽤 비정형적입니다. 어떤 회사는 bullet point로 깔끔하게 쓰고, 어떤 회사는 장문의 산문으로 쓰고, 어떤 회사는 필수와 우대를 구분하지 않습니다.

Stage 2는 이 다양한 형태의 JD를 일관된 요구사항 매트릭스로 변환합니다.

Raw JD (비정형)

"Python 경험자 우대. ML 파이프라인 설계 가능하신 분. 3년 이상. 커뮤니케이션 잘 되시는 분 선호. AWS 경험 필수. K8s 쓸 줄 아시면 좋겠습니다."

Stage 2 출력 (구조화)

필수: AWS 경험, 3년 이상 경력
우대: Python, ML 파이프라인, K8s
소프트: 커뮤니케이션
명시되지 않음: 학력, 산업군, 직급

여기서 중요한 건 "명시되지 않음" 카테고리입니다. JD에 학력 요구가 없으면 "명시되지 않음"으로 분류합니다. Stage 1 방식의 AI라면 "보통 이 정도 포지션이면 학사 이상이겠지"라고 추론해서 넣었을 겁니다. 하지만 Stage 2는 추론하지 않습니다. 없으면 없는 겁니다.

이 단계도 극도로 보수적인 Temperature 설정입니다. JD 텍스트에 적힌 것만 추출하고, 해석하지 않습니다. "우대"라고 적힌 건 우대로, "필수"라고 적힌 건 필수로. 리크루터 관점에서 이 구분은 결정적입니다. 우대 미충족은 감점이 아닌데, 기존 도구들은 이걸 구분하지 못합니다.

• • •

Stage 3: 팩트 기반 매칭

세 번째 단계에서 드디어 AI가 "분석"을 합니다. 하지만 이 분석은 원본 이력서와 원본 JD를 보지 않습니다. Stage 1의 출력(구조화된 후보자 데이터)과 Stage 2의 출력(요구사항 매트릭스)만 입력으로 받습니다.

Stage 3이 원본을 보지 않는 이유

원본 이력서를 다시 보여주면, AI가 Stage 1에서 추출하지 않은 정보를 "재발견"해서 분석에 포함시킬 수 있습니다. Stage 1에서 의도적으로 걸러낸 추론 정보가 Stage 3에서 다시 살아나는 겁니다.

데이터 흐름을 물리적으로 차단해야 할루시네이션이 방지됩니다.

이 단계의 Temperature는 추출 단계보다 약간 높지만, 여전히 보수적인 수준입니다. 왜냐하면 매칭 분석에는 약간의 "해석"이 필요하기 때문입니다. "이 후보자의 AWS 3년 경험이 JD의 AWS 필수 요구사항을 충족하는가?"를 판단하려면, 단순 키워드 매칭을 넘어서는 이해가 필요합니다.

하지만 분석에 최적화된 이 설정은 여전히 보수적인 수준입니다. AI가 "창의적으로" 매칭을 해석하지 않으면서도, 맥락을 이해할 수 있는 정도. 이 미세한 밸런스가 중요합니다.

데이터 흐름 — 오염 차단 구조

Stage 1 (극보수적 설정)

이력서 원본 → 구조화된 팩트 데이터

추론/해석 금지. 이력서에 적힌 것만 추출.

Stage 2 (극보수적 설정)

JD 원본 → 요구사항 매트릭스

필수/우대/소프트 분류. 없으면 "없음".

Stage 3 (분석 최적화 설정)

팩트 데이터 + 매트릭스 → 매칭 리포트

원본 접근 차단. Stage 1, 2 출력만 사용.

• • •

Temperature라는 열쇠

Temperature는 AI의 "창의성 다이얼"입니다. 0에 가까우면 가장 확률이 높은 답만 내놓고, 1에 가까우면 다양한(때로는 엉뚱한) 답을 생성합니다.

설정 수준	특성	적합한 작업	CA에서의 용도
극보수적	정확성 극대화	데이터 추출, 분류	Stage 1 (이력서 추출), Stage 2 (JD 구조화)
분석 최적화	약간의 유연성	비교 분석, 맥락 판단	Stage 3 (매칭 분석)
창의적	자유로운 표현	아웃리치 메시지, 카피	이메일 시퀀스 AI 메시지 생성
매우 높음	매우 창의적 (불안정)	브레인스토밍	사용하지 않음

대부분의 AI 서비스는 기본 Temperature가 0.7~1.0입니다. 채팅, 글쓰기, 아이디어 생성에는 적합하지만, 팩트 기반 분석에는 재앙입니다.

CA에서는 분석 파이프라인의 2/3가 극도로 보수적인 Temperature로 동작합니다. "정확하지 않으면 안 만든다"는 원칙의 기술적 구현입니다.

극도로 보수적인 Temperature는 AI에게 "너의 생각을 말해달라"가 아니라 "눈에 보이는 것만 말해달라"고 요구하는 것입니다.

• • •

검증은 구조로 해야 합니다

"AI를 믿되, 검증하라"라는 말을 많이 합니다. 맞는 말이지만, 사람이 매번 결과를 검증하는 건 비현실적입니다. 특히 하루에 수십 건의 분석을 돌려야 하는 리크루터에게 "AI 결과를 매번 원본과 대조해보세요"는 도구가 아니라 숙제를 주는 겁니다.

그래서 검증 자체를 구조에 내장했습니다.

3단계

분리된 파이프라인

극보수적

추출 단계 Temperature

팩트 기반 할루시네이션

이 구조가 완벽하다고 말하지는 않겠습니다. AI 기술이 발전하면 할루시네이션 자체가 줄어들 수도 있고, 더 나은 방법이 나올 수도 있습니다. 하지만 지금, 프로덕션 환경에서, 실제 채용 데이터를 다루면서 할루시네이션 0%를 유지하고 있다는 건 이 구조가 작동한다는 의미입니다.

기술적으로 대단한 건 아닙니다. "나눠서 시키고, 각 단계에서 할 수 있는 일을 제한한다." 그게 전부입니다. 하지만 이 단순한 원칙을 실제 프로덕트에서 일관되게 적용하는 건 생각보다 어렵습니다. "그냥 한 번에 시키면 더 빠르고 편한데"라는 유혹이 매 순간 있으니까요.

다음 편에서는 자기감시의 구조적 한계에 대해 이야기합니다. 8명의 AI 임원진에서 CHRO를 해고하게 된 이유. 그리고 "진짜 견제"란 무엇인지.

Ask AI to "compare this resume against the JD," and you'll get impressively polished results. The problem? Half of it is fabricated.

This was the first wall I hit building Candidate Analyzer. The AI would generate analysis results that added experience not on the resume, or invented JD requirements that didn't exist. On the surface, it looked professional and logical -- but it wasn't factual.

What happens when non-factual information creeps into talent analysis? The recruiter recommends candidates based on false evidence. That's not a product -- that's an incident.

• • •

Ask It to Do Everything at Once, and It Lies

The initial approach was simple. Give the AI a resume and JD simultaneously, and ask "analyze whether this candidate is suitable for this position" in a single request.

The result was impressive. Strengths, weaknesses, match rate, recommendation comments -- all neatly organized in a polished report. Even with my 12 years of recruiting experience, my first reaction was "oh, this looks pretty good."

Then I pulled out the original resume and cross-referenced. The problems started surfacing.

Actual Hallucination Cases We Encountered

Case 1: Resume stated "Python 3 years." AI inflated this to "Python and ML pipeline experience, 5 years" in the analysis. There was zero mention of ML experience on the resume.

Case 2: A "nice-to-have" in the JD was reclassified as a "required qualification" in the analysis. The candidate's failure to meet a preferred qualification was flagged as a "critical mismatch."

Case 3: No "team lead experience" on the resume, but AI interpreted "participated in team projects" as "has leadership experience." In the field, this distinction is critical.

Why does this happen? Because AI is trained to produce "plausible-sounding answers." When processing a resume and JD simultaneously, the AI tries to bridge gaps between them. In that process, it generates information that doesn't exist. From the AI's perspective, it's manufacturing evidence to support the conclusion that "this candidate matches the JD well."

AI was built to fill in blanks. The problem is, in talent analysis, blanks mean "missing information" -- not "information to be invented."

• • •

Why We Split It into 3 Stages

The solution was surprisingly simple. Don't ask it to do everything at once.

When you tell AI to "analyze," it tries to extract, structure, compare, and judge all at once. In this process, the boundaries between steps collapse, and input data mixes with output data. The distinction between "what was read from the resume" and "what the AI invented" disappears.

So we separated the pipeline into 3 stages. Each stage runs independently, and only the output of the previous stage feeds into the next.

Fact Extraction

Resume → Structured Data

Ultra-conservative settings

→

JD Structuring

JD → Requirements Matrix

Ultra-conservative settings

→

Matching Analysis

Data vs Matrix

Analysis-optimized settings

The core principle is straightforward. Radically constrain what AI can do at each stage. Stage 1 only "extracts." No judgment. Stage 2 only "structures." No interpretation. Judgment and interpretation are only permitted in Stage 3, and even then, it's forced to use only the outputs from Stages 1 and 2.

• • •

Stage 1: Extract Facts Only

The first stage extracts only facts from the resume. By "facts," I mean information that is explicitly stated on the resume.

Stage 1 Rules

Does: Convert resume information (name, experience, tech stack, education, certifications) into structured JSON

Does Not: Interpret, infer, judge, or augment. Inferences like "this person probably has leadership skills" are forbidden

Temperature: Ultra-conservative setting (minimal creativity, maximum accuracy)

An ultra-conservative Temperature setting is essentially telling the AI "absolutely do not think creatively." If the resume says "Python 3 years," the output says exactly "Python 3 years." It won't inflate it to "Python and related library experience."

If the AI adds information that doesn't exist in the original resume at this stage, every subsequent analysis gets contaminated. That's why the prompt explicitly enforces: "Do not add any information not explicitly stated in the resume." The combination of conservative Temperature and this prompt is the core of Stage 1.

• • •

Stage 2: Structuring the JD

The second stage structures the JD (Job Description). JDs are actually quite unstructured documents. Some companies write clean bullet points, others write long prose, and some don't distinguish between required and preferred qualifications.

Stage 2 transforms these varied JD formats into a consistent requirements matrix.

Raw JD (Unstructured)

"Python experience preferred. Someone who can design ML pipelines. 3+ years. Good communicator preferred. AWS experience required. K8s would be nice."

Stage 2 Output (Structured)

Required: AWS experience, 3+ years
Preferred: Python, ML pipeline, K8s
Soft Skills: Communication
Not Specified: Education, industry, title

The critical part is the "Not Specified" category. If the JD doesn't mention education requirements, it's classified as "Not Specified." A standard AI approach would infer "a position like this probably requires a bachelor's degree" and add it. Stage 2 doesn't infer. If it's not there, it's not there.

This stage also uses ultra-conservative Temperature settings. Extract only what's written in the JD text -- don't interpret. "Preferred" stays "preferred," "required" stays "required." From a recruiter's perspective, this distinction is critical. Failing to meet a preferred qualification shouldn't count against a candidate, but most existing tools can't make this distinction.

• • •

Stage 3: Fact-Based Matching

In the third stage, AI finally "analyzes." But this analysis doesn't see the original resume or the original JD. It only receives Stage 1's output (structured candidate data) and Stage 2's output (requirements matrix) as inputs.

Why Stage 3 Doesn't See the Originals

If you show the original resume again, the AI might "rediscover" information that Stage 1 deliberately didn't extract and include it in the analysis. Inference-based information intentionally filtered out at Stage 1 comes back to life at Stage 3.

You must physically block the data flow to prevent hallucination.

This stage's Temperature is slightly higher than the extraction stages but still conservative. Matching analysis requires some degree of "interpretation" -- judging whether "this candidate's 3 years of AWS experience satisfies the JD's AWS requirement" requires understanding beyond simple keyword matching.

But this analysis-optimized setting remains conservative enough that the AI doesn't "creatively" interpret matches while still understanding context. This delicate balance is critical.

Data Flow — Contamination Prevention Structure

Stage 1 (Ultra-Conservative)

Original Resume → Structured Fact Data

No inference/interpretation. Extract only what's written.

Stage 2 (Ultra-Conservative)

Original JD → Requirements Matrix

Required/Preferred/Soft classification. If absent, it's "absent."

Stage 3 (Analysis-Optimized)

Fact Data + Matrix → Matching Report

Original access blocked. Uses only Stage 1 & 2 outputs.

• • •

Temperature: The Key

Temperature is AI's "creativity dial." Closer to 0, it produces only the highest-probability answers. Closer to 1, it generates diverse (sometimes nonsensical) responses.

Level	Characteristics	Best For	Usage in CA
Ultra-Conservative	Maximum accuracy	Data extraction, classification	Stage 1 (resume), Stage 2 (JD)
Analysis-Optimized	Slight flexibility	Comparative analysis, context judgment	Stage 3 (matching analysis)
Creative	Free expression	Outreach messages, copy	Email sequence AI messages
Very High	Highly creative (unstable)	Brainstorming	Not used

Most AI services default to a Temperature of 0.7-1.0. Fine for chatting, writing, and ideation, but a disaster for fact-based analysis.

In CA, two-thirds of the analysis pipeline runs at ultra-conservative Temperature. This is the technical implementation of the principle "if it's not accurate, don't build it."

An ultra-conservative Temperature tells AI not "share your thoughts" but "report only what you see."

• • •

Verification Must Be Structural

"Trust but verify" is common advice for AI. It's sound advice, but having humans verify every result is impractical. For a recruiter running dozens of analyses per day, "manually cross-reference every AI output against the original" isn't a tool -- it's a homework assignment.

So we embedded verification into the structure itself.

3 Stages

Separated Pipeline

Ultra-Low

Extraction Temperature

Fact-Based Hallucination

I won't claim this structure is perfect. As AI technology advances, hallucination itself may decrease, and better methods may emerge. But right now, in a production environment, handling real hiring data while maintaining a 0% hallucination rate -- that means this structure works.

It's not technically groundbreaking. "Split it up and constrain what each stage can do." That's the whole thing. But consistently applying this simple principle in a real product is harder than you'd think. The temptation of "it's faster and easier to just do it in one shot" is always there.

In the next post, I'll discuss the structural limitations of self-monitoring. Why we fired the CHRO from our 8-person AI executive team. And what "real checks and balances" actually means.

AI 할루시네이션을
0으로 만드는 3단계

3 Steps to Reduce
AI Hallucination to Zero

한 번에 시키면 거짓말을 합니다

3단계로 나눈 이유

Stage 1: 팩트만 추출하기

Stage 2: JD를 구조화하기

Stage 3: 팩트 기반 매칭

Temperature라는 열쇠

검증은 구조로 해야 합니다

Ask It to Do Everything at Once, and It Lies

Why We Split It into 3 Stages

Stage 1: Extract Facts Only

Stage 2: Structuring the JD

Stage 3: Fact-Based Matching

Temperature: The Key

Verification Must Be Structural

할루시네이션 0%의 AI 분석 도구

AI 할루시네이션을0으로 만드는 3단계

3 Steps to ReduceAI Hallucination to Zero

한 번에 시키면 거짓말을 합니다

3단계로 나눈 이유

Stage 1: 팩트만 추출하기

Stage 2: JD를 구조화하기

Stage 3: 팩트 기반 매칭

Temperature라는 열쇠

검증은 구조로 해야 합니다

Ask It to Do Everything at Once, and It Lies

Why We Split It into 3 Stages

Stage 1: Extract Facts Only

Stage 2: Structuring the JD

Stage 3: Fact-Based Matching

Temperature: The Key

Verification Must Be Structural

할루시네이션 0%의 AI 분석 도구

AI 할루시네이션을
0으로 만드는 3단계

3 Steps to Reduce
AI Hallucination to Zero