JobFitAI：利用Deepseek+DeepInfra+Gradio构建一个简历综合分析项目-极客小站

在当今竞争激烈的就业市场上，让您的简历脱颖而出至关重要。JobFitAI 是一款创新型解决方案，旨在通过分析简历和提供可操作的反馈来帮助求职者和招聘人员。传统的基于关键字的筛选方法可能会忽略求职者简历中的关键细微差别。为了克服这些挑战，可以利用人工智能驱动的系统来分析简历、提取关键技能，并将其与职位描述进行有效匹配。

学习目标

安装所有必需的库，并使用 DeepInfra API 密钥配置您的环境。
了解如何创建可处理 PDF 和音频文件的人工智能简历分析器。
通过 DeepInfra 利用 DeepSeek-R1 从简历中提取相关信息。
使用 Gradio 开发交互式网络应用程序，实现无缝用户交互。
应用实用的增强功能并解决常见问题，为您的简历分析器增添重要价值。

什么是Deepseek R1

DeepSeek-R1 是一款先进的开源人工智能模型，专为自然语言处理（NLP）任务而设计。它是一个基于转换器的大型语言模型（LLM），经过训练可理解和生成类人文本。DeepSeek-R1 可以执行文本摘要、问题解答、语言翻译等任务。由于它是开源的，开发人员可以将其集成到各种应用中，根据特定需求进行微调，并在自己的硬件上运行，而无需依赖专有系统。它尤其适用于研究、自动化和人工智能驱动的应用。

了解Gradio

Gradio 是一个用户友好型 Python 库，可帮助开发人员为机器学习模型和其他应用创建交互式网络界面。只需几行代码，Gradio 就能让用户创建具有输入组件（如文本框、滑块和图片上传）和输出显示（如文本、图片或音频）的可共享应用程序。它被广泛应用于人工智能模型演示、快速原型设计和面向非技术用户的友好界面。Gradio 还支持简单的模型部署，允许开发人员通过公共链接分享他们的应用程序，而无需复杂的网络开发技能。

本指南介绍了 JobFitAI，这是一个端到端的解决方案，可利用尖端技术提取文本、生成详细分析，并就简历与给定职位描述的匹配程度提供反馈：

DeepSeek-R1：强大的人工智能模型，可从简历文本中提取关键技能、经验、教育和成就。
DeepInfra：提供强大的与 OpenAI 兼容的 API 接口，使我们能够与 DeepSeek-R1 等人工智能模型进行无缝交互。
Gradio：一个用户友好型框架，可让您快速轻松地为机器学习应用构建交互式网络界面。

项目架构

JobFitAI 项目采用模块化架构，每个组件都在处理简历时发挥特定作用。以下是概述：

JobFitAI/ 
│── src/
│   ├── __pycache__/  (compiled Python files)
│   ├── analyzer.py
│   ├── audio_transcriber.py
│   ├── feedback_generator.py
│   ├── pdf_extractor.py
│   ├── resume_pipeline.py
│── .env  (environment variables)
│── .gitignore
│── app.py  (Gradio interface)
│── LICENSE
│── README.md
│── requirements.txt  (dependencies)

设置环境

在深入学习代码之前，您需要设置开发环境。

创建虚拟环境并安装依赖项

首先，在项目文件夹中创建一个虚拟环境来管理依赖项。打开终端并运行

python3 -m venv jobfitai
source jobfitai/bin/activate  # On macOS/Linux
python -m venv jobfitai
jobfitai\Scripts\activate # On Windows - cmd

接下来，创建一个名为 requirements.txt 的文件，并添加以下库：

requests 
whisper
PyPDF2
python-dotenv
openai
torch
torchvision
torchaudio
gradio

运行下列命令安装依赖项：

pip install -r requirements.txt

设置环境变量

项目需要 API 令牌才能与 DeepInfra API 交互。在项目根目录中创建 .env 文件，并添加 API 令牌：

DEEPINFRA_TOKEN="your_deepinfra_api_token_here"

确保将 your_deepinfra_api_token_here 替换为 DeepInfra 提供的实际令牌。

了解如何访问 DeepInfra API 密钥；此处。

项目简介

该项目由多个 Python 模块组成。在下面的章节中，我们将了解每个文件的用途及其在项目中的上下文。

src/audio_transcriber.py

简历并不总是文本格式。在收到音频简历时，AudioTranscriber 类就会发挥作用。该文件使用 OpenAI 的 Whisper 模型将音频文件转录为文本。然后，分析器会使用转录内容提取简历细节。

import whisper
class AudioTranscriber:
"""Transcribe audio files using OpenAI Whisper."""
def __init__(self, model_size: str = "base"):
"""
Initializes the Whisper model for transcription.
Args:
model_size (str): The size of the Whisper model to load. Defaults to "base".
"""
self.model_size = model_size 
self.model = whisper.load_model(self.model_size)
def transcribe(self, audio_path: str) -> str:
"""
Transcribes the given audio file and returns the text.
Args:
audio_path (str): The path to the audio file to be transcribed.
Returns:
str: The transcribed text.
Raises:
Exception: If transcription fails.
"""
try:
result = self.model.transcribe(audio_path)
return result["text"]
except Exception as e:
print(f"Error transcribing audio: {e}")
return ""

src/pdf_extractor.py

大多数简历都是 PDF 格式。PDFExtractor 类负责使用 PyPDF2 库从 PDF 文件中提取文本。该模块循环浏览 PDF 文档的所有页面，提取文本并将其编译成一个字符串，以便进一步分析。

import PyPDF2
class PDFExtractor:
"""Extract text from PDF files using PyPDF2."""
def __init__(self):
"""Initialize the PDFExtractor."""
pass
def extract_text(self, pdf_path: str) -> str:
"""
Extract text content from a given PDF file.
Args:
pdf_path (str): Path to the PDF file.
Returns:
str: Extracted text from the PDF.
Raises:
FileNotFoundError: If the file does not exist.
Exception: For other unexpected errors.
"""
text = ""
try:
with open(pdf_path, "rb") as file:
reader = PyPDF2.PdfReader(file)
for page in reader.pages:
page_text = page.extract_text()
if page_text:
text += page_text + "\n"
except FileNotFoundError:
print(f"Error: The file '{pdf_path}' was not found.")
except Exception as e:
print(f"An error occurred while extracting text: {e}")
return text

src/resume_pipeline.py

ResumePipeline 模块是处理简历的协调器。它集成了 PDF 提取器和音频转录器。根据用户提供的文件类型，它将简历导向正确的处理器，并返回提取的文本。这种模块化设计便于在将来需要支持其他简历格式时进行扩展。

from src.pdf_extractor import PDFExtractor
from src.audio_transcriber import AudioTranscriber
class ResumePipeline:
"""
Process resume files (PDF or audio) and return extracted text.
"""
def __init__(self):
"""Initialize the ResumePipeline with PDFExtractor and AudioTranscriber."""
self.pdf_extractor = PDFExtractor()
self.audio_transcriber = AudioTranscriber()
def process_resume(self, file_path: str, file_type: str) -> str:
"""
Process a resume file and extract text based on its type.
Args:
file_path (str): Path to the resume file.
file_type (str): Type of the file ('pdf' or 'audio').
Returns:
str: Extracted text from the resume.
Raises:
ValueError: If the file type is unsupported.
FileNotFoundError: If the specified file does not exist.
Exception: For other unexpected errors.
"""
try:
file_type_lower = file_type.lower()
if file_type_lower == "pdf":
return self.pdf_extractor.extract_text(file_path)
elif file_type_lower in ["audio", "wav", "mp3"]:
return self.audio_transcriber.transcribe(file_path)
else:
raise ValueError("Unsupported file type. Use 'pdf' or 'audio'.")
except FileNotFoundError:
print(f"Error: The file '{file_path}' was not found.")
return ""
except ValueError as ve:
print(f"Error: {ve}")
return ""
except Exception as e:
print(f"An unexpected error occurred: {e}")
return ""

src/analyzer.py

该模块是简历分析器的主干。它使用 DeepSeek-R1 模型初始化与 DeepInfra API 的连接。该文件中的主要函数是 analyze_text，它将简历文本作为输入，并返回总结简历关键细节的分析结果。该文件确保我们的简历文本由专为简历分析定制的人工智能模型处理。

import os
from openai import OpenAI 
from dotenv import load_dotenv
# Load environment variables from .env file
load_dotenv()
class DeepInfraAnalyzer:
"""
Calls DeepSeek-R1 model on DeepInfra using an OpenAI-compatible interface.
This class processes resume text and extracts structured information using AI.
""" 
def __init__(
self,
api_key: str= os.getenv("DEEPINFRA_TOKEN"),
model_name: str = "deepseek-ai/DeepSeek-R1"
):
"""
Initializes the DeepInfraAnalyzer with API key and model name.
:param api_key: API key for authentication 
:param model_name: The name of the model to use 
"""
try:
self.openai_client = OpenAI(
api_key=api_key, 
base_url="https://api.deepinfra.com/v1/openai",
)
self.model_name = model_name 
except Exception as e:
raise RuntimeError(f"Failed to initialize OpenAI client: {e}")
def analyze_text(self, text: str) -> str:
"""
Processes the given resume text and extracts key information in JSON format.
The response will contain structured details about key skills, experience, education, etc.
:param text: The resume text to analyze
:return: JSON string with structured resume analysis
"""
prompt = (
"You are an AI job resume matcher assistant. "
"DO NOT show your chain of thought. "
"Respond ONLY in English. "
"Extract the key skills, experiences, education, achievements, etc. from the following resume text. "
"Then produce the final output as a well-structured JSON with a top-level key called \"analysis\". "
"Inside \"analysis\", you can have subkeys like \"key_skills\", \"experiences\", \"education\", etc. "
"Return ONLY the final JSON, with no extra commentary.\n\n"
f"Resume Text:\n{text}\n\n"
"Required Format (example):\n"
"```\n"
"{\n"
"  \"analysis\": {\n"
"    \"key_skills\": [...],\n"
"    \"experiences\": [...],\n"
"    \"education\": [...],\n"
"    \"achievements\": [...],\n"
"    ...\n"
"  }\n"
"}\n"
"```\n"
) 
try:
response = self.openai_client.chat.completions.create(
model=self.model_name,
messages=[{"role": "user", "content": prompt}], 
)
return response.choices[0].message.content
except Exception as e:
raise RuntimeError(f"Error processing resume text: {e}")

src/feedback_generator.py

从简历中提取详细信息后，下一步就是将简历与特定职位描述进行比较。FeedbackGenerator 模块从简历中提取分析结果，并提供匹配分数和改进建议。该模块对于求职者来说至关重要，他们可以通过该模块完善简历，使其与职位描述更加匹配，从而增加通过自动求职系统的机会。

from src.analyzer import DeepInfraAnalyzer 
class FeedbackGenerator:
"""
Generates feedback for resume improvement based on a job description 
using the DeepInfraAnalyzer.
"""
def __init__(self, analyzer: DeepInfraAnalyzer):
"""
Initializes the FeedbackGenerator with an instance of DeepInfraAnalyzer.
Args:
analyzer (DeepInfraAnalyzer): An instance of the DeepInfraAnalyzer class.
"""
self.analyzer = analyzer 
def generate_feedback(self, resume_text: str, job_description: str) -> str:
"""
Generates feedback on how well a resume aligns with a job description.
Args:
resume_text (str): The extracted text from the resume.
job_description (str): The job posting or job description.
Returns:
str: A JSON-formatted response containing:
- "match_score" (int): A score from 0-100 indicating job match quality.
- "job_alignment" (dict): Categorization of strong and weak matches.
- "missing_skills" (list): Skills missing from the resume.
- "recommendations" (list): Actionable suggestions for improvement.
Raises:
Exception: If an unexpected error occurs during analysis.
"""
try:
prompt = (
"You are an AI job resume matcher assistant. "
"DO NOT show your chain of thought. "
"Respond ONLY in English. "
"Compare the following resume text with the job description. "
"Calculate a match score (0-100) for how well the resume matches. "
"Identify keywords from the job description that are missing in the resume. "
"Provide bullet-point recommendations to improve the resume for better alignment.\n\n"
f"Resume Text:\n{resume_text}\n\n"
f"Job Description:\n{job_description}\n\n"
"Return JSON ONLY in this format:\n"
"{\n"
"  \"job_match\": {\n"
"    \"match_score\": ,\n"
"    \"job_alignment\": {\n"
"      \"strong_match\": [...],\n"
"      \"weak_match\": [...]\n"
"    },\n"
"    \"missing_skills\": [...],\n"
"    \"recommendations\": [\n"
"      \"\",\n"
"      \"\",\n"
"      ...\n"
"    ]\n"
"  }\n"
"}"
) 
return self.analyzer.analyze_text(prompt)
except Exception as e:
print(f"Error in generating feedback: {e}")
return "{}"  # Returning an empty JSON string in case of failure

app.py

app.py 文件是 JobFitAI 项目的主要入口。它集成了上述所有模块，并使用 Gradio 构建了一个交互式网页界面。用户可以上传简历/CV 文件（PDF 或音频）并输入职位描述。然后，应用程序会处理简历、运行分析、生成反馈，并返回包含分析和建议的结构化 JSON 响应。

import os
from dotenv import load_dotenv
load_dotenv()
import gradio as gr 
from src.resume_pipeline import ResumePipeline
from src.analyzer import DeepInfraAnalyzer
from src.feedback_generator import FeedbackGenerator
# Pipeline for PDF/audio
resume_pipeline = ResumePipeline()
# Initialize the DeepInfra analyzer   
analyzer = DeepInfraAnalyzer()
# Feedback generator
feedback_generator = FeedbackGenerator(analyzer) 
def analyze_resume(resume_path, job_desc):
"""
Gradio callback function to analyze a resume against a job description.
Args:
resume_path (str): Path to the uploaded resume file (PDF or audio).
job_desc (str): The job description text for comparison.
""" 
try:
if not resume_path or not job_desc:
return {"error": "Please upload a resume and enter a job description."}
# Determine file type from extension
lower_name = resume_path.lower()
file_type = "pdf" if lower_name.endswith(".pdf") else "audio"
# Extract text from the resume
resume_text = resume_pipeline.process_resume(resume_path, file_type)
# Analyze extracted text
analysis_result = analyzer.analyze_text(resume_text)
# Generate feedback and recommendations
feedback = feedback_generator.generate_feedback(resume_text, job_desc)
# Return structured response
return {
"analysis": analysis_result,
"recommendations": feedback
}
except ValueError as e:
return {"error": f"Unsupported file type or processing error: {str(e)}"}
except Exception as e:
return {"error": f"An unexpected error occurred: {str(e)}"}
# Define Gradio interface
demo = gr.Interface(
fn=analyze_resume,
inputs=[
gr.File(label="Resume (PDF/Audio)", type="filepath"),
gr.Textbox(lines=5, label="Job Description"),
],
outputs="json",
title="JobFitAI: AI Resume Analyzer",
description="""
Upload your resume/cv (PDF or audio) and paste the job description to get a match score,
missing keywords, and actionable recommendations.""",
)
if __name__ == "__main__": 
demo.launch(server_name="0.0.0.0", server_port=8000)

使用 Gradio 运行应用程序

设置好环境并检查所有代码组件后，就可以运行应用程序了。

启动应用程序： 在终端导航到项目目录，执行以下代码

python app.py

该命令将在本地启动 Gradio 界面。在浏览器中打开提供的 URL，查看交互式简历分析器。
测试 JobFitAI：
- 上传简历：选择 PDF 文件或包含录音简历的音频文件。
- 输入职位描述：粘贴或输入职位描述
- 查看输出：系统将显示一个 JSON 响应，其中包括对简历的详细分析、匹配分数、缺失的关键字以及反馈和改进建议。

您可以在 Github 代码库中找到所有代码文件 –点击此处。

使用案例和实际应用

JobFitAI 简历分析器可应用于各种实际场景：

提高简历质量

自我评估：求职者可在申请前使用该工具对简历进行自我评估。通过了解匹配分数和需要改进的地方，他们可以更好地为特定职位定制简历。
反馈回路：该工具生成的结构化 JSON 反馈可集成到职业咨询平台中，提供个性化的简历改进建议。

教育和培训应用

职业研讨会：教育机构和职业辅导平台可将 JobFitAI 纳入其课程。它是如何利用人工智能提高职业准备度的实际演示。
编码和人工智能项目：有抱负的数据科学家和开发人员可以学习如何将多种人工智能服务（如转录、PDF 提取和自然语言处理）整合到一个有凝聚力的项目中。

故障排除和扩展

下面让我们来探讨故障排除和扩展– 常见问题和解决方案

常见问题和解决方案

API 令牌问题：如果 DeepInfra API 标记丢失或不正确，分析器模块将失效。请始终验证您的 .env 文件是否包含正确的令牌，以及令牌是否处于激活状态。
不支持的文件类型：应用程序目前仅支持 PDF 和音频格式。如果尝试上传其他文件类型（如 DOCX），系统将提示错误。未来的扩展功能将包括对其他格式的支持。
转录延迟：音频转录有时需要较长的时间，尤其是较大的文件。如果您计划处理许多音频简历，请考虑使用更高规格的机器或基于云的解决方案。