An intelligent HR management system that leverages Retrieval-Augmented Generation (RAG) and AI to streamline candidate management, CV analysis, and job matching processes.
HR Rag Assistant is a full-stack application that combines Spring Boot backend with React frontend to provide comprehensive HR management capabilities. The system uses OpenAI's GPT models and vector embeddings to intelligently analyze candidate CVs, match them with job requirements, and provide AI-powered insights.
- CV Upload & Processing: Upload PDF, DOC, DOCX, and text files
- Intelligent Text Extraction: Automatic extraction of candidate information from CVs
- Vector Storage: PostgreSQL with pgvector for semantic search capabilities
- AI-Powered Q&A: Ask questions about uploaded documents using natural language
- Complete Candidate Profiles: Name, email, phone, skills, experience, education
- Skills Extraction: Automatic identification of technical and soft skills
- Experience Analysis: Years of experience and career history extraction
- Education Parsing: Educational background and qualifications
- Smart Job Analysis: Define job requirements with skills, experience, and education criteria
- Candidate Ranking: AI-powered scoring and ranking system (0-100% match)
- Multi-criteria Evaluation:
- Skills matching (40% weight) - Required vs. Preferred skills
- Experience level (30% weight) - Years and relevance
- Education requirements (20% weight)
- CV content relevance (10% weight)
- AI Recommendations: Generated insights for top candidates
- Real-time Metrics: Total candidates, weekly/monthly uploads, skill analytics
- Visual Insights: Top skills distribution, candidate statistics
- Activity Tracking: Recent uploads and system activities
- Comprehensive Logging: System monitoring with exportable logs
src/main/java/ie/com/rag/
βββ Application.java # Main application entry point
βββ Constants.java # Application constants and prompts
βββ config/
β βββ WebConfig.java # CORS and web configuration
βββ controller/
β βββ HRController.java # HR operations (candidates, analysis, metrics)
β βββ RagController.java # RAG Q&A functionality
β βββ RagUploaderController.java # CV upload and processing
βββ dto/
β βββ CandidateDTO.java # Candidate data transfer objects
β βββ JobAnalysisRequestDTO.java # Job analysis request structure
β βββ JobAnalysisResponseDTO.java # Job analysis response structure
β βββ QAHistoryDTO.java # Q&A history records
β βββ RankedCandidateDTO.java # Ranked candidate results
β βββ UploadedDocumentDTO.java # Document metadata
βββ service/
βββ CandidateService.java # Candidate CRUD operations
βββ DashboardService.java # Dashboard metrics and analytics
βββ JobAnalysisService.java # Job matching and ranking logic
βββ RagDocumentService.java # Document processing for RAG
βββ RagUploaderService.java # CV upload and text extraction
frontend/src/
βββ App.js # Main application component
βββ components/
β βββ Dashboard.js # Analytics dashboard with comprehensive logging
β βββ DocumentUpload.js # CV upload interface
β βββ CandidateList.js # Candidate management interface
β βββ JobAnalysisForm.js # Job analysis and matching
βββ App.css # Application styling
- candidates: Complete candidate profiles with skills array
- job_analyses: Job requirements and analysis results
- candidate_rankings: Scoring and ranking data for each analysis
- qa_history: Q&A interaction history
- uploaded_documents: Document metadata and tracking
- vector_store: Vector embeddings for semantic search
- Java 21+
- Node.js 16+
- Docker & Docker Compose
- OpenAI API Key
git clone https://github.com/RobertoDure/hr-rag-assistant
cd HR-RagWiserCreate .env file or set environment variable:
export OPENAI_API_KEY=your_openai_api_key_heredocker-compose up -dThis starts PostgreSQL with pgvector extension on port 5432.
# Build and run Spring Boot application
./mvnw clean spring-boot:run
# Or using Maven wrapper on Windows
mvnw.cmd clean spring-boot:runBackend runs on http://localhost:8080
cd frontend
npm install
npm startFrontend runs on http://localhost:3000
spring:
datasource:
url: jdbc:postgresql://localhost:5432/rag_hr_db
username: postgres
password: postgres
ai:
openai:
api-key: ${OPENAI_API_KEY}
chat:
options:
model: gpt-4
temperature: 0.3
max-tokens: 4000
embedding:
options:
model: text-embedding-3-small
vectorstore:
table: vector_store
dimension: 1536
distance: cosine- pgvector: PostgreSQL 16 with vector extension
- Database:
rag_hr_dbwith automatic schema initialization
GET /api/hr/candidates- Get all candidatesGET /api/hr/candidates/{id}- Get candidate by IDDELETE /api/hr/candidates/{id}- Delete candidatePOST /api/hr/analyze- Analyze job requirementsGET /api/hr/metrics- Dashboard metricsGET /api/hr/health- Health check
POST /api/rag/upload- Upload CV with metadataGET /api/rag- Ask questions about documentsGET /api/rag/qa-history- Get Q&A historyGET /api/rag/uploaded-documents- Get document list
- Real-time Statistics: Candidate counts, recent activity
- Visual Analytics: Skills distribution, experience levels
- System Monitoring: Comprehensive logging with export functionality
- Auto-refresh: Updates every 5 minutes
- File Selection: Drag-and-drop or file picker
- Metadata Entry: Candidate name, email, phone
- Processing: Text extraction and AI analysis
- Storage: Database persistence and vector indexing
- Job Definition: Title, description, requirements
- Candidate Matching: AI-powered scoring algorithm
- Results Display: Ranked candidates with match percentages
- AI Recommendations: Generated insights for top matches
The system uses a sophisticated 4-factor scoring system:
- Required Skills (70%): Must-have competencies
- Preferred Skills (30%): Nice-to-have abilities
- Case-insensitive matching with exact keyword detection
- Years Validation: Minimum/maximum experience requirements
- Penalty System: Graduated scoring for over/under-qualification
- Missing Data Handling: Neutral scoring for incomplete information
- Level Matching: Degree type and field alignment
- Hierarchy Recognition: Higher degrees satisfy lower requirements
- Partial Credit: Scoring for related educational backgrounds
- Keyword Analysis: Job description terminology in CV
- Semantic Matching: Context-aware content evaluation
- Frequency Weighting: Multiple mention scoring
- Spring Boot 3.3.2: Application framework
- Spring AI 1.0.0-M1: AI integration and RAG capabilities
- PostgreSQL: Primary database with JSON support
- pgvector: Vector similarity search
- Spring Data JDBC: Database access layer
- OpenAI Integration: GPT-4 and embedding models
- React 18.2.0: UI framework
- React Bootstrap: Component library
- Axios: HTTP client
- React Router: Navigation
- React Dropzone: File upload
- React Icons: Icon library
- Maven: Build management
- Docker Compose: Development environment
- Lombok: Boilerplate reduction
- SLF4J: Logging framework
- Comprehensive Tracking: All user actions and system events
- Log Levels: Info, Warning, Error, Debug
- Persistent Storage: LocalStorage with 50-entry limit
- Export Functionality: JSON format download
- Performance Metrics: API response times and duration tracking
- Auto-refresh: 5-minute intervals
- Error Handling: Graceful degradation and user feedback
- Health Endpoints: Service status monitoring
- CORS Configuration: Controlled cross-origin access
- Input Validation: Comprehensive request validation
- File Upload Security: Type and size restrictions (50MB limit)
- SQL Injection Prevention: Parameterized queries
- Text Sanitization: Null byte and control character removal
- Database Indexing: Optimized queries for frequent operations
- Concurrent Processing: Parallel API calls in frontend
- Vector Search: Efficient similarity matching with pgvector
- Batch Operations: Optimized bulk data processing
- Connection Pooling: Database connection management
- Set production OpenAI API key
- Configure database connection pooling
- Enable SSL/TLS for secure connections
- Set up application monitoring
- Configure log aggregation
- Implement backup strategies
OPENAI_API_KEY=your_production_api_key
SPRING_PROFILES_ACTIVE=production
DATABASE_URL=your_production_database_urlThe HR Rag Assistant includes a complete CI/CD pipeline using Jenkins and Kubernetes for automated building, testing, and deployment.
The Jenkins pipeline automates the entire deployment process:
- Code Checkout β Pull latest code from Git repository
- Backend Build β Maven clean install
- Testing β Run unit tests
- Docker Build β Create backend and frontend images (parallel)
- Docker Push β Push images to Docker Hub registry
- Kubernetes Deploy β Deploy to Kubernetes cluster
- Verification β Verify deployment health and status
The Jenkinsfile defines a declarative pipeline with the following stages:
GIT_REPO # Your GitHub repository URL
GIT_BRANCH # Branch to build (default: main)
DOCKER_USERNAME # Docker Hub username
BACKEND_IMAGE # Backend Docker image name
FRONTEND_IMAGE # Frontend Docker image name
IMAGE_TAG # Build number for versioning
K8S_NAMESPACE # Kubernetes namespace (hr-ragwiser)1. Checkout Stage
- Pulls code from Git repository
- Supports branch configuration
- Works with GitHub webhooks or polling
2. Build Backend Stage
- Runs
mvn clean install -DskipTests - Compiles Java code and creates JAR
- Cross-platform support (Unix/Windows)
3. Test Backend Stage
- Executes
mvn test - Runs all unit tests
- Pipeline fails if tests don't pass
4. Build Docker Images Stage (Parallel)
- Builds backend image from root Dockerfile
- Builds frontend image from frontend/Dockerfile
- Tags images with build number
- Parallel execution for efficiency
5. Push Docker Images Stage
- Authenticates to Docker Hub using credentials
- Pushes both backend and frontend images
- Images versioned with build number
6. Deploy to Kubernetes Stage
- Creates namespace if not exists
- Applies all Kubernetes configurations in order
- Waits for PostgreSQL readiness
- Updates or creates deployments
- Applies ingress rules
7. Verify Deployment Stage
- Checks rollout status for all deployments
- Displays pods, services, and deployment status
- Ensures all components are healthy
The k8s/ directory contains all Kubernetes manifests for deploying the application:
Purpose: Isolates application resources
Name: hr-ragwiser
Labels: app=hr-rag-assistantCreates a dedicated namespace for the application, providing resource isolation and organization.
Purpose: Non-sensitive configuration data
Contains:
- Database connection details (host, port, database name)
- Spring datasource configuration
- Backend URL for frontend
- Application profilesKey configurations:
POSTGRES_HOST: postgres-service- Internal K8s service nameSPRING_DATASOURCE_URL- JDBC connection stringREACT_APP_BACKEND_URL- API endpoint for frontend
Purpose: Store passwords and sensitive information
Type: Opaque (base64 encoded)
Contains:
- POSTGRES_PASSWORD
- SPRING_DATASOURCE_PASSWORDecho -n 'your-actual-password' | base64Purpose: Persistent storage for PostgreSQL database
Access Mode: ReadWriteOnce
Storage: 5GiEnsures database data persists across pod restarts and rescheduling.
Purpose: PostgreSQL database with pgvector extension
Image: pgvector/pgvector:pg16
Replicas: 1
Resources:
Requests: 256Mi memory, 250m CPU
Limits: 512Mi memory, 500m CPUFeatures:
- Uses pgvector-enabled PostgreSQL
- Mounts persistent volume to
/var/lib/postgresql/data - Includes liveness and readiness probes
- Environment variables from ConfigMap and Secret
Purpose: Internal service for database access
Type: ClusterIP (internal only)
Port: 5432
Selector: app=postgresMakes PostgreSQL accessible to backend pods within the cluster.
Purpose: Spring Boot application deployment
Image: your_dockerhub_user/hr-ragwiser-backend:latest
Replicas: 2 (high availability)
Strategy: RollingUpdate (zero-downtime deployments)
Resources:
Requests: 512Mi memory, 500m CPU
Limits: 1Gi memory, 1000m CPUFeatures:
- Init Container: Waits for PostgreSQL readiness before starting
- Health Checks: Liveness and readiness probes on
/api/hr/health - Environment Variables: Database credentials and Spring configuration
- Rolling Updates:
maxSurge: 1, maxUnavailable: 0for zero downtime
Purpose: Expose backend API
Type: NodePort
Port: 8080
NodePort: 30080 (external access)
Selector: app=backendAccessible at http://<node-ip>:30080 for external testing.
Purpose: React application deployment
Image: your_dockerhub_user/hr-ragwiser-frontend:latest
Replicas: 2 (high availability)
Strategy: RollingUpdate
Resources:
Requests: 256Mi memory, 250m CPU
Limits: 512Mi memory, 500m CPUFeatures:
- Environment Variables: Backend API URL from ConfigMap
- Health Checks: HTTP probes on root path
/ - Rolling Updates: Ensures continuous availability
Purpose: Expose frontend application
Type: NodePort
Port: 3000
NodePort: 30000 (external access)
Selector: app=frontendAccessible at http://<node-ip>:30000 for user access.
Purpose: HTTP routing and load balancing
IngressClass: nginx
Host: hr-ragwiser.local
Routes:
- / β frontend-service:3000
- /api β backend-service:8080Features:
- Single entry point for application
- Path-based routing
- SSL redirect disabled for local development
# Install Jenkins with Docker support
docker run -d \
--name jenkins \
-p 8080:8080 -p 50000:50000 \
-v jenkins_home:/var/jenkins_home \
-v /var/run/docker.sock:/var/run/docker.sock \
jenkins/jenkins:ltsRequired Jenkins Plugins:
- Docker Pipeline
- Kubernetes CLI
- Git Plugin
- Pipeline
- Credentials Binding
Add the following credentials in Jenkins:
Docker Hub Credentials:
- ID:
docker - Type: Username with password
- Username: Your Docker Hub username
- Password: Your Docker Hub password/token
Kubernetes Config (if using remote cluster):
- ID:
kubeconfig - Type: Secret file
- File: Your kubeconfig file
Options for running Kubernetes:
Local Development:
- Minikube:
minikube start --cpus=4 --memory=8192 - Docker Desktop: Enable Kubernetes in settings
- Kind:
kind create cluster --name hr-ragwiser
Cloud Providers:
- Google GKE:
gcloud container clusters create hr-ragwiser - AWS EKS: Use eksctl or AWS console
- Azure AKS: Use Azure CLI or portal
# Install NGINX Ingress Controller
kubectl apply -f https://raw.githubusercontent.com/kubernetes/ingress-nginx/main/deploy/static/provider/cloud/deploy.yaml
# For Minikube
minikube addons enable ingressUpdate Jenkinsfile:
// Line 6-7: Update with your repository
GIT_REPO = 'https://github.com/YourUsername/hr-rag-assistant.git'
// Line 11-13: Update with your Docker Hub username
DOCKER_USERNAME = 'your-dockerhub-username'
BACKEND_IMAGE = "your-dockerhub-username/hr-ragwiser-backend"
FRONTEND_IMAGE = "your-dockerhub-username/hr-ragwiser-frontend"Update Kubernetes Deployments:
# backend-deployment.yaml and frontend-deployment.yaml
# Replace image URLs with your Docker Hub username
image: your-dockerhub-username/hr-ragwiser-backend:latest
image: your-dockerhub-username/hr-ragwiser-frontend:latestUpdate Secrets:
# Generate base64 encoded password
echo -n 'YourSecurePassword123!' | base64
# Update k8s/secret.yaml with the output- Open Jenkins UI (
http://localhost:8080) - Create New Item β Pipeline
- Name:
HR-RagWiser-Pipeline - Pipeline β Definition: Pipeline script from SCM
- SCM: Git
- Repository URL: Your repository URL
- Branch:
*/main - Script Path:
Jenkinsfile - Save
Option A: Polling (Simple)
triggers {
pollSCM('H/5 * * * *') // Poll every 5 minutes
}Option B: Webhook (Recommended)
- GitHub Settings β Webhooks β Add webhook
- Payload URL:
http://your-jenkins-url/github-webhook/ - Content type:
application/json - Events: Push events
- Update Jenkinsfile:
triggers {
githubPush()
}For Local Kubernetes:
# Copy kubeconfig to Jenkins
docker cp ~/.kube/config jenkins:/var/jenkins_home/kube/config
docker exec jenkins chmod 644 /var/jenkins_home/kube/configFor Remote Kubernetes:
- Add kubeconfig as Jenkins credential
- Update Jenkinsfile with credential ID
Manual Trigger:
- Open pipeline in Jenkins
- Click "Build Now"
- Monitor console output
Automatic Trigger:
- Push code to repository
- Jenkins automatically detects and builds
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β 1. Checkout Code from Git β
ββββββββββββββββββ¬βββββββββββββββββββββββββββββββββββββββββββββ
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β 2. Build Backend (Maven) β
β mvn clean install -DskipTests β
ββββββββββββββββββ¬βββββββββββββββββββββββββββββββββββββββββββββ
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β 3. Test Backend (Maven) β
β mvn test β
ββββββββββββββββββ¬βββββββββββββββββββββββββββββββββββββββββββββ
βΌ
ββββββββββββββββββββββββββββββββ¬βββββββββββββββββββββββββββββββ
β 4a. Build Backend Image β 4b. Build Frontend Image β
β (Parallel) β (Parallel) β
ββββββββββββββββ¬ββββββββββββββββ΄βββββββββββββββ¬ββββββββββββββββ
ββββββββββββββ¬βββββββββββββββββββ
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β 5. Push Images to Docker Hub β
β - backend:BUILD_NUMBER β
β - frontend:BUILD_NUMBER β
ββββββββββββββββββ¬βββββββββββββββββββββββββββββββββββββββββββββ
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β 6. Deploy to Kubernetes β
β βββ Create namespace β
β βββ Apply ConfigMap & Secret β
β βββ Deploy PostgreSQL (PVC + Deployment + Service) β
β βββ Wait for PostgreSQL ready β
β βββ Deploy Backend (Deployment + Service) β
β βββ Deploy Frontend (Deployment + Service) β
β βββ Apply Ingress β
ββββββββββββββββββ¬βββββββββββββββββββββββββββββββββββββββββββββ
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β 7. Verify Deployment β
β βββ Check rollout status β
β βββ Display pods status β
β βββ Display services status β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
# Check all resources in namespace
kubectl get all -n hr-ragwiser
# Check pod logs
kubectl logs -f deployment/backend -n hr-ragwiser
kubectl logs -f deployment/frontend -n hr-ragwiser
kubectl logs -f deployment/postgres -n hr-ragwiser
# Describe pod for detailed information
kubectl describe pod <pod-name> -n hr-ragwiser
# Check events
kubectl get events -n hr-ragwiser --sort-by='.lastTimestamp'1. Image Pull Errors
# Check image name in deployment
kubectl get deployment backend -n hr-ragwiser -o yaml | grep image
# Verify Docker Hub credentials
docker login
# Update deployment with correct image
kubectl set image deployment/backend backend=your-username/hr-ragwiser-backend:TAG -n hr-ragwiser2. Database Connection Issues
# Check PostgreSQL is running
kubectl get pods -n hr-ragwiser -l app=postgres
# Test database connection
kubectl exec -it deployment/postgres -n hr-ragwiser -- psql -U postgres -d rag_hr_db
# Check service DNS
kubectl exec -it deployment/backend -n hr-ragwiser -- nslookup postgres-service3. Backend Not Starting
# Check init container status
kubectl describe pod <backend-pod> -n hr-ragwiser
# Check environment variables
kubectl exec -it deployment/backend -n hr-ragwiser -- env | grep SPRING
# View backend logs
kubectl logs -f deployment/backend -n hr-ragwiser4. Frontend Can't Connect to Backend
# Check backend service
kubectl get svc backend-service -n hr-ragwiser
# Test backend health endpoint
kubectl run curl-test --image=curlimages/curl -it --rm -- curl http://backend-service:8080/api/hr/health
# Check ConfigMap
kubectl get configmap hr-ragwiser-config -n hr-ragwiser -o yamlView Build Logs:
- Click on build number β Console Output
Common Jenkins Issues:
Docker Permission Denied:
# Add Jenkins user to docker group
docker exec -u root jenkins usermod -aG docker jenkins
docker restart jenkinsKubectl Not Found:
# Install kubectl in Jenkins container
docker exec -u root jenkins curl -LO "https://dl.k8s.io/release/$(curl -L -s https://dl.k8s.io/release/stable.txt)/bin/linux/amd64/kubectl"
docker exec -u root jenkins install -o root -g root -m 0755 kubectl /usr/local/bin/kubectlMaven Build Fails:
# Check Java version in Jenkins
docker exec jenkins java -version
# Ensure Maven is installed
docker exec jenkins mvn -version-
Secrets Management:
- Never commit secrets to Git
- Use Kubernetes Secrets for sensitive data
- Consider using HashiCorp Vault or AWS Secrets Manager
- Rotate credentials regularly
-
Image Security:
- Scan Docker images for vulnerabilities
- Use specific image tags, avoid
latest - Implement image signing
- Use private registries for production
-
RBAC (Role-Based Access Control):
- Create service accounts for applications
- Limit namespace permissions
- Use network policies to restrict traffic
-
Network Security:
- Enable TLS for ingress
- Use network policies to isolate pods
- Restrict nodePort access in production
-
Resource Limits:
- Always define CPU and memory limits
- Monitor resource usage and adjust
- Use Horizontal Pod Autoscaler (HPA) for scaling
-
High Availability:
- Run multiple replicas (β₯2)
- Use pod anti-affinity for distribution
- Implement readiness and liveness probes
-
Backup Strategy:
- Regular database backups
- Use VolumeSnapshots for PVC backups
- Store backups in remote location
-
Monitoring:
- Set up Prometheus and Grafana
- Monitor application metrics
- Set up alerts for critical issues
-
Rolling Updates:
- Use proper update strategies
- Test in staging environment first
- Implement blue-green or canary deployments
Horizontal Pod Autoscaling:
# Create HPA for backend
kubectl autoscale deployment backend \
--cpu-percent=70 \
--min=2 \
--max=10 \
-n hr-ragwiser
# Create HPA for frontend
kubectl autoscale deployment frontend \
--cpu-percent=70 \
--min=2 \
--max=5 \
-n hr-ragwiserManual Scaling:
# Scale backend
kubectl scale deployment backend --replicas=5 -n hr-ragwiser
# Scale frontend
kubectl scale deployment frontend --replicas=3 -n hr-ragwiserDelete Deployment:
# Delete entire namespace (removes everything)
kubectl delete namespace hr-ragwiser
# Delete specific resources
kubectl delete -f k8s/ -n hr-ragwiserRollback Deployment:
# View rollout history
kubectl rollout history deployment/backend -n hr-ragwiser
# Rollback to previous version
kubectl rollout undo deployment/backend -n hr-ragwiser
# Rollback to specific revision
kubectl rollout undo deployment/backend --to-revision=2 -n hr-ragwiser- Kubernetes Documentation: https://kubernetes.io/docs/
- Jenkins Pipeline Syntax: https://www.jenkins.io/doc/book/pipeline/syntax/
- Docker Best Practices: https://docs.docker.com/develop/dev-best-practices/
- Ingress NGINX: https://kubernetes.github.io/ingress-nginx/
- Fork the repository
- Create feature branch (
git checkout -b feature/amazing-feature) - Commit changes (
git commit -m 'Add amazing feature') - Push to branch (
git push origin feature/amazing-feature) - Open Pull Request
This project is licensed under the MIT License - see the LICENSE file for details.
For support and questions:
- Create an issue in the repository
- Check the application logs via the dashboard
- Review the comprehensive logging system for debugging
HR Rag Assistant - Transforming HR management with AI-powered intelligence.




