NLP based Resume Screening and

Job recommendation system

Developed by byte brains

Developers

Mahnoor Ishfaq M. Umar Farooq Muntaha Javed

PROCESSING . . .

Project Overview

Our NLP-based system automates resume screening and job matching, saving time and improving accuracy in the hiring process.

Resume In

Upload resume in various formats

Extract

Extract key information using NLP

Understand

Analyze content with AI models

Match

Compare with job requirements

Score

Generate matching score and recommendations

Resume

NLP

Structured Data

Key NLP Terms Used

Our system leverages these key Natural Language Processing concepts to understand and process resume content effectively.

NER

Named Entity Recognition - identifies entities like names, locations in text

Fine-Tuning

Adapting pre-trained models to specific tasks with domain data

Transformers

Neural network architecture using attention mechanisms

Attention Mechanism

Weighs importance of different words in context

Tokenization

A single unit (word, punctuation, etc.) in text.

N-Grams

Sequences of n words (e.g., bigram, trigram).

NLTK

Natural Language Toolkit for text processing

Stemming

Reducing words to their root form

Lemmatization

Converting words to their dictionary form

BERT

Bidirectional Encoder Representations from Transformers

Cosine Similarity

Measures similarity between vectors

TF-IDF

Term Frequency-Inverse Document Frequency weighting

Core Concepts

Techniques

Models

Tools

Architectures

NER Model with spaCy 'en_core_web_trf'

Our Named Entity Recognition model identifies and extracts key information from resumes, including personal details and skills.

Resume Sample

John Smith

john.smith@example.com | (555) 123-4567

123 Main St, New York, NY 10001

Experienced Data Scientist with expertise in Python, Machine Learning, and NLP.

Proficient in TensorFlow, PyTorch, and scikit-learn.

Entity Types

NAME

EMAIL

PHONE

ADDRESS

Extracted Entities

John Smith

john.smith@example.com

(555) 123-4567

123 Main St, New York, NY 10001

Structured Entity Output

Entities are extracted, categorized, and stored for further processing

John Smith

john.smith@example.com

(555) 123-4567

Job Role Classification – Finetuned BERT Classifier

Our BERT-based classifier categorizes resumes into job roles with high accuracy, enabling better matching with job descriptions.

BERT Classifier Architecture

Input

Resume

Text

Input

Tokenization

Embedding

Transformer Layers

Classification Head

Output

Pridicted Resume Class

Classification Examples

Resume 1

5 years experience in Python, TensorFlow, data analysis...

Data Scientist

Resume 2

React, JavaScript, CSS, UI/UX design experience...

Frontend Developer

Resume 3

Circuit design, PCB layout, embedded systems...

Electrical Engineer

BERT Classifier Performance Metrics

Evaluation on test dataset of 500 resumes across 10 job categories

Precision

Ratio of correctly predicted positive observations to total predicted positives

89%

Recall

Ratio of correctly predicted positive observations to all actual positives

84%

Accuracy

Ratio of correctly predicted observations to total observations

96%

Model Training view

Resume & JD Matching via Cosine Similarity

Our system uses cosine similarity to measure the match between resumes and job descriptions, providing accurate matching scores.

Resume

Experienced Data Scientist with 5 years in machine learning and Python.

Proficient in TensorFlow, PyTorch, and SQL.

Experience with NLP and computer vision projects.

Job Description

Seeking a Data Scientist with strong Python skills.

Experience with machine learning frameworks like TensorFlow.

NLP experience is a plus.

Vector Representation

θ

similarity = cos(θ) = Similarity score

Match Score

Similarity Score

High Match

This resume is highly relevant to the job description with strong skill alignment.

Skills Extraction – DistilBERT Model

DistilBERT Model

Tokenization

Input Embedding

Contextual Embeddings

Predicted class (Classification)

N-Grams Input

An n-gram is simply a contiguous sequence of N items from a given text.
Unigram (1-gram): one word at a time
Example: "I love NLP" → ["I", "love", "NLP"]
Bigram (2-gram): two-word pairs
Example: "I love NLP" → ["I love", "love NLP"]
Trigram (3-gram): three-word sequences
Example: "I love NLP" → ["I love NLP"]

DistilBERT is a smaller, faster version of BERT that retains 97% of its language understanding capabilities while being 40% smaller and 60% faster.

Our DistilBERT-based model identifies and extracts technical and soft skills from resumes with high accuracy.

Resume Text

I have 5 years of experience working with data analysis tools including Python, R, and SQL.

Implemented machine learning models using TensorFlow and scikit-learn.

Deployed applications using Docker and Kubernetes on AWS.

Developed front-end interfaces with React and JavaScript.

Extracted Skills

Programming

Python

R

SQL

JavaScript

Data Science

Data Analysis

Machine Learning

TensorFlow

scikit-learn

DevOps

Docker

Kubernetes

AWS

Frontend

React

Skills Extraction Model Performance

Evaluation on test dataset of 1000 resumes with manually labeled skills

Precision

Percentage of extracted skills that are actual skills

89.72%

Recall

Percentage of actual skills that were correctly extracted

92.46%

F1 Score

Harmonic mean of precision and recall

91.06%

Accuracy

Overall token classification accuracy

90.95%

Model Training view

Pipeline Flow Summary

Our complete NLP pipeline processes resumes through multiple stages to provide accurate job matching and recommendations.

Resume

Input document

NER

Entity extraction

BERT Classifier

Role classification

Skills Extractor

Skills identification

JD Matching

Similarity scoring

Data Flow Visualization

Resume

Structured Data

Vector Representation

Match Score

Smart Hiring Powered by NLP

Our system streamlines the hiring process, saving time and improving match quality.

Any Questions?

GitHub