Byte Brains Logo

NLP based Resume Screening and


Job recommendation system

Developed by byte brains

Developers

Mahnoor Ishfaq M. Umar Farooq Muntaha Javed
PROCESSING . . .

Project Overview

Our NLP-based system automates resume screening and job matching, saving time and improving accuracy in the hiring process.

Resume In
Upload resume in various formats
Extract
Extract key information using NLP
Understand
Analyze content with AI models
Match
Compare with job requirements
Score
Generate matching score and recommendations
Resume
NLP
Structured Data

Key NLP Terms Used

Our system leverages these key Natural Language Processing concepts to understand and process resume content effectively.

NER

Named Entity Recognition - identifies entities like names, locations in text

Fine-Tuning

Adapting pre-trained models to specific tasks with domain data

Transformers

Neural network architecture using attention mechanisms

Attention Mechanism

Weighs importance of different words in context

Tokenization

A single unit (word, punctuation, etc.) in text.

N-Grams

Sequences of n words (e.g., bigram, trigram).

NLTK

Natural Language Toolkit for text processing

Stemming

Reducing words to their root form

Lemmatization

Converting words to their dictionary form

BERT

Bidirectional Encoder Representations from Transformers

Cosine Similarity

Measures similarity between vectors

TF-IDF

Term Frequency-Inverse Document Frequency weighting

Core Concepts
Techniques
Models
Tools
Architectures

NER Model with spaCy 'en_core_web_trf'

Our Named Entity Recognition model identifies and extracts key information from resumes, including personal details and skills.

Resume Sample

John Smith

john.smith@example.com | (555) 123-4567

123 Main St, New York, NY 10001

Experienced Data Scientist with expertise in Python, Machine Learning, and NLP.

Proficient in TensorFlow, PyTorch, and scikit-learn.

Entity Types
NAME
EMAIL
PHONE
ADDRESS
Extracted Entities
John Smith
john.smith@example.com
(555) 123-4567
123 Main St, New York, NY 10001

Structured Entity Output

Entities are extracted, categorized, and stored for further processing

John Smith
john.smith@example.com
(555) 123-4567

Job Role Classification – Finetuned BERT Classifier

Our BERT-based classifier categorizes resumes into job roles with high accuracy, enabling better matching with job descriptions.

BERT Classifier Architecture

Input
Resume
Text
Input
Tokenization
Embedding
Transformer Layers
Classification Head
Output
Pridicted Resume Class

Classification Examples

Resume 1

5 years experience in Python, TensorFlow, data analysis...

Data Scientist
Resume 2

React, JavaScript, CSS, UI/UX design experience...

Frontend Developer
Resume 3

Circuit design, PCB layout, embedded systems...

Electrical Engineer

BERT Classifier Performance Metrics

Evaluation on test dataset of 500 resumes across 10 job categories

Precision
Ratio of correctly predicted positive observations to total predicted positives
89%
Recall
Ratio of correctly predicted positive observations to all actual positives
84%
Accuracy
Ratio of correctly predicted observations to total observations
96%

Model Training view

Resume & JD Matching via Cosine Similarity

Our system uses cosine similarity to measure the match between resumes and job descriptions, providing accurate matching scores.

Resume

Experienced Data Scientist with 5 years in machine learning and Python.

Proficient in TensorFlow, PyTorch, and SQL.

Experience with NLP and computer vision projects.

Job Description

Seeking a Data Scientist with strong Python skills.

Experience with machine learning frameworks like TensorFlow.

NLP experience is a plus.

Vector Representation

θ
similarity = cos(θ) = Similarity score

Match Score

89%
Similarity Score
High Match

This resume is highly relevant to the job description with strong skill alignment.

Skills Extraction – DistilBERT Model

DistilBERT Model

Tokenization
Input Embedding
Contextual Embeddings
Predicted class (Classification)

N-Grams Input

An n-gram is simply a contiguous sequence of N items from a given text.
Unigram (1-gram): one word at a time
Example: "I love NLP"["I", "love", "NLP"]
Bigram (2-gram): two-word pairs
Example: "I love NLP"["I love", "love NLP"]
Trigram (3-gram): three-word sequences
Example: "I love NLP"["I love NLP"]

DistilBERT is a smaller, faster version of BERT that retains 97% of its language understanding capabilities while being 40% smaller and 60% faster.

Our DistilBERT-based model identifies and extracts technical and soft skills from resumes with high accuracy.

Resume Text

I have 5 years of experience working with data analysis tools including Python, R, and SQL.

Implemented machine learning models using TensorFlow and scikit-learn.

Deployed applications using Docker and Kubernetes on AWS.

Developed front-end interfaces with React and JavaScript.

Extracted Skills

Programming
Python
R
SQL
JavaScript
Data Science
Data Analysis
Machine Learning
TensorFlow
scikit-learn
DevOps
Docker
Kubernetes
AWS
Frontend
React

Skills Extraction Model Performance

Evaluation on test dataset of 1000 resumes with manually labeled skills

Precision
Percentage of extracted skills that are actual skills
89.72%
Recall
Percentage of actual skills that were correctly extracted
92.46%
F1 Score
Harmonic mean of precision and recall
91.06%
Accuracy
Overall token classification accuracy
90.95%

Model Training view

Pipeline Flow Summary

Our complete NLP pipeline processes resumes through multiple stages to provide accurate job matching and recommendations.

Resume
Input document
NER
Entity extraction
BERT Classifier
Role classification
Skills Extractor
Skills identification
JD Matching
Similarity scoring

Data Flow Visualization

Resume
Structured Data
Vector Representation
Match Score

Smart Hiring Powered by NLP

Our system streamlines the hiring process, saving time and improving match quality.

Any Questions?

1/8