This repository contains the implementation and experimentation of deepfake video classification using deep learning techniques such as Convolutional Neural Networks (CNNs) and Long Short-Term Memory (LSTM) networks. Deepfakes—synthetically altered media generated using AI—pose increasing threats to authenticity and digital integrity. This project evaluates multiple model architectures to distinguish between real and fake facial video data.
Dataset used:
- DFDC: https://www.kaggle.com/competitions/deepfake-detection-challenge
- Celeb-DF: https://www.kaggle.com/datasets/reubensuju/celeb-df-v2
- Ishita Akolkar
- Akshat Chhatriwala
- Dr Shankar Parmar
This model adopts a multi-stream architecture, where three parallel branches process distinct modalities of information:
- Spatial Stream
a. Inputs: RGB video frames b. Model: Lightweight CNN (e.g., MobileNet, ResNet-18) c. Role: Detect artifacts and texture-level anomalies within each frame
-
Temporal Stream
-
Inputs: Consecutive frame pairs
-
Model: Optical flow extraction + CNN
-
Role: Detect motion-level artifacts introduced by frame-to-frame inconsistencies
-
Landmark Stream
a. Inputs: Facial landmark sequences over time b. Model: GCN / LSTM-based architectures c. Role: Detect unnatural geometric deformations or keypoint instability
Each stream produces independent predictions or feature embeddings. A late fusion module combines them via ensemble logic or learned fusion (e.g., MLP or attention-based layer) to yield the final classification.
- Face Extraction: Automatically detects and crops face regions from video frames for consistent input processing.
- Deep Learning-Based Classifier: Custom CNN and LSTM-based models trained to learn spatial and temporal features indicative of manipulations.
- Binary Classification: Classifies video input as real or fake with associated confidence scores.
- Evaluation Metrics: Includes Accuracy, Precision, Recall, and Confusion Matrix for comprehensive model assessment.
- Modular Architecture: Components for data preprocessing, training, evaluation, and visualization are modular and extensible.
- Multi-Dataset Support: Easily adaptable to various benchmark datasets with flexible data loaders and configuration files.
The models have been evaluated on publicly available deepfake datasets, including:
- Deepfake Detection Challenge (DFDC)
- Celeb-DF (v1 and v2)
Due to size constraints, datasets are not bundled with this repository.
The system includes the following model variations:
- CNN-Based Classifier: Learns spatial-level forgery artifacts on a frame-by-frame basis.
- CNN + LSTM Hybrid: Utilizes temporal sequences of facial frames to capture inconsistencies across time.
- Custom Feature Extractors: Tailored lightweight CNNs for faster training and generalization.