MACHINE LEARNING-BASED STATIC MALWARE DETECTION
Detect malware in Windows executables using static PE file analysis with machine learning.
System Features
Single Prediction
Enter PE header values for one file.
Batch Prediction
Upload CSV for multiple files, get predictions and metrics.
How It Works
- Feature Extraction: Extract 27 features from the PE header of Windows executables (no need to run the file!).
- Preprocessing: Handle missing values, scale features, and prepare data for ML models.
- Model Training: Train multiple models (Random Forest, XGBoost, etc.) on the Brazilian malware dataset (~50,000 samples).
- Prediction: Predict if a file is malware (1) or goodware (0) with confidence scores.
- Evaluation: For batch uploads with labels, see AUC, accuracy, and confusion matrix.
Key Benefits
- Static Analysis: No need to execute potentially malicious files!
- High Accuracy: AUC scores typically >0.95 on test data
- Scalable: Process single files or thousands in batch
- Transparent: See confidence scores and detailed metrics
Technical Details
This system uses machine learning models trained on PE file characteristics:
- Models: Random Forest, XGBoost, LightGBM, CatBoost, PyTorch MLP
- Features: 27 PE header attributes (sizes, addresses, flags, etc.)
- Evaluation: 10-fold cross-validation + hold-out test set
- Primary Metric: AUC (Area Under ROC Curve)
- Secondary Metric: Accuracy