Executive Summary¶
This project develops a behavior-driven fraud detection framework using a combination of anomaly detection and supervised machine learning to identify fraudulent card transactions while minimizing unnecessary customer friction. Exploratory analysis reveals that fraud is driven primarily by behavioral deviations such as elevated transaction velocity, abnormal spending patterns, geographic inconsistencies, and interactions with new devices or merchants. Anomaly detection techniques, particularly Isolation Forest, effectively surface unusual transaction behavior and support early risk triage, while supervised models enable accurate classification of confirmed fraud cases. Among the evaluated classifiers, a tuned Random Forest model achieves the strongest precision–recall performance and demonstrates good generalization to unseen data. By integrating these models into a layered, risk-based decisioning workflow, the proposed solution balances fraud loss prevention, operational efficiency, and customer experience, providing a scalable and interpretable foundation for real-world fraud detection systems.
1. Requirements Gathering¶
1.1 Business Problem Definition¶
The organization under consideration is a card-issuing financial services company that processes a high volume of credit and debit card transactions across in-store, online, and cross-border channels in real time. The primary business objective is to authorize legitimate transactions with minimal friction while detecting and preventing fraudulent activity as early as possible.
Fraudulent behavior in card transactions often manifests as subtle behavioral deviations rather than extreme statistical outliers. Examples include sudden transaction bursts, unexpected geographic changes, use of new devices, or transactions with higher-risk merchants. These patterns are difficult to capture using static, rule-based systems alone. At the same time, overly conservative controls can lead to false declines, negatively impacting customer experience and increasing operational workload for fraud analysts.
The challenge, therefore, is to design a fraud detection solution that balances fraud loss prevention, customer experience, and regulatory compliance, while remaining scalable and interpretable in a real-time transaction environment.
1.2 Business and Analytical Objectives¶
The key objectives of this project are:
To design and implement a behavior-driven fraud detection framework that improves the identification of fraudulent transactions.
To distinguish between legitimate customer behavior and subtle fraudulent deviations, reducing unnecessary transaction declines.
To leverage anomaly detection techniques to identify unusual transaction patterns that may indicate emerging or previously unseen fraud behaviors.
To build and evaluate supervised machine learning models that predict confirmed fraud outcomes using historical labels.
To ensure that the resulting solution is interpretable, operationally feasible, and aligned with real-time decision-making requirements.
1.3 Expected Business Outcomes¶
The expected outcomes of this project include:
Earlier detection of potentially fraudulent transactions through anomaly-based triaging.
Reduced false positives compared to static rule-based approaches.
Improved fraud analyst efficiency by prioritizing high-risk transactions.
Support for risk-based transaction decisions such as:
Automatic approval for low-risk transactions
Step-up verification (e.g., OTP, CVV checks) for medium-risk transactions
Blocking or manual review for high-risk transactions
1.4 Success Criteria and Evaluation Metrics¶
Given the highly imbalanced nature of fraud detection problems, traditional accuracy is not an appropriate performance metric. Instead, model performance will be evaluated using metrics that better reflect business risk and operational impact, including:
Area Under the Precision-Recall Curve (AUCPR) to evaluate performance under class imbalance
Precision and Recall, particularly for the fraud class
F1-score, to balance precision and recall
Comparative performance across training, validation, and test datasets to assess generalization
In addition to quantitative metrics, model interpretability and alignment with known fraud patterns will be considered critical success factors.
1.5 High-Level Solution Approach¶
To address the problem, the project will adopt a two-layer detection strategy:
Anomaly Detection Layer Statistical methods (Z-score, IQR) and an Isolation Forest model will be used to identify transactions that deviate from typical behavioral patterns. This layer serves as an early warning and triage mechanism, particularly useful for detecting novel or emerging fraud behaviors.
Supervised Fraud Detection Layer Decision Tree and Random Forest models will be trained using confirmed fraud labels to predict fraudulent transactions. These models will be tuned and evaluated to minimize false positives while maintaining strong fraud detection performance.
This layered approach supports both early anomaly identification and accurate fraud classification, enabling risk-based decisioning in a production fraud detection environment.
#Section 2
import os
print("Current working directory:", os.getcwd())
print("Files here:", os.listdir())
import pandas as pd
df = pd.read_csv(r"C:\Users\13015\Desktop\Credit Card Project\fraud_dataset.csv")
# Dataset shape
df.shape
# Preview first few rows
df.head()
# Preview random sample (useful for fraud datasets)
df.sample(5, random_state=42)
df.info()
df.describe()
df.isnull().sum()
df.duplicated().sum()
df['is_fraud'].value_counts()
df['is_fraud'].value_counts(normalize=True)
Current working directory: c:\Users\13015\Desktop\Credit Card Project Files here: ['Credit Card Project.ipynb', 'fraud_dataset.csv'] <class 'pandas.core.frame.DataFrame'> RangeIndex: 60000 entries, 0 to 59999 Data columns (total 32 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 transaction_id 60000 non-null int64 1 account_id 60000 non-null int64 2 card_id 60000 non-null object 3 txn_hour 60000 non-null int64 4 txn_day_of_week 60000 non-null int64 5 transaction_amount 60000 non-null float64 6 channel 60000 non-null object 7 entry_mode 60000 non-null object 8 txn_count_1min 60000 non-null int64 9 txn_count_5min 60000 non-null int64 10 avg_txn_amount_30d 60000 non-null float64 11 max_txn_amount_90d 60000 non-null float64 12 merchant_id 60000 non-null object 13 merchant_category_code 60000 non-null int64 14 merchant_fraud_rate_30d 60000 non-null float64 15 is_new_merchant 60000 non-null int64 16 merchant_country 60000 non-null object 17 cardholder_country 60000 non-null object 18 cross_border_flag 60000 non-null int64 19 distance_from_last_txn_km 60000 non-null float64 20 time_since_last_txn_sec 60000 non-null int64 21 device_id 60000 non-null object 22 is_new_device 60000 non-null int64 23 device_country_mismatch 60000 non-null int64 24 cvv_result 60000 non-null object 25 cvv_fail_count_24h 60000 non-null int64 26 card_age_days 60000 non-null int64 27 account_age_days 60000 non-null int64 28 historical_fraud_flag 60000 non-null int64 29 anomaly_label 60000 non-null int64 30 anomaly_score 60000 non-null float64 31 is_fraud 60000 non-null int64 dtypes: float64(6), int64(18), object(8) memory usage: 14.6+ MB
is_fraud 0 0.955333 1 0.044667 Name: proportion, dtype: float64
Section 2: Data Overview¶
Dataset Description¶
The dataset consists of 60,000 card transactions and 32 features capturing transaction behavior, merchant characteristics, device attributes, historical risk indicators, and confirmed fraud outcomes. It is well-suited for both anomaly detection and supervised fraud classification, enabling the identification of subtle behavioral deviations as well as explicit fraud patterns.
Dataset Shape¶
Rows: 60,000
Columns: 32
The size of the dataset provides a strong foundation for exploratory analysis, robust model training, and reliable validation of fraud detection techniques.
Feature Data Types¶
The dataset includes a balanced mix of numerical and categorical variables commonly observed in real-world payment systems:
Numerical Features
int64: 18 columns
float64: 6 columns
Categorical Features
object: 8 columns
This diversity reflects realistic transaction environments where continuous behavioral signals (e.g., transaction amount, velocity) coexist with categorical attributes such as merchant identifiers, device information, and geographic indicators.
Missing Value Assessment¶
A completeness check confirmed that no missing values are present across all 32 features. As a result, no imputation or row removal was required, preserving the full dataset for downstream analysis and modeling.
Duplicate Record Check¶
Duplicate transaction checks were performed to ensure data integrity. No duplicate records were identified, confirming that each transaction represents a unique observation.
Fraud Class Distribution¶
The target variable is_fraud exhibits a significant class imbalance:
Legitimate transactions (0): ~95.53%
Fraudulent transactions (1): ~4.47%
This imbalance is consistent with real-world fraud detection scenarios and directly informs modeling decisions, including:
The use of anomaly detection techniques for early risk identification
The selection of precision-recall–based evaluation metrics
The application of class-weighted supervised learning models to mitigate bias toward the majority class
#Section 3
import seaborn as sns
import matplotlib.pyplot as plt
plt.figure(figsize=(8, 4))
sns.histplot(df['transaction_amount'], bins=50, kde=True)
plt.title('Distribution of Transaction Amount')
plt.xlabel('Transaction Amount')
plt.ylabel('Count')
plt.show()
fig, axes = plt.subplots(1, 2, figsize=(12, 4))
sns.histplot(df['txn_count_1min'], bins=30, ax=axes[0])
axes[0].set_title('Transactions in Last 1 Minute')
sns.histplot(df['txn_count_5min'], bins=30, ax=axes[1])
axes[1].set_title('Transactions in Last 5 Minutes')
plt.tight_layout()
plt.show()
plt.figure(figsize=(8, 4))
sns.countplot(x='txn_hour', data=df)
plt.title('Transactions by Hour of Day')
plt.xlabel('Hour')
plt.ylabel('Transaction Count')
plt.show()
plt.figure(figsize=(8, 4))
sns.boxplot(x='is_fraud', y='transaction_amount', data=df)
plt.title('Transaction Amount by Fraud Label')
plt.xlabel('Is Fraud')
plt.ylabel('Transaction Amount')
plt.show()
plt.figure(figsize=(8, 4))
sns.boxplot(x='is_fraud', y='txn_count_5min', data=df)
plt.title('5-Minute Transaction Count by Fraud Label')
plt.xlabel('Is Fraud')
plt.ylabel('Txn Count (5 min)')
plt.show()
plt.figure(figsize=(6, 4))
sns.countplot(x='cross_border_flag', hue='is_fraud', data=df)
plt.title('Cross-Border Flag vs Fraud')
plt.xlabel('Cross Border Flag')
plt.ylabel('Count')
plt.show()
fig, axes = plt.subplots(1, 2, figsize=(12, 4))
sns.countplot(x='is_new_device', hue='is_fraud', data=df, ax=axes[0])
axes[0].set_title('New Device vs Fraud')
sns.countplot(x='is_new_merchant', hue='is_fraud', data=df, ax=axes[1])
axes[1].set_title('New Merchant vs Fraud')
plt.tight_layout()
plt.show()
Section 3: Exploratory Data Analysis (EDA)¶
In this section, exploratory data analysis is conducted to examine the distributions, patterns, and relationships among key transaction, behavioral, and risk-related variables. The primary objective is to identify characteristics that distinguish fraudulent transactions from legitimate activity and to generate insights that inform both anomaly detection and supervised fraud modeling.
The analysis focuses on the following dimensions:
Transaction behavior: amount and velocity-based features
Temporal patterns: hour-of-day and day-of-week activity
Merchant and device risk indicators: new entities and historical risk signals
Geographic and cross-border behavior: location changes and international transactions
Key Observations¶
Transaction Amount Distribution:
Transaction amounts are highly right-skewed, with fraudulent transactions occurring more frequently in higher-value ranges, suggesting increased risk associated with unusually large purchases.
Transaction Velocity:
Fraudulent transactions demonstrate elevated transaction velocity, particularly within short time windows (e.g., 1-minute and 5-minute intervals), indicating rapid bursts of activity that deviate from normal customer behavior.
Cross-Border Risk:
Cross-border transactions exhibit a disproportionately higher fraud rate compared to domestic transactions, highlighting the importance of geographic risk features in fraud detection.
New Merchant and Device Signals:
Transactions involving new devices or previously unseen merchants are more frequently associated with fraudulent outcomes, reinforcing the relevance of novelty-based risk indicators.
Temporal Behavior:
Fraud occurs across all hours of the day, suggesting that time alone is not a strong discriminator. However, bursts of activity over short periods are more indicative of fraudulent behavior than isolated transactions at specific times.
Implications for Modeling¶
The insights derived from EDA support the following modeling strategies:
Emphasizing velocity-based and behavioral features
Leveraging anomaly detection to capture subtle deviations from normal behavior
Incorporating risk-aware supervised models that account for class imbalance and heterogeneous fraud patterns
#Section 4
from sklearn.model_selection import train_test_split
# Separate features and target
X = df.drop(columns=['is_fraud'])
y = df['is_fraud']
# Keep numeric features only for anomaly detection
X_num = X.select_dtypes(include=['int64', 'float64'])
# Train / temp split
X_train, X_temp, y_train, y_temp = train_test_split(
X_num, y, test_size=0.4, stratify=y, random_state=42
)
# Validation / test split
X_val, X_test, y_val, y_test = train_test_split(
X_temp, y_temp, test_size=0.5, stratify=y_temp, random_state=42
)
X_train.shape, X_val.shape, X_test.shape
from scipy.stats import zscore
import numpy as np
# Compute Z-scores
z_scores = np.abs(zscore(X_train))
# Flag anomalies (threshold = 3)
z_anomalies = (z_scores > 3).any(axis=1)
# Add to dataframe
z_anomaly_rate = z_anomalies.mean()
z_anomaly_rate
Q1 = X_train.quantile(0.25)
Q3 = X_train.quantile(0.75)
IQR = Q3 - Q1
iqr_anomalies = ((X_train < (Q1 - 1.5 * IQR)) |
(X_train > (Q3 + 1.5 * IQR))).any(axis=1)
iqr_anomaly_rate = iqr_anomalies.mean()
iqr_anomaly_rate
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_val_scaled = scaler.transform(X_val)
from sklearn.ensemble import IsolationForest
iso_forest = IsolationForest(
n_estimators=200,
contamination=0.05,
random_state=42,
n_jobs=-1
)
iso_forest.fit(X_train_scaled)
# Anomaly scores (lower = more anomalous)
val_scores = iso_forest.decision_function(X_val_scaled)
# Convert to anomaly labels
val_anomalies = iso_forest.predict(X_val_scaled)
val_anomalies = (val_anomalies == -1).astype(int)
from sklearn.metrics import average_precision_score
aucpr = average_precision_score(y_val, -val_scores)
aucpr
# Attach scores to validation data
val_results = X_val.copy()
val_results['anomaly_score'] = val_scores
val_results['is_fraud'] = y_val.values
# Top 10 most anomalous transactions
top_anomalies = val_results.sort_values('anomaly_score').head(10)
top_anomalies
| transaction_id | account_id | txn_hour | txn_day_of_week | transaction_amount | txn_count_1min | txn_count_5min | avg_txn_amount_30d | max_txn_amount_90d | merchant_category_code | ... | time_since_last_txn_sec | is_new_device | device_country_mismatch | cvv_fail_count_24h | card_age_days | account_age_days | historical_fraud_flag | anomaly_label | anomaly_score | is_fraud | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 9534 | 9535 | 100081 | 10 | 0 | 1651.12000 | 4 | 6 | 216.40 | 1047.00 | 5311 | ... | 30 | 1 | 1 | 5 | 294 | 683 | 0 | 1 | -0.214881 | 1 |
| 30861 | 30862 | 100278 | 12 | 0 | 1303.71000 | 3 | 5 | 131.60 | 965.79 | 5732 | ... | 51 | 1 | 1 | 5 | 3306 | 3588 | 0 | 1 | -0.211623 | 1 |
| 7575 | 7576 | 100066 | 13 | 6 | 1691.31000 | 4 | 6 | 183.95 | 1328.91 | 5999 | ... | 88 | 1 | 1 | 3 | 1123 | 2193 | 0 | 1 | -0.209716 | 1 |
| 42332 | 42333 | 100381 | 14 | 0 | 2676.50000 | 4 | 4 | 276.19 | 1548.82 | 5311 | ... | 52 | 1 | 1 | 2 | 2149 | 2615 | 0 | 1 | -0.209154 | 1 |
| 25300 | 25301 | 100230 | 1 | 0 | 110.32000 | 2 | 3 | 12.21 | 64.59 | 5311 | ... | 112 | 1 | 1 | 5 | 1813 | 3438 | 0 | 1 | -0.207872 | 1 |
| 43415 | 43416 | 100391 | 21 | 6 | 2012.74000 | 3 | 4 | 281.60 | 1588.37 | 5999 | ... | 85 | 1 | 1 | 2 | 427 | 850 | 0 | 1 | -0.207725 | 1 |
| 39122 | 39123 | 100351 | 9 | 0 | 1185.97000 | 4 | 4 | 296.06 | 1672.98 | 5812 | ... | 71 | 1 | 1 | 4 | 939 | 3194 | 1 | 1 | -0.206762 | 1 |
| 9203 | 9204 | 100079 | 23 | 6 | 193.70000 | 3 | 5 | 40.77 | 194.01 | 5411 | ... | 105 | 1 | 1 | 4 | 2719 | 3077 | 0 | 1 | -0.205180 | 1 |
| 38645 | 38646 | 100346 | 8 | 0 | 1434.53000 | 3 | 5 | 282.62 | 1397.09 | 5311 | ... | 123 | 1 | 1 | 3 | 96 | 191 | 0 | 1 | -0.205047 | 1 |
| 59645 | 59646 | 100257 | 22 | 6 | 900.97817 | 4 | 4 | 124.98 | 947.01 | 5812 | ... | 179 | 1 | 1 | 4 | 356 | 991 | 0 | 1 | -0.202970 | 1 |
10 rows × 24 columns
Section 4: Anomaly Detection¶
In this section, unsupervised anomaly detection techniques are applied to identify transactions that exhibit unusual or suspicious behavior. Because fraudulent activity is relatively rare and may not always be labeled in real time, anomaly detection serves as an effective early-warning mechanism to surface potentially risky transactions before confirmation.
The analysis follows a progressive approach:
Establishing statistical baseline methods (Z-score and IQR)
Implementing a machine learning–based anomaly detection model using Isolation Forest
Evaluating detected anomalies against known fraud labels to assess practical effectiveness
Baseline Anomaly Detection Methods¶
Simple statistical approaches were first applied to establish reference benchmarks:
Z-score and Interquartile Range (IQR) methods identify extreme values based on univariate thresholds.
These methods flag a large number of anomalies, particularly in skewed or high-variance features.
Because they evaluate each feature independently, they fail to capture multivariate interactions inherent in complex transaction behavior.
While these techniques provide useful interpretability and fast computation, they are prone to high false-positive rates in high-dimensional, behavior-driven datasets and are therefore insufficient as standalone solutions.
Isolation Forest Performance¶
To address the limitations of statistical baselines, an Isolation Forest model was implemented:
The model effectively captures multivariate relationships across transaction amount, velocity, geographic, device, and merchant features.
Performance evaluation using Area Under the Precision–Recall Curve (AUCPR) demonstrates improved discrimination between fraudulent and legitimate transactions relative to baseline methods.
Isolation Forest scales efficiently and is well-suited for real-time, high-volume payment environments where rapid decision-making is required.
Analysis of Top Anomalies¶
Examination of the highest-scoring anomalous transactions reveals consistent behavioral patterns, including:
Elevated transaction velocity within short time windows
Cross-border or geographically inconsistent activity
Usage of new devices or unfamiliar merchants
Large deviations from historical spending behavior
These patterns closely align with known fraud typologies, reinforcing the validity of the anomaly detection approach.
#Section 5
# Features and target
X = df.drop(columns=['is_fraud'])
y = df['is_fraud']
# Use numeric features only
X_num = X.select_dtypes(include=['int64', 'float64'])
# Train / validation / test split (same structure as before)
from sklearn.model_selection import train_test_split
X_train, X_temp, y_train, y_temp = train_test_split(
X_num, y, test_size=0.4, stratify=y, random_state=42
)
X_val, X_test, y_val, y_test = train_test_split(
X_temp, y_temp, test_size=0.5, stratify=y_temp, random_state=42
)
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import classification_report, average_precision_score
dt_default = DecisionTreeClassifier(random_state=42)
dt_default.fit(X_train, y_train)
# Validation predictions
dt_val_preds = dt_default.predict(X_val)
dt_val_probs = dt_default.predict_proba(X_val)[:, 1]
# Metrics
print(classification_report(y_val, dt_val_preds))
dt_aucpr = average_precision_score(y_val, dt_val_probs)
dt_aucpr
from sklearn.ensemble import RandomForestClassifier
rf_default = RandomForestClassifier(
n_estimators=200,
random_state=42,
n_jobs=-1
)
rf_default.fit(X_train, y_train)
rf_val_preds = rf_default.predict(X_val)
rf_val_probs = rf_default.predict_proba(X_val)[:, 1]
print(classification_report(y_val, rf_val_preds))
rf_aucpr = average_precision_score(y_val, rf_val_probs)
rf_aucpr
dt_weighted = DecisionTreeClassifier(
class_weight='balanced',
random_state=42
)
dt_weighted.fit(X_train, y_train)
dtw_val_probs = dt_weighted.predict_proba(X_val)[:, 1]
dtw_aucpr = average_precision_score(y_val, dtw_val_probs)
dtw_aucpr
rf_weighted = RandomForestClassifier(
n_estimators=300,
class_weight='balanced',
random_state=42,
n_jobs=-1
)
rf_weighted.fit(X_train, y_train)
rfw_val_probs = rf_weighted.predict_proba(X_val)[:, 1]
rfw_aucpr = average_precision_score(y_val, rfw_val_probs)
rfw_aucpr
rf_tuned = RandomForestClassifier(
n_estimators=400,
max_depth=12,
min_samples_split=50,
class_weight='balanced',
random_state=42,
n_jobs=-1
)
rf_tuned.fit(X_train, y_train)
rft_val_probs = rf_tuned.predict_proba(X_val)[:, 1]
rft_aucpr = average_precision_score(y_val, rft_val_probs)
rft_aucpr
import pandas as pd
model_results = pd.DataFrame({
'Model': [
'Decision Tree (Default)',
'Random Forest (Default)',
'Decision Tree (Weighted)',
'Random Forest (Weighted)',
'Random Forest (Tuned)'
],
'Validation_AUCPR': [
dt_aucpr,
rf_aucpr,
dtw_aucpr,
rfw_aucpr,
rft_aucpr
]
})
model_results.sort_values(by='Validation_AUCPR', ascending=False)
precision recall f1-score support
0 1.00 1.00 1.00 11464
1 0.96 0.94 0.95 536
accuracy 1.00 12000
macro avg 0.98 0.97 0.97 12000
weighted avg 1.00 1.00 1.00 12000
precision recall f1-score support
0 1.00 1.00 1.00 11464
1 0.97 0.96 0.97 536
accuracy 1.00 12000
macro avg 0.98 0.98 0.98 12000
weighted avg 1.00 1.00 1.00 12000
| Model | Validation_AUCPR | |
|---|---|---|
| 1 | Random Forest (Default) | 0.989725 |
| 4 | Random Forest (Tuned) | 0.988792 |
| 3 | Random Forest (Weighted) | 0.987392 |
| 2 | Decision Tree (Weighted) | 0.921265 |
| 0 | Decision Tree (Default) | 0.903637 |
Section 5: Fraud Detection – Model Building¶
In this section, supervised machine learning models are developed to classify transactions as fraudulent or legitimate using labeled data. Fraud detection presents a challenging class-imbalanced classification problem, where fraudulent transactions represent a small fraction of total activity. As a result, model selection and evaluation prioritize metrics that emphasize fraud capture while controlling false positives.
The modeling workflow includes:
Training baseline Decision Tree and Random Forest classifiers using default parameters
Addressing class imbalance through class-weighted learning
Applying hyperparameter tuning to optimize model performance
Comparing results across training and validation datasets to select the most effective model
Evaluation Metrics¶
Traditional accuracy is misleading in fraud detection due to the dominance of legitimate transactions. Therefore, model performance is evaluated using metrics better suited to imbalanced datasets:
Recall – Measures the model’s ability to correctly identify fraudulent transactions, minimizing missed fraud
Precision – Quantifies how many flagged transactions are truly fraudulent, helping control false positives
F1-Score – Provides a balanced assessment of precision and recall
Area Under the Precision–Recall Curve (AUCPR) – Serves as the primary evaluation metric due to its robustness in highly imbalanced classification settings
Model Performance Insights¶
Key observations from model training and evaluation include:
Random Forest models consistently outperform Decision Trees, benefiting from ensemble learning and reduced variance.
Incorporating class weights significantly improves recall, enabling the model to better identify rare fraud cases.
Hyperparameter tuning further improves the precision–recall tradeoff, reducing false declines while maintaining strong fraud detection.
The tuned Random Forest model delivers the most balanced and reliable performance across validation metrics.
#Section 6
from sklearn.metrics import average_precision_score
def evaluate_aucpr(model, X_tr, y_tr, X_v, y_v):
train_probs = model.predict_proba(X_tr)[:, 1]
val_probs = model.predict_proba(X_v)[:, 1]
return (
average_precision_score(y_tr, train_probs),
average_precision_score(y_v, val_probs)
)
comparison = pd.DataFrame({
'Model': [
'RF Default',
'RF Weighted',
'RF Tuned'
],
'Train_AUCPR': [
evaluate_aucpr(rf_default, X_train, y_train, X_val, y_val)[0],
evaluate_aucpr(rf_weighted, X_train, y_train, X_val, y_val)[0],
evaluate_aucpr(rf_tuned, X_train, y_train, X_val, y_val)[0]
],
'Validation_AUCPR': [
evaluate_aucpr(rf_default, X_train, y_train, X_val, y_val)[1],
evaluate_aucpr(rf_weighted, X_train, y_train, X_val, y_val)[1],
evaluate_aucpr(rf_tuned, X_train, y_train, X_val, y_val)[1]
]
})
comparison
# Final model evaluation on test set
test_probs = rf_tuned.predict_proba(X_test)[:, 1]
test_preds = rf_tuned.predict(X_test)
from sklearn.metrics import classification_report
print(classification_report(y_test, test_preds))
test_aucpr = average_precision_score(y_test, test_probs)
test_aucpr
import matplotlib.pyplot as plt
import seaborn as sns
feature_importance = pd.DataFrame({
'Feature': X_train.columns,
'Importance': rf_tuned.feature_importances_
}).sort_values(by='Importance', ascending=False)
feature_importance.head(10)
plt.figure(figsize=(10, 6))
sns.barplot(
data=feature_importance.head(10),
x='Importance',
y='Feature'
)
plt.title('Top 10 Feature Importances - Final Fraud Model')
plt.show()
precision recall f1-score support
0 1.00 1.00 1.00 11464
1 0.95 0.99 0.97 536
accuracy 1.00 12000
macro avg 0.98 0.99 0.98 12000
weighted avg 1.00 1.00 1.00 12000
Section 6: Model Performance Comparison and Final Model Selection¶
In this section, the performance of all trained fraud detection models is compared across training, validation, and test datasets to assess generalization, robustness, and potential overfitting. Model selection is guided by validation performance, interpretability, and operational considerations relevant to real-world fraud detection systems.
The selected model is then evaluated on a held-out test set to confirm its ability to generalize to unseen data. Finally, feature importance analysis is used to interpret model behavior and identify the key behavioral signals driving fraud predictions.
Final Model Selection¶
Among the evaluated models, the tuned Random Forest classifier achieves the strongest validation performance as measured by Area Under the Precision–Recall Curve (AUCPR), while maintaining a reasonable gap between training and validation scores. This indicates effective generalization and a well-balanced tradeoff between fraud detection capability and false positive control.
The model’s ensemble structure provides robustness to noisy features and enables the capture of complex, non-linear interactions across transaction, behavioral, and contextual variables. Additionally, class weighting improves recall for rare fraud cases without disproportionately increasing false positives. Based on these factors, the tuned Random Forest is selected as the final fraud detection model.
Test Set Performance¶
Evaluation on the held-out test dataset confirms that the final model maintains strong precision-recall performance on previously unseen transactions. This result indicates that the model does not overfit to the training data and is suitable for deployment in a real-world fraud detection environment where generalization is critical.
Feature Importance Analysis¶
Analysis of feature importance reveals that the most influential predictors include:
Transaction velocity indicators
Deviations in transaction amount
Merchant-related risk features
Cross-border and geographic activity
These features are consistent with established fraud typologies, such as rapid transaction bursts, abnormal spending behavior, and geographic inconsistencies. The alignment between model-driven insights and domain knowledge enhances interpretability and supports trust in the model’s predictions.
#Section 7
feature_importance.head(10)
import numpy as np
import pandas as pd
# Use predicted probabilities from the selected final model (rf_tuned)
test_probs = rf_tuned.predict_proba(X_test)[:, 1]
def risk_bucket(p, low=0.20, high=0.60):
if p < low:
return "Approve"
elif p < high:
return "Step-up Verification"
else:
return "Block/Manual Review"
policy = pd.DataFrame({
"fraud_probability": test_probs,
"recommended_action": [risk_bucket(p) for p in test_probs],
"actual_is_fraud": y_test.values
})
policy["recommended_action"].value_counts()
policy.groupby("recommended_action")["actual_is_fraud"].mean().sort_values(ascending=False)
recommended_action Block/Manual Review 0.954792 Step-up Verification 0.166667 Approve 0.000437 Name: actual_is_fraud, dtype: float64
Section 7: Business Insights and Recommendations¶
This project developed a behavior-driven fraud detection framework that integrates unsupervised anomaly detection with supervised machine learning to improve fraud identification while minimizing unnecessary customer friction. The approach emphasizes early risk detection, scalable modeling, and operational interpretability, aligning technical performance with real-world fraud management objectives.
7.1 Key Findings¶
Fraud is rare but systematically detectable through behavioral signals
The dataset exhibits significant class imbalance, consistent with real-world payment systems. This reinforces the need for precision-recall–based evaluation metrics and specialized modeling techniques, such as class-weighted learning, to ensure rare fraud events are effectively captured.
Behavioral features provide strong discrimination between fraud and legitimate activity
Exploratory analysis and feature importance results highlight transaction velocity, transaction magnitude deviations, geographic inconsistencies, device novelty, and merchant risk as highly informative predictors. These features reflect behavioral anomalies rather than isolated attribute values.
Anomaly detection is most effective as a triage mechanism
While statistical baselines (Z-score and IQR) offer simple interpretability, they generate excessive false positives in high-dimensional data. Isolation Forest improves anomaly detection by modeling multivariate interactions and is particularly effective for prioritizing suspicious transactions, including novel or evolving fraud patterns.
Random Forest models deliver the strongest supervised fraud detection performance
Among the evaluated classifiers, Random Forest achieves superior AUCPR and a more favorable precision–recall balance. Incorporating class weights and tuning further improves recall for rare fraud cases while maintaining manageable false-positive rates, making it well-suited for operational deployment.
7.2 Recommended Fraud Detection Workflow (Hybrid Approach)¶
A practical, production-ready fraud detection strategy should adopt a layered decision pipeline:
Rule-Based Screening (Fast Filtering)
Apply basic validation and exclusion rules (e.g., blocked merchants, impossible locations, known compromised cards) to quickly eliminate obvious fraud cases with minimal computational cost.
Anomaly Detection Layer (Behavioral Risk Scoring)
Use Isolation Forest to flag transactions exhibiting unusual behavioral patterns, such as sudden transaction bursts, new device usage combined with cross-border activity, or abnormal spending behavior. Purpose: Prioritize potentially risky transactions, including previously unseen fraud behaviors.
Supervised Fraud Scoring Layer (Final Classification)
Apply the selected supervised model (tuned Random Forest) to estimate fraud probability using labeled historical patterns.
Purpose: Enable consistent, data-driven risk scoring for downstream decisioning.
7.3 Risk-Based Action Policy¶
To balance fraud loss prevention with customer experience, transaction decisions should follow a risk-tiered policy:
Low Risk: Automatically approve transactions
Medium Risk: Apply step-up verification (e.g., OTP, 3DS challenge, CVV re-validation)
High Risk: Decline transactions or route to manual review based on business policy
Decision thresholds should be calibrated using validation data to meet operational objectives such as acceptable false-decline rates, analyst review capacity, and targeted fraud capture levels.
7.4 Monitoring and Governance Considerations¶
A real-world deployment requires continuous oversight to maintain effectiveness and trust:
Model Drift Monitoring: Fraud tactics evolve over time, necessitating periodic retraining and threshold recalibration.
Alert Quality Monitoring: Tracking false positives helps reduce analyst fatigue and prevents unnecessary customer disruption.
Explainability and Audit Readiness: Maintaining feature importance analyses and model documentation supports regulatory compliance and stakeholder transparency.
7.5 Opportunities for Future Enhancement¶
Potential extensions to this work include:
Cost-sensitive optimization and probability calibration to explicitly balance fraud losses against customer friction.
Time-aware modeling approaches, such as rolling windows or temporal feature engineering, to better capture evolving fraud dynamics.
Advanced anomaly detection techniques, including autoencoders and ensemble anomaly detectors.
External data enrichment, incorporating merchant category trends, device fingerprinting signals, or regional fraud intelligence feeds.
Conclusion¶
This project demonstrates the value of a behavior-driven, risk-based fraud detection framework that integrates anomaly detection with supervised machine learning to support real-time transaction decisioning. Exploratory analysis revealed clear behavioral signals associated with fraud, including abnormal transaction velocity, elevated amounts, geographic inconsistencies, and interactions with new devices or merchants. Anomaly detection models, particularly Isolation Forest, effectively surfaced unusual behavioral patterns, while supervised classifiers provided accurate fraud identification and enabled actionable risk segmentation.
The final model outputs were translated into operational recommendations, showing that the majority of transactions can be safely approved, while a smaller, higher-risk subset is routed to step-up verification or manual review. This tiered decision strategy reflects a practical balance between fraud loss prevention and customer experience, ensuring that friction is applied only where risk is materially elevated. Importantly, feature importance analysis confirmed that the model’s decisions align with domain-relevant fraud indicators, supporting interpretability and regulatory transparency.
Overall, the proposed solution offers a scalable and explainable approach to modern fraud detection, capable of adapting to evolving behavioral patterns while minimizing false declines. With continuous monitoring, threshold tuning, and periodic retraining, this framework can meaningfully reduce fraud losses, optimize analyst workload, and improve customer trust in digital payment systems.