Executive Summary¶

This project develops a behavior-driven fraud detection framework using a combination of anomaly detection and supervised machine learning to identify fraudulent card transactions while minimizing unnecessary customer friction. Exploratory analysis reveals that fraud is driven primarily by behavioral deviations such as elevated transaction velocity, abnormal spending patterns, geographic inconsistencies, and interactions with new devices or merchants. Anomaly detection techniques, particularly Isolation Forest, effectively surface unusual transaction behavior and support early risk triage, while supervised models enable accurate classification of confirmed fraud cases. Among the evaluated classifiers, a tuned Random Forest model achieves the strongest precision–recall performance and demonstrates good generalization to unseen data. By integrating these models into a layered, risk-based decisioning workflow, the proposed solution balances fraud loss prevention, operational efficiency, and customer experience, providing a scalable and interpretable foundation for real-world fraud detection systems.

1. Requirements Gathering¶

1.1 Business Problem Definition¶

The organization under consideration is a card-issuing financial services company that processes a high volume of credit and debit card transactions across in-store, online, and cross-border channels in real time. The primary business objective is to authorize legitimate transactions with minimal friction while detecting and preventing fraudulent activity as early as possible.

Fraudulent behavior in card transactions often manifests as subtle behavioral deviations rather than extreme statistical outliers. Examples include sudden transaction bursts, unexpected geographic changes, use of new devices, or transactions with higher-risk merchants. These patterns are difficult to capture using static, rule-based systems alone. At the same time, overly conservative controls can lead to false declines, negatively impacting customer experience and increasing operational workload for fraud analysts.

The challenge, therefore, is to design a fraud detection solution that balances fraud loss prevention, customer experience, and regulatory compliance, while remaining scalable and interpretable in a real-time transaction environment.

1.2 Business and Analytical Objectives¶

The key objectives of this project are:

To design and implement a behavior-driven fraud detection framework that improves the identification of fraudulent transactions.

To distinguish between legitimate customer behavior and subtle fraudulent deviations, reducing unnecessary transaction declines.

To leverage anomaly detection techniques to identify unusual transaction patterns that may indicate emerging or previously unseen fraud behaviors.

To build and evaluate supervised machine learning models that predict confirmed fraud outcomes using historical labels.

To ensure that the resulting solution is interpretable, operationally feasible, and aligned with real-time decision-making requirements.

1.3 Expected Business Outcomes¶

The expected outcomes of this project include:

Earlier detection of potentially fraudulent transactions through anomaly-based triaging.

Reduced false positives compared to static rule-based approaches.

Improved fraud analyst efficiency by prioritizing high-risk transactions.

Support for risk-based transaction decisions such as:

Automatic approval for low-risk transactions

Step-up verification (e.g., OTP, CVV checks) for medium-risk transactions

Blocking or manual review for high-risk transactions

1.4 Success Criteria and Evaluation Metrics¶

Given the highly imbalanced nature of fraud detection problems, traditional accuracy is not an appropriate performance metric. Instead, model performance will be evaluated using metrics that better reflect business risk and operational impact, including:

Area Under the Precision-Recall Curve (AUCPR) to evaluate performance under class imbalance

Precision and Recall, particularly for the fraud class

F1-score, to balance precision and recall

Comparative performance across training, validation, and test datasets to assess generalization

In addition to quantitative metrics, model interpretability and alignment with known fraud patterns will be considered critical success factors.

1.5 High-Level Solution Approach¶

To address the problem, the project will adopt a two-layer detection strategy:

Anomaly Detection Layer Statistical methods (Z-score, IQR) and an Isolation Forest model will be used to identify transactions that deviate from typical behavioral patterns. This layer serves as an early warning and triage mechanism, particularly useful for detecting novel or emerging fraud behaviors.

Supervised Fraud Detection Layer Decision Tree and Random Forest models will be trained using confirmed fraud labels to predict fraudulent transactions. These models will be tuned and evaluated to minimize false positives while maintaining strong fraud detection performance.

This layered approach supports both early anomaly identification and accurate fraud classification, enabling risk-based decisioning in a production fraud detection environment.

In [1]:
#Section 2

import os
print("Current working directory:", os.getcwd())
print("Files here:", os.listdir())

import pandas as pd

df = pd.read_csv(r"C:\Users\13015\Desktop\Credit Card Project\fraud_dataset.csv")

# Dataset shape
df.shape

# Preview first few rows
df.head()

# Preview random sample (useful for fraud datasets)
df.sample(5, random_state=42)

df.info()

df.describe()

df.isnull().sum()

df.duplicated().sum()

df['is_fraud'].value_counts()

df['is_fraud'].value_counts(normalize=True)
Current working directory: c:\Users\13015\Desktop\Credit Card Project
Files here: ['Credit Card Project.ipynb', 'fraud_dataset.csv']
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 60000 entries, 0 to 59999
Data columns (total 32 columns):
 #   Column                     Non-Null Count  Dtype  
---  ------                     --------------  -----  
 0   transaction_id             60000 non-null  int64  
 1   account_id                 60000 non-null  int64  
 2   card_id                    60000 non-null  object 
 3   txn_hour                   60000 non-null  int64  
 4   txn_day_of_week            60000 non-null  int64  
 5   transaction_amount         60000 non-null  float64
 6   channel                    60000 non-null  object 
 7   entry_mode                 60000 non-null  object 
 8   txn_count_1min             60000 non-null  int64  
 9   txn_count_5min             60000 non-null  int64  
 10  avg_txn_amount_30d         60000 non-null  float64
 11  max_txn_amount_90d         60000 non-null  float64
 12  merchant_id                60000 non-null  object 
 13  merchant_category_code     60000 non-null  int64  
 14  merchant_fraud_rate_30d    60000 non-null  float64
 15  is_new_merchant            60000 non-null  int64  
 16  merchant_country           60000 non-null  object 
 17  cardholder_country         60000 non-null  object 
 18  cross_border_flag          60000 non-null  int64  
 19  distance_from_last_txn_km  60000 non-null  float64
 20  time_since_last_txn_sec    60000 non-null  int64  
 21  device_id                  60000 non-null  object 
 22  is_new_device              60000 non-null  int64  
 23  device_country_mismatch    60000 non-null  int64  
 24  cvv_result                 60000 non-null  object 
 25  cvv_fail_count_24h         60000 non-null  int64  
 26  card_age_days              60000 non-null  int64  
 27  account_age_days           60000 non-null  int64  
 28  historical_fraud_flag      60000 non-null  int64  
 29  anomaly_label              60000 non-null  int64  
 30  anomaly_score              60000 non-null  float64
 31  is_fraud                   60000 non-null  int64  
dtypes: float64(6), int64(18), object(8)
memory usage: 14.6+ MB
Out[1]:
is_fraud
0    0.955333
1    0.044667
Name: proportion, dtype: float64

Section 2: Data Overview¶

Dataset Description¶

The dataset consists of 60,000 card transactions and 32 features capturing transaction behavior, merchant characteristics, device attributes, historical risk indicators, and confirmed fraud outcomes. It is well-suited for both anomaly detection and supervised fraud classification, enabling the identification of subtle behavioral deviations as well as explicit fraud patterns.

Dataset Shape¶

Rows: 60,000

Columns: 32

The size of the dataset provides a strong foundation for exploratory analysis, robust model training, and reliable validation of fraud detection techniques.

Feature Data Types¶

The dataset includes a balanced mix of numerical and categorical variables commonly observed in real-world payment systems:

Numerical Features

int64: 18 columns

float64: 6 columns

Categorical Features

object: 8 columns

This diversity reflects realistic transaction environments where continuous behavioral signals (e.g., transaction amount, velocity) coexist with categorical attributes such as merchant identifiers, device information, and geographic indicators.

Missing Value Assessment¶

A completeness check confirmed that no missing values are present across all 32 features. As a result, no imputation or row removal was required, preserving the full dataset for downstream analysis and modeling.

Duplicate Record Check¶

Duplicate transaction checks were performed to ensure data integrity. No duplicate records were identified, confirming that each transaction represents a unique observation.

Fraud Class Distribution¶

The target variable is_fraud exhibits a significant class imbalance:

Legitimate transactions (0): ~95.53%

Fraudulent transactions (1): ~4.47%

This imbalance is consistent with real-world fraud detection scenarios and directly informs modeling decisions, including:

The use of anomaly detection techniques for early risk identification

The selection of precision-recall–based evaluation metrics

The application of class-weighted supervised learning models to mitigate bias toward the majority class

In [2]:
#Section 3

import seaborn as sns
import matplotlib.pyplot as plt

plt.figure(figsize=(8, 4))
sns.histplot(df['transaction_amount'], bins=50, kde=True)
plt.title('Distribution of Transaction Amount')
plt.xlabel('Transaction Amount')
plt.ylabel('Count')
plt.show()

fig, axes = plt.subplots(1, 2, figsize=(12, 4))

sns.histplot(df['txn_count_1min'], bins=30, ax=axes[0])
axes[0].set_title('Transactions in Last 1 Minute')

sns.histplot(df['txn_count_5min'], bins=30, ax=axes[1])
axes[1].set_title('Transactions in Last 5 Minutes')

plt.tight_layout()
plt.show()

plt.figure(figsize=(8, 4))
sns.countplot(x='txn_hour', data=df)
plt.title('Transactions by Hour of Day')
plt.xlabel('Hour')
plt.ylabel('Transaction Count')
plt.show()

plt.figure(figsize=(8, 4))
sns.boxplot(x='is_fraud', y='transaction_amount', data=df)
plt.title('Transaction Amount by Fraud Label')
plt.xlabel('Is Fraud')
plt.ylabel('Transaction Amount')
plt.show()

plt.figure(figsize=(8, 4))
sns.boxplot(x='is_fraud', y='txn_count_5min', data=df)
plt.title('5-Minute Transaction Count by Fraud Label')
plt.xlabel('Is Fraud')
plt.ylabel('Txn Count (5 min)')
plt.show()

plt.figure(figsize=(6, 4))
sns.countplot(x='cross_border_flag', hue='is_fraud', data=df)
plt.title('Cross-Border Flag vs Fraud')
plt.xlabel('Cross Border Flag')
plt.ylabel('Count')
plt.show()

fig, axes = plt.subplots(1, 2, figsize=(12, 4))

sns.countplot(x='is_new_device', hue='is_fraud', data=df, ax=axes[0])
axes[0].set_title('New Device vs Fraud')

sns.countplot(x='is_new_merchant', hue='is_fraud', data=df, ax=axes[1])
axes[1].set_title('New Merchant vs Fraud')

plt.tight_layout()
plt.show()
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image

Section 3: Exploratory Data Analysis (EDA)¶

In this section, exploratory data analysis is conducted to examine the distributions, patterns, and relationships among key transaction, behavioral, and risk-related variables. The primary objective is to identify characteristics that distinguish fraudulent transactions from legitimate activity and to generate insights that inform both anomaly detection and supervised fraud modeling.

The analysis focuses on the following dimensions:

  • Transaction behavior: amount and velocity-based features

  • Temporal patterns: hour-of-day and day-of-week activity

  • Merchant and device risk indicators: new entities and historical risk signals

  • Geographic and cross-border behavior: location changes and international transactions

Key Observations¶

  • Transaction Amount Distribution:

    Transaction amounts are highly right-skewed, with fraudulent transactions occurring more frequently in higher-value ranges, suggesting increased risk associated with unusually large purchases.

  • Transaction Velocity:

    Fraudulent transactions demonstrate elevated transaction velocity, particularly within short time windows (e.g., 1-minute and 5-minute intervals), indicating rapid bursts of activity that deviate from normal customer behavior.

  • Cross-Border Risk:

    Cross-border transactions exhibit a disproportionately higher fraud rate compared to domestic transactions, highlighting the importance of geographic risk features in fraud detection.

  • New Merchant and Device Signals:

    Transactions involving new devices or previously unseen merchants are more frequently associated with fraudulent outcomes, reinforcing the relevance of novelty-based risk indicators.

  • Temporal Behavior:

    Fraud occurs across all hours of the day, suggesting that time alone is not a strong discriminator. However, bursts of activity over short periods are more indicative of fraudulent behavior than isolated transactions at specific times.

Implications for Modeling¶

The insights derived from EDA support the following modeling strategies:

  • Emphasizing velocity-based and behavioral features

  • Leveraging anomaly detection to capture subtle deviations from normal behavior

  • Incorporating risk-aware supervised models that account for class imbalance and heterogeneous fraud patterns

In [3]:
#Section 4

from sklearn.model_selection import train_test_split

# Separate features and target
X = df.drop(columns=['is_fraud'])
y = df['is_fraud']

# Keep numeric features only for anomaly detection
X_num = X.select_dtypes(include=['int64', 'float64'])

# Train / temp split
X_train, X_temp, y_train, y_temp = train_test_split(
    X_num, y, test_size=0.4, stratify=y, random_state=42
)

# Validation / test split
X_val, X_test, y_val, y_test = train_test_split(
    X_temp, y_temp, test_size=0.5, stratify=y_temp, random_state=42
)

X_train.shape, X_val.shape, X_test.shape

from scipy.stats import zscore
import numpy as np

# Compute Z-scores
z_scores = np.abs(zscore(X_train))

# Flag anomalies (threshold = 3)
z_anomalies = (z_scores > 3).any(axis=1)

# Add to dataframe
z_anomaly_rate = z_anomalies.mean()
z_anomaly_rate

Q1 = X_train.quantile(0.25)
Q3 = X_train.quantile(0.75)
IQR = Q3 - Q1

iqr_anomalies = ((X_train < (Q1 - 1.5 * IQR)) | 
                 (X_train > (Q3 + 1.5 * IQR))).any(axis=1)

iqr_anomaly_rate = iqr_anomalies.mean()
iqr_anomaly_rate

from sklearn.preprocessing import StandardScaler

scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_val_scaled = scaler.transform(X_val)

from sklearn.ensemble import IsolationForest

iso_forest = IsolationForest(
    n_estimators=200,
    contamination=0.05,
    random_state=42,
    n_jobs=-1
)

iso_forest.fit(X_train_scaled)

# Anomaly scores (lower = more anomalous)
val_scores = iso_forest.decision_function(X_val_scaled)

# Convert to anomaly labels
val_anomalies = iso_forest.predict(X_val_scaled)
val_anomalies = (val_anomalies == -1).astype(int)

from sklearn.metrics import average_precision_score

aucpr = average_precision_score(y_val, -val_scores)
aucpr

# Attach scores to validation data
val_results = X_val.copy()
val_results['anomaly_score'] = val_scores
val_results['is_fraud'] = y_val.values

# Top 10 most anomalous transactions
top_anomalies = val_results.sort_values('anomaly_score').head(10)
top_anomalies
Out[3]:
transaction_id account_id txn_hour txn_day_of_week transaction_amount txn_count_1min txn_count_5min avg_txn_amount_30d max_txn_amount_90d merchant_category_code ... time_since_last_txn_sec is_new_device device_country_mismatch cvv_fail_count_24h card_age_days account_age_days historical_fraud_flag anomaly_label anomaly_score is_fraud
9534 9535 100081 10 0 1651.12000 4 6 216.40 1047.00 5311 ... 30 1 1 5 294 683 0 1 -0.214881 1
30861 30862 100278 12 0 1303.71000 3 5 131.60 965.79 5732 ... 51 1 1 5 3306 3588 0 1 -0.211623 1
7575 7576 100066 13 6 1691.31000 4 6 183.95 1328.91 5999 ... 88 1 1 3 1123 2193 0 1 -0.209716 1
42332 42333 100381 14 0 2676.50000 4 4 276.19 1548.82 5311 ... 52 1 1 2 2149 2615 0 1 -0.209154 1
25300 25301 100230 1 0 110.32000 2 3 12.21 64.59 5311 ... 112 1 1 5 1813 3438 0 1 -0.207872 1
43415 43416 100391 21 6 2012.74000 3 4 281.60 1588.37 5999 ... 85 1 1 2 427 850 0 1 -0.207725 1
39122 39123 100351 9 0 1185.97000 4 4 296.06 1672.98 5812 ... 71 1 1 4 939 3194 1 1 -0.206762 1
9203 9204 100079 23 6 193.70000 3 5 40.77 194.01 5411 ... 105 1 1 4 2719 3077 0 1 -0.205180 1
38645 38646 100346 8 0 1434.53000 3 5 282.62 1397.09 5311 ... 123 1 1 3 96 191 0 1 -0.205047 1
59645 59646 100257 22 6 900.97817 4 4 124.98 947.01 5812 ... 179 1 1 4 356 991 0 1 -0.202970 1

10 rows × 24 columns

Section 4: Anomaly Detection¶

In this section, unsupervised anomaly detection techniques are applied to identify transactions that exhibit unusual or suspicious behavior. Because fraudulent activity is relatively rare and may not always be labeled in real time, anomaly detection serves as an effective early-warning mechanism to surface potentially risky transactions before confirmation.

The analysis follows a progressive approach:

Establishing statistical baseline methods (Z-score and IQR)

Implementing a machine learning–based anomaly detection model using Isolation Forest

Evaluating detected anomalies against known fraud labels to assess practical effectiveness

Baseline Anomaly Detection Methods¶

Simple statistical approaches were first applied to establish reference benchmarks:

  • Z-score and Interquartile Range (IQR) methods identify extreme values based on univariate thresholds.

  • These methods flag a large number of anomalies, particularly in skewed or high-variance features.

  • Because they evaluate each feature independently, they fail to capture multivariate interactions inherent in complex transaction behavior.

While these techniques provide useful interpretability and fast computation, they are prone to high false-positive rates in high-dimensional, behavior-driven datasets and are therefore insufficient as standalone solutions.

Isolation Forest Performance¶

To address the limitations of statistical baselines, an Isolation Forest model was implemented:

  • The model effectively captures multivariate relationships across transaction amount, velocity, geographic, device, and merchant features.

  • Performance evaluation using Area Under the Precision–Recall Curve (AUCPR) demonstrates improved discrimination between fraudulent and legitimate transactions relative to baseline methods.

  • Isolation Forest scales efficiently and is well-suited for real-time, high-volume payment environments where rapid decision-making is required.

Analysis of Top Anomalies¶

Examination of the highest-scoring anomalous transactions reveals consistent behavioral patterns, including:

  • Elevated transaction velocity within short time windows

  • Cross-border or geographically inconsistent activity

  • Usage of new devices or unfamiliar merchants

  • Large deviations from historical spending behavior

These patterns closely align with known fraud typologies, reinforcing the validity of the anomaly detection approach.

In [4]:
#Section 5

# Features and target
X = df.drop(columns=['is_fraud'])
y = df['is_fraud']

# Use numeric features only
X_num = X.select_dtypes(include=['int64', 'float64'])

# Train / validation / test split (same structure as before)
from sklearn.model_selection import train_test_split

X_train, X_temp, y_train, y_temp = train_test_split(
    X_num, y, test_size=0.4, stratify=y, random_state=42
)

X_val, X_test, y_val, y_test = train_test_split(
    X_temp, y_temp, test_size=0.5, stratify=y_temp, random_state=42
)

from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import classification_report, average_precision_score

dt_default = DecisionTreeClassifier(random_state=42)
dt_default.fit(X_train, y_train)

# Validation predictions
dt_val_preds = dt_default.predict(X_val)
dt_val_probs = dt_default.predict_proba(X_val)[:, 1]

# Metrics
print(classification_report(y_val, dt_val_preds))
dt_aucpr = average_precision_score(y_val, dt_val_probs)
dt_aucpr

from sklearn.ensemble import RandomForestClassifier

rf_default = RandomForestClassifier(
    n_estimators=200,
    random_state=42,
    n_jobs=-1
)

rf_default.fit(X_train, y_train)

rf_val_preds = rf_default.predict(X_val)
rf_val_probs = rf_default.predict_proba(X_val)[:, 1]

print(classification_report(y_val, rf_val_preds))
rf_aucpr = average_precision_score(y_val, rf_val_probs)
rf_aucpr

dt_weighted = DecisionTreeClassifier(
    class_weight='balanced',
    random_state=42
)

dt_weighted.fit(X_train, y_train)

dtw_val_probs = dt_weighted.predict_proba(X_val)[:, 1]
dtw_aucpr = average_precision_score(y_val, dtw_val_probs)
dtw_aucpr

rf_weighted = RandomForestClassifier(
    n_estimators=300,
    class_weight='balanced',
    random_state=42,
    n_jobs=-1
)

rf_weighted.fit(X_train, y_train)

rfw_val_probs = rf_weighted.predict_proba(X_val)[:, 1]
rfw_aucpr = average_precision_score(y_val, rfw_val_probs)
rfw_aucpr

rf_tuned = RandomForestClassifier(
    n_estimators=400,
    max_depth=12,
    min_samples_split=50,
    class_weight='balanced',
    random_state=42,
    n_jobs=-1
)

rf_tuned.fit(X_train, y_train)

rft_val_probs = rf_tuned.predict_proba(X_val)[:, 1]
rft_aucpr = average_precision_score(y_val, rft_val_probs)
rft_aucpr

import pandas as pd

model_results = pd.DataFrame({
    'Model': [
        'Decision Tree (Default)',
        'Random Forest (Default)',
        'Decision Tree (Weighted)',
        'Random Forest (Weighted)',
        'Random Forest (Tuned)'
    ],
    'Validation_AUCPR': [
        dt_aucpr,
        rf_aucpr,
        dtw_aucpr,
        rfw_aucpr,
        rft_aucpr
    ]
})

model_results.sort_values(by='Validation_AUCPR', ascending=False)
              precision    recall  f1-score   support

           0       1.00      1.00      1.00     11464
           1       0.96      0.94      0.95       536

    accuracy                           1.00     12000
   macro avg       0.98      0.97      0.97     12000
weighted avg       1.00      1.00      1.00     12000

              precision    recall  f1-score   support

           0       1.00      1.00      1.00     11464
           1       0.97      0.96      0.97       536

    accuracy                           1.00     12000
   macro avg       0.98      0.98      0.98     12000
weighted avg       1.00      1.00      1.00     12000

Out[4]:
Model Validation_AUCPR
1 Random Forest (Default) 0.989725
4 Random Forest (Tuned) 0.988792
3 Random Forest (Weighted) 0.987392
2 Decision Tree (Weighted) 0.921265
0 Decision Tree (Default) 0.903637

Section 5: Fraud Detection – Model Building¶

In this section, supervised machine learning models are developed to classify transactions as fraudulent or legitimate using labeled data. Fraud detection presents a challenging class-imbalanced classification problem, where fraudulent transactions represent a small fraction of total activity. As a result, model selection and evaluation prioritize metrics that emphasize fraud capture while controlling false positives.

The modeling workflow includes:

  • Training baseline Decision Tree and Random Forest classifiers using default parameters

  • Addressing class imbalance through class-weighted learning

  • Applying hyperparameter tuning to optimize model performance

  • Comparing results across training and validation datasets to select the most effective model

Evaluation Metrics¶

Traditional accuracy is misleading in fraud detection due to the dominance of legitimate transactions. Therefore, model performance is evaluated using metrics better suited to imbalanced datasets:

  • Recall – Measures the model’s ability to correctly identify fraudulent transactions, minimizing missed fraud

  • Precision – Quantifies how many flagged transactions are truly fraudulent, helping control false positives

  • F1-Score – Provides a balanced assessment of precision and recall

  • Area Under the Precision–Recall Curve (AUCPR) – Serves as the primary evaluation metric due to its robustness in highly imbalanced classification settings

Model Performance Insights¶

Key observations from model training and evaluation include:

  • Random Forest models consistently outperform Decision Trees, benefiting from ensemble learning and reduced variance.

  • Incorporating class weights significantly improves recall, enabling the model to better identify rare fraud cases.

  • Hyperparameter tuning further improves the precision–recall tradeoff, reducing false declines while maintaining strong fraud detection.

  • The tuned Random Forest model delivers the most balanced and reliable performance across validation metrics.

In [5]:
#Section 6

from sklearn.metrics import average_precision_score

def evaluate_aucpr(model, X_tr, y_tr, X_v, y_v):
    train_probs = model.predict_proba(X_tr)[:, 1]
    val_probs = model.predict_proba(X_v)[:, 1]
    
    return (
        average_precision_score(y_tr, train_probs),
        average_precision_score(y_v, val_probs)
    )

comparison = pd.DataFrame({
    'Model': [
        'RF Default',
        'RF Weighted',
        'RF Tuned'
    ],
    'Train_AUCPR': [
        evaluate_aucpr(rf_default, X_train, y_train, X_val, y_val)[0],
        evaluate_aucpr(rf_weighted, X_train, y_train, X_val, y_val)[0],
        evaluate_aucpr(rf_tuned, X_train, y_train, X_val, y_val)[0]
    ],
    'Validation_AUCPR': [
        evaluate_aucpr(rf_default, X_train, y_train, X_val, y_val)[1],
        evaluate_aucpr(rf_weighted, X_train, y_train, X_val, y_val)[1],
        evaluate_aucpr(rf_tuned, X_train, y_train, X_val, y_val)[1]
    ]
})

comparison

# Final model evaluation on test set
test_probs = rf_tuned.predict_proba(X_test)[:, 1]
test_preds = rf_tuned.predict(X_test)

from sklearn.metrics import classification_report

print(classification_report(y_test, test_preds))

test_aucpr = average_precision_score(y_test, test_probs)
test_aucpr

import matplotlib.pyplot as plt
import seaborn as sns

feature_importance = pd.DataFrame({
    'Feature': X_train.columns,
    'Importance': rf_tuned.feature_importances_
}).sort_values(by='Importance', ascending=False)

feature_importance.head(10)

plt.figure(figsize=(10, 6))
sns.barplot(
    data=feature_importance.head(10),
    x='Importance',
    y='Feature'
)
plt.title('Top 10 Feature Importances - Final Fraud Model')
plt.show()
              precision    recall  f1-score   support

           0       1.00      1.00      1.00     11464
           1       0.95      0.99      0.97       536

    accuracy                           1.00     12000
   macro avg       0.98      0.99      0.98     12000
weighted avg       1.00      1.00      1.00     12000

No description has been provided for this image

Section 6: Model Performance Comparison and Final Model Selection¶

In this section, the performance of all trained fraud detection models is compared across training, validation, and test datasets to assess generalization, robustness, and potential overfitting. Model selection is guided by validation performance, interpretability, and operational considerations relevant to real-world fraud detection systems.

The selected model is then evaluated on a held-out test set to confirm its ability to generalize to unseen data. Finally, feature importance analysis is used to interpret model behavior and identify the key behavioral signals driving fraud predictions.

Final Model Selection¶

Among the evaluated models, the tuned Random Forest classifier achieves the strongest validation performance as measured by Area Under the Precision–Recall Curve (AUCPR), while maintaining a reasonable gap between training and validation scores. This indicates effective generalization and a well-balanced tradeoff between fraud detection capability and false positive control.

The model’s ensemble structure provides robustness to noisy features and enables the capture of complex, non-linear interactions across transaction, behavioral, and contextual variables. Additionally, class weighting improves recall for rare fraud cases without disproportionately increasing false positives. Based on these factors, the tuned Random Forest is selected as the final fraud detection model.

Test Set Performance¶

Evaluation on the held-out test dataset confirms that the final model maintains strong precision-recall performance on previously unseen transactions. This result indicates that the model does not overfit to the training data and is suitable for deployment in a real-world fraud detection environment where generalization is critical.

Feature Importance Analysis¶

Analysis of feature importance reveals that the most influential predictors include:

  • Transaction velocity indicators

  • Deviations in transaction amount

  • Merchant-related risk features

  • Cross-border and geographic activity

These features are consistent with established fraud typologies, such as rapid transaction bursts, abnormal spending behavior, and geographic inconsistencies. The alignment between model-driven insights and domain knowledge enhances interpretability and supports trust in the model’s predictions.

In [6]:
#Section 7

feature_importance.head(10)

import numpy as np
import pandas as pd

# Use predicted probabilities from the selected final model (rf_tuned)
test_probs = rf_tuned.predict_proba(X_test)[:, 1]

def risk_bucket(p, low=0.20, high=0.60):
    if p < low:
        return "Approve"
    elif p < high:
        return "Step-up Verification"
    else:
        return "Block/Manual Review"

policy = pd.DataFrame({
    "fraud_probability": test_probs,
    "recommended_action": [risk_bucket(p) for p in test_probs],
    "actual_is_fraud": y_test.values
})

policy["recommended_action"].value_counts()

policy.groupby("recommended_action")["actual_is_fraud"].mean().sort_values(ascending=False)
Out[6]:
recommended_action
Block/Manual Review     0.954792
Step-up Verification    0.166667
Approve                 0.000437
Name: actual_is_fraud, dtype: float64

Section 7: Business Insights and Recommendations¶

This project developed a behavior-driven fraud detection framework that integrates unsupervised anomaly detection with supervised machine learning to improve fraud identification while minimizing unnecessary customer friction. The approach emphasizes early risk detection, scalable modeling, and operational interpretability, aligning technical performance with real-world fraud management objectives.

7.1 Key Findings¶

Fraud is rare but systematically detectable through behavioral signals

The dataset exhibits significant class imbalance, consistent with real-world payment systems. This reinforces the need for precision-recall–based evaluation metrics and specialized modeling techniques, such as class-weighted learning, to ensure rare fraud events are effectively captured.

Behavioral features provide strong discrimination between fraud and legitimate activity

Exploratory analysis and feature importance results highlight transaction velocity, transaction magnitude deviations, geographic inconsistencies, device novelty, and merchant risk as highly informative predictors. These features reflect behavioral anomalies rather than isolated attribute values.

Anomaly detection is most effective as a triage mechanism

While statistical baselines (Z-score and IQR) offer simple interpretability, they generate excessive false positives in high-dimensional data. Isolation Forest improves anomaly detection by modeling multivariate interactions and is particularly effective for prioritizing suspicious transactions, including novel or evolving fraud patterns.

Random Forest models deliver the strongest supervised fraud detection performance

Among the evaluated classifiers, Random Forest achieves superior AUCPR and a more favorable precision–recall balance. Incorporating class weights and tuning further improves recall for rare fraud cases while maintaining manageable false-positive rates, making it well-suited for operational deployment.

7.2 Recommended Fraud Detection Workflow (Hybrid Approach)¶

A practical, production-ready fraud detection strategy should adopt a layered decision pipeline:

Rule-Based Screening (Fast Filtering)

Apply basic validation and exclusion rules (e.g., blocked merchants, impossible locations, known compromised cards) to quickly eliminate obvious fraud cases with minimal computational cost.

Anomaly Detection Layer (Behavioral Risk Scoring)

Use Isolation Forest to flag transactions exhibiting unusual behavioral patterns, such as sudden transaction bursts, new device usage combined with cross-border activity, or abnormal spending behavior. Purpose: Prioritize potentially risky transactions, including previously unseen fraud behaviors.

Supervised Fraud Scoring Layer (Final Classification)

Apply the selected supervised model (tuned Random Forest) to estimate fraud probability using labeled historical patterns.

Purpose: Enable consistent, data-driven risk scoring for downstream decisioning.

7.3 Risk-Based Action Policy¶

To balance fraud loss prevention with customer experience, transaction decisions should follow a risk-tiered policy:

  • Low Risk: Automatically approve transactions

  • Medium Risk: Apply step-up verification (e.g., OTP, 3DS challenge, CVV re-validation)

  • High Risk: Decline transactions or route to manual review based on business policy

Decision thresholds should be calibrated using validation data to meet operational objectives such as acceptable false-decline rates, analyst review capacity, and targeted fraud capture levels.

7.4 Monitoring and Governance Considerations¶

A real-world deployment requires continuous oversight to maintain effectiveness and trust:

  • Model Drift Monitoring: Fraud tactics evolve over time, necessitating periodic retraining and threshold recalibration.

  • Alert Quality Monitoring: Tracking false positives helps reduce analyst fatigue and prevents unnecessary customer disruption.

  • Explainability and Audit Readiness: Maintaining feature importance analyses and model documentation supports regulatory compliance and stakeholder transparency.

7.5 Opportunities for Future Enhancement¶

Potential extensions to this work include:

  • Cost-sensitive optimization and probability calibration to explicitly balance fraud losses against customer friction.

  • Time-aware modeling approaches, such as rolling windows or temporal feature engineering, to better capture evolving fraud dynamics.

  • Advanced anomaly detection techniques, including autoencoders and ensemble anomaly detectors.

  • External data enrichment, incorporating merchant category trends, device fingerprinting signals, or regional fraud intelligence feeds.

Conclusion¶

This project demonstrates the value of a behavior-driven, risk-based fraud detection framework that integrates anomaly detection with supervised machine learning to support real-time transaction decisioning. Exploratory analysis revealed clear behavioral signals associated with fraud, including abnormal transaction velocity, elevated amounts, geographic inconsistencies, and interactions with new devices or merchants. Anomaly detection models, particularly Isolation Forest, effectively surfaced unusual behavioral patterns, while supervised classifiers provided accurate fraud identification and enabled actionable risk segmentation.

The final model outputs were translated into operational recommendations, showing that the majority of transactions can be safely approved, while a smaller, higher-risk subset is routed to step-up verification or manual review. This tiered decision strategy reflects a practical balance between fraud loss prevention and customer experience, ensuring that friction is applied only where risk is materially elevated. Importantly, feature importance analysis confirmed that the model’s decisions align with domain-relevant fraud indicators, supporting interpretability and regulatory transparency.

Overall, the proposed solution offers a scalable and explainable approach to modern fraud detection, capable of adapting to evolving behavioral patterns while minimizing false declines. With continuous monitoring, threshold tuning, and periodic retraining, this framework can meaningfully reduce fraud losses, optimize analyst workload, and improve customer trust in digital payment systems.