Path: blob/master/ML/Notebook/Credit card fraud detection.ipynb
3087 views
Problem Statement:
The Credit Card Fraud Detection Problem includes modeling past credit card transactions with the knowledge of the ones that turned out to be fraud. This model is then used to identify whether a new transaction is fraudulent or not. Our aim here is to detect 100% of the fraudulent transactions while minimizing the incorrect fraud classifications.
Data Description
The datasets contains transactions made by credit cards in September 2013 by european cardholders. This dataset presents transactions that occurred in two days, where we have 492 frauds out of 284,807 transactions.
The dataset is highly unbalanced, the positive class (frauds) account for 0.172% of all transactions.
The dataset consists of numerical values from the 28 ‘Principal Component Analysis (PCA)’ transformed features, namely V1 to V28. Furthermore, there is no metadata about the original features provided, so pre-analysis or feature study could not be done.
The ‘Time’ and ‘Amount’ features are not transformed data.The feature 'Amount' is the transaction Amount, this feature can be used for example-dependant cost-senstive learning. Feature 'Class' is the response variable and it takes value 1 in case of fraud and 0 otherwise.
There is no missing value in the dataset.
.
Important terms
True Positive: The fraud cases that the model predicted as ‘fraud.’
False Positive: The non-fraud cases that the model predicted as ‘fraud.’
True Negative: The non-fraud cases that the model predicted as ‘non-fraud.’
False Negative: The fraud cases that the model predicted as ‘non-fraud.’
Threshold Cutoff Probability: Probability at which the true positive ratio and true negatives ratio are both highest. It can be noted that this probability is minimal, which is reasonable as the probability of frauds is low.
Accuracy: The measure of correct predictions made by the model – that is, the ratio of fraud transactions classified as fraud and non-fraud classified as non-fraud to the total transactions in the test data.
Sensitivity: Sensitivity, or True Positive Rate, or Recall, is the ratio of correctly identified fraud cases to total fraud cases.
Specificity: Specificity, or True Negative Rate, is the ratio of correctly identified non-fraud cases to total non-fraud cases.
Precision: Precision is the ratio of correctly predicted fraud cases to total predicted fraud cases.
Insight
Around 88 dollars is the mean of all credit card transactions in this data set. The biggest transaction had a monetary value of around 25,691 dollars.
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
<ipython-input-11-4de0c76d0126> in <module>()
4 plt.ylabel('Count')
5 plt.xlabel('Class (0:Non-Fraudulent, 1:Fraudulent)')
----> 6 prints(counts.index)
NameError: name 'prints' is not defined