PETsChallenge

Code underlying: Privacy-Preserving Membership Queries for Federated Anomaly Detection

8
contributors

Description

Privacy-Preserving Feature Extraction for Detection of

Anomalous Financial Transactions


This repository holds the code written by the PPMLHuskies for the 2nd Place solution in the PETs Prize Challenge, Track A.

Description

The task is to predict probabilities for anomalous transactions, from a

synthetic database of international transactions, and several synthetic

databases of banking account information. We provide two solutions. One

solution, our centralized approach, found in solution_centralized.py,

uses the transactions database (PNS) and the banking database with no

privacy protections. The second solution, which provides robust privacy

gurantees outlined in our report, follows a federated architecture,

found in solution_federated.py and model.py. In this approach, PNS

data resides in one client, banking data is divided up accross other

clients, and an aggregator handles all the communication between any

clients. We have built in privacy protections so that clients and the

aggregator learn minimal information about each other, while engaging in

communication to detect anomalous transactions in PNS.

The way in which we conduct training and inference in both the

centralized and the federated architectures is fundamentally the same

(other than the privacy protections in the latter). Several new features

are engineered from the given PNS data. Then a model is trained on those

features from PNS. Next, during inference, a check is made to determine

if attributes from a PNS transaction match with the banking data, or if

the associated account in the banking data is flagged. If any of these

attributes are amiss, we give it a value of 1, and a 0 otherwise.

Lastly, we take the maximum of the inferred probabilities from the PNS

model, and the result from the Banking data validation, which is used as

our final prediction for the probability that the transaction is

anomalous.

The difference between the federated and centralized logic is that in

the federated set up, where there are one or multiple partitions of the

banking data across clients, is that the PNS client engages in a

cryptographic protocol based on homomorphic encryption with the banking

clients, routed through the aggregator, to perform feature extraction.

This protocol, to ensure privacy, and that PNS does not learn anything

from the banks beyond the set membership of a select few features, is

carried out over several rounds, r. r = 7 + n, where n is the number of

bank clients.

Logo of PETsChallenge
Keywords
Programming languages
  • Rust 83%
  • Markdown 7%
  • Python 5%
  • Other 4%
  • Other 1%
License
  • Apache-2.0
</>Source code
Packages
data.4tu.nl

Reference papers

Contributors

Member of community

4TU