A Jungle Computing Approach to Large-Scale Online Forensic Analysis

Programming tools that simplify application development and deployment

Image: West Midlands Police (CC License)

Computing devices (including mobile phones) feature in many of the day-to-day crimes. Computer forensics has emerged as a discipline to assist law enforcement agencies in addressing the increasing use of digital storage devices in criminal acts. Forensic examination of for example mobile phones and personal computers can reveal a wealth of evidence.

Increasingly, high profile criminal cases are benefitting from digital evidence gathered via a computer forensic examination. However, analyzing these large volume data sets of evidence can prove to be a very time consuming process due to the variety of the data and the quantity of potential evidence in a digital environment. For this reason, the Netherlands Forensic Institute (NFI) designed the HANSKEN platform – an important aid in modern police investigation, capable of micro level analysis of digital traces contained in digital devices such as hard disks and mobile phones, and generating macro level forensic views.

The HANSKEN platform requires a wide variety of computing hardware – all at once. The concurrent use of such variety of hardware has been applied in scientific and industrial applications, spawning a new computing paradigm: Jungle Computing. This variety of computing hardware used in Jungle Computing can take many forms – ranging from a single centralized machine consisting of heterogeneous hardware components (e.g., multicore CPUs, GPUs, and FPGAs) to large-scale distributed systems consisting of combinations of multiple clusters, grids, and cloud systems (each potentially being self-heterogeneous as well).

The complexity of Jungle Computing Systems has generated a need for programming tools that simplify application development and deployment. Above all, such tools must hide as much as possible the idiosyncrasies of the underlying hardware. Moreover, such tools must allow programmers to efficiently integrate multiple compute kernels each potentially implemented using different languages or models (e.g. C, MPI, Python, CUDA), to easily combine and integrate different types of data (potentially from different locations), and to easily deal with dynamic computing needs (software malleability and scalability) and ad-hoc hardware availability (hardware malleability and fault-tolerance). This project focuses on the development of a high quality set of technologies that adhere to all these requirements.

The project aims to apply Jungle Computing to the above described extremely demanding domain of forensic analysis, in particular by extending and adapting the HANSKEN platform. Important requirements underlying the HANSKEN platform include: high-performance, full coverage of all available (possibly distributed) traces, ability to support a wide variety of trace analysis compute kernels, and direct access for various types of police investigators. Although the HANSKEN approach is proven successful (showing 80 times speed improvement over NFI’s current XIRAF system), the expected growth in data volumes, the need for multi-tenancy, and the need for deep analysis of multimedia traces in particular, put further demands on the HANSKEN platform. This proposal aims to realize a Jungle Computing enabled version of HANSKEN that adheres to all these requirements.

Driven by the demands of the forensic analysis domain, expected outcomes of the proposed project include:

Based on the collaboration between VU University, NFI and the Netherlands eScience Center, it is expected that forensic digital analysis will provide faster insights in forensic casework as well as new links between cases that would otherwise not have been found (for example cross-trace camera identification).

Participating organisations

Natural Sciences & Engineering

Impact

1.
Author(s): Ana Oprescu
Published in 2013

1.
Author(s): Ben van Werkhoven, Willem Jan Palenstijn, Alessio Sclocco
Published in 2020
2.
Author(s): Stijn Heldens, Pieter Hijma, Ben van Werkhoven, Jason Maassen, Henri Bal, Rob van Nieuwpoort
Published in 2020
3.

Output

1.
Author(s): Jason Maassen, Niels Drost, Henri E. Bal, Frank J. Seinstra
Published in Proceedings of the 2011 workshop on Dynamic distributed data-intensive applications, programming abstractions, and systems by ACM in 2011, page: 7-18
10.1145/1996010.1996013

1.
Author(s): Jurriaan H. Spaaks, Jason Maassen
Published in 2015

Team

Contact person

Jason Maassen

eScience coordinator

Netherlands eScience Center

0000-0002-8172-4865 Mail Jason

Jason Maassen

eScience coordinator

Netherlands eScience Center

0000-0002-8172-4865

Ben van Werkhoven

Senior eScience Research Engineer

Netherlands eScience Center

0000-0002-7508-3272

Henri Bal

Principal investigator

Vrije Universiteit Amsterdam

0000-0001-9827-4461

Related projects

Enhance Your Research Alliance (EYRA) Benchmark Platform

Supporting researchers to easily set-up benchmarks

Updated 40 months ago

Finished

SecConNet Smart

Smart, secure container networks for trusted big data sharing

Updated 44 months ago

Finished

A methodology and ecosystem for many-core programming

Boosting the performance of current and future programs

Updated 40 months ago

Finished

Generic eScience Technologies

Making breakthroughs in data-driven research

Updated 40 months ago

Finished

eStep

Developing an eScience technology platform

Updated 20 months ago

Finished

Related software

Dive

Interactively explore millions of 2D and 3D data points in your browser, without the need to install anything.

Updated 49 months ago

16 2

Kernel Tuner

Kernel Tuner greatly simplifies the development of highly-optimized and auto-tuned CUDA, OpenCL, and C code, supporting many advanced use-cases and optimization strategies that speed up the auto-tuning process.

Updated 34 months ago

154 15

A Jungle Computing Approach to Large-Scale Online Forensic Analysis

Participating organisations

Impact

Book section3

Conference papers10

Journal articles20

Thesis1

Other3

Output

Computer programs6

Conference papers1

Dataset2

Journal articles3

Presentations1

Other2

Team

Contact person

Jason Maassen

Related projects

Enhance Your Research Alliance (EYRA) Benchmark Platform

SecConNet Smart

A methodology and ecosystem for many-core programming

Generic eScience Technologies

eStep

Related software

Dive

Kernel Tuner