A Jungle Computing Approach to Large-Scale Online Forensic Analysis

Programming tools that simplify application development and deployment

Image: West Midlands Police (CC License)

Computing devices (including mobile phones) feature in many of the day-to-day crimes. Computer forensics has emerged as a discipline to assist law enforcement agencies in addressing the increasing use of digital storage devices in criminal acts. Forensic examination of for example mobile phones and personal computers can reveal a wealth of evidence.

Increasingly, high profile criminal cases are benefitting from digital evidence gathered via a computer forensic examination. However, analyzing these large volume data sets of evidence can prove to be a very time consuming process due to the variety of the data and the quantity of potential evidence in a digital environment. For this reason, the Netherlands Forensic Institute (NFI) designed the HANSKEN platform – an important aid in modern police investigation, capable of micro level analysis of digital traces contained in digital devices such as hard disks and mobile phones, and generating macro level forensic views.

The HANSKEN platform requires a wide variety of computing hardware – all at once. The concurrent use of such variety of hardware has been applied in scientific and industrial applications, spawning a new computing paradigm: Jungle Computing. This variety of computing hardware used in Jungle Computing can take many forms – ranging from a single centralized machine consisting of heterogeneous hardware components (e.g., multicore CPUs, GPUs, and FPGAs) to large-scale distributed systems consisting of combinations of multiple clusters, grids, and cloud systems (each potentially being self-heterogeneous as well).

The complexity of Jungle Computing Systems has generated a need for programming tools that simplify application development and deployment. Above all, such tools must hide as much as possible the idiosyncrasies of the underlying hardware. Moreover, such tools must allow programmers to efficiently integrate multiple compute kernels each potentially implemented using different languages or models (e.g. C, MPI, Python, CUDA), to easily combine and integrate different types of data (potentially from different locations), and to easily deal with dynamic computing needs (software malleability and scalability) and ad-hoc hardware availability (hardware malleability and fault-tolerance). This project focuses on the development of a high quality set of technologies that adhere to all these requirements.

The project aims to apply Jungle Computing to the above described extremely demanding domain of forensic analysis, in particular by extending and adapting the HANSKEN platform. Important requirements underlying the HANSKEN platform include: high-performance, full coverage of all available (possibly distributed) traces, ability to support a wide variety of trace analysis compute kernels, and direct access for various types of police investigators. Although the HANSKEN approach is proven successful (showing 80 times speed improvement over NFI’s current XIRAF system), the expected growth in data volumes, the need for multi-tenancy, and the need for deep analysis of multimedia traces in particular, put further demands on the HANSKEN platform. This proposal aims to realize a Jungle Computing enabled version of HANSKEN that adheres to all these requirements.

Driven by the demands of the forensic analysis domain, expected outcomes of the proposed project include:

Based on the collaboration between VU University, NFI and the Netherlands eScience Center, it is expected that forensic digital analysis will provide faster insights in forensic casework as well as new links between cases that would otherwise not have been found (for example cross-trace camera identification).

Participating organisations

Nederlands Forensisch Instituut
Natural Sciences & Engineering
Natural Sciences & Engineering
Netherlands eScience Center
Vrije Universiteit Amsterdam

Impact

Output

  • 1.
    Author(s): Jurriaan H. Spaaks, Jason Maassen
    Published in 2015

Team

Jason Maassen
eScience coordinator
Netherlands eScience Center
Ben van Werkhoven
Ben van Werkhoven
Senior eScience Research Engineer
Netherlands eScience Center
HB
Henri Bal
Principal investigator
Vrije Universiteit Amsterdam

Related projects

Enhance Your Research Alliance (EYRA) Benchmark Platform

Supporting researchers to easily set-up benchmarks

Updated 21 months ago
Finished

SecConNet Smart

Smart, secure container networks for trusted big data sharing

Updated 26 months ago
Finished

A methodology and ecosystem for many-core programming

Boosting the performance of current and future programs

Updated 21 months ago
Finished

Generic eScience Technologies

Making breakthroughs in data-driven research

Updated 22 months ago
Finished

eStep

Developing an eScience technology platform

Updated 2 months ago
Finished

Related software

Dive

DI

Interactively explore millions of 2D and 3D data points in your browser, without the need to install anything.

Updated 31 months ago
16 2

Kernel Tuner

KE

Kernel Tuner greatly simplifies the development of highly-optimized and auto-tuned CUDA, OpenCL, and C code, supporting many advanced use-cases and optimization strategies that speed up the auto-tuning process.

Updated 16 months ago
113 15