Page 107 - Proceedings of the 2017 ITU Kaleidoscope
P. 107
Challenges for a data-driven society
within the framework is central to deriving a useful spatial Table 1. Description of the different categories of important
analysis. This means that when crime data from a spatial re-
features considered
gion is analysed alongside other spatial data (from another
Features Categories
region/province), there is tendency for over-fitting or under- Indian-male (I-male)
fitting in the emerging model or pattern, leading to poor pre- Indian-female (I-female)
dictions on new data sets. Thus, the spatial characteristics Victim Black-male (B-male)
of data within a specified proximity is crucial during analy- and Black-female (B-female)
sis. The proximity centred analysis can be achieved if local Suspect information White-male (W-male)
stations are empowered to effectively analyse data from their White-female (W-female)
region or suburb. Coloured-male (C-male)
Coloured-female (C-female)
Lured
2.3. Paucity of Research in Crime Series Identification in Method of victim capture Kidnapped
Developing Nations Weapon
Deceit
Over the past decade, there have been a significant research Incident location/day/time Time and location information
Substance abuse suspected traces of substance (drug)
effort on crime mining, for example, in the area of hotspots
abuse:
and spatio-temporal related research [6],[7],[8], but there is
(yes, no, unsure)
a paucity of research in crime series identification particu-
Suspect disguised (Masked) (yes, no, unsure)
larly in developing nations [9]. Moreover, while research
on crime series identification seems to be gaining attention
by researchers in the advanced part of the world such as the 3.2. Problem Definition and Analysis
USA [10],[11], its exploration in developing nations is in-
significant, despite its critical importance for public safety The proposition in this study is that most crime patterns ex-
improvement in a smart city development. hibit at least a k minimum principal set that characterise the
MO of the offender(s) behaviour. This minimum principal
Crime series analysis focuses on crimes thought to have been
set induces a similarity graph of crime objects and has the
committed by the same individual or offenders, and may not
capability to reveal specific and general crime trends. To
necessarily happen at hotspot locations [4]. Experience has
identify crime series in a (rape) crime database, a hybrid
shown that many crimes are due to repeat offenders [10],[11],
model called CriClust, which combines similarity concepts,
[12]. However, our findings reveal that the crime intelligence
geometric projection, and graph connectivity (highly con-
unit in most of the developing nations (e.g., South Africa) do
nected subgraphs), was adopted. CriClust is augmented with
not currently have an automated means of identifying these
a dual threshold scheme. Firstly, a crime similarity function
similar attributes or incidents. Hence this research focuses on
was derived which is used to connect crime instances that
the development of a crime series mining model, CriClust,
share related attribute information, based on the dual thresh-
augmented with a dual-threshold scheme, which applies es-
old scheme. The similar objects are then modelled into a
tablished theoretical concepts from clustering (highly con-
graphical structure, to learn a similarity graph that is based on
nected sub-graph and similarity ranking) [13] to derive use-
established graph-theoretic model which is then partitioned
ful evidence to security agencies as a way to improve public
into highly connected sub-graphs of related crimes [13].
safety outcomes in developing nations.
Let C be a set of crime items or objects, where each crime
object, say C i ∈ C, is defined by a set of attributes A(C i ),
3. CRICLUST MODEL FORMULATION with cardinality F. Our interest lies in crime objects that ex-
hibit a coherent pattern on a subset of attributes of A. This
requires understanding the different characteristics of a data
3.1. Data Used: Rape Database
set and prioritising features that will promote the goal of the
This work serves to assist in identifying CSP in a rape data, analysis. The measure used in this work identifies similarity
however it can be extended to other forms of crime. The attribute between crimes C i and C j based on two important
motivation for considering rape crime is the fact that despite thresholds S and P, for sufficiently high (strict) coherence;
the heightened sensitivity and understanding about sexual as- where S is the interest similarity support measure (signifi-
sault and violence, South African communities happen to be cance threshold), and P is the prevalence support threshold.
a place where rape, assault and murder of people (and partic- Therefore, the following definitions follow:
4
ularly women and children) is of great concern .
Definition 1. (Instance Feature (IF)) Consider a crime
Table 1 presents a description of some features and subjects C i ∈ C, and a feature f. Let P f (C i ) be the value of the
considered in this research. The prefix on gender informa- f (th) feature in C i . For example, if the crime C 2 occurs on a
tion (e.g., I-male, B-female) represents the different racial Monday, then P day (C 2 ) = Monday.
population categories in SA.
We define a binary feature similarity function S f using the
4 http://rapecrisis.org.za/ Kronecker delta function, where S f (c i , c j ) takes on values in
– 91 –