# I. Introduction ources of crime data grew out of the work of the sociologist Émile Durkheim in the 1897s when suicide rates across different populations were considered as a quantitative data. Sources of crime data changed massively during the 20 th century. In the 1915s, the recorded convictions, environment and social experiences were used as statistics to generate a hypothesis for a study or to test hypotheses related to the proneness to criminal behaviour. Since the 1950s, criminology saw the raise of many attempts to measure crime, also in a quantitative context, mainly by British criminology due to the large number of social scientists that developed criminology theories (Dantzker and Hunter, 2000). With the development of data collection methods and analytical methods, many of the old sources and measures have been modified or have continued to be used in one form or another up to the present day. While examiners of quantitative criminology have proposed many sources of measuring crimes over the past years, never before has a large-scale analysis and evaluation of these data sources been conducted to determine which are most useful for measuring crime. Such an evaluation should have done a long time ago: if we are to know, for example, crime levels as to whether crime is increasing or decreasing, then we must use accurate crime data source to adequately draw firm conclusions. The aim of this study is thus to analyse and evaluate the four most commonly available sources of crime data, in order to determine the best source that can tell the whole truth about the extent of crime in a society. In addition, based on the results of these tests, a more comprehensive approach to measure crime is proposed, which represents all categories of crime and covers the offences committed. This paper is organized as follows: the next section discusses the various sources of crime data typically used in quantitative criminology. Section three presents and describes the data, as well as the three analytical methods that will be used. Analytical test results and their interpretation are included in section four while the conclusions drawn by this study are discussed in section five. # a) Major Sources of Crime Data A variety of data sources to measure crime have evolved over the years. Each source has different strengths and limitations. The most frequently cited data sources are those collected from official/national crime statistics: official documentation by government and quasi-government agencies. What follows is a variety of these data sources, and it is useful to define each one of these sources and consider briefly the respective advantages and disadvantages of each source. # i. Police Crimes Records (PCR) It is also known as Crime Related Statistics (CRS) or Police Crime Statistics (PCS). However, whatever name it is given, this source records all the crimes (felonies, misdemeanors, infractions) detected by the police or reported to them. More specifically, police records often include any person(s) of the society who committed a crime or crimes cleared by arrest. The main advantage of this data source is that it provides a government with a summarized account of the crime information obtained regionally and nationally by identifying trends in illegal behavior and patterns of disadvantage of PCR is that unless a crime has been reported to the police and classified as a criminal act or an offence it will not be recorded. For example, sexual assaults or sexual offences are not always (immediately) reported to the police or unrecorded (i.e. reported to the police but not recorded as an offence), or, as in some cases, are reported long after the incident has committed. Also, there are times when the victims are more willing to report an incident or a crime to the police and, conversely, when the victims are less willing to do it. Another disadvantage of PCR is that victimless crimes (e.g. prostitution, public orders, etc) and all minor crimes are also excluded from being recorded, not to mention that most offending activities do not always result in an arrest. For example, incidents of assault between people who know each other are less likely to be reported to the police or recorded by the police (considered private matter) than incidents of assault between two strangers or incidents of assault with a weapon or a sharp instrument or injury. # ii. Victim Surveys (VS) This source of data aims to record crimes that have not been recorded by the police or have not been reported to the authorities and this way to show the so called 'dark figure' or 'grey figure' of crimes occurring in a society. However, this source is usually done through surveys and interviews with various members of the public. Victim surveys can be conducted at home, by visiting door to door or over the phone. Asking peoples (individuals, households, members of neighborhood, etc)what crimes they have been a victim of or if they have been victims of crimes is a good way to measure crimes and let peoples speak about their attitudes toward police and concerns about crime. The primary advantage of this data source is that it can help in the analysis of reporting behaviour and also can identify the factors that affect reporting decisions. It is often suggested that this data source gives an indication about patterns of crime within society and in particular crimes committed against different sociological and minority groups (e.g. in cases where a range of varied people is involved). An additional advantage is that this data gives an indication of crimes that may not be otherwise reported or considered as a criminal act. One of the main weaknesses of this data is that it records incidents and actions that the police might consider as not criminal since this increases the tendency to make some types of crime over-reported or exaggerated. Being dependent on an individual's honesty and personal understanding of how he/she has been affected or the effect of crime, the reliability of victim surveys is questionable: individuals may provide exaggerated responses or false information. Another disadvantage is that victim surveys account only for crimes that are committed by individuals, i.e. commercial or corporate crimes are not recorded. # iii. Offender Surveys (OS) Surveys of offenders are used just like victimization surveys, but these are for the offenders. The surveys often ask what crime or how many crimes the offender has committed. The main advantage of this data source is that it detects some victimless crimes that have escaped from the police attention such as illegal drug use, prostitution, public order and delinquency crimes, as well as rarely reported crimes such as shoplifting, offender surveys. However, offender surveys have potential for bias. It is often recognized that these surveys reflect the biases and personal career objectives of those involved in reporting crimes. For example, there is a tendency sometimes to under-report more serious crimes (e.g. sexual offences) or to remove the suspects (who are likely to have been detected and convicted) for some serious offences from the sampling frame. # iv. Self-Report Studies (SRS) Like surveys of victims and offenders, this data source asks particular groups or a sample of people as to whether they have themselves committed a crime in a particular period of time. This measure is helpful especially in revealing much about crimes that are victimless and those less observed, and also in identifying hidden offenders who are not caught or detected by the police. In particular this data source makes it possible to find out about the social characteristics of offenders such as ages, gender, social class, and even their location. Besides these advantages, this data source has also a lot of disadvantages. This data source doesn't make good use of a representative sample of a society. Many or most self-report studies are often on simple crimes and young people and students, asking them about their involvement in criminality and law breaking. There are no such studies on professional criminals or drug traffickers for example. Another disadvantage is that this data depends on the honesty of those being surveyed. That is, respondents may lie or exaggerate about their criminal behaviour and, even if they do not deliberately seek to mislead, they may simply be mistaken about their criminal history. # v. Court Records (CR) This data source records all the convictions for criminal offences. It provide accurate information about how many offenders are heard by a court and tried or imprisoned for reported crimes or offences, and what crimes they were convicted of. This data source also provides statistics on type and volume of cases that are received and processed through the criminal court Quantitative Criminology: An Evaluation of Sources of Crime Data system of a country. However, some believe that one disadvantage of court records is that it underestimate the true extent of crime. That is, after the police identify and arrest a suspect, a relevant court may decide that there is insufficient evidence to mount a prosecution. Another disadvantage is that a jury may not be convinced by the prosecution's case. A further disadvantage is that in cases where a single incident has multiple offences (e.g. burglary and rape) the offenders are tried and convicted of only one offence they have actually committed (i.e. the most serious crime), and in cases where one or more offences committed by the same person the offenders are tried and convicted of a few of many offences they have actually committed. # vi. Prison Records(PR) Prison records or statistics provides accurate information about the total number of offenders or how many offenders are actually entered prisons to serve ordered sentences and the types of crimes they have committed. The major advantage of this data source is that it shows the relationship between prison numbers and levels and types of crimes, and thereby reveals scope for community solutions to prevent or reduce crime. Another major advantage of prison statistics is that it provides important information relating to prisoners' general categorization, such as ethnicity, gender, religion, sexuality or disability, and prisoners' group types or categories, such as imprisoned juveniles, elderly prisoners, foreign prisoners, minority ethnic prisoners, with statistics for the main types of crimes they have committed. In addition to these advantages, prison statistics provides statistics and information on the criminal justice system such as prisoner re-offending and ex-offenders, prison rehabilitation and education, budgets and costs, staffing, violence, mental health, drugs and alcohol. Like most things, prison statistics suffers from specific disadvantages related to sentencing policies that may be politically determined. If a government decides on a series of sever measures to restrict, for example, burglaries, theft or drug crimes, then this might translate into sever sentencing policies, which result in more people being imprisoned for those offences, even if the actual rate of offending has not really changed. # vii. Observation and Reports (OR) Crimes are usually detected in two ways: observation and reports by other people. Observation is used to measure crimes when some crimes such as traffic offences and victimless crimes are observed directly by the police. Reports by other people (e.g. households, individuals, neighbourhoods, etc) are also used to measure crimes when someone goes to the police and informs of crime that either he/she observed it or someone else told him/her about it. If we rely on the observation or reports by other people as methods or ways to detect or inform the police of crime, we would find that many crimes will not be well measured. This source of data is far from being the most efficient way to provide information about the actual crime rate in a society. For example, shoplifting or drug use. There are many cases where shoplifting, theft, or drug use will neither be observed by the police nor reported by other people. Therefore, crimes like shoplifting, drug possession and sales, etc. will not be accurately measured. In summary, the forgoing discussion shows that there is a wide range of available data sources used to measure different categories of crimes and provide statistics on each type, which may be useful for different purposes. It also shows that no single source has a complete advantage over the others; rather it shows that these data sources might be complementary and could be used alongside each other. Each data source has strengths and weaknesses and each provides different information on the nature and extent of crime in a society. Thus a study attempts to address (particular) questions or solve (particular) problems through the analysis of data sources of crime statistics should use one or two or as many data sources as are relevant to a particular research aim. Figures for crimes that are uncautioned, untried or unsentenced were excluded. These data sources are used by central and local government and police service for planning and monitoring service delivery and for resources allocation. They are also used to inform public debate about crime and the public policy response to it. These crimes are shown in Table / # II. # Data and Methods # iii. Data representation In the current application, in order to conduct a fair analysis and comparison of the most commonly used data sources of crime statistics it is necessary that each type of crime be inserted into the same analytical methods and tested using the thirty-six types of crimes listed in Table/1 above. To do so, vector space model (VSM) was used to represent each data source mathematically, that is, each data source was a statistical vector profile with the same (types of crime) information. After each data source was mathematically represented in a vector profile, the associated set of vectors stored together as a matrix row vector, in which the rows are the data sources and the columns the types of crimes. That is, the current data is represented as a 12 x 36 data matrix D in which D i (for i=1..m) is the i'th crime measure, D j (for j=1..n) is the j'th crime, and D ij the value of crime j for measure i. # b) The Methods The field of quantitative criminology is fundamentally a 20 th century movement with the appearance and major advances in computing technology occurring during and immediately after World War II. What began with an emphasis on suicide rates across different populations gradually became focused on the methodological and statistical tools that have led to rapid increase of methodological and statistical tools, and as a result quantitative criminology has developed rapidly. In brief, the field of quantitative criminology now regularly employs statistical univariate methods and statistical bivariate methods (e.s. Boba, 2012). The statistical univariate methods measure only a single variable, for example, frequency distributions or graphical representation of murder. Common univariate to examine crimes in terms of a single variable and the results derived from them are therefore described as a simple form of statistical analysis. The statistical bivariate methods measure relationships between two variables, for example, murder rate and burglary rate, or violent crime and total average income. Common bivariate methods are linear regression, measure of association, T-test, Pearson's correlation. This study does not, however, use statistical methods because the analysis of the relevant data is not statistical. The reasoning which led to the decision not to take a statistical approach is as follows. The position adopted here is that each data source of crime statistic consists of various types of crime that have values and these sources can't be described by a single or even two descriptive crimes, and that simultaneous analysis of numerous crimes is required to create a more accurate analysis to evaluate or explain the different measures of crime. Each measure of crime is a combination or more or less numerous crimes, but univariate analysis permits investigation of only one characteristic of a crime at a time, bivariate analysis permits only two, and results for different characteristics are not always or even usually compatible, and the consequence is unclear overall results. This means that univariate and bivariate statistical methods are insufficient for present purposes, and that, if statistical methods are to be used, a multivariate methodology is required. The main class of multivariate statistical methods is multivariate regression, which investigates the relationship between more or less numerous independent variables and one or more dependent ones. At an early stage of the research reported here, however, it became clear that selection of sets of independent and dependent variables was problematic: which variables should be independent, which dependent, and why should the sets, once selected, have an independent-dependent relationship? There may well be answers to these questions, but the decision was taken to abandon multivariate regression and to use an entirely different class of methods. In principle, after all, to decide on the best measures that can give a clear picture about the extent of crime requires only an evidence to be identified; that evidence does not have to be statistical in the sense of having been derived from regression analysis. For this study, cluster analysis was used. Cluster analysis divides data into clusters based on information found in them that describes the data and its relationship. The data items within cluster are similar or related to one another (since they share common characteristics) and different from or unrelated to the data items in other clusters (since they do not share common characteristics). There is a large number of cluster analysis methods and a large literature associated with each. An extensive range of these methods is discussed and covered in (e.g. Moisl, 2015; Everitt et al. 2001). The methods used here were Agglomerative Hierarchical Clustering (AHC), Principal Components Analysis (PCA), and U-matrix Self-Organizing Map (SOM). The rationale for using these methods is that it is often recognized that that a single class of methods cannot safely be relied on, and that at least one additional method or class of methods must be used to corroborate the results from hierarchical analysis: (i) AHC is based on preservation of distance relations in data space, ii) PCA is a non-hierarchical method based on preservation of data variance, and iii) U-matrix SOM is a nonlinear method based on preservation of data topology. # i. Agglomerative Hierarchical Cluster Analysis (AHCA) Hierarchical clustering is characterized by atreelike structure called a cluster hierarchy ordendrogram. Most hierarchical methods fall into acategory called agglomerative clustering. In this category, clusters are consecutively formed from vectors on the basis of the smallest distance measure ofall the pairwise distance between the vectors. LetX={x1, x2, x3,?,xn} be the set of vectors. We begin with each vector representing an individual cluster. We then sequentially merge these clusters according to their similarity. First, we search for the two most similar clusters, that is, those with the nearest distance between them and merge them to form a new cluster in the dendrogram or hierarchy. In the next step, we merge another pair of clusters and link it to a higher level of the hierarchy, and so on until all the vectors are in one cluster. This allows a hierarchy of clusters to be constructed from the left to right or the bottom to top. The proximity between two vector profiles is calculated as the Euclidean distance between the two profiles taken on by the two vectors. Euclidean distance is the actual geometric distance between vectors in the space and Euclidean distance is the square root of the sum of the squared differences in the variables' values. This is expressed by the function: ?? ???????????? (????)= ? ??? ????? ?? ? 2 + ??? ????? ?? ? 2 AHCA is not one specific method but a family of related methods, often minor variants of each other, and it can seem difficult to select an appropriate method for a particular study since all of them operate in a similar way but their calculation (i.e. how distance between clusters is measured) is different. Four AHCA methods based on Sq. Euclidean distance were selected for the analyses that follow: single linkage, complete linkag e, average linkage, and Ward method, the aim of which was to examine and differentiate the four data sources at an individual rather than group level with the aid of 21 types of crimes. matrix of 12 data sources, where D described by 36 crimes, principal component analysis re-described the 12 data sources in terms of a number of crimes, such that most of the variability in the original variables was retained. This allowed us to plot the 12 data sources in two-dimensional space and to directly perceive the resulting clusters. The principal components analysis was in a four-stage procedure. The first step was the construction of a symmetric proximity matrix for distances among vectors. The second was the construction of an orthogonal basis for the covariance matrix in such a way that each axis was the leastsquares best fit to one of the n directions of maximum of variation in D. The third was the selection of dimensions in which we removed the axes that had relatively little variation and kept an m-dimensional basis for D, where m