Analyzing the Violence in Rio de Janeiro

Sebastiao Ferreira de Paula Neto
14 min readMar 4, 2021

--

Samba and violence, two words that come to mind when we talk about Rio de Janeiro.

Hello, my name is Sebastião and I am a data scientist. Through this article, we would like to explain an analysis of the state of Rio de Janeiro about its violence.

So let’s start by talking about him:

Rio de Janeiro, Brazil

Rio de Janeiro, the state capital, is one of South America’s megalopolises (an urbanized region consisting of several metropolises and conurbation metropolitan regions) and has the second-largest economy in Brazil. Its high density of industries, as well as cultural and geographical characteristics, make it one of the Brazilian cities with the highest tourist indices.

Figure 1: Christ the Redeemer, Rio de Janeiro’s main tourist attraction.

Its capital whose name is also Rio de Janeiro is affectionately called the “Cidade maravilhosa” due to numerous tourist sites. The best-known tourist site is the Christ the Redeemer, with the title of one of the 7 wonders of the world. Although this is one of the most requested by tourists when visiting the “Cidade maravilhosa”, the favelas have been gaining a lot of space by tourists according to the newspaper o globe.

There is a stereotype that the favelas were responsible for the reputation of a violent city. Although nowadays they are in demand as tourism, is it true that all this responsibility has some basis?

There is nothing better than to analyze history for answers.

History of the Favelas

At the beginning of the 20th century, the city of Rio de Janeiro underwent urban reformulations such as the reform of the central avenue that caused an internal migration of the population to the outskirts of the city and newly formed irregular constructions. This migration was the beginning, in Rio de Janeiro, of what we know today as Favelas. These places were seen as a region of crime, disease, and were not even recognized by society.

During the military regime, practices were instituted to vacate these areas, which were not successful, according to the state, and resulted in 139,000 inhabitants being removed from the favelas. This practice of the state not only provided a huge population swelling in other places but also made these and the former areas even more violet.

Without jobs and with a high density of people, drug trafficking centers emerged in these places as profitability and control, which favored that already violent places became even more violent. Urbanization and security practices were adopted, however, it is still possible to see these places even more violent.

According to data from Ipea (Institute for Applied Economic Research) in its project ”Atlas of Violence “ gathered data since 1980 and in its analysis shows Rio de Janeiro as the most violent place in the country.

his statistic occurred through what we call the War of the trafficking. All these characteristics carry particularities that can be interpreted as a violent city. All these characteristics have particularities that can be interpreted as a violent city.

Although the government has put in place practices to get around all these statistics, can you state with certainty that Rio de Janeiro is still one of the most violent cities in the country?

At least in doubt, you would be, and if you answered YES you would be wrong.

Based on the datasus (department of information technology of SUS) in conjunction with the department of national security that evaluates the number of homicides, Rio de Janeiro is sixth in homicide ranking behind Bahia, Minas Gerais, Pernambuco, São Paulo, and Pará. Moreover, since 2008 its numbers have only decreased.

While the city is considered violent, where is all this described violence to be found? This and other questions we want to answer with this notebook

Obtaining Data

The Institute of Public Security (ISP) of Rio de Janeiro collects the cases that occur in the state. These are available for open data access on its website.

The data is collected by means of Registros de Ocorrência (RO). For verification before being made available, the data are submitted to quality control where they are submitted to the Internal Corregedoria of the Civil Police (COINPOL).

With the data collected, it is still possible to check a dashboard, a form of interactive presentation with the user, which relates crimes with regions of the state. This can be accessed through Link.

When accessing the site it is possible to visualize several data sets and for this analysis, we will use the data set “Security statistics: monthly historical series in the state since 01/1991” which is presented in CSV format.

Data preprocessing

In this step, we will assess how the data is arranged and what should be done to prepare the data for analysis. Thus verifying if there are missing values and how to treat them and if the data has outliers. These are some of the analyses performed on the data, as well as others that will be covered.

Starting the data, we are faced with the first 5 entries:

we can go through some analyses, such as:

  • The data are distributed in occurrence count format, given month and year.
  • The counts are categorized by crime, indicators, and relevant information.
  • There are many missing values NaN, so we need to define how to work with them, how they are collected, or, if possible, even what information they can provide.
  • Another feature is the fact that each entry corresponds to 1 month of collected data.

Evaluating these missing values is of great importance for the analyses. In this way we will check when missing values are found in the whole database:

Thirty “features” show missing values. Since these are accounted values you might think:

“Just add zero to everything, since they are all missing.”

Or even:

“Use some way to remove rows or columns”

DON’T DO IT!!!

It would be very nice to assume that not all these values were accounted for, but it is not possible to access this value. So we can’t assume that it is zero, or even remove it. Since these values may simply not have been accounted for and some of these values represent a large part of the dataset.

In many cases, this would be very important for data management and analysis, but in this context, it is not possible because we can bias the analysis.

In this way, we have to work with the data we have. This means working with it from the date it was appended to the database. Analyzing the missing values for the years we arrive at the following information:

  • On average, the features below present non-added values until the year 1998: lesao_corp_morte, hom_por_interv_policia, hom_culposo, lesao_corp_culposa, roubo_celular, registro_ocorrencias, ameaca e sequestro.
  • On average the features below present non-added values until the year 2003: pol_civis_mortos_serv, pol_militares_mortos_serv, encontro_ossada, pessoas_desaparecidas, extorsao, sequestro_relampago, estelionato, furto_coletivo, furto_celular, roubo_conducao_saque e roubo_apos_saque.
  • On average the features below present non-added values until the year 2006: aaapai, cmp, cmba, apf, apreensao_drogas_sem_autor, posse_drogas e trafico_drogas.
  • On average the features below present non-added values until the year 2013: bicycle_theft and bicycle_theft.

If desired working with all variables we should analyze the data from the year 2014.

Exploring the data

Having checked the organization of the data we will now evaluate how the data are arranged. For this we will use the help of statistics, we will use the describe function of the panda's library that tells us the metric for each occurrence. Therefore, we have:

When analyzing the data metrics some of the features that present some values out of the averages are:

lesao_corp_culposa, roubo_em_coletivo, roubo_carga, roubo_veiculo, roubo_residencia,roubo_banco,cmba e extorsao.

To assess this further we will use the box diagram tool that makes use of the metrics to identify outliers contained in the set. Got confused?

Let’s have a brief explanation:

The tips of the chart represent the acceptance limit for the standard deviations found in the data and the tips of the box are the 25%( Q1 ) and 75%( Q3 ) percentiles. To calculate the outlier thresholds the following rule is used:

  • Identify the value of the interquartile range (FIQ), where FIQ=Q3-Q1 .
  • The lower outliers will be those that Outliers<Q1–1.5⋅FIQ ;
  • The upper outliers will be those that Outliers>Q3+1.5⋅FIQ ;

Therefore, applying the block diagram method we have:

Although there is the presence of outliers we need to verify what the real interference in the distribution is, for this, we checked the data. These were collected by counting each occurrence in the database. This way, they give us a piece of very relevant information: The data, although numerical, are robust to outliers, in this specific case, because there is a control in the database where there is a filtering of each case.

Similarity Analysis

To identify associations between occurrences a heat map was generated with the correlation indices.

Analysis of the correlation

To generate any relationship between data we first need to analyze the correlation coefficient. This is a good indicator of the association between two variables, i.e. whether they contain information together.

For this, we use a heat map that will relate a correlation matrix that evaluates the relationship of all variables with each other. On the main diagonal the value is maximum because it is the feature’s own relationship with itself. To evaluate the values we follow the following rules, where r is the correlation coefficient:

  1. Values of r lie between -1<r<1 ;
  2. r in the extremes means perfect correlation and changing only the direction;
  3. Scaling techniques do not change the values of r ;
  4. They do not work for non-linear cases;

The code above represents the way that the heatmap was generated, but with the number of variables it is a little difficult to visualize. To better visualize it, access the link.

In the database there are 3 types of crimes assigned in this data set:

  • Crimes against property
  • Life Crimes
  • Drug trafficking

Also, it was possible to verify an association between :

  • Life crimes and robberies.

So let’s do an isolated analysis on each of these characteristics and then see how they are related.

Analysis of the crimes of Robbery

First of all, it is worth remembering the context of the state of Rio de Janeiro, wherefrom the 1980s until 2008 was among the most violent states.

Does robbery follow this trend?

To evaluate this question we will plot a bar chart of each year relating the total number of robberies.

The data shows that robberies until 2008 follow an upward trend. As in 2008 public measures were adopted to control it is notorious the decline in robberies.

In 2017 and 2018 it can be observed that the state had the highest number of occurrences, reaching in 2018 the average annual number of 19301 occurrences.

Knowing what the distribution of thefts over the years is, I was interested in what the distribution of average thefts is like each month. To do this, let’s plot a bar chart to see how it distributes.

In this period, most of Brazil, especially Rio de Janeiro, is celebrating Carnaval. We can verify that a great social event of great magnitude influences and greatly increases the occurrences of robberies.

As a consequence of the increase caused by Carnival, the tendency of robberies continues until May. It can be visualized as the largest registry occurrences.

It was found that commercial robbery has a considerable association with juvenile apprehension. For this, we will adopt the temporal visualization tool of the cases to analyze the distribution of Juvenile Apprehension and Commercial Robbery.

we can say that commercial robberies have an indirect association with the apprehension of minors. This can be verified by looking at the trends in the graph.

With so many types of robberies, which is the most frequent in the state of Rio de Janeiro?

These robberies are street robberies and thefts from passersby, which is a common and common crime characterized by assaults of individuals who are approached while they are walking on public roads with the violent removal of belongings.

Analysis of crimes against life

When it comes to crimes against the life we are dealing with those that directly affect people (human beings). First, we will obtain information regarding the monthly distribution of crimes. To do this we will make a bar chart where we relate the average amount of crimes in a given month.

We can see that the highest occurrence is in March. When dealing with crimes against life, it is desirable to know the recurrence of the crimes, which are the temporal trends, as well as the associations. Since most crimes are composed of more than one occurrence.

Thinking of this, we will verify the associations between them resorting once more to correlation indices.

Evaluating the heat map we can visualize a strong relationship between the following occurrences:

  • felonious homicide and violent lethality;
  • grievous bodily injury and threats;
  • grievous bodily harm, threat, and missing persons;
  • rape, attempted murder, missing persons, and violent fatality;

Intentional homicide: When there is intent to kill another individual.

In many of these cases, the violence implicit in this crime is very high.

Does the lethality data follow the data of the occurrence of intentional homicide?

To verify this fact we will observe through the evolution of the data in the sample period so that we can verify the distribution of these data.

We can verify that even with different magnitudes the data follow almost the same dynamics. Thus, we can confirm that intentional homicides tend to have a high association with lethality violence.

Threat: a fact, action, gesture, or word that intimidates or frightens.

Threats in many cases can be substantiated or not, or even have related actions or not.

In this topic, we will evaluate if a threatening occurrence resulted from or followed trends from other occurrences.

When a threat occurrence is recorded there is a stereotype of fear of injury or even death. This is why the occurrence is made as a measure to contain such actions. To evaluate this bias, let us temporally evaluate the relationship between intentional bodily harm and threatening:

We can see that the distribution of the two occurrences follows a similar pattern and the two can be associated.

However, the dynamics for threatening do not follow the dynamics for the battery. Thus, the latter must suffer interference from the other occurrences together.

If we analyze the correlation indexes, great indicators of association, you can verify that the occurrence of threats is related to manslaughter and missing persons. To evaluate this association we will use the temporal distribution tool:

Based on the graph, we can verify that the occurrences of manslaughter and missing persons follow much of the trend of the distribution of threats.

Thus, we can conclude that the crimes of Bodily Harm, Manslaughter, and Missing Persons have direct interaction with the threatening crimes, as well as with each other.

Rape: a crime consisting of having sexual intercourse or performing libidinous acts without consent.

There are many formulations and even examples that link rape with other crimes. With this in mind, we will assess at this stage whether other occurrences are associated with rape.

In this topic, based on the correlation index calculated above. Based on the relative values. For this, we will first evaluate, through temporal evaluation, how the occurrences of attempted murder, missing persons, and violent lethality relate to rape.

We can say that the occurrences of Attempted Murder and Missing Persons directly influence the occurrences of rape. This can be seen both in the temporal distribution of the distributions as well as through the correlation indices.

Analysis of drug trafficking

Drug trafficking emerged with the premise of generating resources for those marginalized in the peripheral regions of Rio de Janeiro in the 1980s, as shown. In this way, we will analyze how this crime has been distributed over the years and how it has impacted people’s lives.

In this way, we will evaluate the total occurrences, based on the average over the entire sample space

We can see that trafficking continued to grow until 2015 when it reached its highest average number, around 4900 occurrences per year.

To analyze it in isolation, we will again resort to correlation indexes.

Through the correlation index, we can verify the association between:

  • drug seizure and (drug possession, drug trafficking, flagrant arrest, and underage arrest).
  • Other associations were found, but unfortunately, they did not meet with the associations, as well as in many other cases, to view go to Link.

Drug Seizure: Collection of narcotics in possession

This occurrence deals with the times that drugs were collected in some medium. When associated with the crimes recurrent to drug trafficking we can verify a high association between some occurrences.

To be able to check it through another bias, the visual one, we will check it through the temporal distribution of this occurrence and identify trends:

Analyzing the temporal distribution of the occurrences we can verify that (possession of drugs, drug trafficking, arrest in flagrante delicto, and arrest of a minor) generate the tendencies of drug seizure occurrences. Furthermore, we can say that they are directly influenced by each other since we have the support of the correlation index.

Thinking about this, let’s evaluate the average monthly occurrences of drug seizure occurrence. After all, since it is generated by trending all the other occurrences it will manage the average of these.

We can see that the month with the highest occurrences related to drug trafficking in August.

Conclusion

Therefore, we can say that Carnaval influences robberies in the state, where we can verify that the month with the highest average number of occurrences is March. When we relate minors and robberies, the greatest association is with commercial robberies.

When we deal with threats, we see a high association between assault, manslaughter, and missing persons.

The crime of rape is a distribution that is directly influenced by the crimes Murder with intent, Finding of bodies, Violent death, Missing persons, and Attempted murder, as they can also be associated.

When it comes to drug trafficking one can see a direct association with these occurrences of (possession of drugs, drug trafficking, arrest in flagrante delicto, and arrest of a minor) and drug seizures.

If you want to check out the full code, just go to my Github.

--

--

Sebastiao Ferreira de Paula Neto

Data engineer with a passion for data science, I write efficient code and optimize pipelines for successful analytics projects.