Keeping Our Buildings Safe – Predicting Inspection Results

December 4, 2020 by Bonumic R&D Team

Share to  

Facebook

Twitter

Linkedin

Get link

Share to  

Facebook

Twitter

Linkedin

Get link

in San Francisco.

Introducing Machine Learning to the building inspection process could improve the quality of life of many tenants, reduce the burden on City Inspectors and even prevent tragic events.

Many cities around the world face the growing issue of ensuring that their buildings are being maintained to the required standards translating into safe environments for their inhabitants. The difficulty is that the number of buildings has expanded such that cities do not have the necessary resources to inspect every building on the suggested schedule.

In San Francisco, where there are strict building codes and regulations, there are over 200,000 residential and commercial buildings requiring inspections. This level of inspection is a mammoth undertaking for the Department of Building Inspection; the agency is tasked with carrying out the inspections. The result is that the agency has to rank building inspections in order of priority. This practice is not only too time consuming but potentially fraught with errors.

Our goal was to investigate how machine learning can be incorporated into this process to help predict which buildings will pass and which buildings will fail inspection. These predictions could significantly enhance the efficiency of building inspections, reducing the pressure on stretched resources.

Additionally, if the machine learning was successful others such as the insurance industry may find this a useful application for their requirements. San Francisco, with its open data policy made for an excellent city to carry out this investigation as data on past violations of the codes, is readily available.

Lack of Resources

The story is the same in different cities and different countries. Many cities are months behind on inspections. There are often endless calls regarding safety concerns that require follow up.

There are examples of tragic losses that can highlight the critical significance of timely inspections. Within California, the 2016 Oakland “Ghost Ship” fire claimed thirty-six lives. The warehouse, since described as a “fire trap”, was illegally converted to living quarters and had at least nine violations of the fire code among other violations. After the tragedy, Oakland City leaders admitted their “building inspections process was in dire need of an overhaul”.

In 2019, a pedestrian died in New York City, when a piece of Terra Cotta fell from the façade of the office building. The death highlighted the immediate need for inspection of over 1330 buildings that had been reported as unsafe. Twelve new inspectors were hired to manage the inspection of facades of more than 14,500 buildings with ageing facades and more than six stories.

Further afield, the 2017 fire at Grenfell Towers in the UK claimed seventy-two lives and injured an additional seventy-four residents. The fire was attributed to a horrendous failure by the regulators and the inspectors. One ignition source and the existence of at least seven dangerous conditions were blamed for the catastrophic event. The disaster prompted emergency inspections of over 4,000 high rise apartment buildings in the United Kingdom.

Virtually everywhere, limited resources are causing a breakdown in managing inspections.

What does a Building Inspection Entail?

A typical building inspection considers a significant number of features of a property to check for violations of the building code and safety regulations. For each feature, the inspector is judging whether or not there are any violations of the recommended code. Generally, if the inspector observes a violation, he will create a document to outline the nature of the violation.

Classification of building violations may vary by the location, but broad categories may look like the following:

Generally, if the inspector sees a violation, it will be classified, and then the inspector will provide further details about the specific violation. The types of violations can vary significantly in type and severity.

A few examples of violations may include:

How Can Machine Learning (ML) Help?

Machine learning for predictive modeling is not a new concept nor it is restricted to use in real estate. There is documented research concerning violations predictions in the restaurant industry and traffic safety.

We could use the existing data from years of building inspections to predict which buildings will fail inspection within a specific timeframe. This could lead to a dramatic reduction in the difficulty of prioritizing inspections. With machine learning, we would effectively be using artificial intelligence to develop a system to access all the available data and draw inferences from the patterns in the data. In essence, the system would be making informed guesses about the likelihood of violations.

Implementing these data-driven processes can empower city departments and help improve residents’ quality of life. This is the mission of Bonumic, an artificial intelligence (AI) and research project, striving to advance the Real Estate industry by leveraging on data and machine learning.

Building a Model to Represent Building Inspection Results

Our goal was to build a model to which various chosen details from the violation notices could be uploaded. The model could then predict what will occur over the next few months relevant to building inspections.

There are many aspects to building a successful model. Consideration had to be given most importantly to the algorithm or the “set of instructions” that would govern the model. There are other elements to consider:

Our scope was limited to actual violations noted in the City of San Francisco over 27 years between 1993 and 2020. There were more than 385,000 violation notices filed in that period and available through the open data policy of the city.

Challenges Encountered

Not surprisingly, we encountered some challenges regarding the data.

First of all, data contained only documented violations and did not take into account inspections that did not result in a violation. Cases in which there was no violation, were not recorded. Therefore, our modelling approach needed to be adjusted to this fact.

Moreover, there was a large number of parcels in the dataset which have multiple/numerous condos, and thus there was a decent number of cases with multiple occurrences of the same type of violation on the same parcel and on the exact same day. This required us to standardize and reshape the data before inserting it in the modelling algorithm. The standard format required each row to contain all the necessary information about all the violations for a specific parcel on a specific date.

Another consideration were the irregular time intervals in between violations. There is no regularity to the occurrence of building violations. Therefore, we needed to consider two main approaches:

Do we try to create a model using data with regularized time intervals,
or,
do we create a model using data with irregular time intervals?

We decided to test both approaches and find out which gives better results.

Our Approach

Similarly to many other Data Science problems out there, there were numerous approaches we could have taken for trying to get the best results. Reality is, in many cases best results are never achieved. For most problems, the solution space is endless and there is nowhere near the time to test all possible approaches.

We created two Machine Learning models that could provide useful information for City decision making:
First, we created a classification model, which would predicts whether the violation will be repeated in the following 120 days after it has already happened.
Secondly, we created a regression model, which would tell us how many violations might be expected during the next inspection.

The Essence of our Prediction Models

Our goal was to use Machine Learning to create a model that would learn a pattern of violations based on the time, frequency, quantity and violation types from previous occurrences of violations.
The critical variables of our models were the date of each violation and the category of violation. Most of our time on coding for this model was spent on “feature engineering” – a standard Machine Learning procedure which, if done correctly, usually makes the difference between average and excellent in terms of the success of the model. After feature engineering was finished, we ended up having over 200 predictive features for our model.
The critical variables of our models were the date of each violation and the category of violation. Most of our time on coding for this model was spent on “feature engineering” – a standard Machine Learning procedure which, if done correctly, usually makes the difference between average and excellent in terms of the success of the model. After feature engineering was finished, we ended up having over 200 predictive features for our model“

Conclusions on Predictive Models Created

Two types of models for 2 different modelling approaches were created. For both the regularized and “irregular data” approach we have created two models, one to predict whether or not a violation will be repeated within the following 6 months and the other to predict how many violations might occur in the next inspection. Approach using irregular data has proven to be more successful – classification model had accuracy of 0,81 and AUC of 0,78 (accuracy and AUC of 1 represent a perfect model), while the regression model had slightly lower performance.

As mentioned previously, the publicly available data on violations was utilized. However, among the data there was no information regarding how many inspections were completed without failing. We believe that if the complete set of information on inspections was obtained, which documented failed and passed inspections, the model could be improved. On top of that, there are many more ways in which the model can be significantly improved. Details regarding this and about the overall process of creating and evaluating the model can be found on the technical version, on our github page: https://github.com/flowandform/bonumic-ml/tree/master/CS2_modeling_city_violations.

Final Remarks: Predicting Building Violations in San Francisco

The strict code enforcements and building regulations demand that commercial and residential buildings in San Francisco are in the interest of public health and safety. However, the number of buildings scheduled for regular inspections and the complaints that have to be investigated far outstrip the human capacity available.

The implementation of more data-driven decision-making initiatives is now a necessity. Improving the quality of predictive intelligence can certainly optimize city resources allocation and improve the quality of life for city residents.


Share to  

Facebook

Twitter

Linkedin

Get link