This year’s judges for ENR’s photo contest were shadowed by a machine-learning, artificial-intelligence system being developed to flag safety hazards in construction photos and video.
A year ago, Smartvid.io Inc., the developer, began marketing a product designed to tag and index frames in videos uploaded to its system. The tags are based on image content and prompted by cues from audio narration during the shooting.
For example, an inspector describes a location and feature while shooting video, and machine-learning software turns that association into a tag that it applies to similar features. Hours of video can be searched quickly to show clips with those features.
But the deeper ambition of Josh Kanner, the company’s founder and CEO, is to harness machine learning to create a system that can be taught to find patterns in the pixels of images, matching them to a library of objects and automatically tagging specific features without prompts.
Kanner’s team has been developing software, which it calls VINNIE (“Very Intelligent Neural Network for Insight and Evaluation”), by first “teaching” it to recognize whenever people are present in jobsite images, then, for a first application, to determine whether they are wearing safety colors and hardhats.
“To say ‘missing hardhat,’ first you have to determine ‘if people,’ then do a much harder logic problem to search for ‘show me all the places where you see a person who is missing a hardhat,’ ” says Kanner.
“VINNIE is a long way away from being a safety expert, but what we want it to do is flag an image that has potential risk and raise it to our attention, so humans can do a better job and get broader coverage. It’s the second pair of eyes that never sleeps and always looks to call things to your attention,” Kanner says.
Training and testing VINNIE requires exposing it to thousands of construction images. A number of industry firms, as well as ENR, are contributing construction images for training VINNIE, with the goal of helping to develop a potentially valuable new tool, but also with the understanding that a basic level of VINNIE’s hazard-spotting intelligence will become a free, public utility.
Confirms Kanner, “Indeed, sometime in 2017, we will have a ‘freemium’ version that will offer unlimited-duration, limited-features of the product. In the limited-features bucket can be safety classifiers, so that the functionality is available more broadly. It would, in essence, be like a public utility.”
For the training, ENR contributed thousands of unidentified images from its photo-contest database. Image collections also are being shared by Suffolk Construction, Mortenson Construction, Skanska USA Building, Rogers-O’Brien Construction and other firms that decline to be named.
“We are extremely excited about what Josh is working on,” says Todd Wynne, construction technology manager at Rogers-O’Brien. “It’s like having a smart assistant—that intern that never sleeps and can continually index all the images for you so that, when you come into the office in the morning, you know of specific, actionable items that need to be addressed.”
As Wynne points out, safety managers can’t be everywhere on a project, but with people taking more jobsite images and many more being captured by drones and jobcams, the time is right to leverage the assets to find issues and insights. “What we want to get to is to turn images into not just actionable tasks but automated actions. So, if a drone sees two people aren’t wearing hardhats, it can auto-create a task or issue a report and send it to the safety director,” says Wynne.
VINNIE’s technology works on the same principles that enable autonomous vehicles to detect people and vehicles in real time, allowing the vehicle to adjust its driving. Kanner says the software first “boxes” a space that appears to contain a defined object type, such as a human, and then runs an analysis on that area to refine the object and test for other qualities, such as the presence of hardhats or safety colors.
Those are only the first safety- and hazard-related qualities Smartvid.io is training VINNIE to recognize, but the potential conditions it could search for are virtually unlimited.
“I would love for it to tell me when it sees a ladder,” says Wynne, explaining that some sites only allow lifts. “I would like to get it to spot a ladder so I could get [the ladder] off the site. I would like it to tell me if people aren’t wearing personal protective devices or if they are not tied off.”
Wynne says infractions happen frequently but can be addressed only when seen. “If the software could see it just by looking at our images and alert us, we would love that,” he says. “This is one of those technologies where I can’t even imagine all the ways we can use it until we have it. I want it, and I want it yesterday!”
Taylor Cupp, a project solutions technologist at Mortenson, says, “What’s really interesting to us is the automation and analysis of massive amounts of images and data—something we are already capturing—for the sake of safety. I really like that this first big test was around safety, and I like that they are getting all these companies to contribute images for the training. It’s an interesting approach.”
However, after Mortenson tested the Smartvid.io image-indexing service, it told the vendor, “It’s great, but we can’t throw more tools at our project team,” Cupp adds. The capability needs to be integrated as an enhancement to a system in use and already collecting images. “We told them they need to integrate with [Autodesk’s] BIM 360 Field, Procore and Kahua, among about a dozen other project management systems,” he says.
Integration is happening, Kanner says. Images loaded into Procore now have the image-indexing tool, which includes some safety tags. More safety capabilities will be enabled in early 2017, he adds. Integration with Autodesk’s BIM 360 Field is expected by the end of Q2, as well.
“A couple of years ago, you would have said, ‘That’s way out there.’ But we’ve come a long way,” says Cupp. “We’re close.”
Using ENR’s 2016 Year in Construction Photo Contest submissions, VINNIE processed 1,080 images in less than 10 minutes, compared to a Smartvid.io human team tasked with finding people and risks that took 4.5 hours. VINNIE identified 446 images with people—32 images of which were missed by the humans. It flagged 32 images with people missing hardhats and 106 with people not wearing safety colors. But VINNIE’s prediction of 30 images as likely contest winners, based on its comparison with winners of years past, however, needs improvement. Only three were correct.