Last month at the Computer Vision and Pattern Recognition conference, Google showed off an algorithm capable removing watermarks from photos. Using neural networks, researchers in the company’s artificial intelligence lab could train an algorithm to identify recurring visual patterns in a watermark (the sans-serif type in Shutterstock’s logo, or the dense logomark of Adobe Stock, for instance) and automatically strip them from an image.
To anyone who produces or sells stock photography, this was troubling news—watermarks have since the early 1990s provided the first line of defense against the theft of unlicensed photos. Granted, anyone with serious Photoshop skills can eliminate a watermark in about an hour. But Google’s technology makes it possible for a computer to remove watermarks from hundreds of images in just minutes, essentially automating the wholesale theft of copyrighted images.
Google didn’t set out to undermine stock photo companies. It was simply doing yet more research into machine learning and how it might apply to images. In fact, the algorithm could one day help make your photos a little better. And the researchers gave the stock photo firms a heads-up, contacting them weeks before demonstrating the algorithm to explain the research and show how, as they outline in their paper, the firms might protect themselves. “They gave us sufficient time so we could mitigate the risk,” says Sultan Mahmood, director of engineering for content at Shutterstock, one of the web’s stock photo giants.
Mahmood and his team have spent the past month developing an algorithm capable of tricking Google’s technology by giving each of Shutterstock’s 150 million photos a unique watermark. To understand why that works, you must know what Google’s algorithm does.
The issue, as Google explains it, is that most stock photo companies apply the same watermark to their entire image library. Feed enough images—fewer than 1,000 in this case—into a neural network, and eventually the network discerns patterns within the watermark. It can identify, for example, gradients, opacity, and shadows, which means even the most complex geometries can be isolated.
Google’s algorithm can separate the foreground image (the watermark) from the background image (the photograph) and start removing it. “If a similar watermark is embedded in many images, the watermark becomes the signal in the collection and the images become the noise, and simple image operations can be used to pull out a rough estimation of the watermark pattern,” the researchers write.
After isolating the watermark, the algorithm can erase the overlay and fill in the blank pixels by extrapolating the surrounding image data.
Knowing this, Shutterstock created an algorithm that warped the company’s watermark logo in a unique way with each application to a photo. Before creating a composite of the watermark and the original image, the software slightly tweaks the shape of the letters in the logo, adding a little bulge here and a slightly sharper curve there. It also adds a contributor’s name to the watermark in order to increase the complexity of the text. “It’s really subtle if you see it with your naked eye, but if you look closely you can see they’re all different,” Mahmood says. These sight variations make it significantly harder for Google’s algorithm to correctly estimate where the watermark is in the photo. As a result, the algorithm is less likely to be able to remove the watermark without leaving a visual artifact behind.
Though companies like Shutterstock have incentive to protect themselves against technology that could put their content at risk, there’s no reason to panic. “It’s a very small threat,” says Zeke Koch, a senior director of product at Adobe Stock. For starters, Koch says, it’s rare for Adobe to come across widespread breaches in licensing. Plus, he adds, most of the images Adobe adorns with watermarks have resolutions too small to use widely, anyway. Still, Adobe, like Shutterstock, is increasing the opacity of its watermark and adding contributors’ names, which should make the watermarks impervious to Google’s algorithm for the time being.
To Koch, Google’s underlying technology is actually really promising. “There’s reason to be excited by technology like this in the future,” he says. This is the same computer-vision technology used in Google Photos to recognize and share photos with your significant other or to identify cancer cells in pathology slides. Taken in the context of Google’s other computer vision work, a tool like this could automatically remove window glare or other imperfections that crop up in photographs. “I would guess what they’re working on really is technology to separate images into layers,” he says. “The watermark is just an easy first step towards developing a more interesting algorithm.”