Google has unveiled a system that attempts to pinpoint the location of where a photograph was taken by analysing the image, as the internet group continues to experiment with advanced “machine learning” technologies.
Though at its early stages, the Californian company’s system is another example of how Silicon Valley groups are making giant strides in artificial intelligence, using the ability to crunch huge amounts of data and spot patterns to develop capabilities far beyond human brains.
尽管这套系统仍处于初级阶段,但它再次突显出硅谷(Silicon Valley)企业是如何在人工智能(AI)领域取得巨大进展的。人工智能是利用处理海量数据和从中辨识出模式的计算能力,来开发出远胜人类大脑的智能。
Google’s latest experiment attempts solve a task that most humans find difficult: looking at a picture at random and trying to work out where it was taken.
Humans are able to make rough guesses on where a shot has been taken based on clues in the picture, such as the type of trees in background and the architectural style of buildings. This task has proven beyond most computer systems.
This week, Tobias Weyand, a computer vision specialist at Google, unveiled a system called PlaNet, that is able to decipher where a photograph has been taken by analysing the pixels it contains.
本周,谷歌计算机视觉处理专家托拜厄斯•韦安德(Tobias Weyand)发布了这个名为PlaNet的系统。该系统可以通过分析照片中包含的像素来判断出拍摄地。
“We think PlaNet has an advantage over humans because it has seen many more places than any human can ever visit and has learnt subtle cues of different scenes that are even hard for a well-travelled human to distinguish,” Mr Weyand told MIT Technology Review, which first reported the news.
“我们认为PlaNet相对于人类拥有一个优势,它所见过的地方比任何一个人可能前往的地方都多得多,并且它掌握不同场景的细微线索,而即使是那些经常旅行的人也很难辨识出这些线索,”韦安德向《麻省理工科技评论》(MIT Technology Review)表示。这份杂志最先报道了这则消息。
His team divided the world into a grid containing 26,000 squares — each one representing a specific geographical area.
For every square, the scientists created a database of images derived from the internet that could be identified by their “geolocation” — the digital signatures that show where many photographs are taken. This database was made up of 126m images.
Using this information, the team would teach a neural network — a computer system modelled on how layers of neurons in the brain interact — to place each image to a specific place.
Mr Weyand’s team plugged 2.3m geotagged images from Flickr, the online photo library, to see whether the system could correctly determine their location.
Though this means it is far from perfect, this performance is far better than humans. According to the team’s findings, the “median human localisation error” — meaning the median distance from where a person guessed the location of a picture, to where it was actually taken — is 2,320.75km. PlaNet’s median localisation error is 1,131.7km.