This would be an interesting additional layer for google maps search which I often find to be lacking. For example, I was recently travelling in Gran Canaria and looking for places selling artesan coffee in the south (spoiler: only one in a hotel which took me almost half an hour to even find). Searching for things like "pourover" and "v60" is usually my go-to signal but unless the cafe mentions this in their description or its mentioned in reviews it's hard to find. I don't think they even index the text on the photos customers take (which will often include the coffee menu behind the cashier).
The pudding.cool article has a link labeled "View the map of “F*ck”" but it leads to a search for "fuck" instead. If you search for "F*ck", you find gems such as "F CK Lighting & Funiture" https://www.alltext.nyc/panorama/YO3cuHiCI4kqa6XRnnf94g?o=54... (It's an OCR error.)
GitHub of the person who prepared the data. I am curious how much compute was needed for NY. I would love to do it for my metro but I suspect it is way beyond my budget.
(The commenters below are right. It is the Maps API, not compute, that I should worry about. Using the free tier, it would have taken the author years to download all tiles. I wish I had their budget!)
The linked article mentions that they ingested 8 million panos - even if they're scraping the dynamic viewer that's $30k just in street view API fees (the static image API would probably be at least double that due to the low per-call resolution).
OCR I'd expect to be comparatively cheap, if you weren't in a hurry - a consumer GPU running PaddlePaddle server can do about 4 MP per second. If you spent a few grand on hardware that might work out to 3-6 months of processing, depending on the resolution per pano and size of your model.
Their write up (linked at top of page below main link, and in a comment) says:
> "media artist Yufeng Zhao fed millions of publicly-available panoramas from Google Street View into a computer program that transcribes text within the images (anyone can access these Street View images; you don’t even need a Google account!)."
Maybe they used multiple IPs / devices and didn't want to mention doing something technically naughty to get around Google's free limits, or maybe they somehow didn't hit a limit doing it as a single user? Either way, it doesn't sound like they had to pay if they only mention not needing an account.
(Or maybe they just thought people didn't need to know that they had to pay, and that readers would just want the free access to look up a few images, rather than a whole city's worth?)
The next step should be to create a Street-View-style website for navigating around New York City, where only the text is visible and everything else is left blank/white.
Reminds me of NY Cerebro, semantic search across New York City's hundreds of public street cameras: https://nycerebro.vercel.app/ (e.g. search for "scaffolding")
This is a super cool project. But it would be 10x cooler if they had generated CLIP or some other embeddings for the images, so you could search for text but also do semantic vector search like "people fighting", "cats and dogs, "red tesla", "clown", "child playing with dog", etc.
I feel like street-view data is surprisingly underused for geospatial intelligence.
With current-gen multimodal LLMs, you could very easily query and plot things like "broken windows," "houses with front-yard fences," "double-parked cars," "faded lane markers," etc. that are difficult to generally derive from other sources.
For any reasonably-sized area, I'd guess the largest bottleneck is actually the Maps API cost vs the LLM inference. And ideally we'd have better GIS products for doing this sort of analysis smoothly.
Yes. I work at a company that is using street view to identify high-rise apartments with dangerous cladding for the UK gov. Also could use it for grouping nearby properties which were clearly built together and share features. Helps spread known information about buildings. You can also get the models to predict age and sometimes even things like double-glazing.
I was trying for various graffiti slogans, turns out the anarchy "(A)" is basically the most difficult thing in the world to search for lol, other political ideologies much easier to find. It did amusingly lead me to search for just "anarchy" which led to 4 pages of bus ads for a show by the "Sons of Anarchy" guy.
EDIT: Lol, "communism" leads to 39 pages of Shen Yun billboards.
A game: find an English word with the fewest hits. (It must have at least one hit that is not an OCR error, but such errors do still count towards your score. Only spend a couple of minutes.) My best is "scintillating" : 3.
At first glance, there's plenty of grog to be had in NYC. But sailors will be disappointed. It all seems to be OCR errors of "Groceries" or the "Google" watermarks.
The word search for "fart" shows the tool's limits. No entry I saw actually said the word fart, but was listed as doing so -- "fart nawor" (hearts around the world irl), the penny farting (the penny farthing irl), etc.
BNE is an anonymous graffiti artist known for stickers that read "BNE" or "BNE was here". The artist has left their mark in countries throughout the world, including the United States, Canada, Asia, Romania, Australia, Europe, and South America. "His accent and knowledge of local artists suggest he is from New York."
Mamdani is just one dude's gynecology clinic. I wonder when the data was pulled?
edit: I found mentions of Gaza bombings and there's cars with like #gaza on it so my guess is sometime in the last 2 years.
I could of course look it up but this is a game now for me, like when I found a hella old atlas in a library and tried to figure out the date it was published just by looking at the maps.
Gosh! Maybe one of these days someone will take time off from this cultural wonderment to construct a simple, easy to use, text-to-audio.file program - you know, install, paste in some text, convert, start-up a player - so that the blind can listen to texts that aren't recorded in audiobooks. Without a CS degree.
I think the issue is the compute power needed for good voice models is far from free just in hardware and electricity, so any good text to audio solution likely needs to cost some money. Wiring up Google vertex AI text to speech or the aws equivalent is probably something chat gpt could walk most people through even without a CS degree, a simple python script you could authenticate from a terminal command, and would maybe cost a couple bucks for personal usage
A service you can pay for of that simplicity probably doesn’t exist because there are other tools that integrate better with how the blind interact with computers, I doubt it’s copy and pasting text, and those tools are likely more robust albeit expensive
https://github.com/yz3440
(The commenters below are right. It is the Maps API, not compute, that I should worry about. Using the free tier, it would have taken the author years to download all tiles. I wish I had their budget!)
It's the Google Maps API costs that will sink your project if you can't get them waived as art:
https://mapsplatform.google.com/pricing/
Not sure how many panoramas there are in New York or your metro, but if it's over the free tier you're talking thousands of dollars.
OCR I'd expect to be comparatively cheap, if you weren't in a hurry - a consumer GPU running PaddlePaddle server can do about 4 MP per second. If you spent a few grand on hardware that might work out to 3-6 months of processing, depending on the resolution per pano and size of your model.
> "media artist Yufeng Zhao fed millions of publicly-available panoramas from Google Street View into a computer program that transcribes text within the images (anyone can access these Street View images; you don’t even need a Google account!)."
Maybe they used multiple IPs / devices and didn't want to mention doing something technically naughty to get around Google's free limits, or maybe they somehow didn't hit a limit doing it as a single user? Either way, it doesn't sound like they had to pay if they only mention not needing an account.
(Or maybe they just thought people didn't need to know that they had to pay, and that readers would just want the free access to look up a few images, rather than a whole city's worth?)
I'm wondering about more the data - did they use Google's API or work with Google to use the data?
All Text in NYC - https://news.ycombinator.com/item?id=42367029 - Dec 2024 (4 comments)
All text in Brooklyn - https://news.ycombinator.com/item?id=41344245 - Aug 2024 (50 comments)
https://en.wikipedia.org/wiki/SAMO
But difficult to figure out if any of them are original.
I liked this one, but it is most likely newer. It is on top of the City-as-school building where Basquiat attended, so it is probably a tribute.
https://www.alltext.nyc/panorama/DZz7Gp1PtROe78ailUpvlA?o=11...
With current-gen multimodal LLMs, you could very easily query and plot things like "broken windows," "houses with front-yard fences," "double-parked cars," "faded lane markers," etc. that are difficult to generally derive from other sources.
For any reasonably-sized area, I'd guess the largest bottleneck is actually the Maps API cost vs the LLM inference. And ideally we'd have better GIS products for doing this sort of analysis smoothly.
New York is consistently rated alongside Naples as having the best pizza in the world.
Instead shows me thousands of “Rev“
EDIT: Lol, "communism" leads to 39 pages of Shen Yun billboards.
A game: find an English word with the fewest hits. (It must have at least one hit that is not an OCR error, but such errors do still count towards your score. Only spend a couple of minutes.) My best is "scintillating" : 3.
https://www.alltext.nyc/search?q=Calisthenics
https://www.alltext.nyc/search?q=perplexed
IIRC he found a way to download streetview images without paying, and used the OCR built-in to macOS (which is really good).
BNE is an anonymous graffiti artist known for stickers that read "BNE" or "BNE was here". The artist has left their mark in countries throughout the world, including the United States, Canada, Asia, Romania, Australia, Europe, and South America. "His accent and knowledge of local artists suggest he is from New York."
https://www.alltext.nyc/search?q=Sex
Again, a complex problem and I love it...
edit: I found mentions of Gaza bombings and there's cars with like #gaza on it so my guess is sometime in the last 2 years.
I could of course look it up but this is a game now for me, like when I found a hella old atlas in a library and tried to figure out the date it was published just by looking at the maps.
A service you can pay for of that simplicity probably doesn’t exist because there are other tools that integrate better with how the blind interact with computers, I doubt it’s copy and pasting text, and those tools are likely more robust albeit expensive