New Feature Alert: Access Archived Webpages Directly Through Google Search

(blog.archive.org)

91 points | by xnx 306 days ago

7 comments

BugsJustFindMe 306 days ago
> Google Search is now making it easier than ever to access the past.
I don't know about easier than ever. Easier than since google killed their own version of this that worked well until they killed it, sure.
[-]
- RajT88 305 days ago
  From what I recall the cached versions only had fairly recent versions. IA were always better for the archived versions.
  [-]
  - postalrat 305 days ago
    It's infuriating when there is text in the Google preview in search results that isn't in the linked document.
    My guess is that Google doesn't want to share the version of each page they get served to index.
    [-]
    - dietr1ch 304 days ago
      Well, no one can ensure that same text they found will be found again on a link. It can simply change since the last time you saw it.
      Now, building an index means that after you are done you can drop the data you had and use that space for getting newer data, so you might not even have it. It's probably smart to keep the data to help the crawler be less aggressive at downloading the whole internet though, but this data is globally replicated and ready to serve, might be just in 1.5+ hard disk drives that are not publicly reachable because they are dedicated to run the indexing service, not serve end users.
      Could Google do it? Yeah, sure, but they are busy doing LLM things.
- xnx 306 days ago
  cache:url still works (for now), but it's true there's no link to access it
  [-]
  - p0358 306 days ago
    True, although only if the link is actually cached, which I find to be the case far more rarely than before. But it does work and it's still helpful...
xnx 306 days ago
Cool to see the Internet Archive getting some official promotion from a megacorp like Google. I hope there's also proportional financial (and legal?) support.
lxgr 306 days ago
Doesn't work for me – probably one of those things that Google rolls out over the course of months without any way of telling whether you're already in the feature flag or not.
In the meantime, I'll keep using this handy bookmarklet: https://gist.github.com/n-st/0dd03b2323e7f9acd98e (which obviously only works for pages that are still available; for others, it requires copy-paste-ing the URL).
Also, Google/IA and me seem to have very different definitions of "easy":
> [...] conduct a search on Google as usual. Next to each search result, you’ll find three dots—clicking on these will bring up the “About this Result” panel. Within this panel, select “More About This Page” to reveal a link to the Wayback Machine page for that website.
The only thing that's missing is the "Beware of the Leopard" sign.
[-]
- jonah-archive 305 days ago
  When this was posted it was rolled out for a subset of users. At this time it should be fully rolled out for all users.
  [-]
  - lxgr 305 days ago
    I can see it now, thank you!
    That said, it's one full page scroll down, and that after two clicks at not-too-obvious menus. Not exactly in your face in terms of discoverability.
    Hopefully this is just a first step – keeping archived sites indexed and/or automatically forwarding people to IA links once the original goes down would be amazing!
    Within reason, of course; there's possibly a point where it would be a bit "too discoverable" and make people exclude their site from archiving as a general precaution. (I could see people being generally fine with being archived, but not with being google-able forever.)
creer 306 days ago
Internet Archive is amazing, and vulnerable: It's a single entity in a single jurisdiction unfriendly to this kind of effort. Are there efforts to duplicate it? For the book side - and some magazines, there is libgen and such. Is there something for the web side? music, photos, software? Any current effort by Internet Archive themselves?
A quick look now at archive.org didn't find much.
A hint that "partner institutions" can maintain local copies. Some which might be "one-off" copies, not maintained up to date.
There was an IA.BAK project.
There was a useful discussion 4 years ago, here:
https://old.reddit.com/r/DataHoarder/comments/h02jl4/lets_sa...
ahmedfromtunis 306 days ago
It was a bummer when Google removed access to their cached version of webpages.
This is a step in the right direction, even though navigating the Wayback Machine often results in the tab crashing.
elektor 306 days ago
Kagi does the same thing
https://imgur.com/a/z4D8aDo
[-]
- p0358 306 days ago
  At least here it's right there on the spotlight, in Google it's so hidden beneath that I doubt anyone will ever notice it.
  [-]
  - msephton 297 days ago
    It really is hidden away isn't it. What a waste of time
terrycody 304 days ago
But is there a way to check when a specific webpage first posted online? I doubt it...
[-]
- msephton 297 days ago
  You can ask for the oldest version of any page that is on the Wayback Machine. That may or may not be the first posting, depending on the date. The further back in time you go, the more spotty the archive becomes. Late 1990s and early 2000s is quite patchy. Just enter the URL into Wayback Machine and you'll see two dates (first and last) and a timeline (all the archives versions).