18 comments

  • OuterVale 15 hours ago
    Also worth a mention is Wiby.

    "The Wiby search engine is building a web of pages as it was in the earlier days of the internet."

    It's main indexing requirements are:

    - "Pages must be simple in design. Simple HTML, non-commerical sites are preferred."

    - "Pages should not use much scripts/css for cosmetic effect. Some might squeak through."

    - "Don't use ads that are intrusive (such as ads that appear overtop of content)."

    - "Don't submit a page which serves primarily as a portal to other bloated websites."

    https://wiby.me

    • lelanthran 14 hours ago
      This is most definitely not the same thing. The indexing requirements are not "This site must be an independent or personal site", it's "This site must lean towards being a plain HTTP document".

      The Search My Site, from what I can tell, has the goal of surfacing personal/independent websites, while Wiby has the goal of surfacing minimally styled documents.

      Two different goals.

      • codetrotter 11 hours ago
        > In the early days of the web, pages were made primarily by hobbyists, academics, and computer savvy people about subjects they were personally interested in. Later on, the web became saturated with commercial pages that overcrowded everything else. All the personalized websites are hidden among a pile of commercial pages.

        > […]

        > The Wiby search engine is building a web of pages as it was in the earlier days of the internet.

        https://wiby.me/about/

        Sounds to me like Wiby is more similar to Search My Site than what your comment makes it sound like.

      • danlitt 14 hours ago
        I wonder if this is why they said "worth a mention" rather than "the same thing".
        • lelanthran 14 hours ago
          I dunno; something is worth a mention if it's in the same category being discussed. Wiby most certainly isn't.
          • danlitt 14 hours ago
            "alternative search engines"?
            • lelanthran 14 hours ago
              The title says, as of writing,

                   for personal and independent websites
              
              To my mind, that excludes things like Kagi, and Wiby, etc which would ordinarily be included if the title said

                   alternative search engine
      • OuterVale 13 hours ago
        I'm certainly not presenting Wiby as being the same thing, merely as something that is worth a mention due to likely being of interest to anyone interested in Search My Site.

        It is relevant and has vaguely aligned intent.

      • notachatbot123 14 hours ago
        And that is ok. It is still contextually relevant to some people and a nice project to boost.
  • danlitt 14 hours ago
  • renegat0x0 13 hours ago
  • rumgewieselt 14 hours ago
    I love the simplecity of https://pagefind.app/
    • kilroy123 6 hours ago
      Me too! I'm a huge fan. I use it for all my static sites.
    • junto 14 hours ago
      This is what I’m using with my Astro personal blog. It’s awesome.
    • brontosaurusrex 12 hours ago
      Interesting, is that a more complete variation of fuse.js? (Just pluged-in fuse.js into my static jekyll blog)
      • 7952 2 hours ago
        Pagefind uses an index that is created ahead of time and stored as numerous files on a static site. It then downloads just the part of the index needed to complete the search. This means that you can search vastly more data than could be loaded onto a browser.
      • wonger_ 10 hours ago
        I think Pagefind is focused on the whole experience of searching pages, like with default UI widgets, easy page indexing, and handling larger sites. fuse.js seems to be a fuzzy-filter function on JS data, not handling the site integration.
    • ozornin 12 hours ago
      This is just what I wanted, thank you for that!
  • kreelman 16 hours ago
    Thanks for putting this together. I wonder, is Postgres a bit of a large DB if it's just a personal website search tool? I'll have to give it a go. We need more tools like this.
    • m-i-l 12 hours ago
      Postgres is just used for the site admin, i.e. keeping track of submissions, review status, subscriptions etc. The actual search index is in Apache Solr. In theory you could use Solr to store all the admin data, but it is generally not recommended to use a Solr style document store to master data. I guess something more lightweight like SQLite could be used, but it is intended to be deployed on servers and Postgres isn't too resource intensive.
  • 1dom 10 hours ago
    I like this, thank you! I just lost an hour of time to the exact sort of random but considered personal websites that I think made the Web great in the first place.
    • m-i-l 7 hours ago
      Thanks for the great feedback:-) This is what searchmysite.net is attempting to do - help make "surfing the web" a fun leisure activity once more. It is good to see more people seem to get that point now. When it was on HN nearly 3 years ago[0], many people saw a search box and thought it must be a Google replacement, but were disappointed to find it wasn't. And I guess now more than ever it is useful to have a way of finding content on the web which has been made by humans rather than AI.

      [0] https://news.ycombinator.com/item?id=31395231

  • unfixed 14 hours ago
    This kind of projects are really good for finding interesting blogs and obscure sites.

    My go to choice is https://marginalia-search.com/

  • _puk 14 hours ago
    Great to see this.

    Ironically, given Google's stronghold over the past decade, I strongly feel that one of the big winners in the AI space is going to be the backend search engine.

    Modern web search has become so polluted, with many tricks to get to the front page of Google that a lot (most?) of the good content is lost.

    Now that many of the big models are capable of calling out to the web, this bloat is now appearing in AI search. A proper data first engine, without ads, less focus on presentation, and more on structured data is what is needed.

  • saltysalt 5 hours ago
    I'd also suggest https://greppr.org

    (Disclaimer: I built it).

  • ThinkBeat 12 hours ago
    I am a bit confused. Solr is the search engine.

    An LLM model is loaded. What does the LLM model add to the solution?

    • m-i-l 11 hours ago
      The LLM was for an experiment in retrieval augmented generation, i.e. "a chat with your website" style interface, using Apache Solr as the vector store. Results (on a small self-hosted LLM to keep costs manageable) weren't good enough for the functionality to be fully rolled out, so the LLM has been disabled and is likely to be fully removed.
  • nelsonfigueroa 13 hours ago
    This is awesome. I love anything that helps me discover new personal sites/blogs.
  • eviks 12 hours ago
    No basics like typo-resistance?

    digiatl

    > No results found for digiatl.

    • amanaplanacanal 10 hours ago
      In my imagination, I think I prefer a search engine which searches for what I ask, rather than one which tries to guess what I really want.

      It's been so long since I had one that really worked that way that I might turn out to hate it though.

      • 1dom 10 hours ago
        Best of both worlds:

        > No results found for "digiatl". Did you mean to search for "digital" instead?

        • m-i-l 7 hours ago
          At a big corporate, we had an Apache Solr based search which had some reasonably clever lemmatization and stats analysis and spell check config to suggest alternative searches if not many results were found for the original query, but one day someone reported an unfortunate edge case which caused a bit of a panic - if you searched "annual report” it returned "did you mean anal report?" (we were in the finance sector rather than medical sector, but there were a lot more documents in the corpus containing words like analysts, analysis, analytics etc). Anyway, the point is yes, it is great to have that sort of functionality, but it does come at a cost, and a small project like this might prefer to keep it simple.
          • 1dom 5 hours ago
            Generating suggestions from something other than what your users have already given you is inevitably going to result in something different and potentially offensive being shown to them.

            One solution is to offer suggestion from a list of previous searches.

            Also, that is very much a big corporate problem: I imagine most searchmysite users are mature and stable enough not to have a melt down at the word "anal".

            But I agree with your point, sometimes seemingly small features take a disproportionate amount of support, and this could be one of them!

          • busymom0 5 hours ago
            Couldn't you just add an extra step to check if the suggestion is offensive, then don't show it?
      • eviks 10 hours ago
        Most of the search engines you encounter fail here (press Ctrl+F in your browser and make a typo), it's the web search that's different. Though even here it's easy to check without making relying only on imagination - how often do you add quotes for literals?
  • aleken 15 hours ago
    This is exactly what I have been looking for. Like the other commenter, I am a but surprised by having to drag along psql for this. I like the design of the site, though
  • csprimer-in 14 hours ago
    I fail to understand the complexity of it, can you help me understand how is it different from other search engines ? Thanks in advance !
    • _puk 14 hours ago
      Sites are ranked higher when they have no ads. Fully open source.

      That's a good starting point..

      • pjerem 14 hours ago
        > Sites are ranked higher when they have no ads.

        That’s a pretty clever filter tbh. Sounds so evident that I’m amazed nobody thought of it before.

        I’d love Kagi to have such an option.

        • lelanthran 12 hours ago
          > That’s a pretty clever filter tbh. Sounds so evident that I’m amazed nobody thought of it before.

          I did. I posted it on HN as a comment. It was a very popular (by my standards) comment: https://news.ycombinator.com/item?id=40438288

          The thread was interesting, with a lot of people posting rebuttals for why such a scheme would obviously not work. Equally obviously, someone else thought it was a good idea and went and implemented it.

          • 1dom 10 hours ago
            > Equally obviously, someone else thought it was a good idea and went and implemented it.

            Maybe I misunderstood, but it's not obvious to me that someone read your idea, thought it was good, and then went to implement it.

            https://github.com/searchmysite/searchmysite.net/graphs/code... looks like the bulk of the code was added 2 - 4 years ago.

      • p3rls 9 hours ago
        So then definitionally not independent then because their funding comes from elsewhere
        • Sophira 8 hours ago
          According to the site, the funding comes from its "Search as a Service" feature[0], where anybody can pay them in order to have a search service focused on their site (which does not have to be in the public index and thus doesn't have to be personal/independent).

          So, in the sense that the funding (aims to) comes from larger companies, you are correct. It's not VC, but it does seem like it could end up relying on payments from large companies, making it potentially vulnerable.

          [0] https://searchmysite.net/pages/about/#search-as-a-service

          • m-i-l 7 hours ago
            That's right. Most search engines are funded by advertising, where there is the clear conflict of interest[0], not to mention incentive for spam etc. Alternative models include a subscription fee (which I don't think would work for a small niche search like this) and donations (which may or may not be sustainable). Looking through some of the support forums for the big search engines, I'm pretty sure that enough site owners would pay a fee for support to pay the running costs for a large search engine, although for a smaller search engine like this there needs to be something more than just support, hence the search as a service features.

            [0] "Advertising funded search engines will be inherently biased towards the advertisers and away from the needs of consumers", to quote Sergey Brin and Lawrence Page in their "The Anatomy of a Large-Scale Hypertextual Web Search Engine" paper from 1998.

          • p3rls 8 hours ago
            That too, but I was referring to the sites in the search themselves. They are not independents but in the pay of someone else if they're not generating revenue.

            As someone with a commercial digital garden with ads (and a $3/mo sub for the ad-adverse) I like to point out the tension in that whenever possible

            • Sophira 55 minutes ago
              Not necessarily. As a counterpoint, I have a site (<https://www.automidiflip.com/>, which I posted to HN on launch 8 years ago[0], which provides a service and does not have (and has never had) any ads whatsoever. (It inserts a reference to automidiflip.com in the MIDIs it creates as a credit, but no ads in the sense that most people mean.)

              Granted, it's a niche service, but over the past year it's still been used to flip MIDIs about 23 times a day on average, so it's definitely not unknown. I don't see any need to monetise it, though.

              [0] https://news.ycombinator.com/item?id=13553224

  • tobiasnvdw 15 hours ago
    This is wonderful. I immediately found an interesting new blog I had never seen before.
  • idiotsecant 10 hours ago
    It's like the internet is made of sincere weirdos again, I love this.
  • kittikitti 5 hours ago
    Thank you very much for sharing this project. After digging around, I found this blog post of yours to be the most insightful into the technical details about the search engine, https://blog.searchmysite.net/posts/searchmysite.net-buildin...
  • misonic 18 hours ago
    the login with password seems not working properly