Putting a full power search engine in Ecto

(moosie.us)

144 points | by philippemnoel 81 days ago

5 comments

  • FrancoisBosun 79 days ago
    This looks very, very interesting! Good work. My only nitpick is the ligatures. I believe pipelining in Elixir uses the |> operator, but the blog post uses a kind of triangle pointing to the right. Due to my previous exposure to Elixir, I guessed that it must have been |>, but if I hadn’t know, then I would be really confused when I tried to write that in my editor to replicate the code.
    • sbuttgereit 79 days ago
      I agree. I use ligatures in my own coding and so to my eyes, the presentation was very natural... but for someone that doesn't/hasn't I think your point is completely correct.

      It's better to not use ligatures for publication, such as in this scenario.

      (Now that I've said that, I better go check and see if I've made this mistake due to just not thinking about it.... hmm.....)

    • lawn 79 days ago
      I personally like some types of ligatures, but I think it's good to not use them when others should read the code.
    • mise_en_place 78 days ago
      100% agree, it's just jarring to anyone who's developed in Elixir before. It's just like dquote characters on MacOS ("smart quotes")
    • LorenzoGood 78 days ago
      I, as a new elixir user, was personally confused by this exact thing as well.
    • carrja99 78 days ago
      A lot of folks configure their editor to render |> as a rotated triangle.
      • Kamq 78 days ago
        Sure, but putting it in a code sample is similar to putting opening/closing quotes in a code sample instead of "

        It makes it harder for people to copy and paste and play with.

        • dpatterbee 78 days ago
          If you copy and paste that triangle you will get "|>".
    • brightball 79 days ago
      There is some editor plugin that converts it visually to a triangle. I have seen other people use it.
      • arrowsmith 79 days ago
        It’s a font, not an editor plugin.

        Not sure which font specifically is used in the article but an example of a monospace font with ligatures is Fira: https://github.com/tonsky/FiraCode

      • lvass 78 days ago
        prettify-symbols-mode in Emacs.
  • dugmartin 79 days ago
    If you are looking for a great blog post describing how to build a native full text search engine in Elixir that you can drop into your app:

    https://culttt.com/2023/03/22/building-a-full-text-search-en...

    (I’ve no affiliation with the author)

  • skybrian 78 days ago
    Context: Ecto [1] seems to be a database layer for Elixir and Elixir is a programming language for Erlang’s virtual machine.

    [1] https://github.com/elixir-ecto/ecto

  • conradfr 79 days ago
    Quite interesting, except the fork aspect.
    • Moosieus 78 days ago
      Yeah, it’s a non-starter in a lot of ways. I’m working on an alternate implementation that uses fragments, enabled by ongoing updates to ParadeDB’s syntax. Gonna explore all options and see where the cards land.
  • latch 79 days ago
    I don't understand why people use Ecto (or ActiveRecord, or...)

    Back in the day, I'm pretty sure we were using Hibernate and friend because our software was shipped and we wanted it to work with whatever database the client was using.

    But for a hosted software, what's the point? Not having to know SQL or details about PostgreSQL / the underlying DB ? Apps should be using SQL directly, and for cases where you need dynamic SQL (like, you're where clause is different based on some query string parameters), you can have a low-level query builder (1)

    (1) I'm not affiliated with it and have never used it, but a good search came up with https://github.com/robconery/moebius which, at least from the readme, is roughly what I'm talking about.

    • isodev 79 days ago
      > we wanted it to work with whatever database

      While this is an option, making an app work with different database backends is not usually why one picks a library like Ecto or ActiveRecord.

      You see, these frameworks bring a tone of facilities for working with data, adding common solutions so we don't have re-invent the wheel every time. Everything from sanitation of user inputs and query building to creating update statements, validating input data and generating validation messages. Ecto is smart enough to even offer facilities for creating and handling forms based on the provided schema of the data. It also works with virtual data (not backed by a database) and all that can happen in Elixir, the same language as the rest of the application. For me, this has the advantage that I can focus on building what my app is supposed to do.

      Of course, if you really want or need to, there is always an option to submit your "direct SQL" query, and you will be free to write all the boilerplate needed to handle that.

    • prophesi 79 days ago
      Ecto isn't an ORM like ActiveRecord or Hibernate. It's a DSL for building queries and you can always drop down into raw SQL if you'd like. Moebius looks like Ecto except now you no longer have schema validations and the like.
      • sph 79 days ago
        Yes. It's more of a data mapper and composable DSL over SQL than an "object-relational mapper", since indeed there are no stateful objects in Elixir.
    • brightball 79 days ago
      ActiveRecord Scopes are actually incredible for this. You can separate pieces of queries and recombine them, use parameters for some, etc.

      I implemented an entire search categorized search with Postgres and ActiveRecord with nothing but Scopes.

      Scopes are my primary reason for using Rails these days honestly. It makes it so easy to tap into the DB in a reusable way.

      https://guides.rubyonrails.org/active_record_querying.html#s...

    • sph 79 days ago
      Sounds like you have never used Ecto.
    • sbuttgereit 79 days ago
      I done extensive database development work, including writing schemas, queries, and store procedures over the course of almost 30 years. This is in the ERP space where database schemas tend to be quite large, highly normalized, and overall complex. And despite this experience I very much enjoy using Ecto.

      I say this having survived ORMs, including Hibernate. Ecto is not an ORM. I've also done a lot of work with applications which just used raw queries and I still elect to use Ecto. More than that, in my own Elixir application I was "database abstraction skeptic" enough that I was not going to use Ecto at all, just as you suggest, but was very quickly sold on its advantages and some of my fears about such tools just didn't materialize.

      First: Ecto is not actually one thing and there are use cases in applications where there is no database at all. As I see it, there really are four (related) tools in Ecto which you can elect to use... or not; though its safe to say the most common pattern is use all of them. 1) There's a database migration tool; 2) a data mapper; 3) a data validation library, 4) a query building DSL. The database migration tool and query builder are clearly database related but the data mapper and data validation parts of Ecto, however, have uses outside of the database, such as mapping and validating web form data.

      The migrator and query builder are, unsurprisingly, very database focused. The DSLs of both are very close to the SQL however and, especially with the query builder, I've found that for any query I build in the DSL, I can clearly know what the database queries generated will be and I can do this at a fine tuning level, where I can write specific query DSL and know that I'll get a specific query (or queries) at the database. The reason I choose to write the query DSL rather than just sending raw queries is because, while the query DSL is very SQL like anyway, I get all of the advantages of functional composition and natural usage within Elixir that SQL doesn't offer on its own. I guess the key to winning my trust is that it's not so abstracted that the database is truly black box. In cases where the Ecto DSL isn't up to the challenge, you can always write and process a raw database query, including into the data mapper defined schemas for further processing.

      I also do use the data mapping with Ecto to define and map virtual data schemas which back web forms, forms which do not relate directly to database tables in any one-to-one way, and I validate web form data using Ecto Changesets (the validation part of Ecto). Again, this is independent of any database related functionality.

      I will say I don't use the database migrator. Not because it's bad, but because I was able to better create a migration scheme which better matched my application's development style and because I use many database features not directly supported by the Ecto database migrator... and if you're going to be writing a lot of raw SQL anyway, why wrap it all in a bunch of Elixir?

      Finally, I will say that people sometimes err and try to use the Ecto query DSL in cases where they really would be better off just writing a raw SQL query. Over at the Elixir Forum (https://elixirforum.com/) I sometimes see people asking, "how do I do <some complex SQL query> in Ecto?!", and I see some pretty tortured Ecto DSL trying to get there. I do think there is a point where you say: just because you can, doesn't mean you should. In those cases, I'm betting they'd be better off just writing the raw SQL and moving on. Nothing in Ecto forces you to use the query DSL exclusively and not using it can be the simpler option in a number of complex query scenarios.

    • stephen 78 days ago
      Using raw SQL directly is doable, but it means you're responsible for maintaining the business logic & validation rules of every single hand-written INSERT, UPDATE, and DELETE query in your codebase.

      Personally I don't trust myself to remember to do that :-) hence preferring entity-based ORMs:

      https://joist-orm.io/docs/modeling/why-entities

      (That said, I definitely "know SQL" and use raw SQL queries for the ~5% of queries in a CRUD/SaaS app that are actually unique/non-boilerplate, instead of forcing them to go through an obtuse query builder DSL.)

    • mise_en_place 78 days ago
      You can store most things in an ETS table, or mnesia if you want something distributed. But a lot of times, your customers or end users will end up abusing a system that was meant for fast read/write access and small data sizes. Then people like me get paged at 4 AM in the morning because ERTS happily consumes all available memory. You should plan to use Ecto and an RDBMS for most use-cases.
    • throwawaymaths 78 days ago
      ORMs and Ecto will do a lot of things for you but the biggest one is sanitization. If you default to "go to raw SQL" it's too easy to miss those things and cause a bobby tables incident. Better to default to the framework and opt in to raw SQL when the queries need fine tuning.