DuckDB Community Extensions

(duckdb.org)

139 points | by isaacbrodsky 92 days ago

8 comments

  • xnx 92 days ago
    Very cool. The shellfs extension (https://github.com/rustyconover/duckdb-shellfs-extension) that allows shell commands to be used for input and output will make DuckDB even more useful as a command line analysis tool. I'm not sure how I'll use it yet, but I'm betting I can streamline some multi-step data processes.
    • rustyconover 91 days ago
      As the author I'm happy to answer any questions. I'm glad you like the idea of the extension.
  • ec109685 91 days ago
    If anyone is curious, web assembly is also supported: https://duckdb.org/2023/12/18/duckdb-extensions-in-wasm.html
  • netcraft 92 days ago
    >DuckDB Labs and the DuckDB Foundation do not vet the code within community extensions and, therefore, cannot guarantee that DuckDB community extensions are safe to use. The loading of community extensions can be explicitly disabled with the following one-way configuration option:

    So we should think of this like NPM.

    Still, very cool and very useful. Would love a way from inside of duckdb directly to query the extensions available from community.

    • nerdponx 92 days ago
      And like NPM or PyPI it's still at least marginally better than downloading compiled packages from opaque file servers. For example we avoided using the H3 (https://h3geo.org) extension for that reason. Safer (but slower) to use Python UDFs with the official H3 Python library than to fetch a file from an R2 instance, which is what the instructions currently state on Github (https://github.com/isaacbrodsky/h3-duckdb/blob/3c8a5358e42ab...)
  • 9cb14c1ec0 92 days ago
    > What happens behind the scenes is that DuckDB downloads an extension binary

    The baser part of me wonders how hard it would be to compromise that supply chain.

    • 1egg0myegg0 92 days ago
      Extension downloads are validated using a signature check to prevent tampering!

      (I work for DuckDB Labs and MotherDuck)

      • immibis 91 days ago
        The backdoored version of xz was also signed.
    • metadat 92 days ago
      define: baser

      > 1. (of a person or a person's actions or feelings) without moral principles; ignoble.

      > 2. denoting or befitting a person of low social class.

      (New term, to me)

    • sitkack 92 days ago
      Same as PyPi. Maybe upload left pad?
  • shubhamjain 91 days ago
    Honest question, how feasible it would be for DuckDB to release a non-columnar version of their DB (or at least make DuckDB a decent choice for a typical web app)? I don't know any other DB that makes installing extensions this easy. The rate at which they're shipping awesome features makes me wonder if they could eventually become a great generic database.

    I know, I know, this could just as easily be a double-edged sword. A database should prioritize stability above everything else, but there is no reason why we shouldn't expect them to reach there.

    • wild_egg 91 days ago
      Are we certain that it's _not_ a decent choice for a typical web app? I'm tempted to swap it into one of mine and see how it behaves. Even if some operations are internally slower, that might be offset by having zero network latency to deal with

      It would be nice though if other DBs made extensions this easy. There are a handful of package managers for Postgres but they're not generally supported on managed platforms like RDS.

      Anyone know if there are comparable options for SQLite? Seems like an obvious thing that should exist but a quick search isn't showing me any

    • 1egg0myegg0 91 days ago
      Hello! I would recommend trying out DuckDB's SQLite attach feature! You can read or write data, and even make schema changes, all with DuckDB's engine and syntax. The storage then uses SQLite, which is row oriented!

      https://duckdb.org/docs/extensions/sqlite

      (I work at MotherDuck and DuckDB Labs)

      • wild_egg 91 days ago
        This is excellent — do you have any content around the performance affect here over using SQLite directly? I could see DuckDB's engine being faster for some cases but the SQLite storage format might hinder it. Curious if there's any analysis around this
    • snidane 91 days ago
      What do you need non-columnar layout for? Do you expect thousands of concurrent single row writes at a time?

      If you use embedded duckdb on the client, unless the person goes crazy clicking their mouse at 60 clicks/s, duckdb should handle it fine.

      If you run it on the backend and expect concurrent writes, you can buffer the writes in concatenated arrow tables, one per minibatch, and merge to duckdb every say 10 seconds. You'd just need to query both the historical duckdb and realtime arrow tables separately and combine results later.

      I agree that having a native support for this so called Lambda architecture would be cool to have natively in duckdb. Especially when drinking fast moving data from a firehose.

    • mgaunard 91 days ago
      Most of my web apps are built around tabular data.
  • gigatexal 92 days ago
    This is the coolest thing! I’m very excited to see what we will have next. Hah maybe an extension that imbeds vim and then I’ll never leave DuckDb lol
  • victor106 91 days ago
    Does duckdb (natively or through extensions) support Delta Tables?