Very cool. The shellfs extension (https://github.com/rustyconover/duckdb-shellfs-extension) that allows shell commands to be used for input and output will make DuckDB even more useful as a command line analysis tool. I'm not sure how I'll use it yet, but I'm betting I can streamline some multi-step data processes.
>DuckDB Labs and the DuckDB Foundation do not vet the code within community extensions and, therefore, cannot guarantee that DuckDB community extensions are safe to use. The loading of community extensions can be explicitly disabled with the following one-way configuration option:
So we should think of this like NPM.
Still, very cool and very useful. Would love a way from inside of duckdb directly to query the extensions available from community.
And like NPM or PyPI it's still at least marginally better than downloading compiled packages from opaque file servers. For example we avoided using the H3 (https://h3geo.org) extension for that reason. Safer (but slower) to use Python UDFs with the official H3 Python library than to fetch a file from an R2 instance, which is what the instructions currently state on Github (https://github.com/isaacbrodsky/h3-duckdb/blob/3c8a5358e42ab...)
Honest question, how feasible it would be for DuckDB to release a non-columnar version of their DB (or at least make DuckDB a decent choice for a typical web app)? I don't know any other DB that makes installing extensions this easy. The rate at which they're shipping awesome features makes me wonder if they could eventually become a great generic database.
I know, I know, this could just as easily be a double-edged sword. A database should prioritize stability above everything else, but there is no reason why we shouldn't expect them to reach there.
Are we certain that it's _not_ a decent choice for a typical web app? I'm tempted to swap it into one of mine and see how it behaves. Even if some operations are internally slower, that might be offset by having zero network latency to deal with
It would be nice though if other DBs made extensions this easy. There are a handful of package managers for Postgres but they're not generally supported on managed platforms like RDS.
Anyone know if there are comparable options for SQLite? Seems like an obvious thing that should exist but a quick search isn't showing me any
Hello! I would recommend trying out DuckDB's SQLite attach feature! You can read or write data, and even make schema changes, all with DuckDB's engine and syntax. The storage then uses SQLite, which is row oriented!
This is excellent — do you have any content around the performance affect here over using SQLite directly? I could see DuckDB's engine being faster for some cases but the SQLite storage format might hinder it. Curious if there's any analysis around this
What do you need non-columnar layout for? Do you expect thousands of concurrent single row writes at a time?
If you use embedded duckdb on the client, unless the person goes crazy clicking their mouse at 60 clicks/s, duckdb should handle it fine.
If you run it on the backend and expect concurrent writes, you can buffer the writes in concatenated arrow tables, one per minibatch, and merge to duckdb every say 10 seconds. You'd just need to query both the historical duckdb and realtime arrow tables separately and combine results later.
I agree that having a native support for this so called Lambda architecture would be cool to have natively in duckdb. Especially when drinking fast moving data from a firehose.
So we should think of this like NPM.
Still, very cool and very useful. Would love a way from inside of duckdb directly to query the extensions available from community.
The baser part of me wonders how hard it would be to compromise that supply chain.
(I work for DuckDB Labs and MotherDuck)
> 1. (of a person or a person's actions or feelings) without moral principles; ignoble.
> 2. denoting or befitting a person of low social class.
(New term, to me)
I know, I know, this could just as easily be a double-edged sword. A database should prioritize stability above everything else, but there is no reason why we shouldn't expect them to reach there.
It would be nice though if other DBs made extensions this easy. There are a handful of package managers for Postgres but they're not generally supported on managed platforms like RDS.
Anyone know if there are comparable options for SQLite? Seems like an obvious thing that should exist but a quick search isn't showing me any
https://duckdb.org/docs/extensions/sqlite
(I work at MotherDuck and DuckDB Labs)
If you use embedded duckdb on the client, unless the person goes crazy clicking their mouse at 60 clicks/s, duckdb should handle it fine.
If you run it on the backend and expect concurrent writes, you can buffer the writes in concatenated arrow tables, one per minibatch, and merge to duckdb every say 10 seconds. You'd just need to query both the historical duckdb and realtime arrow tables separately and combine results later.
I agree that having a native support for this so called Lambda architecture would be cool to have natively in duckdb. Especially when drinking fast moving data from a firehose.