Launch HN: Airweave (YC X25) – Let agents search any app

btown · 2025-09-30T19:48:48 1759261728

How do you compare to Onyx? We've used it for some limited use cases, but one of the real challenges - and one I hope to see a lot of innovation on in the space - was permissioning.

I see in another comment that you encourage each user to build their own dataset with their own permissions, but often this breaks for founders. If I have a Super Secret Personnel Planning Google Doc at a founder level, how can I be the one to set up the system for our company, but ensure that only files that I've explicitly shared with the company are ingested? What if a file needs to be made anyone-with-link-can-access for sharing with a strategic partner, but that shouldn't be indexed for the entire company?

Far too much of the world relies on the security-by-obscurity of public-but-unindexed links, and communications that might look public from a metadata perspective but were carefully designed for a very specific group of people who have verbal/mental context about confidentiality expectations. Being able to categorize by likely confidentiality, and allowing an administrator to partition access on a project and sub-project basis based on that, might be crucial for growth.

My recollection is that Onyx had limited support for some security use cases, but very rudimentary. Hoping you can solve this in a thoughtful way!

Onyx links for comparison:

https://www.onyx.app/

https://docs.onyx.app/developers/guides/chat_guide

https://docs.onyx.app/admin/connectors/official/

raufakdemir · 2025-09-30T21:12:03 1759266723

It’s a good point. It IS hard to map the various “off-market RBACs” onto a unified model and this is part of the reason we delay that - and instead handle it with per-user syncs that include the q=“sharedWithMe” parameters.

As for intelligently - but probabilistically - determining confidentiality (if I read that correctly), that does sound pretty interesting in scenarios where metadata is just simply insufficient. Also tricky. Sounds like you thought about these problems pretty deeply.

lennertjansen · 2025-10-01T09:56:41 1759312601

@btown: Biggest difference: Airweave is infra for devs, i.e., connectors, sync, indexing (semantic + keyword), and a retrieval API/MCP designed with LLMs in mind as the consumers. You bring the agent/UI. Onyx is an end-to-end search app that owns the agentic reasoning layers that orchestrates their search. You can think of Airweave as a dev tool that you would use if you were building an agentic application, where Onyx is a good example of one.

On permissioning: we default to per-user syncs that adopt the permissions of the syncing user and mirror source ACLs (e.g., Drive items a user owns or that are sharedWithMe). In practice, founders avoid leaking private docs by either (a) having each user sync their own corpus, or (b) using a centrally-scoped token limited to Shared Drives/team folders and excluding personal “My Drive.” You can also keep separate collections and only expose cross-user search behind your own checks. We’re exploring richer org-level RBAC mapping on a per-customer basis (e.g., mapping Drive/SharePoint groups to index ACLs), but the above works today.

@Weves: Thanks, appreciate it!

Weves · 2025-09-30T23:00:21 1759273221

Don't mean to hijack (one of the Onyx founders here), but the example you described should be doable with Drive service accounts. Admittedly, our permissioning system is only implemented for a handful of connectors like Drive.

Congratulations on the launch Rauf & Lennert! Always great to have more innovation in the open source AI space :D. It looks like Airweave works well with Cursor, something we don't have nailed down yet!

ashu1461 · 2025-09-30T20:51:03 1759265463

Great release,

1. How do you decide whether to cache the data into a vector database or fetch it on runtime using a tool call ?

2. Slowly all players like Open AI / Claude are trying to provide a somewhat equivalent offering of connecting your workspaces and then providing search on top of it either via direct integrations / mcp servers, how do you see that spanning out ?

raufakdemir · 2025-09-30T20:59:25 1759265965

Airweave always indexes everything. We do not do any direct tool calling currently.

lennertjansen · 2025-10-01T10:06:53 1759313213

re 2.: I agree that there's a trend with OpenAI/Anthropic are adding Airweave-like connectors to ChatGPT and Claude Desktop. Imho this a good thing for us because it's showing the utility of our use case

suprnurd · 2025-09-30T16:58:48 1759251528

Looks great! It's cool how you are able to unify multiple sources into a single searchable layer. I’m curious how you chose which connectors to support first (e.g. GitHub, Notion, Slack) and how you plan to scale connector coverage? Thanks!

lennertjansen · 2025-09-30T17:13:51 1759252431

it's currently guided by community feedback, github issues, and user talks. and we rely on private e2e test suites for maintaining quality as we scale coverage

ameyamk · 2025-09-30T18:16:00 1759256160

Looks good. Curious, how is auth handled? Lot of docs have permissions etc. Can you clarify how this is handled in both indexing side and searching side of things?

raufakdemir · 2025-09-30T18:48:43 1759258123

Great question. We usually sync per user in cases where this matters. That seems inefficient until you realize the following: for most teams, workspace data is pretty small - at least compared to other data workloads (CRMs << 1gb).

We plan to implement unified ACL syncs to dedupe the data or even have 1 sync per org, but that’s mostly a cost optimization; Airweave will just scale horizontally until then.

Blahah · 2025-10-01T04:10:15 1759291815

This is pretty cool and I could see myself recommending this to our team for some applications. Congrats on the launch!

A couple of bits of feedback:

1. Code samples on the site have broken whitespace on mobile (Android/Brave) so look a bit intense.

2. The pricing is complex to reason about - I have to consider the technical aspects and the number of users? Why don't I just get an API key?

lennertjansen · 2025-10-01T09:59:43 1759312783

1. thanks for the feedback, testing it on a smartphone and changing that asap. 2. What about the pricing do you find complex? And what would make it easier to understand for you? Just want to add that you can just get an API key by using the free developer version or local instance (API-key is shown immediately in the top-right panel). You can also create more in your org settings

and ofc, feel free to reach out if your team needs help with setup

candiddevmike · 2025-09-30T18:27:50 1759256870

Seems like Google Agentspace but without the UI. Do you folks keep a persistent copy of the data being ingested? How are you planning on solving RBAC? IMO, all of these "search anything" apps are going to be leaky by design unless you're indexing/gathering on the fly using passthrough credentials...

raufakdemir · 2025-09-30T18:54:00 1759258440

Great question. We do index the data!

We usually sync per user. That way we make sure that no information leaks to another interface.

janwilmake · 2025-09-30T19:24:48 1759260288

Hey Lennert, congrats on the launch! Still open to chat about uithub

lennertjansen · 2025-10-01T10:00:28 1759312828

thanks Jan, lets definitely chat

hommes-r · 2025-10-01T07:48:01 1759304881

Awesome to see the Cursor Airweave example!

lennertjansen · 2025-10-01T10:00:10 1759312810

thanks!

ripped_britches · 2025-09-30T18:20:58 1759256458

Cool deal. How is this different from Glean?

raufakdemir · 2025-09-30T18:35:38 1759257338

Glean is enterprise search for humans. Airweave is built for agent developers that want to access their user’s (so the person using the agent product) information

andric · 2025-09-30T20:38:37 1759264717

Your pricing currently seems prohibitive for that kind of use case. Shouldn't it be usage-based so one can build a product where users can connect their apps without having to worry about arbitrary limits on plans? There should be a PAYG option that simply charges per connection, and automatic volume discounts.

lennertjansen · 2025-10-01T05:24:07 1759296247

the custom priced tier has usage based pricing. Admittedly, we’re still trying to nail down the unit economics of it all, which is pretty tricky in our case. That’s partly why wanted to release the free dev tier and cheap pro tier, so people can get started with building lightweight projects already. But I 100% agree that the next step is a self-serve PAYG tier.

raufakdemir · 2025-09-30T20:54:13 1759265653

Definitely worth looking into.

orliesaurus · 2025-09-30T23:04:30 1759273470

is this like RAG for cloud services that store my content?

lennertjansen · 2025-10-01T10:02:41 1759312961

Yes thats an accurate description. I'd add the nuance that our retrieval is designed to give agents the right actionable context to perform work on users' workspaces, and not necessarily to synthesize a final answer for a human end-user. But ofc you can use it for that.

ori_b · 2025-09-30T22:55:43 1759272943

Can I have the application search without the LLM shit?

lennertjansen · 2025-10-01T05:29:31 1759296571

yes, you don’t have to use LLM operations during search. You can set the search-endpoint to just use BM25 keyword search.

ori_b · 2025-10-01T14:35:33 1759329333

Can I have a UI for this, instead of having the clunkiness of trying to make an LLM do what I want?

EGreg · 2025-09-30T17:53:48 1759254828

"Give us access to any information on your computer."

And who is "us"?

"Well, our agents, of course. We'll send the information down to our servers, because -- surprise -- we have the GPU infrastructure to run it, and you don't. Don't worry, it's secure."

"Alright, well--"

https://www.wiz.io/blog/38-terabytes-of-private-data-acciden...

"Oops! Well don't worry, it's not like we're the first ones to sell your usage data..."

https://ferrumit.com/resources/it-s-now-legal-for-isps-to-se...

"You see! Well, just send us your DNA we'll analyze it -- with science! I mean with AI..."

"Alright, here is--"

https://www.nytimes.com/2025/05/19/business/regeneron-pharma...

"Oops! Well don't worry, it's not like the company that bought us will do anything with your data, that we wouldn't have done."

Here's my question...

1) How much can we feasibly run on a consumer-grade GPU today, on-board the computer, either the latest macbook or latest mobile iphone? Does Apple Metal + Silicon ship with any models that are on board the latest iOS 26?

2) How can we extend the security boundary to GPU servers that are attested black boxes that store data encrypted at rest, guaranteed not to train on it and are not owned by some corporation that can peek at the data?