Hacker Newsnew | past | comments | ask | show | jobs | submit | Tsarp's commentslogin

Voice interfaces. Currently this dictation app https://carelesswhisper.app

Managed to make long dictations even >10mins appear in < 2seconds by pushing what is possible with current STT models.

All processing done locally with 0 network calls.


If you were to classify how you landed the 170+ clients into a few buckets what would the top channels be?


Upwork, majority of them (i'm top ~40 freelancer there and at one point was top ~10). The rest, non-repeatable, non-systematic randos.


Sounds like you solved the trust issue with your upwork profile reputation.


Making local running dictations transcribe faster than cloud tools even for longer dictations. Yes its possible.

https://carelesswhisper.app/blog/latency-demo


Building Arivu: CLI/library that normalizes fetch/search across a bunch of sources (arXiv, PubMed, HN, GitHub, Reddit, YouTube transcripts, RSS, web pages…).

I use it as a context fetcher i.e grab an abstract/transcript/thread as clean text/JSON, pipe it into summaries or scripts.

Also runs as an MCP server (experimental), so tools like Claude Desktop or CLI assistants can call the connectors directly.

  arivu fetch hn:38500000
  arivu fetch PMID:12345678
  arivu fetch https://arxiv.org/abs/2301.07041
https://github.com/srv1n/arivu


Just tried it, this is going to be super helpful with Claude skills.


If we are already comfortable with our enterprise chatgpt subscription, how might this be of value. Given that it does RAG, tool calling, has all the SSO stuff/collab? Or are we not the target customer.

Just curious. Especially with both OpenAI and Anthropic really also outpacing startups in release cadence unlike previous cycles.

Guessing your selling point is any model no locking (Assuming we are happy with the privacy SOC 2 etc guarantees on enterprise contracts here)


A few reasons:

1/ No model lock-in / ability to use the ideal model for each use case.

2/ More connectivity. A fuller connector library (contributed to by the open-source community). More built-in tools (similar to the ^).

3/ Customizability and flexibility. If you really need a feature, you can build it rather than waiting months (years?) for your request to go through.

4/ White-labeling. You can make it feel like a product built for/by your company rather than generic.


What is the use case for a desktop app? Just local storage or using local compute for RAG? or perhaps privacy?


I built a prototype using native messaging (the same way apps password managers interact with browsers and drive actions with pure js).

I have a lot of actions done but not full there yet. Essentially the goal is to use a cli or an external desktop app to drive your already logged‑in Chrome profile without navigator.webdriver, or enabling --remote‑debugging‑port. In all my testing never got flagged with captcha/bot protect. The cli can interact with LLMs, local file system(despite opfs this is easier).

CLI(host app) <-> Native messaging daemon <-> Chrome extension

Extenion just executes pure js for extraction, navigation. Ive gotten basic google searching, image downloading working. Looking at complex form interactions.


native messaging is a huge headache to set up reliably across all the OSs and (often headless) chrome setups in our experience, that's why we've avoided it.

Just using some remote endpoint message bus service is an easier solution, or something like ElectricSQL/RxDB/Replicache/etc.

We also can't really use in-page JS for much because it's easily detected by bot-blockers, though isolated worlds help in CDP.


Wouldnt having chrome.debugger=true also flag your requests?


Im using LanceDB (and pretty happy with it so far). I am looking at the chroma docs and have a few questions

1. I see the core is OSS, any chance of it being pushed up on crates.io(i see you already have a placeholder)

2. Is it embeddable or only as a Axum server?

Do you see all providers converging on similar alpha i.e cheap object storage, nvme drives,ssd cache to solve this?

Cheers and congrats on the launch


Hey there

Chroma is fully OSS - embedded, single-node and distributed (data and control plane). afaik lance distributed is not OSS.

We do have plans to release the crate (enabling embedded chroma in rust) - but haven't gotten around to it yet. Hopefully soon!

> Do you see all providers converging on similar alpha i.e cheap object storage, nvme drives,ssd cache to solve this?

It's not only a new pattern in search workloads, but it's happening in streaming, KV, OLTP, OLAP, etc. Yea - it's the future.


https://github.com/srv1n/kurpod

Lets you hide 1000s of images, pdfs, videos, secrets, keys into a single portable innocent file like "Vacation_Summer_2024.mp4".


Would be cool to get a wasm build that runs fully on browser


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: