Hacker Newsnew | past | comments | ask | show | jobs | submit | badlogic's commentslogin

Neat. Any reason why the MCP server doesn't expose a JavaScript/eval tool? Current models excel at writing JS to drive and inspect the DOM. They aren't great at driving browsers via screenshots.


FWIW, if you have Claude Code or the like, you can quickly prompt your way to an eval function in MCP. It already exists in clicker and the client API. You can use it to get the accessibility tree, for example, and use that to find what to fill out and click.


> why the MCP server doesn't expose a JavaScript/eval tool?

no reason other than my number #1 goal was "ship something". i only started the actual coding on dec 11. it's been a bit of a sprint the last two weeks!

though "image-based" vs "dom-based" testing approaches is a very big topic! (look forward to researching that more in the future.)

v1 announcement: https://github.com/VibiumDev/vibium/blob/main/docs/updates/2...


Create a markdown file, for each SKILL.md of the skills you want to use, put the frontmatter in that single markdown file along with the fulk path to the SKILL.md file. On session start, tell Gemini to read that file. If you put it in your AGENTS.md, you don't have to instruct Gemini. And if you have your skills in a known folder, let Gemini write a small scripts that generates that markdown file for you.


Loved the fun write up. Now that we know that LLM-based vision is lossy, here's a different challenge:

Give the LLM access to the site's DOM and let it recreate the site with modern CSS. LLMs are much better with source code, aka text, right? :)


I can talk for the gov. site in my European home country: they too are buying GPUs for chat ...


Oh, I didn't intend this to come across as MCP being useless. I've written this from the perspective of someone who uses LLMs mostly for coding/computer tasks, where I found MCP to be less than ideal for my use cases.

I actually think MCP can be a multiplier for non-technical users, where it not for some nits like being a bit too technical and the various security footguns many MCP servers hand you.


That makes sense to me, thanks for the clarification.


Also not disagreeing with your argument. Just want to point out that you can achieve the same by putting minimal info about your CLI tools in your global or project specific CLAUDE.md.

The only downside here is that it's more work than `claude mcp add x -- npx x@latest`. But you get composability in return, as well as the intermediate tool outputs not having to pass through the model's context.


Yes, the only reason they are building a browser is to gobble up more data.

https://x.com/badlogicgames/status/1980698199649317287


Yikes!

Market capture, again. :sigh: It's such a common motivator for digital (-adjacent) product decisions in business these days.


I run a few production RAG systems, some as old as end of 2023 and arrived at the same conclusions.

Query expansions and non-naive chunking give the biggest bang for the bug, with chunking being the most resource intensive task, if the input data is chunk (pun intended).


I love this! Not just because I also grew up in the 90ies and like your music choice :)

As we drown in media and slop, I think it's super important to teach kids how to be selective, develop taste. And I too found that physical connection does help with that.

Great project and execution. It would be great if you could also introduce a social aspect, i.e. kids sharing/swapping cards.

(Did something similar for our then 3yo, since it's one of a kind, the social aspect is kinda not there. Yet! https://mariozechner.at/posts/2025-04-20-boxie/)


"game boy for audiobooks" is so cool. Thanks for sharing. (dad) rock on.


Chrome dev console has Gemini integrated as well. Otherwise pick any coding agent (Claude Code, Codex, opencode, ...) give it the Playwright MCP and ask away.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: