swairshah's comments

swairshah · 2025-06-26T16:56:28 1750956988

Just used it for French right now. The Design is excellent! but the LLM task orientedness needs some work. The tutor needs to follow the curriculum well. This has the same issue that I have in my day job i.e. keeping the LLM on topic. Its not strict. i.e. after asking it to make sure to remind me to reply in french it very easily forgets to do so. Its not following a structured approach or even in casual conversation isn't correcting my mistakes unless I ask.

swairshah · 2025-05-14T23:53:00 1747266780

The tricky question is what env state do you compare in order to decide if its a cache hit or miss? just screen status, screen status + all open apps, that + all running processes etc. you know what i mean. I think its a solvable problem and very interesting one too. One now needs to think about what would a human consider while using a muscle memory and that varies based on the action i.e. "rm -rf ." requires knowing what directory i am in vs "click close + don't save" requires knowing I don't want the recent change.

swairshah · 2025-02-24T20:47:15 1740430035

Why not just open source Claude Code? people have tried to reverse eng the minified version https://gist.githubusercontent.com/1rgs/e4e13ac9aba301bcec28...

bhl · 2025-02-25T01:23:06 1740446586

Paste it into Claude and ask it to made the minified code more readable ;)

Agree the code should just be open source but there's nothing secretive that you can't extract manually.

swairshah · 2025-02-25T18:50:10 1740509410

I did! its 900% over the context window limit :D I will have to do it function by function lets see a decent project for me and claude-3.7

seunosewa · 2025-02-25T00:06:03 1740441963

Claude Code is on github: https://github.com/anthropics/claude-code

simonw · 2025-02-25T00:39:47 1740443987

That repo is just there for issue reporting right now - https://github.com/anthropics/claude-code/issues - it doesn't contain the tool's source code.

rafram · 2025-02-25T00:40:47 1740444047

There’s no source code in that repo.

swairshah · on Feb 3, 2019

I don't understand what you mean by 'non-locality of convolutions'. Isn't convolution inherently a local operation? This probably being one of the main reasons that CNNs are biased towards texture [0] and not shapes?

[0] https://openreview.net/forum?id=Bygh9j09KX

gwern · on Feb 3, 2019

Convolutions in a hierarchy of layers, especially with dilated convolutions, provide long-range connections between inputs (handwavily logarithmic). In an RNN, they are separated by however many steps in a linear way, so gradients more easily vanish. Some paper which I do not recall examined them side by side and found that RNNs quickly forget inputs, even with LSTMs, and this means their theoretically unlimited long-range connections between inputs via their hidden state don't wind up being that useful.

trott · on Feb 3, 2019

In an RNN, you could connect each hidden state at time step t, h(t) to h(t-N), instead of, or in addition to, h(t-1), making it analogous to dilated convolutions, but with hidden-to-hidden connections at the same layer.

So I don't think RNNs are fundamentally more myopic than CNNs (just that there may be practical advantages to using the latter)

Hierarchical RNNs, Clockwork RNNs and Hierarchical Multiscale RNNs and probably others are doing things of this nature.

gwern · on Feb 4, 2019

You could, but it's not equivalent, and no one seems to have been able to use clockwork RNNs or related archs to achieve similar performance, so the differences would seem to make a difference.

trott · on Feb 4, 2019

Right. I'm just saying that this myopia is not a fundamental property of the recurrence any more than of convolution.

Clockwork RNNs subsample, BTW, so they are more analogous stride=2 in CNNs than to dilation.

an_opabinia · on Feb 4, 2019

That’s an awful lot of woo to describe something “theoretical” in the sense of being imaginary but not theoretical in the sense of ever proven in a rigorous way mathematically.

Just some papers, you know?

We’re so screwed.

swairshah · on Dec 8, 2015

this twitter account just got deleted https://twitter.com/dr_craig_wright

3301cicada · on Dec 9, 2015

I followed him for his work on Cloudcroft, until an hour ago both of his blogs and Google+ account were active. Completely out now :( http://gse-compliance.blogspot.com/ http://security-doctor.blogspot.com/ https://plus.google.com/117910648569393591305/posts

swairshah · on Nov 30, 2014

This is really cool. I'm working in systems research, but I'm fascinated by mathematical research in CS. Did you continue with similar work after your PhD, If I may ask?

ivansavz · on Nov 30, 2014

I've since switched my research focus to machine learning (look up latent Dirichlet allocation, very cool stuff). I find a lot of parallels between the two fields: prob. theory, matrices, uncertainty, ...

I'm still following quantum information theory research, but more as a spectator from the sidelines. However, a couple of weeks ago I had to come back to quantum info. theory to "defend" my academic reputation. Our colleagues from TIFR found a bug in one of our papers (http://arxiv.org/abs/1111.3645v3) so my coauthor and I had to fix it. It was kind of cool to see I hadn't "lost my quantum skills" after two years of running a business. I guess, once you go quantum you never go back? :)