Just used it for French right now. The Design is excellent! but the LLM task orientedness needs some work. The tutor needs to follow the curriculum well. This has the same issue that I have in my day job i.e. keeping the LLM on topic. Its not strict. i.e. after asking it to make sure to remind me to reply in french it very easily forgets to do so. Its not following a structured approach or even in casual conversation isn't correcting my mistakes unless I ask.
The tricky question is what env state do you compare in order to decide if its a cache hit or miss? just screen status, screen status + all open apps, that + all running processes etc. you know what i mean. I think its a solvable problem and very interesting one too. One now needs to think about what would a human consider while using a muscle memory and that varies based on the action i.e. "rm -rf ." requires knowing what directory i am in vs "click close + don't save" requires knowing I don't want the recent change.
I don't understand what you mean by 'non-locality of convolutions'. Isn't convolution inherently a local operation? This probably being one of the main reasons that CNNs are biased towards texture [0] and not shapes?
Convolutions in a hierarchy of layers, especially with dilated convolutions, provide long-range connections between inputs (handwavily logarithmic). In an RNN, they are separated by however many steps in a linear way, so gradients more easily vanish. Some paper which I do not recall examined them side by side and found that RNNs quickly forget inputs, even with LSTMs, and this means their theoretically unlimited long-range connections between inputs via their hidden state don't wind up being that useful.
In an RNN, you could connect each hidden state at time step t, h(t) to h(t-N), instead of, or in addition to, h(t-1), making it analogous to dilated convolutions, but with hidden-to-hidden connections at the same layer.
So I don't think RNNs are fundamentally more myopic than CNNs (just that there may be practical advantages to using the latter)
Hierarchical RNNs, Clockwork RNNs and Hierarchical Multiscale RNNs and probably others are doing things of this nature.
You could, but it's not equivalent, and no one seems to have been able to use clockwork RNNs or related archs to achieve similar performance, so the differences would seem to make a difference.
That’s an awful lot of woo to describe something “theoretical” in the sense of being imaginary but not theoretical in the sense of ever proven in a rigorous way mathematically.
This is really cool. I'm working in systems research, but I'm fascinated by mathematical research in CS. Did you continue with similar work after your PhD, If I may ask?
I've since switched my research focus to machine learning (look up latent Dirichlet allocation, very cool stuff). I find a lot of parallels between the two fields: prob. theory, matrices, uncertainty, ...
I'm still following quantum information theory research, but more as a spectator from the sidelines. However, a couple of weeks ago I had to come back to quantum info. theory to "defend" my academic reputation. Our colleagues from TIFR found a bug in one of our papers (http://arxiv.org/abs/1111.3645v3) so my coauthor and I had to fix it. It was kind of cool to see I hadn't "lost my quantum skills" after two years of running a business. I guess, once you go quantum you never go back? :)