More

kami8845 · on March 30, 2023

Editing the prompts (which are currently submitted via the system message similar to your linked app) is a great idea. I'll add it to the to-do list :)

kami8845 · on March 30, 2023

Hey, run locally in this case means: YakGPT has no backend. Whether you use the react app through https://yakgpt.vercel.app/ or run it on your own machine, I store none of your data. I will try and make this wording clearer!

NBJack · on March 31, 2023

In that case you're basically offering a browser-based client. 'Locally' strongly suggests this is running entirely on the machine (vs. making API calls). Going to break a lot of hearts out there with the wording as it is.

kami8845 · on March 30, 2023

Hey! I would love to. I seriously considered adding my own key into the app, and implementing some rate limiting to e.g. allow you to send 3 messages for free. But unfortunately that would require me to store some backend data on you that I do not want: I want this to be a completely "private" / FE-only application that stores no data on anyone.

connorgutman · on March 30, 2023

Testing YakGPT right now, excellent work! I would recommend adding some screenshots to the GitHub README so that people can get an idea of how it looks before entering their API key.

kami8845 · on March 30, 2023

Hey! I definitely understand the reservation. This is definitely me as well. My reasons for using the UI at this point:

* GPT-4 is decently faster when talking straight to the API

* The API is so stupidly cheap that it's basically a rounding error for me. Half an hour of chatting to GPT3.5 costs me $0.02

Would be curious what you mean by integrating the backend-api?

qwertox · on March 30, 2023

GPT-3.5 is really cheap (prompt and completion = $0.002 / 1K tokens), but GPT-4 is around 20 times more expensive (prompt = $0.03 / 1K tokens + completion = $0.06 / 1K tokens).

But the benefit from using the API is that you can change the model on the fly, so you chat with 3.5 until you notice that it's not responding properly and, with all the history you have (probably stored in your database), you can send a bigger request with a probably better response once with GPT-4 as the selected model.

I really wish the interface on chat.openai.org would allow me to switch between models in the same conversation in order to 1) not use up your quota of GPT-4 interactions per 3 hours as quickly and 2) not strain the backend unnecessarily when you know that starting a conversation with GPT-3.5 is efficient enough until you notice that you better switch models.

OpenAI already has this implemented: When you use up your quota of GPT-4 chats, it offers you to drop down into GPT-3.5 in that same conversation.

sebzim4500 · on March 30, 2023

Sure, but GPT-4 through the UI costs $20 per month, which is a lot of api calls.

moneywoes · on March 30, 2023

Isn’t it 10 per hour?

arthurcolle · on April 2, 2023

25 / 3 hrs

robopsychology · on March 30, 2023

How is it that cheap?! I ran three queries on langchain yesterday with two ConstitionalPrompts and it cost $0.22 - made me realize deploying my project for cheap could be expensive quick.

kami8845 · on March 30, 2023

GPT3.5 Turbo pricing is 10k tokens or ~7500 words for $0.02. Though note that every API request includes the entire chat context and charges for input & output tokens. https://openai.com/pricing

monkmartinez · on March 30, 2023

You need to check which model you are using, also... LangChain runs through the model several times with increased token count on each successive call.

robopsychology · on March 30, 2023

Yeah I assumed it would be doing several times but still more expensive than OP mentioned. I think the issue is I'm using davinci-003

drusepth · on March 30, 2023

Yeah, davinci-003 is gonna be gpt3, which is more expensive than 3.5.

One more anecdote: I've been running a half dozen gpt3.5 IRC bots for a few weeks and their total cost was less than a dollar. A few hours of playing around with LangChain on gpt3 cost me almost $4 before I realized I needed to switch to 3.5, though even then it still uses a ton of tokens every chain.

robopsychology · on March 30, 2023

Thanks, I'll do that later

agotterer · on March 30, 2023

I’d love to see a comparison of the average cost to use this with the OpenAI API versus subscribing to chat-gpt plus.

Maybe I’ll have to try this for a month and see if it end up costing more than $20. Thanks for creating it!

joenot443 · on March 30, 2023

Wow! Is it really that cheap? GPT4 is much more expensive, I imagine?

kami8845 · on March 30, 2023

GPT-4 is decently more expensive -- I personally really like & use the therapist character a lot. In this scenario the session would cost me less than $1 which is still much cheaper than any therapist I've used previously :)

coolspot · on March 30, 2023

What is your setup?

kami8845 · on March 16, 2023

If you're still running on Heroku after everything that went down last year, you really have no one to blame! We are happily on AWS now.

tmpz22 · on March 16, 2023

> We are happily on AWS now.

Hate to break it to you but you've always been on AWS ;)

jacobsenscott · on March 16, 2023

Did your total devops cost go up or down? I'm sure aws is less expensive than heroku but I assume now you are paying more to people to do what heroku did?

kami8845 · on March 16, 2023

We are spending about 60% less. Workload has actually lessened since AWS is so much more stable. Getting to a similar DX as Heroku was quite the lift, but once it's done, it's done. These days we generally only have outages when we screw something up ourselves. I recommend https://github.com/aws/copilot-cli for starting out on ECS.

jacobsenscott · on March 16, 2023

For all heroku's done wrong it has been rock solid for us for nearly 10 years. The only outage I can remember in all that time is when they were having issues with their upstream dns a little while ago. Even this incident isn't causing any down time for us - we just can't deploy updates until it is fixed. I guess if we had a critical fix on deck it could be considered an "outage" though.

copilot cli looks cool though.

zoomzoom · on March 17, 2023

Withcoherence.com is another way to get a great developer experience without doing all the work yourself.

Disclosure - I’m a cofounder

suryao · on March 16, 2023

If you want help with a kubernetes paas type setup, check out argonaut.dev

Disclaimer: I'm the founder

abathur · on March 16, 2023

AWS outages are, per the status page, quite rare.

jeremyjh · on March 17, 2023

Nothing is more reliable than the AWS status page.

duxup · on March 17, 2023

I thought just the free tier went away.

kami8845 · on Dec 26, 2022

At my current job (sr. eng) I have an average of 45 minutes of meetings per day

kami8845 · on April 17, 2022

A few issues I have with this blog post:

1. It doesn't show off the unique capabilities of firecracker very well.

2. The comparison not very fair.

2a. The docker-build step (which dominates the runtime) is run without any caching, just by adding 2 lines to your build-push-action, "cache-from: type=gha, cache-to: type=gha,mode=max" you can make it a lot faster.

2b. ~1m20s of the time is just "VM start". GitHub Actions has had a rough time recently, but you should never wait that long to get your CI running in day-to-day operation.

2c. The tests are unrealistically short at 20s which allows the author to get to their 10x faster number.

Let's say the GitHub Action starts in 5 seconds, the GitHub Actions cache reduces the build time to 2 minutes and the tests take 10 minutes to run. Now Firecracker is 20% faster ...

You can also get comparable performance out of https://buildkite.com/ which lets you self-host runners on AWS meaning you're almost guaranteed to get a hot docker cache (running against locally attached SSDs). You can now start running your tests (almost) as fast with much more mature tooling.

shoo · on April 18, 2022

> You can also get comparable performance out of https://buildkite.com/ which lets you self-host runners on AWS

you can self-host github runners as well, with a few caveats, the most serious one being that then you are responsible for cleaning up the state of your self-hosted runner between runs

https://docs.github.com/en/actions/hosting-your-own-runners/...

structural isolation guarantees of the form "build execution during run N cannot possibly impact build execution of run N+1" are tremendously helpful -- they reduce the number of weird CI failures and the cost to triage and fix each weird CI failure (by reducing the space of possible interactions). If you cannot offer similar guarantees when self hosting your own CI infrastructure then it may not be wise to self host.

nicoburns · on April 18, 2022

> structural isolation guarantees of the form "build execution during run N cannot possibly impact build execution of run N+1" are tremendously helpful

Yes although the flip side is that it can make caching much less efficient. My experience is that caching layer can often take several minutes to download, dominating CI run time, whereas this might be instant on a self-hosted box that just leaves the previous build artefacts in place.

colinchartier · on April 18, 2022

I tried to get docker layer caching working within GHA for a second benchmark, but it seems like none of the approaches work particularly well for a "docker-compose build" - I'd happily amend the post with a second benchmark if you wouldn't mind opening a PR based on the existing one [1]

https://github.com/webappio/livechat-example/blob/be7c9121c1...

The point still stands for 2c - you can super easily parallelize with firecracker (by taking a snapshot of the state right before the test runs, then loading it a bunch of times)

uf00lme · on April 18, 2022

I did something similar w/VMware nearly 8-10 years ago, just separate storage from OS in your thinking and automate provisioning as needed.

It’s possible to do something similar with ec2 but if you want speed then move your build farm on-prem to utilise storage provider technologies eg https://tintri.com/blog/. Portworx+Stork would probably get you most of the way if you went down the k8 route but not something I’ve seen or look at in detail.

In my experience if your OS boot time is your main bottle neck then you are likely over optimising before expanding your testing environments. Fun but not the best use of development time, it is easier to have a minimum amount of system online before they are needed and just attach new storage to them as needed.

huijzer · on April 18, 2022

I have to mention GitLab here. Their runners are extremely easy to self-host.

kami8845 · on Nov 12, 2021

I'm currently waiting for my EAD (Applied for a GC via marriage to a citizen). What's your opinion on expedite requests with the reason being financial loss (I have a pending job offer from startup)? Is it worth trying?

proberts · on Nov 13, 2021

It's definitely worth trying - there's no downside - but the chance of success is low because there are so many people in the same situation. The limited success I've seen is where someone has a family and is the sole wage earner or someone is working in a field where there are real timing issues/deadlines, such as education, or where someone is working in a national interest field, such as healthcare.

kami8845 · on Nov 13, 2021

Thank you for the reply!

kami8845 · on Oct 19, 2021

In those instances, do the managers at BiggerCorp who initiate the deal get the equity or the company?

nhumrich · on Oct 19, 2021

Usually, the manager.

kami8845 · on Aug 26, 2021

Something I've noticed as well. Might be related to geographic diversification, or that "[App] for [Emerging Market]" better answers the "Why now?" question.