Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Ask HN: Why does validating a user require 14000 files?
61 points by yashg on Aug 30, 2021 | hide | past | favorite | 49 comments
I thought of using Sign in with Google instead of creating my own authentication for a new project I am building. I don't want my users to remember yet another password. I myself have been using Google to sign into many of the new services these days. So I went about implementing Google Identity SDK. Easy to setup. but when it comes to validating the Google ID token on the server side, it requires Google Client SDK.

The php SDK I am using has 14800 files totaling 37 MB!

I just want to validate a user.

I tried removing unnecessary services from composer.json. Still 14000+ files.

I just want to validate a user.

This just reinforces my general dislike of frameworks. Just so much of unnecessary fluff that your project if never going to use.

I just want to validate a user. It shouldn't require including 14000 files in my project.



Slightly out of topic, but beware when using Google to authenticate everywhere. The day Google decides to lock your account and they don't care about you, you are locked out of your digital life.

By the way, I go out of my way to avoid signing into services using third parties for this reason. I don't use a Google account neither so I would be locked out of any service requiring one.

Your service will be highly dependent on Google if you only allow signing with Google and if they have a failure, your service will be unusable. Allowing other providers would highly improve this.


Seconding, and I'd like to refine the point further:

> The day Google decides to lock your account (..) you are locked out from your digital life.

It takes getting banned because of mischief, perceived mischief, or algorithmic error on one of their services to instantly, and usually permanently, lose access to all of their services, and any other third party service you logged into with Google.

In other words: make a "wrong" comment on YouTube, and you lose decade(s) of your e-mails. If you were running your small business off Google's Office Suite on your personal account, well, you can kiss your business goodbye.

(In case anyone doubts that happens, I recommend checking out https://9to5google.com/2019/11/09/google-account-bans-youtub... for a recent example.)

And, of course, this pattern of behavior is not limited to Google. Big cloud services all have problematic or nonexistent customer support experience - a consequence of scaling up without paying the associated costs. It's thus more important than ever to diversify one's digital footprint and minimize dependencies.


I wanted to remove the profile picture on my account, but could not find such button anywhere in the account settings. So instead of removing, I replaced it with a blank white square. I thought I was clever.

Instead, I got a strike on my account.

IIRC it takes three strikes and you're out. But if a strike is handed out due to such a silly thing, I wonder what the next silly thing is going to be?

Clearly I cannot rely on such a platform, considering I would likely NEVER get my account back. From that point onwards, I've tried to reduce my dependency on Google accounts as much as possible.


Yes that's a risk. Our digital lives depend so much on the mercy on the tech overlords. I use third party signins only for non essential services. Like someone building a Twitter thread maker, or an online design tool. There are many tools and services which shouldn't require an account but they still insist on having one, in that case I use a third party sign in. For anything important, I prefer to create a password.


Or at least, always use a custom domain if you’re going to do this. Although even then some places make it a faff to change to another workflow with the same address.


> I go out of my way to avoid signing into services using third parties

This. I make separate account everywhere. When site doesnt allow sign in with new account I dont bother.


The way I implement Google logins on my pages is using the email to create a traditional user account that can also login with a password. Sometimes I even force the user to setup a password.

I am no fan of the idea either, but I like the option to login on a single click.


No one reads the readme anymore? https://github.com/googleapis/google-api-php-client#cleaning... "Cleaning up unused services There are over 200 Google API services. The chances are good that you will not want them all. In order to avoid shipping these dependencies with your code, you can run the Google\Task\Composer::cleanup task and specify the services you want to keep in composer.json:"


It feels like some sort of a tree shaking solution in most of our modern development stacks is long overdue.

If only we could get rid of dynamically called code and instead could evaluate which code paths the code actually can call, then we'd be able to throw out about 90% of the contents of libraries, since most of the time noone actually uses all of that functionality.

Doing that with separate scripts is another approach, but it feels like the problem should be solved at its root.


It sounds like what's already available in C#.

https://docs.microsoft.com/en-us/dotnet/core/deploying/trimm...


This sounds like it could be equivalent to the halting problem


With a conservative analysis it's possible to make an underestimation on the names one can remove. This shouldn't be done on object files but at a level where program flow information is still preserved. A flow-independent analysis using a superset lattice over the callable names should do the trick for a first-order implementation that will look good enough compared to shipping 37mb for google authentication. Further you could make it flow-sensitive for a more accurate analysis with the caveat that the runtime will be significantly longer.


Control flow graphs are well-studied, critical parts of most compilers. The halting problem involves complexity of semantics in loops and jumps, but it’s possible to understand what loops may run just fine, without being able to decide exactly when a loop may exit.

In many cases Clang and GCC compiler can easily understand loops, and even reduce them to constant time.


But Composer is the cause of all those file dependencies. Things like Composer and NPM encourage devs to re-use scores of 3rd-party libraries rather than writing new code; those libraries also rely on NPM/Composer; so they also have scores of dependencies. And it's impractical to audit them all.

I don't understand why people are so willing to depend on these crazy dependency trees, where most of the code was written by people you don't even know the names of.


Before dependency managers, you downloaded a zip with php files from people you dont even know the names of. (And never update any of the external libraries)

These managers dont replace the chain of trust, if you dont trust a dependecy and the developer behind dont install it.

"And it's impractical to audit them all." I dont use NPM but for composer.json: https://github.com/fabpot/local-php-security-checker

I agree, composer is not perfect, but before it was worse.


"I tried removing unnecessary services from composer.json. Still 14000+ files."


Validate a User Only: composer.json

    {
        "require": {
            "google/apiclient": "^2.10"
        },
        "scripts": {
            "pre-autoload-dump": "Google\\Task\\Composer::cleanup"
        },
        "extra": {
            "google/apiclient-services": [
            ]
        }
    }
=> vendor folder size: 6MB, 1012 Files


Cool. I did not remove enough services it seems.


To joke about this absurd situation, I would say it would be a lot easier to go find the user yourself face-to-face and validate his or her data.

To paraphrase what @js4ever said, it would make sense to implement it yourself in your backend than depend on Google Sign In.

@jraph, I completely agree with you; it happened to me and stopped using Google, apart from an email I have for my mobile apps and that is rarely used.


Just use OAuth 2.0, which Google supports, no need to use the identity SDK.


Just some ramble... ...fin-tech goes out of their way to identify a user but governments pretend a picture of a passport or ID card is sufficient(?!) Talking with IT folk since the early days of the web I kept hearing high praise for anonymity which of course has its place... until it doesn't? The Orwellian "track people everywhere" scheme, at the time, did seem like a good excuse not to but today we get identified mostly for sinister purposes? (howmany files would facebook have in their project to track you?) As a developer most often I just want a user to be unique. That cant really be done without major effort. I prefer not to store any information about people but if I don't it gets easier to make a thousand accounts. Everything kinda works but if we are honest it doesn't? It works but the solution isn't to inform google when people log in where(!?) or to hand out your email address to everyone(!?) or to allow anyone to send registration mails from your server and get black listed(!?) Geo data? One can simply configure that. Facial recognition? One can have as many faces as one likes.

It seems so simple to have a global standard for some government API with perhaps some [portable] card reader and perhaps some ISP logic. Nice developer tokens with permission levels that can be revoked?

Or is that [again] one more naive idea?


> but governments pretend a picture of a passport or ID card is sufficient(?!)

That's because forging the passport or ID card, or impersonating someone else with borrowed/stolen passport or ID card, are serious offenses that will land you in prison.

What's often forgotten when comparing digital and meatspace security is that "angry men with guns will come for you and put you in front of a judge" is a critical part of security systems in meatspace. The very possibility of it acts as a powerful deterrent for casual attacks. Governments can rely more on this than private companies, so at least in meatspace, you don't need to jump through as many authentication hoops as you need with banks.

(Digital is different, governments sometimes overdo the hoops there, because for any Internet-connected system the pool of would-be attackers consists mostly of people from outside the reach of law enforcement of the country that hosts the system.)


> What's often forgotten when comparing digital and meatspace security is that "angry men with guns will come for you and put you in front of a judge" is a critical part of security systems in meatspace.

This sounds like governments and government 'officials' (ooh, you work for the government, now you are 'official') somehow automatically loose their jurisdiction somewhere between meatspace and cyberspace. They don't.

I mean, ask Kim Dotcom about angry men with guns. Don't ask him about the judges, I guess.


https://en.m.wikipedia.org/wiki/EIDAS

There was also a French government plan (there was an app in beta, i should take a look at it again) to validate user identity with a smartphone and conformant passport / ID card ( the recent ones with chips and NFC) - you use the app to read the biometric info from the passport / ID card, and then take a video of yourself which is then matched, thus confirming who you are.


What SDK are you using? That sound like its (able of) doing a lot more than just token validation.

As a comparison: looking at googles docs for "Google Identity Toolkit" (which is abandoned, thanks google!), the used google/identity-toolkit-php-client[1] composer package installs ~196 files with ~212k LoC PHP source, totaling at around 7MB. Still a lot for validating a token, but most of that is from the google/apiclient library, which seems to be a generic client for google APIs, and the google/identity-toolkit-php-client just adds 10 files and 729 lines of php code, which seems pretty reasonable.

[edit]: Looking at the more recent (and alive) google/auth[2] seems to add around 200 files with 12k LoC PHP, including the guzzlehttp client, some firebase client and some psr API stubs.

[1]: https://packagist.org/packages/google/identity-toolkit-php-c... [2]: https://packagist.org/packages/google/auth



Oh wow, yeah. Looks like the culprit here is google/apiclient-services which provides a fat client for EVERY google API [1]

In classic google monorepo culture, thats one package for all the things. A great example on how not to structure your SDK.

So yeah, you are right. Thats a lot of code if you only use like 0.25% of it.

[1]: https://github.com/googleapis/google-api-php-client-services...


I liked what jQuery UI does. It provides a web based configurator to choose what components you want to use and then you download a custom file with only those components. Really useful Avoid wasteful loading of code you are never going to use.


This insanity is far more pervasive and has infected software distribution as well.

A couple of years back I installed a copy of Sonic Pi[1] on my machine and was horrified to see that the distribution basically dumped tens of thousands of extremely tiny Ruby files on the hard disk. Ever tried copying hundreds of thousands of sub 1 KB files from one disk to another and noticed the file system crying under the load?

Game developers solved this problem decades ago.[2][3] I don't know why other developers continue to be so backward in their thinking when distributing their software. You don't have to do anything special. Just use SQLite as a VFS and end the insanity.[4]

[1] https://github.com/sonic-pi-net/sonic-pi

[2] https://quakewiki.org/wiki/.pak

[3] https://wiki.totalwar.com/w/Community_hints,_tips_and_tutori...

[4] https://sqlite.org/appfileformat.html


> I tried removing unnecessary services from composer.json. Still 14000+ files.

Ugh, please don't do that. The whole point of using this SDK is that people can trust it works, and if there are bugs you can just bump your dependency.

It's a fair point to make that libraries are bloated these days, but manually hacking down dependencies, or re-inventing the wheel are far worse practices!


> ... manually hacking down dependencies, or re-inventing the wheel are far worse practices!

Disagree. Allowing such bloat to permeate your codebase, and supporting such practices as this fat SDK, are far worse. Short-term gain vs long-term loss.


> I thought of using Sign in with Google instead of creating my own authentication for a new project I am building.

Besides the topic regarding "roll your own stuff" vs "use a huge library", I would ask a more business-oriented question: would your users be alright regarding signing in to your app using their Google accounts? What if they don't have one? Email address + password is usually a good approach when it comes to login: I can always use a fake email address if I don't trust your service, whereas if only "login via Google" is enabled, I'm just going to pass.


I guess it boils down to personal preferences. I have come to detest passwords in general. Especially a new one for each website. (Because you don't know how the site is storing the passwords). I have created a Password Manager (HexaVault.com). If I am using a new service and it asks me to create an account, I go, "Oh man not one more password". Sure I can save it in my password manager which has around 500 passwords now I guess.

So if I see an option to login via some other service - Google/Facebook/Twitter/LinkedIn/Microsoft I prefer that.

What I have come to like is the password-less email only login on a few sites. They either mail you a link or a code to login. Which I think is the way forward. Nothing to remember as a user.


The Google Client SDK does an awful lot of stuff. In order to do them all in an efficient way it shares code.

It's definitely the case that if you only want the one call out of the whole thing then you could get by with only a small chunk of that code, but writing the Client SDK so that each feature was entirely independent would be a lot more work for them.

So they make their lives easier, and you get to have an extra 37MB of code on your client.


I decided not to use Google Sign In eventually. I will implement a password-less Email auth. Basically will mail the user an OTP to sign in every time.


This is what I usually do too, it works well for projects where user management is minimal and the only way to use the product is visiting the web site.

I mail the user a login link with a tamper free token, clicking the link will give the user a cookie.


That SDK is intended to include support for a bunch of different things, not just login. The actual login can be done in a couple of hundred LoC in PHP using cURL. I've done that for one of my projects. Search for OAuth 2.0 sample code. Once you understand it and get it working, other one-click logins like FB function in pretty much the same way. Adding support for them is trivial.


I agree with you! I had the same issue with Google SDK and replaced that with 80 lines of code in my backend instead.


Frameworks and SDKs serve three populations:

1. Developers who understand it's not efficient to reinvent the wheel, and also very often more secure.

2. Developers who wouldn't be able to do it without the framework.

3. Personal data stealing businesses.

Learn to recognize each :)


To be fair. Oauth works completely without SDK as well, you just have to call the right URLs with the right parameters.

Aren't there any third party oauth libs for php?


Why do you need users to sign in. Do they really want to sign in. What do they get out of it versus what do you get out of it

Using Sign in with Google just encourages use of Google. Is that really the best thing for users. Google is a privacy disaster

If you and your users truly both need authentication of each other why not let your users use x509 client certificates

You are probably already using x509 server certificates

The term "SDK" is synonymous with "unnecessary fluff". Its been that way since the 1990s


The ruby ones are split better, but still call in the grpc gem, which weighs at 409MB (Yup).


Is it possible to use a generic oAuth library?


oauth basically redirects the user to the provider, providing a callback url to your site, sending along a token that you should verify with the provider (google). thats basically it, there must be several generic providers for PHP for this but rolling your own is far from impossible and not that complicated.


This is what Google suggests

To verify that the token is valid, ensure that the following criteria are satisfied:

The ID token is properly signed by Google. Use Google's public keys (available in JWK or PEM format) to verify the token's signature. These keys are regularly rotated; examine the Cache-Control header in the response to determine when you should retrieve them again.

The value of aud in the ID token is equal to one of your app's client IDs. This check is necessary to prevent ID tokens issued to a malicious app being used to access data about the same user on your app's backend server.

The value of iss in the ID token is equal to accounts.google.com or https://accounts.google.com.

The expiry time (exp) of the ID token has not passed.

If you want to restrict access to only members of your G Suite domain, verify that the ID token has an hd claim that matches your G Suite domain name.

Rather than writing your own code to perform these verification steps, we strongly recommend using a Google API client library for your platform, or a general-purpose JWT library. For development and debugging, you can call our tokeninfo validation endpoint.


That does sound like a little too complicated to implement myself.


“Validating a user” isn’t a simple thing.

The tone of your post implies it should be easy.

There’s lots of stuff to do when validating a user.

Sure it’s not going to use 14,000 files but it’ll use lots of libraries to get things done.


I don't think you're being clear on all the details. You say this:

> I just want to validate one user.

This is important enough to you that you say it multiple times.

And yet, if I have just a single user, I can validate them easily: we agree on terms for exchanging auth info as a one-off, job done. I might agree to meet them in person and they show me their driver's license, or whatever. The details are irrelevant: we just use whatever we agree verifies this person.

I think that, just perhaps, you might have more than one user, and you're trying to scale this.

So, where are the details?

How much are you trying to scale, and what are the constraints around privacy and so forth.

You can't just say "I want a cheap perfect auth system with no constraints except for some hidden constraints which I'll explain later."


Of course I am not trying to validate just one user. But all I want to do is authenticate the token that Google has posted back on my auth endpoint and make sure 1) It has come from Google 2) Allows me to fetch a user's data (just email) so I know whose account I have to show.

I can write my own authentication using a simple username, hashed password in php easily. I just don't want my user to remember yet one more password for yet another service on the web.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: