Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

It’s a bit of a mess right now. AWS is eating this market but frankly their products are not that great. Their ETL tool which does parallel execution and all is called Glue which is a cloud version of Spark. Glue is supposed to integrate with SageMaker which is basically your standard jupyter notebook experience. Spark though not that intuitive and is not the tool data scientists use for exploration. So data scientists explore and build model and then they rebuild them to run in Spark. Basically we would need a way to seamlessly scale pandas or R dataframes across clusters. Dask looks promising but it is facing an uphill battle vs aws and company and their inferior but convenient tooling.


A friend of mine is trying to build the databricks of dask for exactly that reason.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: