Your issue is that you're using the default (old) binding to GDAL, based on Fion...

culebron21 · on March 27, 2024

CSV is still faster than geo-formats with pyogrio. From what I saw, it writes most of the file quickly, then spends a lot of time, I think, building the index.

        > %%timeit -n 1
        > d.to_csv('/tmp/test.csv')
        10.8 s ± 1.05 s per loop (mean ± std. dev. of 7 runs, 1 loop each)

        > %%timeit -n 1
        > d2.to_file('/tmp/test.gpkg', engine='pyogrio')
        1min 15s ± 5.96 s per loop (mean ± std. dev. of 7 runs, 1 loop each)

        > %%timeit -n 1
        > d.to_csv('/tmp/test.csv.gz')
        35.3 s ± 1.37 s per loop (mean ± std. dev. of 7 runs, 1 loop each)

        > %%timeit -n 1
        > d2.to_file('/tmp/test.fgb', driver='FlatGeobuf', engine='pyogrio')
        19.9 s ± 512 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

        > ls -lah /tmp/test*
        -rw-rw-r-- 1 culebron culebron 228M мар 27 11:02 /tmp/test.csv
        -rw-rw-r-- 1 culebron culebron  63M мар 27 11:27 /tmp/test.csv.gz
        -rw-rw-r-- 1 culebron culebron 545M мар 27 11:52 /tmp/test.fgb
        -rw-r--r-- 1 culebron culebron 423M мар 27 11:14 /tmp/test.gpkg

culebron21 · on March 27, 2024

Still CSV is 2x smaller than GPKG with this kind of data. And CSV.gz is 7x smaller.

kylebarron · on March 27, 2024

That's why I'm working on the GeoParquet spec [0]! It gives you both compression-by-default and super fast reads and writes! So it's usually as small as gzipped CSV, if not smaller, while being faster to read and write than GeoPackage.

Try using `GeoDataFrame.to_parquet` and `GeoPandas.read_parquet`

[0]: https://github.com/opengeospatial/geoparquet

culebron21 · on March 27, 2024

...but this has spared me today some irritation at work. Thanks!