That's true. Parquet went through the weirdest changes between its various revisions and because it was used for Hadoop data lakes, there's a whole bunch of data that is being stored in legacy formats. Off the top of my head:
- different physical types to store timestamps: INT96 vs INT64
- different ways to interpret timestamps before tzdb (current vs earliest tzdb record)
- different ways to handle proleptic Gregorian dates and timestamps
- different ways to handle time zones (since Parquet only has the equivalents of LocalDateTime and Instant, but no OffsetDateTime or ZonedDateTime and earlier versions of Hive 3 were terribly confused which is which)
- decimal data type was written differently, as a byte array in older versions and as int/byte array/binary in the newer ones
- Hadoop ecosystem doesn't support decimals longer than 38 digits, but the file format supports them
- different physical types to store timestamps: INT96 vs INT64
- different ways to interpret timestamps before tzdb (current vs earliest tzdb record)
- different ways to handle proleptic Gregorian dates and timestamps
- different ways to handle time zones (since Parquet only has the equivalents of LocalDateTime and Instant, but no OffsetDateTime or ZonedDateTime and earlier versions of Hive 3 were terribly confused which is which)
- decimal data type was written differently, as a byte array in older versions and as int/byte array/binary in the newer ones
- Hadoop ecosystem doesn't support decimals longer than 38 digits, but the file format supports them