PyIceberg
PyIceberg is a Python implementation for accessing Iceberg tables, without the need of a JVM.
Install
Before installing PyIceberg, make sure that you're on an up-to-date version of pip
:
You can install the latest release version from pypi:
Install it directly for Github (not recommended), but sometimes handy:
Or clone the repository for local development:
You can mix and match optional dependencies depending on your needs:
Key | Description: |
---|---|
hive | Support for the Hive metastore |
glue | Support for AWS Glue |
dynamodb | Support for AWS DynamoDB |
pyarrow | PyArrow as a FileIO implementation to interact with the object store |
pandas | Installs both PyArrow and Pandas |
duckdb | Installs both PyArrow and DuckDB |
ray | Installs PyArrow, Pandas, and Ray |
s3fs | S3FS as a FileIO implementation to interact with the object store |
adlfs | ADLFS as a FileIO implementation to interact with the object store |
snappy | Support for snappy Avro compression |
gcs | GCS as the FileIO implementation to interact with the object store |
You either need to install s3fs
, adlfs
, gcs
, or pyarrow
for fetching files.
There is both a CLI and Python API available.