PyIceberg is a Python implementation for accessing Iceberg tables, without the need of a JVM.
You can install the latest release version from pypi:
pip3 install "pyiceberg[s3fs,hive]"
Install it directly for Github (not recommended), but sometimes handy:
pip install "git+https://github.com/apache/iceberg.git#subdirectory=python&egg=pyiceberg[s3fs]"
Or clone the repository for local development:
git clone https://github.com/apache/iceberg.git cd iceberg/python pip3 install -e ".[s3fs,hive]"
You can mix and match optional dependencies depending on your needs:
|hive||Support for the Hive metastore|
|glue||Support for AWS Glue|
|pyarrow||PyArrow as a FileIO implementation to interact with the object store|
|duckdb||Installs both PyArrow and DuckDB|
|s3fs||S3FS as a FileIO implementation to interact with the object store|
|adlfs||ADLFS as a FileIO implementation to interact with the object store|
|snappy||Support for snappy Avro compression|
You either need to install
pyarrow for fetching files.
There is both a CLI and Python API available.