Skip to content

PyIceberg

PyIceberg is a Python implementation for accessing Iceberg tables, without the need of a JVM.

Install

Before installing PyIceberg, make sure that you're on an up-to-date version of pip:

pip install --upgrade pip

You can install the latest release version from pypi:

pip install "pyiceberg[s3fs,hive]"

Install it directly for Github (not recommended), but sometimes handy:

pip install "git+https://github.com/apache/iceberg.git#subdirectory=python&egg=pyiceberg[s3fs]"

Or clone the repository for local development:

git clone https://github.com/apache/iceberg.git
cd iceberg/python
pip3 install -e ".[s3fs,hive]"

You can mix and match optional dependencies depending on your needs:

Key Description:
hive Support for the Hive metastore
glue Support for AWS Glue
dynamodb Support for AWS DynamoDB
pyarrow PyArrow as a FileIO implementation to interact with the object store
pandas Installs both PyArrow and Pandas
duckdb Installs both PyArrow and DuckDB
ray Installs PyArrow, Pandas, and Ray
s3fs S3FS as a FileIO implementation to interact with the object store
adlfs ADLFS as a FileIO implementation to interact with the object store
snappy Support for snappy Avro compression
gcs GCS as the FileIO implementation to interact with the object store

You either need to install s3fs, adlfs, gcs, or pyarrow for fetching files.

There is both a CLI and Python API available.