Skip to content

io

Base FileIO classes for implementing reading and writing table files.

The FileIO abstraction includes a subset of full filesystem implementations. Specifically, Iceberg needs to read or write a file at a given location (as a seekable stream), as well as check if a file exists. An implementation of the FileIO abstract base class is responsible for returning an InputFile instance, an OutputFile instance, and deleting a file given its location.

FileIO

Bases: ABC

A base class for FileIO implementations.

Source code in pyiceberg/io/__init__.py
class FileIO(ABC):
    """A base class for FileIO implementations."""

    properties: Properties

    def __init__(self, properties: Properties = EMPTY_DICT):
        self.properties = properties

    @abstractmethod
    def new_input(self, location: str) -> InputFile:
        """Get an InputFile instance to read bytes from the file at the given location.

        Args:
            location (str): A URI or a path to a local file.
        """

    @abstractmethod
    def new_output(self, location: str) -> OutputFile:
        """Get an OutputFile instance to write bytes to the file at the given location.

        Args:
            location (str): A URI or a path to a local file.
        """

    @abstractmethod
    def delete(self, location: Union[str, InputFile, OutputFile]) -> None:
        """Delete the file at the given path.

        Args:
            location (Union[str, InputFile, OutputFile]): A URI or a path to a local file--if an InputFile instance or
                an OutputFile instance is provided, the location attribute for that instance is used as the URI to delete.

        Raises:
            PermissionError: If the file at location cannot be accessed due to a permission error.
            FileNotFoundError: When the file at the provided location does not exist.
        """

delete(location) abstractmethod

Delete the file at the given path.

Parameters:

Name Type Description Default
location Union[str, InputFile, OutputFile]

A URI or a path to a local file--if an InputFile instance or an OutputFile instance is provided, the location attribute for that instance is used as the URI to delete.

required

Raises:

Type Description
PermissionError

If the file at location cannot be accessed due to a permission error.

FileNotFoundError

When the file at the provided location does not exist.

Source code in pyiceberg/io/__init__.py
@abstractmethod
def delete(self, location: Union[str, InputFile, OutputFile]) -> None:
    """Delete the file at the given path.

    Args:
        location (Union[str, InputFile, OutputFile]): A URI or a path to a local file--if an InputFile instance or
            an OutputFile instance is provided, the location attribute for that instance is used as the URI to delete.

    Raises:
        PermissionError: If the file at location cannot be accessed due to a permission error.
        FileNotFoundError: When the file at the provided location does not exist.
    """

new_input(location) abstractmethod

Get an InputFile instance to read bytes from the file at the given location.

Parameters:

Name Type Description Default
location str

A URI or a path to a local file.

required
Source code in pyiceberg/io/__init__.py
@abstractmethod
def new_input(self, location: str) -> InputFile:
    """Get an InputFile instance to read bytes from the file at the given location.

    Args:
        location (str): A URI or a path to a local file.
    """

new_output(location) abstractmethod

Get an OutputFile instance to write bytes to the file at the given location.

Parameters:

Name Type Description Default
location str

A URI or a path to a local file.

required
Source code in pyiceberg/io/__init__.py
@abstractmethod
def new_output(self, location: str) -> OutputFile:
    """Get an OutputFile instance to write bytes to the file at the given location.

    Args:
        location (str): A URI or a path to a local file.
    """

InputFile

Bases: ABC

A base class for InputFile implementations.

Parameters:

Name Type Description Default
location str

A URI or a path to a local file.

required

Attributes:

Name Type Description
location str

The URI or path to a local file for an InputFile instance.

exists bool

Whether the file exists or not.

Source code in pyiceberg/io/__init__.py
class InputFile(ABC):
    """A base class for InputFile implementations.

    Args:
        location (str): A URI or a path to a local file.

    Attributes:
        location (str): The URI or path to a local file for an InputFile instance.
        exists (bool): Whether the file exists or not.
    """

    def __init__(self, location: str):
        self._location = location

    @abstractmethod
    def __len__(self) -> int:
        """Return the total length of the file, in bytes."""

    @property
    def location(self) -> str:
        """The fully-qualified location of the input file."""
        return self._location

    @abstractmethod
    def exists(self) -> bool:
        """Check whether the location exists.

        Raises:
            PermissionError: If the file at self.location cannot be accessed due to a permission error.
        """

    @abstractmethod
    def open(self, seekable: bool = True) -> InputStream:
        """Return an object that matches the InputStream protocol.

        Args:
            seekable: If the stream should support seek, or if it is consumed sequential.

        Returns:
            InputStream: An object that matches the InputStream protocol.

        Raises:
            PermissionError: If the file at self.location cannot be accessed due to a permission error.
            FileNotFoundError: If the file at self.location does not exist.
        """

location: str property

The fully-qualified location of the input file.

__len__() abstractmethod

Return the total length of the file, in bytes.

Source code in pyiceberg/io/__init__.py
@abstractmethod
def __len__(self) -> int:
    """Return the total length of the file, in bytes."""

exists() abstractmethod

Check whether the location exists.

Raises:

Type Description
PermissionError

If the file at self.location cannot be accessed due to a permission error.

Source code in pyiceberg/io/__init__.py
@abstractmethod
def exists(self) -> bool:
    """Check whether the location exists.

    Raises:
        PermissionError: If the file at self.location cannot be accessed due to a permission error.
    """

open(seekable=True) abstractmethod

Return an object that matches the InputStream protocol.

Parameters:

Name Type Description Default
seekable bool

If the stream should support seek, or if it is consumed sequential.

True

Returns:

Name Type Description
InputStream InputStream

An object that matches the InputStream protocol.

Raises:

Type Description
PermissionError

If the file at self.location cannot be accessed due to a permission error.

FileNotFoundError

If the file at self.location does not exist.

Source code in pyiceberg/io/__init__.py
@abstractmethod
def open(self, seekable: bool = True) -> InputStream:
    """Return an object that matches the InputStream protocol.

    Args:
        seekable: If the stream should support seek, or if it is consumed sequential.

    Returns:
        InputStream: An object that matches the InputStream protocol.

    Raises:
        PermissionError: If the file at self.location cannot be accessed due to a permission error.
        FileNotFoundError: If the file at self.location does not exist.
    """

InputStream

Bases: Protocol

A protocol for the file-like object returned by InputFile.open(...).

This outlines the minimally required methods for a seekable input stream returned from an InputFile implementation's open(...) method. These methods are a subset of IOBase/RawIOBase.

Source code in pyiceberg/io/__init__.py
@runtime_checkable
class InputStream(Protocol):
    """A protocol for the file-like object returned by InputFile.open(...).

    This outlines the minimally required methods for a seekable input stream returned from an InputFile
    implementation's `open(...)` method. These methods are a subset of IOBase/RawIOBase.
    """

    @abstractmethod
    def read(self, size: int = 0) -> bytes: ...

    @abstractmethod
    def seek(self, offset: int, whence: int = SEEK_SET) -> int: ...

    @abstractmethod
    def tell(self) -> int: ...

    @abstractmethod
    def close(self) -> None: ...

    def __enter__(self) -> InputStream:
        """Provide setup when opening an InputStream using a 'with' statement."""

    @abstractmethod
    def __exit__(
        self, exctype: Optional[Type[BaseException]], excinst: Optional[BaseException], exctb: Optional[TracebackType]
    ) -> None:
        """Perform cleanup when exiting the scope of a 'with' statement."""

__enter__()

Provide setup when opening an InputStream using a 'with' statement.

Source code in pyiceberg/io/__init__.py
def __enter__(self) -> InputStream:
    """Provide setup when opening an InputStream using a 'with' statement."""

__exit__(exctype, excinst, exctb) abstractmethod

Perform cleanup when exiting the scope of a 'with' statement.

Source code in pyiceberg/io/__init__.py
@abstractmethod
def __exit__(
    self, exctype: Optional[Type[BaseException]], excinst: Optional[BaseException], exctb: Optional[TracebackType]
) -> None:
    """Perform cleanup when exiting the scope of a 'with' statement."""

OutputFile

Bases: ABC

A base class for OutputFile implementations.

Parameters:

Name Type Description Default
location str

A URI or a path to a local file.

required

Attributes:

Name Type Description
location str

The URI or path to a local file for an OutputFile instance.

exists bool

Whether the file exists or not.

Source code in pyiceberg/io/__init__.py
class OutputFile(ABC):
    """A base class for OutputFile implementations.

    Args:
        location (str): A URI or a path to a local file.

    Attributes:
        location (str): The URI or path to a local file for an OutputFile instance.
        exists (bool): Whether the file exists or not.
    """

    def __init__(self, location: str):
        self._location = location

    @abstractmethod
    def __len__(self) -> int:
        """Return the total length of the file, in bytes."""

    @property
    def location(self) -> str:
        """The fully-qualified location of the output file."""
        return self._location

    @abstractmethod
    def exists(self) -> bool:
        """Check whether the location exists.

        Raises:
            PermissionError: If the file at self.location cannot be accessed due to a permission error.
        """

    @abstractmethod
    def to_input_file(self) -> InputFile:
        """Return an InputFile for the location of this output file."""

    @abstractmethod
    def create(self, overwrite: bool = False) -> OutputStream:
        """Return an object that matches the OutputStream protocol.

        Args:
            overwrite (bool): If the file already exists at `self.location`
                and `overwrite` is False a FileExistsError should be raised.

        Returns:
            OutputStream: An object that matches the OutputStream protocol.

        Raises:
            PermissionError: If the file at self.location cannot be accessed due to a permission error.
            FileExistsError: If the file at self.location already exists and `overwrite=False`.
        """

location: str property

The fully-qualified location of the output file.

__len__() abstractmethod

Return the total length of the file, in bytes.

Source code in pyiceberg/io/__init__.py
@abstractmethod
def __len__(self) -> int:
    """Return the total length of the file, in bytes."""

create(overwrite=False) abstractmethod

Return an object that matches the OutputStream protocol.

Parameters:

Name Type Description Default
overwrite bool

If the file already exists at self.location and overwrite is False a FileExistsError should be raised.

False

Returns:

Name Type Description
OutputStream OutputStream

An object that matches the OutputStream protocol.

Raises:

Type Description
PermissionError

If the file at self.location cannot be accessed due to a permission error.

FileExistsError

If the file at self.location already exists and overwrite=False.

Source code in pyiceberg/io/__init__.py
@abstractmethod
def create(self, overwrite: bool = False) -> OutputStream:
    """Return an object that matches the OutputStream protocol.

    Args:
        overwrite (bool): If the file already exists at `self.location`
            and `overwrite` is False a FileExistsError should be raised.

    Returns:
        OutputStream: An object that matches the OutputStream protocol.

    Raises:
        PermissionError: If the file at self.location cannot be accessed due to a permission error.
        FileExistsError: If the file at self.location already exists and `overwrite=False`.
    """

exists() abstractmethod

Check whether the location exists.

Raises:

Type Description
PermissionError

If the file at self.location cannot be accessed due to a permission error.

Source code in pyiceberg/io/__init__.py
@abstractmethod
def exists(self) -> bool:
    """Check whether the location exists.

    Raises:
        PermissionError: If the file at self.location cannot be accessed due to a permission error.
    """

to_input_file() abstractmethod

Return an InputFile for the location of this output file.

Source code in pyiceberg/io/__init__.py
@abstractmethod
def to_input_file(self) -> InputFile:
    """Return an InputFile for the location of this output file."""

OutputStream

Bases: Protocol

A protocol for the file-like object returned by OutputFile.create(...).

This outlines the minimally required methods for a writable output stream returned from an OutputFile implementation's create(...) method. These methods are a subset of IOBase/RawIOBase.

Source code in pyiceberg/io/__init__.py
@runtime_checkable
class OutputStream(Protocol):  # pragma: no cover
    """A protocol for the file-like object returned by OutputFile.create(...).

    This outlines the minimally required methods for a writable output stream returned from an OutputFile
    implementation's `create(...)` method. These methods are a subset of IOBase/RawIOBase.
    """

    @abstractmethod
    def write(self, b: bytes) -> int: ...

    @abstractmethod
    def close(self) -> None: ...

    @abstractmethod
    def __enter__(self) -> OutputStream:
        """Provide setup when opening an OutputStream using a 'with' statement."""

    @abstractmethod
    def __exit__(
        self, exctype: Optional[Type[BaseException]], excinst: Optional[BaseException], exctb: Optional[TracebackType]
    ) -> None:
        """Perform cleanup when exiting the scope of a 'with' statement."""

__enter__() abstractmethod

Provide setup when opening an OutputStream using a 'with' statement.

Source code in pyiceberg/io/__init__.py
@abstractmethod
def __enter__(self) -> OutputStream:
    """Provide setup when opening an OutputStream using a 'with' statement."""

__exit__(exctype, excinst, exctb) abstractmethod

Perform cleanup when exiting the scope of a 'with' statement.

Source code in pyiceberg/io/__init__.py
@abstractmethod
def __exit__(
    self, exctype: Optional[Type[BaseException]], excinst: Optional[BaseException], exctb: Optional[TracebackType]
) -> None:
    """Perform cleanup when exiting the scope of a 'with' statement."""