reader
Classes for building the Reader tree.
Constructing a reader tree from the schema makes it easy to decouple the reader implementation from the schema.
The reader tree can be changed in such a way that the read schema is different, while respecting the read schema.
BinaryReader
¶
Bases: Reader
Read a binary value.
First reads an integer, to get the length of the binary value, then reads the binary field itself.
Source code in pyiceberg/avro/reader.py
DateReader
¶
DecimalReader
dataclass
¶
Bases: Reader
Reads a value as a decimal.
Decimal bytes are decoded as signed short, int or long depending on the size of bytes.
Source code in pyiceberg/avro/reader.py
FixedReader
dataclass
¶
Bases: Reader
Source code in pyiceberg/avro/reader.py
__len__()
¶
IntegerReader
¶
Bases: Reader
Longs and ints are encoded the same way, and there is no long in Python.
Source code in pyiceberg/avro/reader.py
ListReader
dataclass
¶
Bases: Reader
Source code in pyiceberg/avro/reader.py
MapReader
dataclass
¶
Bases: Reader
Source code in pyiceberg/avro/reader.py
398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 |
|
__hash__()
¶
_read_int_int(decoder)
¶
Read a mapping from int to int from the decoder.
Read a map of ints to ints from the decoder, since this is such a common data type, it is optimized to be faster than the generic map reader, by using a lazy dict.
The time it takes to create the python dictionary is much larger than the time it takes to read the data from the decoder as an array, so the lazy dict defers creating the python dictionary until it is actually accessed.
Source code in pyiceberg/avro/reader.py
StructReader
¶
Bases: Reader
Source code in pyiceberg/avro/reader.py
TimeReader
¶
Bases: IntegerReader
Reads a microsecond granularity timestamp from the stream.
Long is decoded as an integer which represents the number of microseconds from the unix epoch, 1 January 1970.
Source code in pyiceberg/avro/reader.py
TimestampReader
¶
Bases: IntegerReader
Reads a microsecond granularity timestamp from the stream.
Long is decoded as python integer which represents the number of microseconds from the unix epoch, 1 January 1970.
Source code in pyiceberg/avro/reader.py
TimestamptzReader
¶
Bases: IntegerReader
Reads a microsecond granularity timestamptz from the stream.
Long is decoded as python integer which represents the number of microseconds from the unix epoch, 1 January 1970.
Adjusted to UTC.
Source code in pyiceberg/avro/reader.py
_skip_map_array(decoder, skip_entry)
¶
Skips over an array or map.
Both the array and map are encoded similar, and we can reuse the logic of skipping in an efficient way.
From the Avro spec:
Maps (and arrays) are encoded as a series of blocks. Each block consists of a long count value, followed by that many key/value pairs in the case of a map, and followed by that many array items in the case of an array. A block with count zero indicates the end of the map. Each item is encoded per the map's value schema.
If a block's count is negative, its absolute value is used, and the count is followed immediately by a long block size indicating the number of bytes in the block. This block size permits fast skipping through data, e.g., when projecting a record to a subset of its fields.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
decoder
|
BinaryDecoder
|
The decoder that reads the types from the underlying data. |
required |
skip_entry
|
Callable[[], None]
|
Function to skip over the underlying data, element in case of an array, and the key/value in the case of a map. |
required |