metadata
TableMetadataCommonFields
¶
Bases: IcebergBaseModel
Metadata for an Iceberg table as specified in the Apache Iceberg spec.
https://iceberg.apache.org/spec/#iceberg-table-spec
Source code in pyiceberg/table/metadata.py
134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 |
|
current_schema_id: int = Field(alias='current-schema-id', default=DEFAULT_SCHEMA_ID)
class-attribute
instance-attribute
¶
ID of the table’s current schema.
current_snapshot_id: Optional[int] = Field(alias='current-snapshot-id', default=None)
class-attribute
instance-attribute
¶
ID of the current table snapshot.
default_sort_order_id: int = Field(alias='default-sort-order-id', default=UNSORTED_SORT_ORDER_ID)
class-attribute
instance-attribute
¶
Default sort order id of the table. Note that this could be used by writers, but is not used when reading because reads use the specs stored in manifest files.
default_spec_id: int = Field(alias='default-spec-id', default=INITIAL_SPEC_ID)
class-attribute
instance-attribute
¶
ID of the “current” spec that writers should use by default.
last_column_id: int = Field(alias='last-column-id')
class-attribute
instance-attribute
¶
An integer; the highest assigned column ID for the table. This is used to ensure fields are always assigned an unused ID when evolving schemas.
last_partition_id: Optional[int] = Field(alias='last-partition-id', default=None)
class-attribute
instance-attribute
¶
An integer; the highest assigned partition field ID across all partition specs for the table. This is used to ensure partition fields are always assigned an unused ID when evolving specs.
last_updated_ms: int = Field(alias='last-updated-ms', default_factory=lambda: datetime_to_millis(datetime.datetime.now().astimezone()))
class-attribute
instance-attribute
¶
Timestamp in milliseconds from the unix epoch when the table was last updated. Each table metadata file should update this field just before writing.
location: str = Field()
class-attribute
instance-attribute
¶
The table’s base location. This is used by writers to determine where to store data files, manifest files, and table metadata files.
metadata_log: List[MetadataLogEntry] = Field(alias='metadata-log', default_factory=list)
class-attribute
instance-attribute
¶
A list (optional) of timestamp and metadata file location pairs that encodes changes to the previous metadata files for the table. Each time a new metadata file is created, a new entry of the previous metadata file location should be added to the list. Tables can be configured to remove oldest metadata log entries and keep a fixed-size log of the most recent entries after a commit.
partition_specs: List[PartitionSpec] = Field(alias='partition-specs', default_factory=list)
class-attribute
instance-attribute
¶
A list of partition specs, stored as full partition spec objects.
properties: Dict[str, str] = Field(default_factory=dict)
class-attribute
instance-attribute
¶
A string to string map of table properties. This is used to control settings that affect reading and writing and is not intended to be used for arbitrary metadata. For example, commit.retry.num-retries is used to control the number of commit retries.
refs: Dict[str, SnapshotRef] = Field(default_factory=dict)
class-attribute
instance-attribute
¶
A map of snapshot references. The map keys are the unique snapshot reference names in the table, and the map values are snapshot reference objects. There is always a main branch reference pointing to the current-snapshot-id even if the refs map is null.
schemas: List[Schema] = Field(default_factory=list)
class-attribute
instance-attribute
¶
A list of schemas, stored as objects with schema-id.
snapshot_log: List[SnapshotLogEntry] = Field(alias='snapshot-log', default_factory=list)
class-attribute
instance-attribute
¶
A list (optional) of timestamp and snapshot ID pairs that encodes changes to the current snapshot for the table. Each time the current-snapshot-id is changed, a new entry should be added with the last-updated-ms and the new current-snapshot-id. When snapshots are expired from the list of valid snapshots, all entries before a snapshot that has expired should be removed.
snapshots: List[Snapshot] = Field(default_factory=list)
class-attribute
instance-attribute
¶
A list of valid snapshots. Valid snapshots are snapshots for which all data files exist in the file system. A data file must not be deleted from the file system until the last snapshot in which it was listed is garbage collected.
sort_orders: List[SortOrder] = Field(alias='sort-orders', default_factory=list)
class-attribute
instance-attribute
¶
A list of sort orders, stored as full sort order objects.
table_uuid: uuid.UUID = Field(alias='table-uuid', default_factory=uuid.uuid4)
class-attribute
instance-attribute
¶
A UUID that identifies the table, generated when the table is created. Implementations must throw an exception if a table’s UUID does not match the expected UUID after refreshing metadata.
current_snapshot()
¶
Get the current snapshot for this table, or None if there is no current snapshot.
Source code in pyiceberg/table/metadata.py
name_mapping()
¶
Return the table's field-id NameMapping.
Source code in pyiceberg/table/metadata.py
new_snapshot_id()
¶
Generate a new snapshot-id that's not in use.
Source code in pyiceberg/table/metadata.py
schema()
¶
schema_by_id(schema_id)
¶
snapshot_by_id(snapshot_id)
¶
snapshot_by_name(name)
¶
Return the snapshot referenced by the given name or null if no such reference exists.
Source code in pyiceberg/table/metadata.py
sort_order_by_id(sort_order_id)
¶
Get the sort order by sort_order_id.
spec()
¶
specs()
¶
specs_struct()
¶
Produce a struct of all the combined PartitionSpecs.
The partition fields should be optional: Partition fields may be added later, in which case not all files would have the result field, and it may be null.
:return: A StructType that represents all the combined PartitionSpecs of the table
Source code in pyiceberg/table/metadata.py
TableMetadataUtil
¶
Helper class for parsing TableMetadata.
Source code in pyiceberg/table/metadata.py
TableMetadataV1
¶
Bases: TableMetadataCommonFields
, IcebergBaseModel
Represents version 1 of the Table Metadata.
More information about the specification: https://iceberg.apache.org/spec/#version-1-analytic-data-tables
Source code in pyiceberg/table/metadata.py
336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 |
|
format_version: Literal[1] = Field(alias='format-version', default=1)
class-attribute
instance-attribute
¶
An integer version number for the format. Currently, this can be 1 or 2 based on the spec. Implementations must throw an exception if a table’s version is higher than the supported version.
partition_spec: List[Dict[str, Any]] = Field(alias='partition-spec', default_factory=list)
class-attribute
instance-attribute
¶
The table’s current partition spec, stored as only fields. Note that this is used by writers to partition data, but is not used when reading because reads use the specs stored in manifest files. (Deprecated: use partition-specs and default-spec-id instead).
schema_: Schema = Field(alias='schema')
class-attribute
instance-attribute
¶
The table’s current schema. (Deprecated: use schemas and current-schema-id instead).
construct_partition_specs(data)
¶
Convert the partition_spec into partition_specs.
For V1 partition_specs is optional, and if they aren't set, we'll set them in this validator. This was we can always use the partition_specs when reading table metadata, and we don't have to worry if it is a v1 or v2 format.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
data
|
Dict[str, Any]
|
The raw data after validation, meaning that the aliases are applied. |
required |
Returns:
Type | Description |
---|---|
Dict[str, Any]
|
The TableMetadata with the partition_specs set, if not provided. |
Source code in pyiceberg/table/metadata.py
construct_schemas(data)
¶
Convert the schema into schemas.
For V1 schemas is optional, and if they aren't set, we'll set them in this validator. This was we can always use the schemas when reading table metadata, and we don't have to worry if it is a v1 or v2 format.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
data
|
Dict[str, Any]
|
The raw data after validation, meaning that the aliases are applied. |
required |
Returns:
Type | Description |
---|---|
Dict[str, Any]
|
The TableMetadata with the schemas set, if not provided. |
Source code in pyiceberg/table/metadata.py
set_sort_orders(data)
¶
Set the sort_orders if not provided.
For V1 sort_orders is optional, and if they aren't set, we'll set them in this validator.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
data
|
Dict[str, Any]
|
The raw data after validation, meaning that the aliases are applied. |
required |
Returns:
Type | Description |
---|---|
Dict[str, Any]
|
The TableMetadata with the sort_orders set, if not provided. |
Source code in pyiceberg/table/metadata.py
set_v2_compatible_defaults(data)
¶
Set default values to be compatible with the format v2.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
data
|
Dict[str, Any]
|
The raw arguments when initializing a V1 TableMetadata. |
required |
Returns:
Type | Description |
---|---|
Dict[str, Any]
|
The TableMetadata with the defaults applied. |
Source code in pyiceberg/table/metadata.py
TableMetadataV2
¶
Bases: TableMetadataCommonFields
, IcebergBaseModel
Represents version 2 of the Table Metadata.
This extends Version 1 with row-level deletes, and adds some additional information to the schema, such as all the historical schemas, partition-specs, sort-orders.
For more information: https://iceberg.apache.org/spec/#version-2-row-level-deletes
Source code in pyiceberg/table/metadata.py
format_version: Literal[2] = Field(alias='format-version', default=2)
class-attribute
instance-attribute
¶
An integer version number for the format. Currently, this can be 1 or 2 based on the spec. Implementations must throw an exception if a table’s version is higher than the supported version.
last_sequence_number: int = Field(alias='last-sequence-number', default=INITIAL_SEQUENCE_NUMBER)
class-attribute
instance-attribute
¶
The table’s highest assigned sequence number, a monotonically increasing long that tracks the order of snapshots in a table.
check_partition_specs(table_metadata)
¶
Check if the default-spec-id is present in partition-specs.
Source code in pyiceberg/table/metadata.py
check_schemas(table_metadata)
¶
Check if the current-schema-id is actually present in schemas.
Source code in pyiceberg/table/metadata.py
check_sort_orders(table_metadata)
¶
Check if the default_sort_order_id is present in sort-orders.
Source code in pyiceberg/table/metadata.py
cleanup_snapshot_id(data)
¶
Run before validation.
Source code in pyiceberg/table/metadata.py
construct_refs(table_metadata)
¶
Set the main branch if missing.