Row Filter Syntax¶
In addition to the primary Expression DSL, PyIceberg provides a string-based statement interface for filtering rows in Iceberg tables. This guide explains the syntax and provides examples for supported operations.
The row filter syntax is designed to be similar to SQL WHERE clauses. Here are the basic components:
Column References¶
Columns can be referenced using either unquoted or quoted identifiers:
Literals¶
The following literal types are supported:
- Strings:
'hello world'
- Numbers:
42
,-42
,3.14
- Booleans:
true
,false
(case insensitive)
Comparison Operations¶
Basic Comparisons¶
Note
The ==
operator is an alias for =
and <>
is an alias for !=
String Comparisons¶
NULL Checks¶
Check for NULL values using the IS NULL
and IS NOT NULL
operators:
NaN Checks¶
For floating-point columns, you can check for NaN values:
IN and NOT IN¶
Check if a value is in a set of values:
LIKE Operations¶
The LIKE operator supports pattern matching with a wildcard %
at the end of the string:
Important
The %
wildcard is only supported at the end of the pattern. Using it in the middle or beginning of the pattern will raise an error.
BETWEEN¶
The BETWEEN operator filters a numeric value against an inclusive range, e.g. a between 1 and 2
is equivalent to a >= 1 and a <= 2
.
Logical Operations¶
Combine multiple conditions using logical operators:
Tip
Parentheses can be used to group logical operations for clarity:
Complete Examples¶
Here are some complete examples showing how to combine different operations:
-- Complex filter with multiple conditions
status = 'active' AND age > 18 AND NOT (country IN ('US', 'CA'))
-- Filter with string pattern matching
name LIKE 'John%' AND age >= 21
-- Filter with NULL checks and numeric comparisons
price IS NOT NULL AND price > 100 AND quantity > 0
-- Filter with multiple logical operations
(status = 'pending' OR status = 'processing') AND NOT (priority = 'low')
Common Pitfalls¶
- String Quoting: Always use single quotes for string literals. Double quotes are reserved for column identifiers.
- Wildcard Usage: The
%
wildcard in LIKE patterns can only appear at the end.
- Case Sensitivity: Boolean literals (
true
/false
) are case insensitive, but string comparisons are case sensitive.
-- All valid
is_active = true
is_active = TRUE
is_active = True
-- Case sensitive
status = 'Active' -- Will not match 'active'
Best Practices¶
- For complex use cases, use the primary Expression DSL
- When using multiple conditions, consider the order of operations (NOT > AND > OR)
- For string comparisons, be consistent with case usage