API Reference

All APIs are exported from polyglot_sql.

Core Functions

transpile

transpile(
    sql: str,
    read: str | None = None,
    write: str | None = None,
    *,
    identity: bool = True,
    error_level: str | None = None,
    pretty: bool = False,
) -> list[str]

Transpile SQL from one dialect to another.

polyglot_sql.transpile(
    "SELECT IFNULL(a, b) FROM t",
    read="mysql",
    write="postgres",
)
# ["SELECT COALESCE(a, b) FROM t"]

parse

parse(
    sql: str,
    read: str | None = None,
    dialect: str | None = None,
    *,
    error_level: str | None = None,
) -> list[Expression]

Parse SQL into a list of typed Expression AST nodes.

stmts = polyglot_sql.parse("SELECT 1; SELECT 2", dialect="postgres")
len(stmts)  # 2
isinstance(stmts[0], polyglot_sql.Select)  # True

parse_one

parse_one(
    sql: str,
    read: str | None = None,
    dialect: str | None = None,
    *,
    into: Any | None = None,
    error_level: str | None = None,
) -> Expression

Parse a single SQL statement into an Expression AST node. Raises ParseError if the input contains zero or multiple statements.

ast = polyglot_sql.parse_one("SELECT a, b FROM t", dialect="postgres")
isinstance(ast, polyglot_sql.Select)  # True

generate

generate(
    ast: Expression | dict | list[Expression] | list[dict],
    dialect: str = "generic",
    *,
    pretty: bool = False,
) -> list[str]

Generate SQL strings from AST nodes. Accepts Expression objects or their dict equivalents (as returned by to_dict()).

ast = polyglot_sql.parse_one("SELECT 1 + 2")
polyglot_sql.generate(ast, dialect="mysql")
# ["SELECT 1 + 2"]

format_sql / format

format_sql(
    sql: str,
    dialect: str = "generic",
    *,
    max_input_bytes: int | None = None,
    max_tokens: int | None = None,
    max_ast_nodes: int | None = None,
    max_set_op_chain: int | None = None,
) -> str

Parse and pretty-print SQL. format is an alias for format_sql.

polyglot_sql.format_sql("SELECT a,b FROM t WHERE c>1", dialect="postgres")
# "SELECT\n  a,\n  b\nFROM t\nWHERE\n  c > 1"

validate

validate(sql: str, dialect: str = "generic") -> ValidationResult

Validate SQL syntax. Does not raise on invalid SQL — check result.valid instead.

result = polyglot_sql.validate("SELCT 1")
result.valid   # False
result.errors  # [ValidationErrorInfo(...)]

optimize

optimize(sql: str, dialect: str | None = None, *, read: str | None = None) -> str

Apply basic SQL optimizations (predicate simplification, etc.).

lineage / lineage_with_schema / source_tables

lineage(column: str, sql: str, dialect: str = "generic") -> dict
lineage_with_schema(column: str, sql: str, schema: dict, dialect: str = "generic") -> dict
source_tables(column: str, sql: str, dialect: str = "generic") -> list[str]

Column lineage analysis. source_tables returns a flat list of table names contributing to a column.

diff

diff(sql1: str, sql2: str, dialect: str = "generic") -> list[dict]

Compute a structural diff between two SQL statements.

dialects

dialects() -> list[str]

Returns the list of supported dialect names (e.g. ["athena", "bigquery", "clickhouse", ...]).


Expression

All parsed SQL is represented as typed Expression subclasses. The base Expression class provides a rich API for inspection and traversal.

Creating Expressions

# Parse SQL into an AST
ast = polyglot_sql.parse_one("SELECT a AS x, b FROM t WHERE c > 1")
isinstance(ast, polyglot_sql.Select)  # True

Type Dispatch with isinstance

Every AST node is an instance of a specific subclass — Select, Column, Literal, Add, etc. — enabling idiomatic Python isinstance checks:

col = ast.find(polyglot_sql.Column)
isinstance(col, polyglot_sql.Column)     # True
isinstance(col, polyglot_sql.Expression) # True (all subclass Expression)
type(col).__name__                       # "Column"

Core Identifiers

Property Type Description
kind str Snake-case variant name: "select", "column", "add", etc.
key str Alias for kind (sqlglot compatibility).
tree_depth int Maximum depth of the sub-tree (0 for leaves).

SQL Generation

ast.sql()                          # "SELECT a AS x, b FROM t WHERE c > 1"
ast.sql("mysql")                   # MySQL-specific output
ast.sql("postgres", pretty=True)   # Formatted PostgreSQL output
str(ast)                           # Same as ast.sql()

Child Accessors

These properties provide fast, no-serialization access to child nodes:

Property Type Description
this Expression \| None Primary child: operand for unary ops, left for binary ops, aliased expr for Alias, predicate for Where/Having.
expression Expression \| None Secondary child: right operand for binary ops, second arg for binary functions.
expressions list[Expression] List children: columns in Select, args in Function, tables in From, etc.
args dict All fields as a dict (uses serialization).
ast = polyglot_sql.parse_one("SELECT a, b, c FROM t")
ast.expressions                    # [Column(a), Column(b), Column(c)]

binop = polyglot_sql.parse_one("SELECT 1 + 2").find(polyglot_sql.Add)
binop.this                         # Literal(1)  — left operand
binop.expression                   # Literal(2)  — right operand

Name & Alias Properties

Property Type Description
name str Short name: column name, table name, function name, literal value, "*" for Star.
alias str Alias identifier if present (from Alias, Table, Subquery).
alias_or_name str Alias if non-empty, otherwise name.
output_name str Name this expression produces in a result set.
ast = polyglot_sql.parse_one("SELECT a AS x FROM my_table")
alias_node = ast.find(polyglot_sql.Alias)
alias_node.name         # "a"  (delegates to aliased expression)
alias_node.alias        # "x"
alias_node.alias_or_name  # "x"
alias_node.output_name  # "x"

tbl = ast.find(polyglot_sql.Table)
tbl.name                # "my_table"

Type Predicates

Property / Method Type Description
is_string bool True if this is a string literal.
is_number bool True if this is a numeric literal (or negated).
is_int bool True if this is an integer literal (or negated).
is_star bool True if this is a * wildcard.
is_leaf() bool True if this node has no children.
lit = polyglot_sql.parse_one("SELECT 'hello'").find(polyglot_sql.Literal)
lit.is_string  # True
lit.is_number  # False

num = polyglot_sql.parse_one("SELECT 42").find(polyglot_sql.Literal)
num.is_number  # True
num.is_int     # True

Comments

ast.comments  # list[str] — SQL comments attached to this node

Parent Tracking

Parent references are set lazily when you access children via .this, .expression, .expressions, or .children():

Property / Method Type Description
parent Expression \| None Parent node, or None for root.
depth int Number of hops to root (0 for root).
root() Expression Walk parent chain to root.
find_ancestor(*types) Expression \| None First ancestor matching any given type.
parent_select Expression \| None Shorthand for find_ancestor(Select).
ast = polyglot_sql.parse_one("SELECT a FROM t")
col = ast.expressions[0]     # Column(a) — parent is set
col.parent.kind              # "select"
col.depth                    # 1
col.root().kind              # "select"
col.parent_select.kind       # "select"

Traversal

Method Returns Description
children() list[Expression] Immediate children (with parent refs).
walk(order="dfs") list[Expression] All nodes in DFS or BFS order (including self).
find(*types) Expression \| None First descendant matching any type (DFS, skips self).
find_all(*types) list[Expression] All descendants matching any type (DFS, skips self).
iter_expressions() list[Expression] Alias for children().

find() and find_all() accept class objects or strings:

# Using class objects (recommended)
ast.find(polyglot_sql.Column)
ast.find_all(polyglot_sql.Column, polyglot_sql.Literal)

# Using strings
ast.find("column")
ast.find_all("column", "literal")

Unwrapping

Method Returns Description
unnest() Expression Recursively unwrap Paren(...) wrappers.
unalias() Expression Unwrap one Alias layer.
flatten() list[Expression] Flatten same-type chains, e.g. And(And(a,b),c)[a, b, c].
# Flatten chained AND conditions
where = polyglot_sql.parse_one("SELECT * WHERE a AND b AND c")
and_node = where.find(polyglot_sql.And)
conditions = and_node.flatten()  # [Column(a), Column(b), Column(c)]

Other Methods

Method Returns Description
to_dict() dict Full serialization to nested dict.
arg(name) Any Single field by name from serialized payload.
text(key) str Extract a field value as a plain string.
sql(dialect, pretty) str Generate SQL for this node.

String Representations

str(ast)    # SQL string: "SELECT a FROM t"
repr(ast)   # Tree repr: "Select(expressions=[Column(this=Identifier(...))])"

Expression Subclasses

Every AST node type has a corresponding Python class that inherits from Expression. There are 919 subclasses covering all SQL constructs. Here are the most commonly used ones:

Query Structure

Select, Union, Intersect, Except, Subquery, Values, With, Cte

DML

Insert, Update, Delete, Merge

DDL

CreateTable, DropTable, AlterTable, CreateView, DropView, CreateIndex, DropIndex, CreateFunction, DropFunction

Clauses

From, Join, Where, GroupBy, Having, OrderBy, Limit, Offset, Qualify, Window, Over

Expressions

Column, Table, Identifier, Literal, Star, Alias, Cast, Case, Paren, DataType, Interval, Boolean, Null

Operators

And, Or, Not, Add, Sub, Mul, Div, Eq, Neq, Lt, Lte, Gt, Gte, Like, ILike, In, Between, IsNull, Exists, Concat

Functions

Function, AggregateFunction, WindowFunction, Count, Sum, Avg, Min, Max, Coalesce, Upper, Lower, Substring, Cast, TryCast, SafeCast

Window Functions

RowNumber, Rank, DenseRank, Lead, Lag, FirstValue, LastValue, NthValue, PercentRank, CumeDist

All subclasses inherit every property and method from Expression.


Errors

Exception Description
PolyglotError Base exception.
ParseError SQL parsing failed.
GenerateError SQL generation from AST failed.
TranspileError SQL transpilation failed.
ValidationError Fatal validation error.

Unknown dialect names raise Python ValueError.

Validation Result Types

validate(...) returns ValidationResult:

  • valid: boolTrue when the SQL is syntactically valid
  • errors: list[ValidationErrorInfo] — list of findings (may be empty when valid)
  • bool(result) — allows if validate(...): usage

ValidationErrorInfo fields:

  • message: str — human-readable description
  • line: int — 1-based line number
  • col: int — 1-based column number
  • code: str — machine-readable error code
  • severity: str"error" or "warning"