The Snowpark library provides intuitive APIs for querying and processing data in a data pipeline. Using this library, you can build applications that process data in Snowflake without having to move data to the system where your application code runs.
Source code | Snowpark Python developer guide | Snowpark Python API reference | Snowpark pandas developer guide | Snowpark pandas API reference | Product documentation | Samples
If you don't have a Snowflake account yet, you can sign up for a 30-day free trial account .
You can use miniconda , anaconda , or virtualenv to create a Python 3.9, 3.10, 3.11 or 3.12 virtual environment.
For Snowpark pandas, only Python 3.9, 3.10, or 3.11 is supported.
To have the best experience when using it with UDFs, creating a local conda environment with the Snowflake channel is recommended.
pip install snowflake-snowpark-python
To use the Snowpark pandas API, you can optionally install the following, which installs modin in the same environment. The Snowpark pandas API provides a familiar interface for pandas users to query and process data directly in Snowflake.
pip install "snowflake-snowpark-python[modin]"
Create a session and use the Snowpark Python API
from snowflake.snowpark import Session
connection_parameters = {
"account": "<your snowflake account>",
"user": "<your snowflake user>",
"password": "<your snowflake password>",
"role": "<snowflake user role>",
"warehouse": "<snowflake warehouse>",
"database": "<snowflake database>",
"schema": "<snowflake schema>"
session = Session.builder.configs(connection_parameters).create()
# Create a Snowpark dataframe from input data
df = session.create_dataframe([[1, 2], [3, 4]], schema=["a", "b"])
df = df.filter(df.a > 1)
result = df.collect()
df.show()
# -------------
# |"A" |"B" |
# -------------
# |3 |4 |
# -------------
Create a session and use the Snowpark pandas API
import modin.pandas as pd
import snowflake.snowpark.modin.plugin
from snowflake.snowpark import Session
CONNECTION_PARAMETERS = {
'account': '<myaccount>',
'user': '<myuser>',
'password': '<mypassword>',
'role': '<myrole>',
'database': '<mydatabase>',
'schema': '<myschema>',
'warehouse': '<mywarehouse>',
session = Session.builder.configs(CONNECTION_PARAMETERS).create()
# Create a Snowpark pandas dataframe from input data
df = pd.DataFrame([['a', 2.0, 1],['b', 4.0, 2],['c', 6.0, None]], columns=["COL_STR", "COL_FLOAT", "COL_INT"])
# COL_STR COL_FLOAT COL_INT
# 0 a 2.0 1.0
# 1 b 4.0 2.0
# 2 c 6.0 NaN
df.shape
# (3, 3)
df.head(2)
# COL_STR COL_FLOAT COL_INT
# 0 a 2.0 1
# 1 b 4.0 2
df.dropna(subset=["COL_INT"], inplace=True)
# COL_STR COL_FLOAT COL_INT
# 0 a 2.0 1
# 1 b 4.0 2
df.shape
# (2, 3)
df.head(2)
# COL_STR COL_FLOAT COL_INT
# 0 a 2.0 1
# 1 b 4.0 2
# Save the result back to Snowflake with a row_pos column.
df.reset_index(drop=True).to_snowflake('pandas_test2', index=True, index_label=['row_pos'])
Samples
The Snowpark Python developer guide, Snowpark Python API references, Snowpark pandas developer guide, and Snowpark pandas api references have basic sample code.
Snowflake-Labs has more curated demos.
Logging
Configure logging level for snowflake.snowpark for Snowpark Python API logs.
Snowpark uses the Snowflake Python Connector.
So you may also want to configure the logging level for snowflake.connector when the error is in the Python Connector.
For instance,
import logging
for logger_name in ('snowflake.snowpark', 'snowflake.connector'):
logger = logging.getLogger(logger_name)
logger.setLevel(logging.DEBUG)
ch = logging.StreamHandler()
ch.setLevel(logging.DEBUG)
ch.setFormatter(logging.Formatter('%(asctime)s - %(threadName)s %(filename)s:%(lineno)d - %(funcName)s() - %(levelname)s - %(message)s'))
logger.addHandler(ch)
Reading and writing to pandas DataFrame
Snowpark Python API supports reading from and writing to a pandas DataFrame via the to_pandas and write_pandas commands.
To use these operations, ensure that pandas is installed in the same environment. You can install pandas alongside Snowpark Python by executing the following command:
pip install "snowflake-snowpark-python[pandas]"
Once pandas is installed, you can convert between a Snowpark DataFrame and pandas DataFrame as follows:
df = session.create_dataframe([[1, 2], [3, 4]], schema=["a", "b"])
# Convert Snowpark DataFrame to pandas DataFrame
pandas_df = df.to_pandas()
# Write pandas DataFrame to a Snowflake table and return Snowpark DataFrame
snowpark_df = session.write_pandas(pandas_df, "new_table", auto_create_table=True)
Snowpark pandas API also supports writing to pandas:
import modin.pandas as pd
df = pd.DataFrame([[1, 2], [3, 4]], columns=["a", "b"])
# Convert Snowpark pandas DataFrame to pandas DataFrame
pandas_df = df.to_pandas()
Note that the above Snowpark pandas commands will work if Snowpark is installed with the [modin] option, the additional [pandas] installation is not required.
Verifying Package Signatures
To ensure the authenticity and integrity of the Python package, follow the steps below to verify the package signature using cosign.
Steps to verify the signature:
# replace the version number with the version you are verifying
./cosign verify-blob snowflake_snowpark_python-1.22.1-py3-none-any.whl \
--certificate snowflake_snowpark_python-1.22.1-py3-none-any.whl.crt \
--certificate-identity https://github.com/snowflakedb/snowpark-python/.github/workflows/python-publish.yml@refs/tags/v1.22.1 \
--certificate-oidc-issuer https://token.actions.githubusercontent.com \
--signature snowflake_snowpark_python-1.22.1-py3-none-any.whl.sig
Verified OK
Contributing
Please refer to CONTRIBUTING.md.
Release History
1.34.0 (2025-07-15)
Snowpark Python API Updates
New Features
TRY_CAST to DataFrameReader. When TRY_CAST is True columns are wrapped in a TRY_CAST statement rather than a hard cast when loading data.USE_RELAXED_TYPES to the INFER_SCHEMA_OPTIONS of DataFrameReader. When set to True this option casts all strings to max length strings and all numeric types to DoubleType.snowflake.snowpark.context.configure_development_features().snowflake.snowpark.dataframe.map_in_pandas that allows users map a function across a dataframe. The mapping function takes an iterator of pandas dataframes as input and provides one as output.fetch_with_process to DataFrameReader.dbapi (PrPr) to enable multiprocessing for parallel data fetching in
local ingestion. By default, local ingestion uses multithreading. Multiprocessing may improve performance for CPU-bound tasks like Parquet file generation.snowflake.snowpark.functions.model that allows users to call methods of a model.rowValidationXSDPath option when reading XML files with a row tag using rowTag option.session.table().sample() to generate a flat SQL statement.functions.explode.snowflake.snowpark.context.configure_development_features(). This feature also depends on AST collection to be enabled in the session which can be done using session.ast_enabled = True.to_snowpark_pandas() from a snowpark dataframe containing DML/DDL queries instead of throwing a NotImplementedError.DataFrameReader.dbapi (PrPr) where closing the cursor or connection could unexpectedly raise an error and terminate the program.DataFrame.select() that have output columns matching the input DataFrame's columns. This improvement works when dataframe columns are provided as Column objects.DataFrame.to_excel and Series.to_excel.pd.read_feather, pd.read_orc, and pd.read_stata.pd.explain_switch() to return debugging information on hybrid execution decisions.pd.read_snowflake when the global modin backend is Pandas.pd.to_dynamic_table, pd.to_iceberg, and pd.to_view.modin or pandas version does not match our requirements.modin versions to >=0.33.0 and <0.35.0 (was previously >= 0.32.0 and <0.34.0).TypeError: numpy.ndarray object is not callable.np.where on modin objects with the Pandas backend would raise an AttributeError. This fix requires modin version 0.34.0 or newer.df.melt where the resulting values have an additional suffix applied.DataFrameWriter.dbapi (PrPr) for both Parquet and UDTF-based ingestion.DataFrameReader.dbapi (PrPr) for both Parquet and UDTF-based ingestion.DataFrameWriter.dbapi (PrPr) for UDTF-based ingestion.DataFrameReader to enable use of PATTERN when reading files with INFER_SCHEMA enabled.functions.py:
ai_completeai_similarityai_summarize_agg (originally summarize_agg)ai_classifyrowTag option:
ignoreNamespace option.attributePrefix option.excludeAttributes option.valueTag option.null value using nullValue option.charset option.ignoreSurroundingWhitespace option.return_dataframe in Session.call, which can be used to set the return type of the functions to a DataFrame object.Dataframe.describe called strings_include_math_stats that triggers stddev and mean to be calculated for String columns.Edge.properties when retrieving lineage from DGQL in DataFrame.lineage.trace.table_exists to DataFrameWriter.save_as_table that allows specifying if a table already exists. This allows skipping a table lookup that can be expensive.DataFrameReader.dbapi (PrPr) where the create_connection defined as local function was incompatible with multiprocessing.DataFrameReader.dbapi (PrPr) where databricks TIMESTAMP type was converted to Snowflake TIMESTAMP_NTZ type which should be TIMESTAMP_LTZ type.DataFrameReader.json where repeated reads with the same reader object would create incorrectly quoted columns.DataFrame.to_pandas() that would drop column names when converting a dataframe that did not originate from a select statement.DataFrame.create_or_replace_dynamic_table raises error when the dataframe contains a UDTF and SELECT * in UDTF not being parsed correctly.Session.write_pandas() and Session.create_dataframe() when the input pandas DataFrame does not have a column.DataFrame.select when the arguments contain a table function with output columns that collide with columns of current dataframe. With the improvement, if user provides non-colliding columns in df.select("col1", "col2", table_func(...)) as string arguments, then the query generated by snowpark client will not raise ambiguous column error.DataFrameReader.dbapi (PrPr) to use in-memory Parquet-based ingestion for better performance and security.DataFrameReader.dbapi (PrPr) to use MATCH_BY_COLUMN_NAME=CASE_SENSITIVE in copy into table operation.Column.isin that would cause incorrect filtering on joined or previously filtered data.snowflake.snowpark.functions.concat_ws that would cause results to have an incorrect index.modin dependency constraint from 0.32.0 to >=0.32.0, <0.34.0. The latest version tested with Snowpark pandas is modin 0.33.1.from modin.config import AutoSwitchBackend; AutoSwitchBackend.enable(), Snowpark pandas will automatically choose whether to run certain pandas operations locally or on Snowflake. This feature is disabled by default.index parameter to False for DataFrame.to_view, Series.to_view, DataFrame.to_dynamic_table, and Series.to_dynamic_table.iceberg_version option to table creation functions.insert, repr, and groupby, that previously issued a query to retrieve the input data's size.Series.where when the other parameter is an unnamed Series.describe procedure call to check the return type of the procedure.Session.create_dataframe() with the stage URL and FILE data type.session.read.option('mode', <mode>), option('rowTag', <tag_name>).xml(<stage_file_path>). Currently PERMISSIVE, DROPMALFORMED and FAILFAST are supported.Dataframe.drop to use SELECT * EXCLUDE () to exclude the dropped columns. To enable this feature, set session.conf.set("use_simplified_query_generation", True).VariantType to StructType.from_jsonDataFrameWriter.dbapi (PrPr) that unicode or double-quoted column name in external database causes error because not quoted correctly.native_app_params parameters in register udaf function.snowflake.snowpark.functions.rank that would cause sort direction to not be respected.snowflake.snowpark.functions.to_timestamp_* that would cause incorrect results on filtered data.Series.str.get, Series.str.slice, and Series.str.__getitem__ (Series.str[...]).DataFrame.to_html.DataFrame.to_string and Series.to_string.pd.read_csv.ENFORCE_EXISTING_FILE_FORMAT option to the DataFrameReader, which allows to read a dataframe only based on an existing file format object when used together with FORMAT_NAME.iceberg_config a required parameter for DataFrame.to_iceberg and Series.to_iceberg.restricted caller permission of execute_as argument in StoredProcedure.register().DataFrame.to_pandas().artifact_repository parameter to Session.add_packages, Session.add_requirements, Session.get_packages, Session.remove_package, and Session.clear_packages.session.read.option('rowTag', <tag_name>).xml(<stage_file_path>) (experimental).
col(a.b.c).DataFrameReader.dbapi (PrPr):
fetch_merge_count parameter for optimizing performance by merging multiple fetched data into a single Parquet file.functions.py (Private Preview):
promptai_filter (added support for prompt() function and image files, and changed the second argument name from expr to file)ai_classifyrelaxed_ordering param into enforce_ordering for DataFrame.to_snowpark_pandas. Also the new default values is enforce_ordering=False which has the opposite effect of the previous default value, relaxed_ordering=False.DataFrameReader.dbapi (PrPr) reading performance by setting the default fetch_size parameter value to 1000.session.table.DataFrameAnalyticsFunctions.time_series_agg().DataFrame.group_by().pivot().agg when the pivot column and aggregate column are the same.DataFrameReader.dbapi (PrPr) where a TypeError was raised when create_connection returned a connection object of an unsupported driver type.df.limit(0) call would not properly apply.DataFrameWriter.save_as_table that caused reserved names to throw errors when using append mode.sliding_interval in DataFrameAnalyticsFunctions.time_series_agg().Window.range_between.array_construct function.__pycache__ directory was unintentionally copied during stored procedure execution via import.Column.like calls.Column.getItem and snowpark.snowflake.functions.get to raise IndexError rather than return null.df.limit(0) call would not properly apply.Table.merge into an empty table would cause an exception.modin from 0.30.1 to 0.32.0.numpy 2.0 and above.DataFrame.create_or_replace_view and Series.create_or_replace_view.DataFrame.create_or_replace_dynamic_table and Series.create_or_replace_dynamic_table.DataFrame.to_view and Series.to_view.DataFrame.to_dynamic_table and Series.to_dynamic_table.DataFrame.groupby.resample for aggregations max, mean, median, min, and sum.pd.read_excelpd.read_htmlpd.read_picklepd.read_saspd.read_xmlDataFrame.to_iceberg and Series.to_iceberg.Series.str.len.DataFrame.groupby.apply and Series.groupby.apply by avoiding expensive pivot step.OrderedDataFrame to enable better engine switching. This could potentially result in increased query counts.relaxed_ordering param into enforce_ordering for pd.read_snowflake. Also the new default value is enforce_ordering=False which has the opposite effect of the previous default value, relaxed_ordering=False.pd.read_snowflake when reading iceberg tables and enforce_ordering=True.Dataframe.to_snowpark_pandas by introducing the new parameter relaxed_ordering.DataFrameReader.dbapi (PrPr) now accepts a list of strings for the session_init_statement parameter, allowing multiple SQL statements to be executed during session initialization.Dataframe.stat.sample_by to generate a single flat query that scales well with large fractions dictionary compared to older method of creating a UNION ALL subquery for each key in fractions. To enable this feature, set session.conf.set("use_simplified_query_generation", True).DataFrameReader.dbapi by enable vectorized option when copy parquet file into table.DataFrame.random_split in the following ways. They can be enabled by setting session.conf.set("use_simplified_query_generation", True):
cache_result in the internal implementation of the input dataframe resulting in a pure lazy dataframe operation.seed argument now behaves as expected with repeatable results across multiple calls and sessions.DataFrame.fillna and DataFrame.replace now both support fitting int and float into Decimal columns if include_decimal is set to True.files.py as a result of their General Availability.
SnowflakeFile.writeSnowflakeFile.writelinesSnowflakeFile.writeableSnowflakeFile and SnowflakeFile.open().cast() is applied to their output
from_jsonDataframe.except_ that would cause rows to be incorrectly dropped.to_timestamp to fail when casting filtered columns.Series.str.__getitem__ (Series.str[...]).pd.Grouper objects in group by operations. When freq is specified, the default values of the sort, closed, label, and convention arguments are supported; origin is supported when it is start or start_day.pd.read_snowflake for both named data sources (e.g., tables and views) and query data sources by introducing the new parameter relaxed_ordering.QUOTED_IDENTIFIERS_IGNORE_CASE is found to be set, ask user to unset it.index_label in DataFrame.to_snowflake and Series.to_snowflake is handled when index=True. Instead of raising a ValueError, system-defined labels are used for the index columns.groupby or DataFrame or Series.agg when the function name is not supported.DataFrameReader.dbapi (PrPr) that prevents usage in stored procedure and snowbooks.functions.py (Private Preview):
ai_filterai_aggsummarize_aggfunctions.py (Private Preview):
fl_get_content_typefl_get_etagfl_get_file_typefl_get_last_modifiedfl_get_relative_pathfl_get_scoped_file_urlfl_get_sizefl_get_stagefl_get_stage_file_urlfl_is_audiofl_is_compressedfl_is_documentfl_is_imagefl_is_videoartifact_repository and artifact_repository_packages to specify your artifact repository and packages respectively when registering stored procedures or user defined functions.Session.sproc.registerSession.udf.registerSession.udaf.registerSession.udtf.registerfunctions.sprocfunctions.udffunctions.udaffunctions.udtffunctions.pandas_udffunctions.pandas_udtfUnsupported feature 'SCOPED_TEMPORARY'. error if thread-safe session was disabled.df.describe raised internal SQL execution error when the dataframe is created from reading a stage file and CTE optimization is enabled.df.order_by(A).select(B).distinct() would generate invalid SQL when simplified query generation was enabled using session.conf.set("use_simplified_query_generation", True).snowflake-snowpark-python package compatibility when registering stored procedures. Now, warnings are only triggered if the major or minor version does not match, while bugfix version differences no longer generate warnings.cloudpickle==3.0.0 in addition to previous versions.range_between window function.Series.str.slice.ClassifyText, Translate, and ExtractAnswer.Series.hist.DataFrame.groupby.transform and Series.groupby.transform by avoiding expensive pivot step.pd.to_snowflake, DataFrame.to_snowflake, and Series.to_snowflake when the table does not exist.if_exists parameter in pd.to_snowflake, DataFrame.to_snowflake, and Series.to_snowflake.Series.rename_axis where an AttributeError was being raised.pd.get_dummies didn't ignore NULL/NaN values by default.pd.get_dummies results in 'Duplicated column name error'.pd.get_dummies where passing list of columns generated incorrect column labels in output DataFrame.pd.get_dummies to return bool values instead of int.functions.py
normalrandnallow_missing_columns parameter to Dataframe.union_by_name and Dataframe.union_all_by_name.Dataframe.distinct to generate SELECT DISTINCT instead of SELECT with GROUP BY all columns. To disable this feature, set session.conf.set("use_simplified_query_generation", False).snowflake_cortex_summarize. Users can install snowflake-ml-python and use the snowflake.cortex.summarize function instead.snowflake_cortex_sentiment. Users can install snowflake-ml-python and use the snowflake.cortex.sentiment function instead.session.conf.set("collect_stacktrace_in_query_tag", True).Session._write_pandas where it was erroneously passing use_logical_type parameter to Session._write_modin_pandas_helper when writing a Snowpark pandas object.Session.catalog where empty strings for database or schema were not handled correctly and were generating erroneous sql statements.Summarize and Sentiment.Series.str.get.apply where kwargs were not being correctly passed into the applied function.minutedate_format, datetime_format, and timestamp_format options when loading csvs.functions.py
array_reversedivnullmap_catmap_contains_keymap_keysnullifzerosnowflake_cortex_sentimentacoshasinhatanhbit_lengthbitmap_bit_positionbitmap_bucket_numberbitmap_construct_aggbitshiftright_unsignedequal_nullfrom_jsonifnulllocaltimestampmax_bymin_bynth_valueoctet_lengthpositionregr_avgxregr_avgyregr_countregr_interceptregr_r2regr_sloperegr_sxxregr_sxyregr_syytry_to_binarybase64base64_decode_stringbase64_encodeeditdistancehex_encodeinstrlog1plog10percentile_approxunbase64seed argument in DataFrame.stat.sample_by. Note that it only supports a Table object, and will be ignored for a DataFrame object.DataFrame.create_dataframe.DataFrameWriter.insert_into/insertInto. This method also supports local testing mode.DataFrame.create_temp_view to create a temporary view. It will fail if the view already exists.map_cat and map_concat.keep_column_order for keeping original column order in DataFrame.with_column and DataFrame.with_columns.contains_null parameter to ArrayType.DataFrame.create_or_replace_temp_view from a DataFrame created by reading a file from a stage.value_contains_null parameter to MapType.Column object in Column.in_ and functions.in_.interactive to telemetry that indicates whether the current environment is an interactive one.session.file.get in a Native App to read file paths starting with / from the current versionDataFrame.pivot.Catalog class to manage snowflake objects. It can be accessed via Session.catalog.
snowflake.core is a dependency required for this feature.DataFrame.create_dataframe.cosign.StructField.from_json that prevented TimestampTypes with tzinfo from being parsed correctly.date_format that caused an error when the input column was date type or timestamp type.replace and lit which raised type hint assertion error when passing Column expression objects.pandas_udf and pandas_udtf where session parameter was erroneously ignored.session.call.Series.str.ljust and Series.str.rjust.Series.str.center.Series.str.pad.snowflake_cortex_sentiment.DataFrame.map.DataFrame.from_dict and DataFrame.from_records.SeriesGroupBy.uniqueSeries.dt.strftime with the following directives:
Series.between.include_groups=False in DataFrameGroupBy.apply.expand=True in Series.str.split.DataFrame.pop and Series.pop.first and last in DataFrameGroupBy.agg and SeriesGroupBy.agg.Index.drop_duplicates."count", "median", np.median,
"skew", "std", np.std "var", and np.var in
pd.pivot_table(), DataFrame.pivot_table(), and pd.crosstab().DataFrame.map, Series.apply and Series.map methods by mapping numpy functions to snowpark functions if possible.DataFrame.map.DataFrame.apply by mapping numpy functions to snowpark functions if possible.Series.map, Series.apply and DataFrame.map if type-hint is not provided.call_count to telemetry that counts method calls including interchange protocol calls.version and class method get_active_session for Session class.DataType, its derived classes, and StructField:
type_name: Returns the type name of the data.simple_string: Provides a simple string representation of the data.json_value: Returns the data as a JSON-compatible value.json: Converts the data to a JSON string.ArrayType, MapType, StructField, PandasSeriesType, PandasDataFrameType and StructType:
from_json: Enables these types to be created from JSON data.MapType:
keyType: keys of the mapvalueType: values of the mapappName in SessionBuilder.include_nulls argument in DataFrame.unpivot.functions.py:
size to get size of array, object, or map columns.collect_list an alias of array_agg.substring makes len argument optional.ast_enabled to session for internal usage (default: False).DataFrame.create_or_replace_dynamic_table:
iceberg_config A dictionary that can hold the following iceberg configuration options:
external_volumecatalogbase_locationcatalog_syncstorage_serialization_policyDataFrame.print_schemalevel parameter to DataFrame.print_schemaDataFrameReader and DataFrameWriter API by adding support for the following:
format method to DataFrameReader and DataFrameWriter to specify file format when loading or unloading results.load method to DataFrameReader to work in conjunction with format.save method to DataFrameWriter to work in conjunction with format.options method for DataFrameReader and DataFrameWriter.cloudpickle==2.2.1 remains the only supported version.session.read.options where False Boolean values were incorrectly parsed as True in the generated file format.python-dateutil.Series.map when arg is a pandas Series or a
collections.abc.Mapping. No support for instances of dict that implement
__missing__ but are not instances of collections.defaultdict.DataFrame.align and Series.align for axis=1 and axis=None.pd.json_normalize.GroupBy.pct_change with axis=0, freq=None, and limit=None.DataFrameGroupBy.__iter__ and SeriesGroupBy.__iter__.np.sqrt, np.trunc, np.floor, numpy trig functions, np.exp, np.abs, np.positive and np.negative.DataFrame.__dataframe__().df.loc where setting a single column from a series results in unexpected None values.snowflake.snowpark.dataframe:
include_error to Session.query_history to record queries that have error during execution.Session.get_session_stage is used instead of raising SnowparkSQLException.Session.stored_procedure_profiler.set_active_profiler.DataFrame:
cache_resultIn expression were used in selects.AttributeError while calling Session.stored_procedure_profiler.get_output when Session.stored_procedure_profiler is disabled.protobuf>=5.28 and tzlocal at runtime.protoc-wheel-0 for the development profile.snowflake-connector-python>=3.12.0, <4.0.0 (was >=3.10.0).modin from 0.28.1 to 0.30.1.pandas 2.2.x versions.Index.to_numpy.DataFrame.align and Series.align for axis=0.size in GroupBy.aggregate, DataFrame.aggregate, and Series.aggregate.snowflake.snowpark.functions.windowpd.read_pickle (Uses native pandas for processing).pd.read_html (Uses native pandas for processing).pd.read_xml (Uses native pandas for processing)."size" and len in GroupBy.aggregate, DataFrame.aggregate, and Series.aggregate.Series.str.len.pd.DataFrame([0]).agg(np.mean)) would fail to transpose the result.DataFrame.dropna() would:
subset (e.g. []) as if it specified all columns instead of no columns.TypeError for a scalar subset instead of filtering on just that column.ValueError for a subset of type pandas.Index instead of filtering on the columns in the index.TableNotFoundError when using dynamic pivot in notebook environment.snowflake.snowpark.functions module.snowflake.snowpark.functions.any_valueTable.update could not handle VariantType, MapType, and ArrayType data types.DataFrame.join, causing errors when selecting columns from a joined DataFrame.Table.update and Table.merge could fail if the target table's index was not the default RangeIndex.Session class to be thread-safe. This allows concurrent DataFrame transformations, DataFrame actions, UDF and stored procedure registration, and concurrent file uploads when using the same Session object.
FEATURE_THREAD_SAFE_PYTHON_SESSION to True for account.DataFrame.queries API are not deterministic, and may be different when DataFrame actions are executed. This does not affect explicit user-created temporary tables.session.lineage.trace API.copy_grants parameter when registering UDxF and stored procedures.DataFrameWriter to support daisy-chaining:
optionoptionspartition_bysnowflake_cortex_summarize.snowflake.snowpark.functions.array_remove it is now possible to use in python.df.sort().limit() and df.limit().sort() generates the same query with sort in front of limit. Now, df.limit().sort() will generate query that reads df.limit().sort().df.limit().sort(), because limit stops table scanning as soon as the number of records is satisfied.DataFrame.analytics.time_series_agg function to handle multiple data points in same sliding interval.np.subtract, np.multiply, np.divide, and np.true_divide.__array_ufunc__.np.float_power, np.mod, np.remainder, np.greater, np.greater_equal, np.less, np.less_equal, np.not_equal, and np.equal.np.log, np.log2, and np.log10DataFrameGroupBy.bfill, SeriesGroupBy.bfill, DataFrameGroupBy.ffill, and SeriesGroupBy.ffill.on parameter with Resampler.value_counts().snowflake_cortex_summarize.DataFrame.attrs and Series.attrs.DataFrame.style.np.full_likehead and iloc when the row key is a slice.tz_convert and tz_localize in Series, DataFrame, Series.dt, and DatetimeIndex.tz_convert and tz_localize in Series, DataFrame, Series.dt, and DatetimeIndex to specify the supported timezone formats.df.apply and series.apply ( as well as map and applymap ) when using snowpark functions. This allows for some position independent compatibility between apply and functions where the first argument is not a pandas object.iloc and iat when the row key is a scalar.iterrows.Series.map to reflect the unsupported features.np.may_share_memory which is used internally by many scikit-learn functions. This method will always return false when called with a Snowpark pandas object.DataFrame and Series pct_change() would raise TypeError when input contained timedelta columns.replace() would sometimes propagate Timedelta types incorrectly through replace(). Instead raise NotImplementedError for replace() on Timedelta.DataFrame and Series round() would raise AssertionError for Timedelta columns. Instead raise NotImplementedError for round() on Timedelta.reindex fails when the new index is a Series with non-overlapping types from the original index.__getitem__ on a DataFrameGroupBy object always returned a DataFrameGroupBy object if as_index=False.NotImplementedError.DataFrame.shift() on axis=0 and axis=1 would fail to propagate timedelta types.DataFrame.abs(), DataFrame.__neg__(), DataFrame.stack(), and DataFrame.unstack() now raise NotImplementedError for timedelta inputs instead of failing to propagate timedelta types.DataFrame.alias raises KeyError for input column name.to_csv on Snowflake stage fails when data contains empty strings.snowflake.snowpark.functions:
make_intervalWindow.range_between() when the order by column is TIMESTAMP or DATE type.thread_id to QueryRecord to track the thread id submitting the query history.Session.stored_procedure_profiler.'NoneType' has no len() when trying to read default values from function.TimedeltaIndex.mean method.Timedelta columns on axis=0 with agg or aggregate.by, left_by, right_by, left_index, and right_index for pd.merge_asof.include_describe to Session.query_history.DatetimeIndex.mean and DatetimeIndex.std methods.Resampler.asfreq, Resampler.indices, Resampler.nunique, and Resampler.quantile.resample frequency W, ME, YE with closed = "left".DataFrame.rolling.corr and Series.rolling.corr for pairwise = False and int window.window and min_periods = None for Rolling.DataFrameGroupBy.fillna and SeriesGroupBy.fillna.Series and DataFrame objects with the lazy Index object as data, index, and columns arguments.Series and DataFrame objects with index and column values not present in DataFrame/Series data.pd.read_sas (Uses native pandas for processing).rolling().count() and expanding().count() to Timedelta series and columns.tz in both pd.date_range and pd.bdate_range.Series.items.errors="ignore" in pd.to_datetime.DataFrame.tz_localize and Series.tz_localize.DataFrame.tz_convert and Series.tz_convert.sin) in Series.map, Series.apply, DataFrame.apply and DataFrame.applymap.to_pandas to persist the original timezone offset for TIMESTAMP_TZ type.dtype results for TIMESTAMP_TZ type to show correct timezone offset.dtype results for TIMESTAMP_LTZ type to show correct timezone.numeric_only for groupby aggregations.sort_values.convert_dtype in Series.apply.Index object created from a Series/DataFrame incorrectly updates the Series/DataFrame's index name after an inplace update has been applied to the original Series/DataFrame.SettingWithCopyWarning that sometimes appeared when printing Timedelta columns.inplace argument for Series objects derived from other Series objects.Series.sort_values failed if series name overlapped with index column name.Timedelta index levels to integer column levels.Resampler methods on timedelta columns would produce integer results.pd.to_numeric() would leave Timedelta inputs as Timedelta instead of converting them to integers.loc set when setting a single row, or multiple rows, of a DataFrame with a Series value.date_add and date_sub functions failed for NULL values.equal_null could fail inside a merge statement.row_number could fail inside a Window function.This is a re-release of 1.22.0. Please refer to the 1.22.0 release notes for detailed release content.
snowflake.snowpark.functions:
array_removeSession.write_pandas by making use_logical_type option more explicit.DataFrameWriter.save_as_table:
enable_schema_evolutiondata_retention_timemax_data_extension_timechange_trackingcopy_grantsiceberg_config A dicitionary that can hold the following iceberg configuration options:
external_volumecatalogbase_locationcatalog_syncstorage_serialization_policyDataFrameWriter.copy_into_table:
iceberg_config A dicitionary that can hold the following iceberg configuration options:
external_volumecatalogbase_locationcatalog_syncstorage_serialization_policyDataFrame.create_or_replace_dynamic_table:
refresh_modeinitializeclustering_keysis_transientdata_retention_timemax_data_extension_timesession.read.csv that caused an error when setting PARSE_HEADER = True in an externally defined file format.session.get_session_stage that referenced a non-existing stage after switching database or schema.DataFrame.to_snowpark_pandas without explicitly initializing the Snowpark pandas plugin caused an error.explode function in dynamic table creation caused a SQL compilation error due to improper boolean type casting on the outer parameter.Index.identical.DataFrameWriter.save_as_table incorrectly handled DataFrames containing only a subset of columns from the existing table.to_timestamp does not set the default timezone of the column datatype.Timedelta type, including the following features. Snowpark pandas will raise NotImplementedError for unsupported Timedelta use cases.
copy, cache_result, shift, sort_index, assign, bfill, ffill, fillna, compare, diff, drop, dropna, duplicated, empty, equals, insert, isin, isna, items, iterrows, join, len, mask, melt, merge, nlargest, nsmallest, to_pandas.astype.NotImplementedError will be raised for the rest of methods that do not support Timedelta.Timedelta.Timedelta values.Timedelta values and numeric values.TimedeltaIndex.pd.to_timedelta.GroupBy aggregations min, max, mean, idxmax, idxmin, std, sum, median, count, any, all, size, nunique, head, tail, aggregate.GroupBy filtrations first and last.TimedeltaIndex attributes: days, seconds, microseconds and nanoseconds.diff with timestamp columns on axis=0 and axis=1TimedeltaIndex methods: ceil, floor and round.TimedeltaIndex.total_seconds method.Series.dt.round.DatetimeIndex.Index.name, Index.names, Index.rename, and Index.set_names.Index.__repr__.DatetimeIndex.month_name and DatetimeIndex.day_name.Series.dt.weekday, Series.dt.time, and DatetimeIndex.time.Index.min and Index.max.pd.merge_asof.Series.dt.normalize and DatetimeIndex.normalize.Index.is_boolean, Index.is_integer, Index.is_floating, Index.is_numeric, and Index.is_object.DatetimeIndex.round, DatetimeIndex.floor and DatetimeIndex.ceil.Series.dt.days_in_month and Series.dt.daysinmonth.DataFrameGroupBy.value_counts and SeriesGroupBy.value_counts.Series.is_monotonic_increasing and Series.is_monotonic_decreasing.Index.is_monotonic_increasing and Index.is_monotonic_decreasing.pd.crosstab.pd.bdate_range and included business frequency support (B, BME, BMS, BQE, BQS, BYE, BYS) for both pd.date_range and pd.bdate_range.Index objects as labels in DataFrame.reindex and Series.reindex.Series.dt.days, Series.dt.seconds, Series.dt.microseconds, and Series.dt.nanoseconds.DatetimeIndex from an Index of numeric or string type.Timedelta objects.Series.dt.total_seconds method.DataFrame.apply(axis=0).Series.dt.tz_convert and Series.dt.tz_localize.DatetimeIndex.tz_convert and DatetimeIndex.tz_localize.quoted_identifier_to_snowflake_type to avoid making metadata queries if the types have been cached locally.pd.to_datetime to handle all local input cases.NotImplementedError for Index bitwise operators.Index.names is set to a non-like-like object.pd.read_snowflake include the creation reason when temp table creation is triggered.DataFrame.set_index, or setting DataFrame.index or Series.index by avoiding checks require eager evaluation. As a consequence, when the new index that does not match the current Series/DataFrame object length, a ValueError is no longer raised. Instead, when the Series/DataFrame object is longer than the provided index, the Series/DataFrame's new index is filled with NaN values for the "extra" elements. Otherwise, the extra values in the provided index are ignored.NotImplementedError when ambiguous/nonexistent are non-string in ceil/floor/round.pd.Timedelta scalars.Series.dt.isocalendar using a named Seriesinplace argument for Series objects derived from DataFrame columns.Series.reindex and DataFrame.reindex did not update the result index's name correctly.Series.take did not error when axis=1 was specified.to_pandas_batches with async jobs caused an error due to improper handling of waiting for asynchronous query completion.snowflake.snowpark.testing.assert_dataframe_equal that is a utility function to check the equality of two Snowpark DataFrames.INFER_SCHEMA options to DataFrameReader via INFER_SCHEMA_OPTIONS.parameters parameter to Column.rlike and Column.regexp.df.cache_result() in the current session, when the DataFrame is no longer referenced (i.e., gets garbage collected). It is still an experimental feature not enabled by default, and can be enabled by setting session.auto_clean_up_temp_table_enabled to True.fmt parameter of snowflake.snowpark.functions.to_date.* column has an incorrect subquery.DataFrame.to_pandas_batches where the iterator could throw an error if certain transformation is made to the pandas dataframe due to wrong isolation level.DataFrame.lineage.trace to split the quoted feature view's name and version correctly.Column.isin that caused invalid sql generation when passed an empty list.dense_rankpercent_rankcume_distntiledatediffarray_aggrlike and regexp changes above.ignore_nulls properly.DataFrame.backfill, DataFrame.bfill, Series.backfill, and Series.bfill.DataFrame.compare and Series.compare with default parameters.Series.dt.microsecond and Series.dt.nanosecond.Index.is_unique and Index.has_duplicates.Index.equals.Index.value_counts.Series.dt.day_name and Series.dt.month_name.df.index[:10].DataFrame.unstack and Series.unstack.DataFrame.asfreq and Series.asfreq.Series.dt.is_month_start and Series.dt.is_month_end.Index.all and Index.any.Series.dt.is_year_start and Series.dt.is_year_end.Series.dt.is_quarter_start and Series.dt.is_quarter_end.DatetimeIndex.Series.argmax and Series.argmin.Series.dt.is_leap_year.DataFrame.items.Series.dt.floor and Series.dt.ceil.Index.reindex.DatetimeIndex properties: year, month, day, hour, minute, second, microsecond,
nanosecond, date, dayofyear, day_of_year, dayofweek, day_of_week, weekday, quarter,
is_month_start, is_month_end, is_quarter_start, is_quarter_end, is_year_start, is_year_end
and is_leap_year.Resampler.fillna and Resampler.bfill.Timedelta type, including creating Timedelta columns and to_pandas.Index.argmax and Index.argmin.SnowflakeQueryCompiler.is_series_like method.Dataframe.columns now returns native pandas Index object instead of Snowpark Index object.query_compiler argument in Index constructor to create Index from query compiler.pd.to_datetime now returns a DatetimeIndex object instead of a Series object.pd.date_range now returns a DatetimeIndex object instead of a Series object.pivot_table raise NotImplementedError instead of KeyError.Series.drop_duplicates and DataFrame.drop_duplicates when called after sort_values.Index.to_frame where the result frame's column name may be wrong where name is unspecified.Series.reset_index(drop=True) where the result name may be wrong.Groupby.first/last ordering by the correct columns in the underlying window expression.DataFrame:
_execute_and_get_query_idarrays_zip function.df._in by avoiding unnecessary cast for numeric values. You can enable this optimization by setting session.eliminate_numeric_sql_value_cast_enabled = True.write_pandas when the target table does not exist and auto_create_table=False.format_json to the Session.SessionBuilder.app_name function that sets the app name in the Session.query_tag in JSON format. By default, this parameter is set to False.lag(x, 0) was incorrect and failed with error message argument 1 to function LAG needs to be constant, found 'SYSTEM$NULL_TO_FIXED(null)'.patch function when registering a mocked function:
distinct allows an alternate function to be specified for when a sql function should be distinct.pass_column_index passes a named parameter column_index to the mocked function that contains the pandas.Index for the input data.pass_row_index passes a named parameter row_index to the mocked function that is the 0 indexed row number the function is currently operating on.pass_input_data passes a named parameter input_data to the mocked function that contains the entire input dataframe for the current expression.column_order parameter to method DataFrameWriter.save_as_table.DataFrameGroupBy.all, SeriesGroupBy.all, DataFrameGroupBy.any, and SeriesGroupBy.any.DataFrame.nlargest, DataFrame.nsmallest, Series.nlargest and Series.nsmallest.replace and frac > 1 in DataFrame.sample and Series.sample.read_excel (Uses local pandas for processing)Series.at, Series.iat, DataFrame.at, and DataFrame.iat.Series.dt.isocalendar.Series.case_when except when condition or replacement is callable.Index and its APIs.DataFrame.assign.DataFrame.stack.DataFrame.pivot and pd.pivot.DataFrame.to_csv and Series.to_csv.Series.str.translate where the values in the table are single-codepoint strings.DataFrame.corr.df.plot() and series.plot() to be called, materializing the data into the local clientDataFrameGroupBy and SeriesGroupBy aggregations first and lastDataFrameGroupBy.get_group.limit parameter when method parameter is used in fillna.Series.str.translate where the values in the table are single-codepoint strings.DataFrame.corr.DataFrame.equals and Series.equals.DataFrame.reindex and Series.reindex.Index.astype.Index.unique and Index.nunique.Index.sort_values.DataFrame or Series with dtype=np.uint64.values is set to index when index and columns contain all columns in DataFrame during pivot_table.Index.copy()dtype, values, item(), tolist(), to_series() and to_frame()pd.pivot_table and DataFrame.pivot_table.inplace parameter in DataFrame.sort_index and Series.sort_index.to_boolean function.RecursionError: maximum recursion depth exceeded when the DataFrame has more than 500 columns.AsyncJob.result("no_result") doesn't wait for the query to finish execution.strict parameter when registering UDFs and Stored Procedures.DateType raises AttributeError.to_char that raises IndexError when incoming column has nonconsecutive row index.CaseExpr expressions that raises IndexError when incoming column has nonconsecutive row index.Column.like that raises IndexError when incoming column has nonconsecutive row index.iff.DataFrame.pct_change and Series.pct_change without the freq and limit parameters.Series.str.get.Series.dt.dayofweek, Series.dt.day_of_week, Series.dt.dayofyear, and Series.dt.day_of_year.Series.str.__getitem__ (Series.str[...]).Series.str.lstrip and Series.str.rstrip.DataFrameGroupBy.size and SeriesGroupBy.size.DataFrame.expanding and Series.expanding for aggregations count, sum, min, max, mean, std, var, and sem with axis=0.DataFrame.rolling and Series.rolling for aggregation count with axis=0.Series.str.match.DataFrame.resample and Series.resample for aggregations size, first, and last.DataFrameGroupBy.all, SeriesGroupBy.all, DataFrameGroupBy.any, and SeriesGroupBy.any.DataFrame.nlargest, DataFrame.nsmallest, Series.nlargest and Series.nsmallest.replace and frac > 1 in DataFrame.sample and Series.sample.read_excel (Uses local pandas for processing)Series.at, Series.iat, DataFrame.at, and DataFrame.iat.Series.dt.isocalendar.Series.case_when except when condition or replacement is callable.Index and its APIs.DataFrame.assign.DataFrame.stack.DataFrame.pivot and pd.pivot.DataFrame.to_csv and Series.to_csv.Index.T.DataFrame.describe on a frame with duplicate columns of differing dtypes could cause an error or incorrect results.DataFrame.rolling and Series.rolling so window=0 now throws NotImplementedError instead of ValueErrorDataFrame.aggregate and Series.aggregate with axis=0.pd.read_csv reads using the native pandas CSV parser, then uploads data to snowflake using parquet. This enables most of the parameters supported by read_csv including date parsing and numeric conversions. Uploading via parquet is roughly twice as fast as uploading via CSV.pd.Index directly in Snowpark pandas. Support for pd.Index as a first-class component of Snowpark pandas is coming soon.len, shape, size, empty, to_pandas() and names. For df.index, Snowpark pandas creates a lazy index object.df.columns, Snowpark pandas supports a non-lazy version of an Index since the data is already stored locally.{"infer_schema": True} when reading csv file without specifying its schema.Session.create_dataframe when called with more than 512 rows and using format or pyformat paramstyle.DataFrame.cache_result and Series.cache_result methods for users to persist DataFrames and Series to a temporary table lasting the duration of the session to improve latency of subsequent operations.DataFrame.pivot_table with no index parameter, as well as for margins parameter.DataFrame.shift/Series.shift/DataFrameGroupBy.shift/SeriesGroupBy.shift to match pandas 2.2.1. Snowpark pandas does not yet support the newly-added suffix argument, or sequence values of periods.Series.str.split.Series.str.*).csv and json:
FalseUTF8DataFrame.analytics.moving_agg and DataFrame.analytics.cumulative_agg_agg.if_not_exists parameter during UDF and stored procedure registration.* to fail.date_add was unable to handle some numeric types.TimestampType casting resulted in incorrect data.DecimalType data to have incorrect precision in some cases.IndexError.to_timestamp_ntz can not handle None data.DataFrame.with_column_renamed ignores attributes from parent DataFrames after join operations.Column.equal_nan where null data is handled incorrectly.DataFrame.drop ignore attributes from parent DataFrames after join operations.date_part where Column type is set wrong.DataFrameWriter.save_as_table does not raise exceptions when inserting null data into non-nullable columns.DataFrameWriter.save_as_table where
pyarrow as it is not used.Column.cast, adding support for casting to boolean and all integral types.is_permanent and anonymous options in UDFs and stored procedures registration to make it more clear that those features are not yet supported.NotImplementedError instead of warnings and unclear error information.DataFrameWriter.save_as_tableDataFrame.create_or_replace_viewDataFrame.create_or_replace_temp_viewDataFrame.create_or_replace_dynamic_table{"infer_schema": True} when reading CSV file without specifying its schema.to_timestamp_ltz, to_timestamp_ntz, to_timestamp_tz and to_timestamp.to_char.snowflake.snowpark.mock.exceptions.SnowparkLocalTestingException.sys.path during the clean-up step.Session.get_current_[schema|database|role|user|account|warehouse] returns upper-cased identifiers when identifiers are quoted.substr and substring can not handle 0-based start_expr.SnowparkLocalTestingException in error cases which is on par with SnowparkSQLException raised in non-local execution.Session.write_pandas method that NotImplementError will be raised when called.to_date.NaT and NaN values to not be recognized.DataFrameReader.csv was unable to handle quoted values containing a delimiter.None value in an arithmetic calculation, the output should remain None instead of math.nan.sum and covar_pop that when there is math.nan in the data, the output should also be math.nan.DataFrame.to_pandas should take Snowflake numeric types with precision 38 as int64.truncate save mode in DataFrameWrite to overwrite existing tables by truncating the underlying table instead of dropping it.DataFrame into one or more files in a stage:
DataFrame.write.jsonDataFrame.write.csvDataFrame.write.parquetDataFrame and DataFrameWriter:
snowflake.snowpark.Session.file.get and snowflake.snowpark.Session.file.get_streamcomment.session.cte_optimization_enabled to True.statement_params was not passed to query executions that register stored procedures and user defined functions.snowflake.snowpark.Session.file.get_stream to fail for quoted stage locations.utils.py might raise AttributeError in case the underlying module can not be found.to_time.Session.builder.getOrCreate should return the created mock session.process method.SnowflakePlanBuilder that save_as_table does not filter column that name start with '$' and follow by number correctly.field_optionally_enclosed_by is specified.pattern is a Column.KeyError when updating null values in the rows.DataFrame.collect.count_distinct does not work correctly when counting.TypeError.DataFrameReader to raise FileNotFound error when reading a path that does not exist or when there are no files under the path.date_part argument in function last_day.SessionBuilder.app_name will set the query_tag after the session is created.DataFrame.to_local_iterator where the iterator could yield wrong results if another query is executed before the iterator finishes due to wrong isolation level. For details, please see #945.Session.range returns empty result when the range is large.split_blocks=True by default during to_pandas conversion, for optimal memory allocation. This parameter is passed to pyarrow.Table.to_pandas, which enables PyArrow to split the memory allocation into smaller, more manageable blocks instead of allocating a single contiguous block. This results in better memory management when dealing with larger datasets.DataFrame.to_pandas that caused an error when evaluating on a Dataframe with an IntergerType column with null values.statement_params in StoredProcedure.__call__.Session.add_import.
chunk_size: The number of bytes to hash per chunk of the uploaded files.whole_file_hash: By default only the first chunk of the uploaded import is hashed to save time. When this is set to True each uploaded file is fully hashed instead.external_access_integrations and secrets when creating a UDAF from Snowpark Python to allow integration with external access.Session.append_query_tag. Allows an additional tag to be added to the current query tag by appending it as a comma separated value.Session.update_query_tag. Allows updates to a JSON encoded dictionary query tag.SessionBuilder.getOrCreate will now attempt to replace the singleton it returns when token expiration has been detected.snowflake.snowpark.functions:
array_exceptcreate_mapsign/signumDataFrame.analytics:
moving_agg function in DataFrame.analytics to enable moving aggregations like sums and averages with multiple window sizes.cummulative_agg function in DataFrame.analytics to enable commulative aggregations like sums and averages on multiple columns.compute_lag and compute_lead functions in DataFrame.analytics for enabling lead and lag calculations on multiple columns.time_series_agg function in DataFrame.analytics to enable time series aggregations like sums and averages with multiple time windows.Fixed a bug in DataFrame.na.fill that caused Boolean values to erroneously override integer values.
Fixed a bug in Session.create_dataframe where the Snowpark DataFrames created using pandas DataFrames were not inferring the type for timestamp columns correctly. The behavior is as follows:
LongType(), but will now be correctly maintained as timestamp values and be inferred as TimestampType(TimestampTimeZone.NTZ).TimestampType(TimestampTimeZone.NTZ) and loose timezone information but will now be correctly inferred as TimestampType(TimestampTimeZone.LTZ) and timezone information is retained correctly.PYTHON_SNOWPARK_USE_LOGICAL_TYPE_FOR_CREATE_DATAFRAME to revert back to old behavior. It is recommended that you update your code to align with correct behavior because the parameter will be removed in the future.Fixed a bug that DataFrame.to_pandas gets decimal type when scale is not 0, and creates an object dtype in pandas. Instead, we cast the value to a float64 type.
Fixed bugs that wrongly flattened the generated SQL when one of the following happens:
DataFrame.filter() is called after DataFrame.sort().limit().DataFrame.sort() or filter() is called on a DataFrame that already has a window function or sequence-dependent data generator column.
For instance, df.select("a", seq1().alias("b")).select("a", "b").sort("a") won't flatten the sort clause anymore.DataFrame.limit(). For instance, df.limit(10).select(row_number().over()) won't flatten the limit and select in the generated SQL.Fixed a bug where aliasing a DataFrame column raised an error when the DataFame was copied from another DataFrame with an aliased column. For instance,
df = df.select(col("a").alias("b"))
df = copy(df)
df.select(col("b").alias("c")) # threw an error. Now it's fixed.
Fixed a bug in Session.create_dataframe that the non-nullable field in a schema is not respected for boolean type. Note that this fix is only effective when the user has the privilege to create a temp table.
Fixed a bug in SQL simplifier where non-select statements in session.sql dropped a SQL query when used with limit().
Fixed a bug that raised an exception when session parameter ERROR_ON_NONDETERMINISTIC_UPDATE is true.
Behavior Changes (API Compatible)
to_pandas operation, we rely on GS precision value to fix precision issues for large integer values. This may affect users where a column that was earlier returned as int8 gets returned as int64. Users can fix this by explicitly specifying precision values for their return column.Session.call in case of table stored procedures where running Session.call would not trigger stored procedure unless a collect() operation was performed.StoredProcedureRegistration will now automatically add snowflake-snowpark-python as a package dependency. The added dependency will be on the client's local version of the library and an error is thrown if the server cannot support that version.snowflake.snowpark.functions:
from_utc_timestampto_utc_timestampAdd the conn_error attribute to SnowflakeSQLException that stores the whole underlying exception from snowflake-connector-python.
Added support for RelationalGroupedDataframe.pivot() to access pivot in the following pattern Dataframe.group_by(...).pivot(...).
Added experimental feature: Local Testing Mode, which allows you to create and operate on Snowpark Python DataFrames locally without connecting to a Snowflake account. You can use the local testing framework to test your DataFrame operations locally, on your development machine or in a CI (continuous integration) pipeline, before deploying code changes to your account.
Added support for arrays_to_object new functions in snowflake.snowpark.functions.
Added support for the vector data type.
cloudpickle==2.2.1snowflake-connector-python to 3.4.0.session.read.with_metadata creates inconsistent table when doing df.write.save_as_table.DataFrame.to_local_iterator().input_names in UDTFRegistration.register/register_file and functions.pandas_udtf. By default, RelationalGroupedDataFrame.applyInPandas will infer the column names from current dataframe schema.sql_error_code and raw_message attributes to SnowflakeSQLException when it is caused by a SQL exception.DataFrame.to_pandas() where converting snowpark dataframes to pandas dataframes was losing precision on integers with more than 19 digits.session.add_packages can not handle requirement specifier that contains project name with underscore and version.DataFrame.limit() when offset is used and the parent DataFrame uses limit. Now the offset won't impact the parent DataFrame's limit.DataFrame.write.save_as_table where dataframes created from read api could not save data into snowflake because of invalid column name $1.date_format:
format argument changed from optional to required.normal, zipf, uniform, seq1, seq2, seq4, seq8) function is used, the sort and filter operation will no longer be flattened when generating the query.typing-extensions.Dataframe.writer.save_as_table which does not need insert permission for writing tables.PythonObjJSONEncoder json-serializable objects for ARRAY and OBJECT literals.Added support for VOLATILE/IMMUTABLE keyword when registering UDFs.
Added support for specifying clustering keys when saving dataframes using DataFrame.save_as_table.
Accept Iterable objects input for schema when creating dataframes using Session.create_dataframe.
Added the property DataFrame.session to return a Session object.
Added the property Session.session_id to return an integer that represents session ID.
Added the property Session.connection to return a SnowflakeConnection object .
Added support for creating a Snowpark session from a configuration file or environment variables.
snowflake-connector-python to 3.2.0.ValueError even when compatible package version were added in session.add_packages.register_from_file.invalid_identifier error.DataFrame.copy disables SQL simplfier for the returned copy.session.sql().select() would fail if any parameters are specified to session.sql()external_access_integrations and secrets when creating a UDF, UDTF or Stored Procedure from Snowpark Python to allow integration with external access.snowflake.snowpark.functions:
array_flattenflattenapply_in_pandas in snowflake.snowpark.relational_grouped_dataframe.Session.replicate_local_environment.session.create_dataframe fails to properly set nullable columns where nullability was affected by order or data was given.DataFrame.select could not identify and alias columns in presence of table functions when output columns of table function overlapped with columns in dataframe.is_permanent=False will now create temporary objects even when stage_name is provided. The default value of is_permanent is False which is why if this value is not explicitly set to True for permanent objects, users will notice a change in behavior.types.StructField now enquotes column identifier by default.snowflake.snowpark.functions:
array_sortsort_arrayarray_minarray_maxexplode_outerSession.add_requirements or Session.add_packages. They are now usable in stored procedures and UDFs even if packages are not present on the Snowflake Anaconda channel.
custom_packages_upload_enabled and custom_packages_force_upload_enabled to enable the support for pure Python packages feature mentioned above. Both parameters default to False.Session.add_requirements.DataFrame.rename.params in session.sql() in stored procedures.TIMESTAMP_NTZ, TIMESTAMP_LTZ, TIMESTAMP_TZ)
TimestampTimezone as an argument in TimestampType constructor.NTZ, LTZ, TZ and Timestamp to annotate functions when registering UDFs.typing-extensions.DataFrame.cache_result now creates temp table fully qualified names under current database and current schema.numpy.ufunc.DataFrame.union was not generating the correct Selectable.schema_query when SQL simplifier is enabled.DataFrameWriter.save_as_table now respects the nullable field of the schema provided by the user or the inferred schema based on data from user input.snowflake-connector-python to 3.0.4.DataFrame.agg and DataFrame.describe, no longer strip away non-printing characters from column names.snowflake.snowpark.functions:
array_generate_rangearray_unique_aggcollect_setsequenceTABLE return type.length in StringType() to specify the maximum number of characters that can be stored by the column.functions.element_at() for functions.get().Column.contains for functions.contains.DataFrame.alias.DataFrame using DataFrameReader.StructType.add to append more fields to existing StructType objects.execute_as in StoredProcedureRegistration.register_from_file() to specify stored procedure caller rights.Dataframe.join_table_function did not run all of the necessary queries to set up the join table function when SQL simplifier was enabled.ColumnOrName, ColumnOrLiteralStr, ColumnOrSqlExpr, LiteralType and ColumnOrLiteral that were breaking mypy checks.DataFrameWriter.save_as_table and DataFrame.copy_into_table failed to parse fully qualified table names.session.getOrCreate.Column.getField.snowflake.snowpark.functions:
date_add and date_sub to make add and subtract operations easier.daydiffexplodearray_distinct.regexp_extract.struct.format_number.bround.substring_indexskip_upload_on_content_match when creating UDFs, UDTFs and stored procedures using register_from_file to skip uploading files to a stage if the same version of the files are already on the stage.DataFrameWriter.save_as_table method to take table names that contain dots.DataFrame.filter() or DataFrame.order_by() is followed by a projection statement (e.g. DataFrame.select(), DataFrame.with_column()).Dataframe.create_or_replace_dynamic_table.params in session.sql() to support binding variables. Note that this is not supported in stored procedures yet.strtok_to_array where an exception was thrown when a delimiter was passed in.session.add_import where the module had the same namespace as other dependencies.delimiters parameter in functions.initcap().functions.hash() to accept a variable number of input expressions.Session.RuntimeConfig for getting/setting/checking the mutability of any runtime configuration.Row results from DataFrame.collect using case_sensitive parameter.Session.conf for getting, setting or checking the mutability of any runtime configuration.Row results from DataFrame.collect using case_sensitive parameter.snowflake.snowpark.types.StructType.log_on_exception to Dataframe.collect and Dataframe.collect_no_wait to optionally disable error logging for SQL exceptions.DataFrame.substract, DataFrame.union, etc.) being called after another DataFrame set operation and DataFrame.select or DataFrame.with_column throws an exception.SNOWPARK_LEFT, SNOWPARK_RIGHT) by default. Users can disable this at runtime with session.conf.set('use_constant_subquery_alias', False) to use randomly generated alias names instead.session.call().source_code_display=False at registration.if_not_exists when creating a UDF, UDTF or Stored Procedure from Snowpark Python to ignore creating the specified function or procedure if it already exists.snowflake.snowpark.functions.get to extract value from array.functions.reverse in functions to open access to Snowflake built-in function
reverse.require_scoped_url in snowflake.snowflake.files.SnowflakeFile.open() (in Private Preview) to replace is_owner_file is marked for deprecation.paramstyle to qmark when creating a Snowpark session.df.join(..., how="cross") fails with SnowparkJoinException: (1112): Unsupported using join type 'Cross'.DataFrame column created from chained function calls used a wrong column name.asc, asc_nulls_first, asc_nulls_last, desc, desc_nulls_first, desc_nulls_last, date_part and unix_timestamp in functions.DataFrame.dtypes to return a list of column name and data type pairs.functions.expr() for functions.sql_expr().functions.date_format() for functions.to_date().functions.monotonically_increasing_id() for functions.seq8()functions.from_unixtime() for functions.to_timestamp()PYTHON_SNOWPARK_USE_SQL_SIMPLIFIER is True after Snowflake 7.3 was released. In snowpark-python, session.sql_simplifier_enabled reads the value of PYTHON_SNOWPARK_USE_SQL_SIMPLIFIER by default, meaning that the SQL simplfier is enabled by default after the Snowflake 7.3 release. To turn this off, set PYTHON_SNOWPARK_USE_SQL_SIMPLIFIER in Snowflake to False or run session.sql_simplifier_enabled = False from Snowpark. It is recommended to use the SQL simplifier because it helps to generate more concise SQL.Session.generator() to create a new DataFrame using the Generator table function.secure to the functions that create a secure UDF or UDTF.Session.create_async_job() to create an AsyncJob instance from a query id.AsyncJob.result() now accepts argument result_type to return the results in different formats.AsyncJob.to_df() returns a DataFrame built from the result of this asynchronous job.AsyncJob.query() returns the SQL text of the executed query.DataFrame.agg() and RelationalGroupedDataFrame.agg() now accept variable-length arguments.lsuffix and rsuffix to DataFram.join() and DataFrame.cross_join() to conveniently rename overlapping columns.Table.drop_table() so you can drop the temp table after DataFrame.cache_result(). Table is also a context manager so you can use the with statement to drop the cache temp table after use.Session.use_secondary_roles().first_value() and last_value(). (contributed by @chasleslr)on as an alias for using_columns and how as an alias for join_type in DataFrame.join().Session.create_dataframe() that raised an error when schema names had special characters.Session.read.option() were not passed to DataFrame.copy_into_table() as default values.DataFrame.copy_into_table() raises an error when a copy option has single quotes in the value.Session.add_packages() now raises ValueError when the version of a package cannot be found in Snowflake Anaconda channel. Previously, Session.add_packages() succeeded, and a SnowparkSQLException exception was raised later in the UDF/SP registration step.FileOperation.get_stream() to support downloading stage files as stream.functions.ntiles() to accept int argument.functions.call_function() for functions.call_builtin().functions.function() for functions.builtin().DataFrame.order_by() for DataFrame.sort()DataFrame.orderBy() for DataFrame.sort()DataFrame.cache_result() to return a more accurate Table class instead of a DataFrame class.session as the first argument when calling StoredProcedure.Session.sql_simplifier_enabled = True.DataFrame.select(), DataFrame.with_column(), DataFrame.drop() and other select-related APIs have more flattened SQLs.DataFrame.union(), DataFrame.union_all(), DataFrame.except_(), DataFrame.intersect(), DataFrame.union_by_name() have flattened SQLs generated when multiple set operators are chained.Table.update(), Table.delete(), Table.merge() try to reference a temp table that does not exist.block to the following action APIs on Snowpark dataframes (which execute queries) to allow asynchronous evaluations:
DataFrame.collect(), DataFrame.to_local_iterator(), DataFrame.to_pandas(), DataFrame.to_pandas_batches(), DataFrame.count(), DataFrame.first().DataFrameWriter.save_as_table(), DataFrameWriter.copy_into_location().Table.delete(), Table.update(), Table.merge().DataFrame.collect_nowait() to allow asynchronous evaluations.AsyncJob to retrieve results from asynchronously executed queries and check their status.table_type in Session.write_pandas(). You can now choose from these table_type options: "temporary", "temp", and "transient".list, tuple and dict) as literal values in Snowpark.execute_as to functions.sproc() and session.sproc.register() to allow registering a stored procedure as a caller or owner.DataFrame.copy_into_table() and DataFrameWriter.save_as_table() mistakenly created a new table if the table name is fully qualified, and the table already exists.create_temp_table in Session.write_pandas().snowflake-connector-python to 2.7.12.source_code_display as False when calling register() or @udf().DataFrame.select(), DataFrame.with_column() and DataFrame.with_columns() which now take parameters of type table_function.TableFunctionCall for columns.overwrite to session.write_pandas() to allow overwriting contents of a Snowflake table with that of a pandas DataFrame.column_order to df.write.save_as_table() to specify the matching rules when inserting data into table in append mode.FileOperation.put_stream() to upload local files to a stage via file stream.TableFunctionCall.alias() and TableFunctionCall.as_() to allow aliasing the names of columns that come from the output of table function joins.get_active_session() in module snowflake.snowpark.context to get the current active Snowpark session.statement_params is not passed to the function.session.create_dataframe() is called with dicts and a given schema.df.write.save_as_table().function.uniform() to infer the types of inputs max_ and min_ and cast the limits to IntegerType or FloatType correspondingly.statement_params to the following methods to allow for specifying statement level parameters:
collect, to_local_iterator, to_pandas, to_pandas_batches,
count, copy_into_table, show, create_or_replace_view, create_or_replace_temp_view, first, cache_result
and random_split on class snowflake.snowpark.Dateframe.update, delete and merge on class snowflake.snowpark.Table.save_as_table and copy_into_location on class snowflake.snowpark.DataFrameWriter.approx_quantile, statement_params, cov and crosstab on class snowflake.snowpark.DataFrameStatFunctions.register and register_from_file on class snowflake.snowpark.udf.UDFRegistration.register and register_from_file on class snowflake.snowpark.udtf.UDTFRegistration.register and register_from_file on class snowflake.snowpark.stored_procedure.StoredProcedureRegistration.udf, udtf and sproc in snowflake.snowpark.functions.Column as an input argument to session.call().table_type in df.write.save_as_table(). You can now choose from these table_type options: "temporary", "temp", and "transient".session.use_* methods.session.create_dataframe().session.create_dataframe() mistakenly converted 0 and False to None when the input data was only a list.session.create_dataframe() using a large local dataset sometimes created a temp table twice.function.trim() with the SQL function definition.sum vs. the Snowpark function.sum().create_temp_table in df.write.save_as_table().snowflake.snowpark.functions.udtf() to register a UDTF, or use it as a decorator to register the UDTF.
Session.udtf.register() to register a UDTF.Session.udtf.register_from_file() to register a UDTF from a Python file.snowflake.snowpark.functions.table_function() to create a callable representing a table function and use it to call the table function in a query.snowflake.snowpark.functions.call_table_function() to call a table function.over clause that specifies partition by and order by when lateral joining a table function.Session.table_function() and DataFrame.join_table_function() to accept TableFunctionCall instances.functions.udf() and functions.sproc(), you can now specify an empty list for the imports or packages argument to indicate that no import or package is used for this UDF or stored procedure. Previously, specifying an empty list meant that the function would use session-level imports or packages.__repr__ implementation of data types in types.py. The unused type_name property has been removed.ProgrammingError from the Python connector.DataFrame.to_pandas().DataFrameReader.parquet() failed to read a parquet file when its column contained spaces.DataFrame.copy_into_table() failed when the dataframe is created by reading a file with inferred schemas.Session.flatten() and DataFrame.flatten().
cloudpickle <= 2.0.0.current_session(), current_statement(), current_user(), current_version(), current_warehouse(), date_from_parts(), date_trunc(), dayname(), dayofmonth(), dayofweek(), dayofyear(), grouping(), grouping_id(), hour(), last_day(), minute(), next_day(), previous_day(), second(), month(), monthname(), quarter(), year(), current_database(), current_role(), current_schema(), current_schemas(), current_region(), current_avaliable_roles(), add_months(), any_value(), bitnot(), bitshiftleft(), bitshiftright(), convert_timezone(), uniform(), strtok_to_array(), sysdate(), time_from_parts(), timestamp_from_parts(), timestamp_ltz_from_parts(), timestamp_ntz_from_parts(), timestamp_tz_from_parts(), weekofyear(), percentile_cont() to snowflake.snowflake.functions.DataFrame.groupByGroupingSets(), DataFrame.naturalJoin(), DataFrame.joinTableFunction, DataFrame.withColumns(), Session.getImports(), Session.addImport(), Session.removeImport(), Session.clearImports(), Session.getSessionStage(), Session.getDefaultDatabase(), Session.getDefaultSchema(), Session.getCurrentDatabase(), Session.getCurrentSchema(), Session.getFullyQualifiedCurrentSchema().DataFrame with a specific schema using the Session.create_dataframe() method.INFO to DEBUG for several logs (e.g., the executed query) when evaluating a dataframe.Session.create_dataframe() method.typing-extension as a new dependency with the version >= 4.1.0.Session.sproc property and sproc() to snowflake.snowpark.functions, so you can register stored procedures.Session.call to call stored procedures by name.UDFRegistration.register_from_file() to allow registering UDFs from Python source files or zip files directly.UDFRegistration.describe() to describe a UDF.DataFrame.random_split() to provide a way to randomly split a dataframe.md5(), sha1(), sha2(), ascii(), initcap(), length(), lower(), lpad(), ltrim(), rpad(), rtrim(), repeat(), soundex(), regexp_count(), replace(), charindex(), collate(), collation(), insert(), left(), right(), endswith() to snowflake.snowpark.functions.call_udf() to accept literal values.distinct keyword in array_agg().DataFrame.to_pandas() to have a string column if Column.cast(IntegerType()) was used.DataFrame.describe() when there is more than one string column.add_packages(), get_packages(), clear_packages(), and remove_package(), to class Session.add_requirements() to Session so you can use a requirements file to specify which packages this session will use.packages to function snowflake.snowpark.functions.udf() and method UserDefinedFunction.register() to indicate UDF-level Anaconda package dependencies when creating a UDF.imports to snowflake.snowpark.functions.udf() and UserDefinedFunction.register() to specify UDF-level code imports.session to function udf() and UserDefinedFunction.register() so you can specify which session to use to create a UDF if you have multiple sessions.Geography and Variant to snowflake.snowpark.types to be used as type hints for Geography and Variant data when defining a UDF.Table, a subclass of DataFrame for table operations:
update and delete update and delete rows of a table in Snowflake.merge merges data from a DataFrame to a Table.DataFrame.sample() with an additional parameter seed, which works on tables but not on view and sub-queries.DataFrame.to_local_iterator() and DataFrame.to_pandas_batches() to allow getting results from an iterator when the result set returned from the Snowflake database is too large.DataFrame.cache_result() for caching the operations performed on a DataFrame in a temporary table.
Subsequent operations on the original DataFrame have no effect on the cached result DataFrame.DataFrame.queries to get SQL queries that will be executed to evaluate the DataFrame.Session.query_history() as a context manager to track SQL queries executed on a session, including all SQL queries to evaluate DataFrames created from a session. Both query ID and query text are recorded.Session instance from an existing established snowflake.connector.SnowflakeConnection. Use parameter connection in Session.builder.configs().use_database(), use_schema(), use_warehouse(), and use_role() to class Session to switch database/schema/warehouse/role after a session is created.DataFrameWriter.copy_into_table() to unload a DataFrame to stage files.DataFrame.unpivot().Column.within_group() for sorting the rows by columns with some aggregation functions.listagg(), mode(), div0(), acos(), asin(), atan(), atan2(), cos(), cosh(), sin(), sinh(), tan(), tanh(), degrees(), radians(), round(), trunc(), and factorial() to snowflake.snowflake.functions.ignore_nulls in function lead() and lag().condition parameter of function when() and iff() now accepts SQL expressions.Session and replaced them with their snake case equivalents: getImports(), addImports(), removeImport(), clearImports(), getSessionStage(), getDefaultSchema(), getDefaultSchema(), getCurrentDatabase(), getFullyQualifiedCurrentSchema().DataFrame and replaced them with their snake case equivalents: groupingByGroupingSets(), naturalJoin(), withColumns(), joinTableFunction().DataFrame.columns is now consistent with DataFrame.schema.names and the Snowflake database Identifier Requirements.Column.__bool__() now raises a TypeError. This will ban the use of logical operators and, or, not on Column object, for instance col("a") > 1 and col("b") > 2 will raise the TypeError. Use (col("a") > 1) & (col("b") > 2) instead.PutResult and GetResult to subclass NamedTuple.DataFrame.describe() so that non-numeric and non-string columns are ignored instead of raising an exception.snowflake-connector-python to 2.7.4.Column.isin(), with an alias Column.in_().Column.try_cast(), which is a special version of cast(). It tries to cast a string expression to other types and returns null if the cast is not possible.Column.startswith() and Column.substr() to process string columns.Column.cast() now also accepts a str value to indicate the cast type in addition to a DataType instance.DataFrame.describe() to summarize stats of a DataFrame.DataFrame.explain() to print the query plan of a DataFrame.DataFrame.filter() and DataFrame.select_expr() now accepts a sql expression.bool parameter create_temp_table to methods DataFrame.saveAsTable() and Session.write_pandas() to optionally create a temp table.DataFrame.minus() and DataFrame.subtract() as aliases to DataFrame.except_().regexp_replace(), concat(), concat_ws(), to_char(), current_timestamp(), current_date(), current_time(), months_between(), cast(), try_cast(), greatest(), least(), and hash() to module snowflake.snowpark.functions.Session.createDataFrame(pandas_df) and Session.write_pandas(pandas_df) raise an exception when the pandas DataFrame has spaces in the column name.DataFrame.copy_into_table() sometimes prints an error level log entry while it actually works. It's fixed now.DataFrame APIs are missing from the docs.snowflake-connector-python to 2.7.2, which upgrades pyarrow dependency to 6.0.x. Refer to the python connector 2.7.2 release notes for more details.Session.createDataFrame() method for creating a DataFrame from a pandas DataFrame.Session.write_pandas() method for writing a pandas DataFrame to a table in Snowflake and getting a Snowpark DataFrame object back.cume_dist(), to find the cumulative distribution of a value with regard to other values within a window partition,
and row_number(), which returns a unique row number for each row within a window partition.DataFrameStatFunctions class.DataFrameNaFunctions class.rollup(), cube(), and pivot() to the DataFrame class.GroupingSets class, which you can use with the DataFrame groupByGroupingSets method to perform a SQL GROUP BY GROUPING SETS.FileOperation(session)
class that you can use to upload and download files to and from a stage.DataFrame.copy_into_table()
method for loading data from files in a stage into a table.when() and otherwise()
now accept Python types in addition to Column objects.replace parameter to True to overwrite an existing UDF with the same name.df.select(when(col("a") == 1, 4).otherwise(col("a"))), [Row(4), Row(2), Row(3)] raised an exception.df.toPandas() raised an exception when a DataFrame was created from large local data.Start of Private Preview