Using Data Adapters¶
This is an index of all the available data adapters, both savers and loaders. Note that some savers and loaders are the same (certain classes can handle both), but some are different. You will want to reference this when calling out to any of the following:
Using load_from [or for just exposing metadata dataloader].
Using materializers.
To read these tables, you want to first look at the key to determine which format you want – these should be human-readable and familiar to you. Then you’ll want to look at the types field to figure out which is the best for your case (the object you want to load from or save to).
Finally, look up the adapter params to see what parameters you can pass to the data adapters. The optional params come with their default value specified.
If you want more information, click on the module, it will send you to the code that implements it to see how the parameters are used.
As an example, say we wanted to save a pandas dataframe to a CSV file. We would first find the key csv, which would inform us that we want to call save_to.csv (or to.csv in the case of materialize). Then, we would look at the types field, finding that there is a pandas dataframe adapter. Finally, we would look at the params field, finding that we can pass path, and (optionally) sep (which we’d realize defaults to , when looking at the code).
All together, we’d end up with:
import pandas as pd
from hamilton.function_modifiers import value, save_to
@save_to.csv(path=value("my_file.csv"))
def my_data(...) -> pd.DataFrame:
...
For a less “abstracted” approach, where you just expose metadata from saving and loading, you can annotated your saving/loading functions to do so, e.g. analogous to the above you could do:
import pandas as pd
from hamilton.function_modifiers import datasaver
def my_data(...) -> pd.DataFrame:
# your function
...
return _df # return some df
@datasaver
def my_data_saver(my_data: pd.DataFrame, path: str) -> dict:
# code to save my_data
return {"path": path, "type": "csv", ...} # add other metadata
See dataloader for more information on how to load data and expose metadata via this more lighter weight way.
If you want to extend the @save_to or @load_from decorators, see Using Data Adapters for documentation, and the example in the repository for an example of how to do so.
Note that you will need to call registry.register_adapters (or import a module that does that) prior to dynamically referring to these in the code – otherwise we won’t know about them, and won’t be able to access that key!
Data Loaders¶
key |
loader params |
types |
module |
|---|---|---|---|
| json |
str |
dictlist |
hamilton.io.default_data_loaders |
| json |
Unionchunksize Optional=Nonecompression Union=inferconvert_axes Optional=Noneconvert_dates Union=Truedate_unit Optional=Nonedtype Union=Nonedtype_backend Optional=Noneencoding Optional=Noneencoding_errors Optional=strictengine str=ujsonkeep_default_dates bool=Truelines bool=Falsenrows Optional=Noneorient Optional=Noneprecise_float bool=Falsestorage_options Optional=Nonetyp str=frame |
DataFrame |
hamilton.plugins.pandas_extensions |
| json |
Unionschema Union=Noneschema_overrides Union=None |
DataFrame |
hamilton.plugins.polars_post_1_0_0_extensions |
| json |
Union |
XGBModelBooster |
hamilton.plugins.xgboost_extensions |
| literal |
Any |
Any |
hamilton.io.default_data_loaders |
| file |
strencoding str=utf-8 |
str |
hamilton.io.default_data_loaders |
| file |
Union |
LGBMModelBoosterCVBooster |
hamilton.plugins.lightgbm_extensions |
| pickle |
str |
objectAny |
hamilton.io.default_data_loaders |
| pickle |
Union=Nonepath Union=Nonecompression Union=inferstorage_options Optional=None |
DataFrame |
hamilton.plugins.pandas_extensions |
| environment |
Tuple |
dict |
hamilton.io.default_data_loaders |
| yaml |
Union |
strintfloatbooldictlist |
hamilton.plugins.yaml_extensions |
| npy |
Unionmmap_mode Optional=Noneallow_pickle Optional=Nonefix_imports Optional=Noneencoding Literal=ASCII |
ndarray |
hamilton.plugins.numpy_extensions |
| csv |
Unionsep Optional=,delimiter Optional=Noneheader Union=infernames Optional=Noneindex_col Union=Noneusecols Union=Nonedtype Union=Noneengine Optional=Noneconverters Optional=Nonetrue_values Optional=Nonefalse_values Optional=Noneskipinitialspace Optional=Falseskiprows Union=Noneskipfooter int=0nrows Optional=Nonena_values Union=Nonekeep_default_na bool=Truena_filter bool=Trueverbose bool=Falseskip_blank_lines bool=Trueparse_dates Union=Falsekeep_date_col bool=Falsedate_format Optional=Nonedayfirst bool=Falsecache_dates bool=Trueiterator bool=Falsechunksize Optional=Nonecompression Union=inferthousands Optional=Nonedecimal str=.lineterminator Optional=Nonequotechar Optional=Nonequoting int=0doublequote bool=Trueescapechar Optional=Nonecomment Optional=Noneencoding str=utf-8encoding_errors Union=strictdialect Union=Noneon_bad_lines Union=errordelim_whitespace bool=Falselow_memory bool=Truememory_map bool=Falsefloat_precision Optional=Nonestorage_options Optional=Nonedtype_backend Literal=numpy_nullable |
DataFrame |
hamilton.plugins.pandas_extensions |
| csv |
Unionhas_header bool=Trueinclude_header bool=Truecolumns Union=Nonenew_columns Sequence=Noneseparator str=,comment_char str=Nonequote_char str="skip_rows int=0dtypes Union=Nonenull_values Union=Nonemissing_utf8_is_empty_string bool=Falseignore_errors bool=Falsetry_parse_dates bool=Falsen_threads int=Noneinfer_schema_length int=100batch_size int=8192n_rows int=Noneencoding Union=utf8low_memory bool=Falserechunk bool=Trueuse_pyarrow bool=Falsestorage_options Dict=Noneskip_rows_after_header int=0row_count_name str=Nonerow_count_offset int=0sample_size int=1024eol_char str=
raise_if_empty bool=True |
DataFrame |
hamilton.plugins.polars_post_1_0_0_extensions |
| csv |
Unionhas_header bool=Truecolumns Union=Nonenew_columns Sequence=Noneseparator str=,comment_char str=Nonequote_char str="skip_rows int=0dtypes Union=Nonenull_values Union=Nonemissing_utf8_is_empty_string bool=Falseignore_errors bool=Falsetry_parse_dates bool=Falsen_threads int=Noneinfer_schema_length int=100batch_size int=8192n_rows int=Noneencoding Union=utf8low_memory bool=Falserechunk bool=Trueuse_pyarrow bool=Falsestorage_options Dict=Noneskip_rows_after_header int=0row_count_name str=Nonerow_count_offset int=0eol_char str=
raise_if_empty bool=True |
LazyFrame |
hamilton.plugins.polars_lazyframe_extensions |
| csv |
SparkSessionpath strheader bool=Truesep str=, |
DataFrame |
hamilton.plugins.spark_extensions |
| parquet |
Unionengine Literal=autocolumns Optional=Nonestorage_options Optional=Noneuse_nullable_dtypes bool=Falsedtype_backend Literal=numpy_nullablefilesystem Optional=Nonefilters Union=None |
DataFrame |
hamilton.plugins.pandas_extensions |
| parquet |
Unioncolumns Union=Nonen_rows int=Noneuse_pyarrow bool=Falsememory_map bool=Truestorage_options Dict=Noneparallel Any=autorow_count_name str=Nonerow_count_offset int=0low_memory bool=Falsepyarrow_options Dict=Noneuse_statistics bool=Truerechunk bool=True |
DataFrame |
hamilton.plugins.polars_post_1_0_0_extensions |
| parquet |
Unioncolumns Union=Nonen_rows int=Noneuse_pyarrow bool=Falsememory_map bool=Truestorage_options Dict=Noneparallel Any=autorow_count_name str=Nonerow_count_offset int=0low_memory bool=Falseuse_statistics bool=Truerechunk bool=True |
LazyFrame |
hamilton.plugins.polars_lazyframe_extensions |
| parquet |
SparkSessionpath str |
DataFrame |
hamilton.plugins.spark_extensions |
| sql |
strdb_connection Unionchunksize Optional=Nonecoerce_float bool=Truecolumns Optional=Nonedtype Union=Nonedtype_backend Optional=Noneindex_col Union=Noneparams Union=Noneparse_dates Union=None |
DataFrame |
hamilton.plugins.pandas_extensions |
| xml |
Unionxpath Optional=./*namespace Optional=Noneelems_only Optional=Falseattrs_only Optional=Falsenames Optional=Nonedtype Optional=Noneconverters Optional=Noneparse_dates Union=Falseencoding Optional=utf-8parser str=lxmlstylesheet Union=Noneiterparse Optional=Nonecompression Union=inferstorage_options Optional=Nonedtype_backend str=numpy_nullable |
DataFrame |
hamilton.plugins.pandas_extensions |
| html |
Unionmatch Optional=.+flavor Union=Noneheader Union=Noneindex_col Union=Noneskiprows Union=Noneattrs Optional=Noneparse_dates Optional=Nonethousands Optional=,encoding Optional=Nonedecimal str=.converters Optional=Nonena_values Iterable=Nonekeep_default_na bool=Truedisplayed_only bool=Trueextract_links Optional=Nonedtype_backend Literal=numpy_nullablestorage_options Optional=None |
DataFrame |
hamilton.plugins.pandas_extensions |
| stata |
Unionconvert_dates bool=Trueconvert_categoricals bool=Trueindex_col Optional=Noneconvert_missing bool=Falsepreserve_dtypes bool=Truecolumns Optional=Noneorder_categoricals bool=Truechunksize Optional=Noneiterator bool=Falsecompression Union=inferstorage_options Optional=None |
DataFrame |
hamilton.plugins.pandas_extensions |
| feather |
Unioncolumns Optional=Noneuse_threads bool=Truestorage_options Optional=Nonedtype_backend Literal=numpy_nullable |
DataFrame |
hamilton.plugins.pandas_extensions |
| feather |
Unioncolumns Union=Nonen_rows Optional=Noneuse_pyarrow bool=Falsememory_map bool=Truestorage_options Optional=Nonerow_count_name Optional=Nonerow_count_offset int=0rechunk bool=True |
DataFrame |
hamilton.plugins.polars_post_1_0_0_extensions |
| feather |
Unioncolumns Union=Nonen_rows Optional=Noneuse_pyarrow bool=Falsememory_map bool=Truestorage_options Optional=Nonerow_count_name Optional=Nonerow_count_offset int=0rechunk bool=True |
LazyFrame |
hamilton.plugins.polars_lazyframe_extensions |
| orc |
Unioncolumns Optional=Nonedtype_backend Literal=numpy_nullablefilesystem Union=None |
DataFrame |
hamilton.plugins.pandas_extensions |
| excel |
Union=Nonesheet_name Union=0header Union=0names Optional=Noneindex_col Union=Noneusecols Union=Nonedtype Union=Noneengine Optional=Noneconverters Union=Nonetrue_values Optional=Nonefalse_values Optional=Noneskiprows Union=Nonenrows Optional=Nonekeep_default_na bool=Truena_filter bool=Trueverbose bool=Falseparse_dates Union=Falsedate_format Union=Nonethousands Optional=Nonedecimal str=.comment Optional=Noneskipfooter int=0storage_options Optional=Nonedtype_backend Literal=numpy_nullableengine_kwargs Optional=None |
DataFrame |
hamilton.plugins.pandas_extensions |
| table |
Unionsep Optional=Nonedelimiter Optional=Noneheader Union=infernames Optional=Noneindex_col Union=Noneusecols Optional=Nonedtype Union=Noneengine Optional=Noneconverters Optional=Nonetrue_values Optional=Nonefalse_values Optional=Noneskipinitialspace bool=Falseskiprows Union=Noneskipfooter int=0nrows Optional=Nonena_values Union=Nonekeep_default_na bool=Truena_filter bool=Trueverbose bool=Falseskip_blank_lines bool=Trueparse_dates Union=Falseinfer_datetime_format bool=Falsekeep_date_col bool=Falsedate_parser Optional=Nonedate_format Optional=Nonedayfirst bool=Falsecache_dates bool=Trueiterator bool=Falsechunksize Optional=Nonecompression Union=inferthousands Optional=Nonedecimal str=.lineterminator Optional=Nonequotechar Optional="quoting int=0doublequote bool=Trueescapechar Optional=Nonecomment Optional=Noneencoding Optional=Noneencoding_errors Optional=strictdialect Optional=Noneon_bad_lines Union=errordelim_whitespace bool=Falselow_memory bool=Truememory_map bool=Falsefloat_precision Optional=Nonestorage_options Optional=Nonedtype_backend Literal=numpy_nullable |
DataFrame |
hamilton.plugins.pandas_extensions |
| fwf |
Unioncolspecs Union=inferwidths Optional=Noneinfer_nrows int=100dtype_backend Literal=numpy_nullable |
DataFrame |
hamilton.plugins.pandas_extensions |
| spss |
Unionusecols Union=Noneconvert_categoricals bool=Truedtype_backend Literal=numpy_nullable |
DataFrame |
hamilton.plugins.pandas_extensions |
| avro |
Unioncolumns Union=Nonen_rows Optional=None |
DataFrame |
hamilton.plugins.polars_post_1_0_0_extensions |
| database |
strconnection Unioniter_batches bool=Falsebatch_size Optional=Noneschema_overrides Optional=Noneinfer_schema_length Optional=Noneexecute_options Optional=None |
DataFrame |
hamilton.plugins.polars_post_1_0_0_extensions |
| spreadsheet |
Unionsheet_id Union=Nonesheet_name Union=Noneengine Literal=xlsx2csvengine_options Optional=Noneread_options Optional=Noneschema_overrides Optional=Noneraise_if_empty bool=True |
DataFrame |
hamilton.plugins.polars_post_1_0_0_extensions |
| dlt |
DltResource |
DataFrame |
hamilton.plugins.dlt_extensions |
| mlflow |
Optional=Nonemode Literal=trackingrun_id Optional=Nonepath Union=modelmodel_name Optional=Noneversion Union=Noneversion_alias Optional=Noneflavor Union=Nonemlflow_kwargs Dict=None |
Any |
hamilton.plugins.mlflow_extensions |
Data Savers¶
key |
saver params |
types |
module |
|---|---|---|---|
| json |
str |
dictlist |
hamilton.io.default_data_loaders |
| json |
Unioncompression str=inferdate_format str=epochdate_unit str=msdefault_handler Optional=Nonedouble_precision int=10force_ascii bool=Trueindex Optional=Noneindent int=0lines bool=Falsemode str=worient Optional=Nonestorage_options Optional=None |
DataFrame |
hamilton.plugins.pandas_extensions |
| json |
Union |
DataFrameLazyFrame |
hamilton.plugins.polars_post_1_0_0_extensions |
| json |
Union |
XGBModelBooster |
hamilton.plugins.xgboost_extensions |
| file |
strencoding str=utf-8 |
str |
hamilton.io.default_data_loaders |
| file |
Union |
bytesBytesIO |
hamilton.io.default_data_loaders |
| file |
Unionnum_iteration Optional=Nonestart_iteration int=0importance_type Literal=split |
LGBMModelBoosterCVBooster |
hamilton.plugins.lightgbm_extensions |
| pickle |
str |
object |
hamilton.io.default_data_loaders |
| pickle |
Unioncompression Union=inferprotocol int=5storage_options Optional=None |
DataFrame |
hamilton.plugins.pandas_extensions |
| memory | Any |
hamilton.io.default_data_loaders | |
| yaml |
Union |
strintfloatbooldictlist |
hamilton.plugins.yaml_extensions |
| plt |
Uniondpi Union=Noneformat Optional=Nonemetadata Optional=Nonebbox_inches Union=Nonepad_inches Union=Nonefacecolor Union=Noneedgecolor Union=Nonebackend Optional=Noneorientation Optional=Nonepapertype Optional=Nonetransparent Optional=Nonebbox_extra_artists Optional=Nonepil_kwargs Optional=None |
Figure |
hamilton.plugins.matplotlib_extensions |
| npy |
Unionallow_pickle Optional=Nonefix_imports Optional=None |
ndarray |
hamilton.plugins.numpy_extensions |
| csv |
Unionsep Optional=,na_rep str=float_format Union=Nonecolumns Optional=Noneheader Union=Trueindex Optional=Falseindex_label Union=Nonemode str=wencoding Optional=Nonecompression Union=inferquoting Optional=Nonequotechar Optional="lineterminator Optional=Nonechunksize Optional=Nonedate_format Optional=Nonedoublequote bool=Trueescapechar Optional=Nonedecimal str=.errors str=strictstorage_options Optional=None |
DataFrame |
hamilton.plugins.pandas_extensions |
| csv |
Unioninclude_header bool=Trueseparator str=,line_terminator str=
quote_char str="batch_size int=1024datetime_format str=Nonedate_format str=Nonetime_format str=Nonefloat_precision int=Nonenull_value str=Nonequote_style Type=None |
DataFrameLazyFrame |
hamilton.plugins.polars_post_1_0_0_extensions |
| parquet |
Unionengine Literal=autocompression Optional=snappyindex Optional=Nonepartition_cols Optional=Nonestorage_options Optional=Noneextra_kwargs Optional=None |
DataFrame |
hamilton.plugins.pandas_extensions |
| parquet |
Unioncompression Any=zstdcompression_level int=Nonestatistics bool=Falserow_group_size int=Noneuse_pyarrow bool=Falsepyarrow_options Dict=None |
DataFrameLazyFrame |
hamilton.plugins.polars_post_1_0_0_extensions |
| sql |
strdb_connection Anychunksize Optional=Nonedtype Union=Noneif_exists str=failindex bool=Trueindex_label Union=Nonemethod Union=Noneschema Optional=None |
DataFrame |
hamilton.plugins.pandas_extensions |
| xml |
Unionindex bool=Trueroot_name str=datarow_name str=rowna_rep Optional=Noneattr_cols Optional=Noneelems_cols Optional=Nonenamespaces Optional=Noneprefix Optional=Noneencoding str=utf-8xml_declaration bool=Truepretty_print bool=Trueparser str=lxmlstylesheet Union=Nonecompression Union=inferstorage_options Optional=None |
DataFrame |
hamilton.plugins.pandas_extensions |
| html |
Union=Nonecolumns Optional=Nonecol_space Union=Noneheader Optional=Trueindex Optional=Truena_rep Optional=NaNformatters Union=Nonefloat_format Optional=Nonesparsify Optional=Trueindex_names Optional=Truejustify str=Nonemax_rows Optional=Nonemax_cols Optional=Noneshow_dimensions bool=Falsedecimal str=.bold_rows bool=Trueclasses Union=Noneescape Optional=Truenotebook Literal=Falseborder int=Nonetable_id Optional=Nonerender_links bool=Falseencoding Optional=utf-8 |
DataFrame |
hamilton.plugins.pandas_extensions |
| stata |
Union=Noneconvert_dates Optional=Nonewrite_index bool=Truebyteorder Optional=Nonetime_stamp Optional=Nonedata_label Optional=Nonevariable_labels Optional=Noneversion Literal=114convert_strl Optional=Nonecompression Union=inferstorage_options Optional=Nonevalue_labels Optional=None |
DataFrame |
hamilton.plugins.pandas_extensions |
| feather |
Uniondest Optional=Nonecompression Literal=Nonecompression_level Optional=Nonechunksize Optional=Noneversion Optional=2 |
DataFrame |
hamilton.plugins.pandas_extensions |
| feather |
Union=Nonecompression Type=uncompressed |
DataFrameLazyFrame |
hamilton.plugins.polars_post_1_0_0_extensions |
| orc |
Unionengine Literal=pyarrowindex Optional=Noneengine_kwargs Optional=None |
DataFrame |
hamilton.plugins.pandas_extensions |
| excel |
Unionsheet_name str=Sheet1na_rep str=float_format Optional=Nonecolumns Optional=Noneheader Union=Trueindex bool=Trueindex_label Union=Nonestartrow int=0startcol int=0engine Optional=Nonemerge_cells bool=Trueinf_rep str=inffreeze_panes Optional=Nonestorage_options Optional=Noneengine_kwargs Optional=Nonemode Optional=wif_sheet_exists Optional=Nonedatetime_format str=Nonedate_format str=None |
DataFrame |
hamilton.plugins.pandas_extensions |
| avro |
Unioncompression Any=uncompressed |
DataFrameLazyFrame |
hamilton.plugins.polars_post_1_0_0_extensions |
| database |
strconnection Unionif_table_exists Literal=failengine Literal=sqlalchemy |
DataFrameLazyFrame |
hamilton.plugins.polars_post_1_0_0_extensions |
| spreadsheet |
Unionworksheet Optional=Noneposition Union=A1table_style Union=Nonetable_name Optional=Nonecolumn_formats Optional=Nonedtype_formats Optional=Noneconditional_formats Optional=Noneheader_format Optional=Nonecolumn_totals Union=Nonecolumn_widths Union=Nonerow_totals Union=Nonerow_heights Union=Nonesparklines Optional=Noneformulas Optional=Nonefloat_precision int=3include_header bool=Trueautofilter bool=Trueautofit bool=Falsehidden_columns Union=Nonehide_gridlines bool=Nonesheet_zoom Optional=Nonefreeze_panes Union=None |
DataFrameLazyFrame |
hamilton.plugins.polars_post_1_0_0_extensions |
| png |
Uniondpi float=200format str=pngmetadata Optional=Nonebbox_inches str=Nonepad_inches float=0.1backend Optional=Nonepapertype str=Nonetransparent bool=Nonebbox_extra_artists Optional=Nonepil_kwargs Optional=None |
ConfusionMatrixDisplayDetCurveDisplayPrecisionRecallDisplayPredictionErrorDisplayRocCurveDisplayDecisionBoundaryDisplayLearningCurveDisplayPartialDependenceDisplayValidationCurveDisplayFigure |
hamilton.plugins.sklearn_plot_extensions |
| dlt |
Pipelinetable_name strprimary_key Optional=Nonewrite_disposition Optional=Nonecolumns Optional=Noneschema Optional=Noneloader_file_format Optional=None |
IterableDataFrameTableRecordBatch |
hamilton.plugins.dlt_extensions |
| mlflow |
Union=modelregister_as Optional=Noneflavor Union=Nonerun_id Optional=Nonemlflow_kwargs Dict=None |
Any |
hamilton.plugins.mlflow_extensions |