dask.dataframe.read_sql_table
- dask.dataframe.read_sql_table(table, uri, index_col, divisions=None, npartitions=None, limits=None, columns=None, bytes_per_chunk='256 MiB', head_rows=5, schema=None, meta=None, engine_kwargs=None, **kwargs)[source]
Create dataframe from an SQL table.
If neither divisions or npartitions is given, the memory footprint of the first few rows will be determined, and partitions of size ~256MB will be used.
- Parameters
- tablestring or sqlalchemy expression
Select columns from here.
- uristring
Full sqlalchemy URI for the database connection
- index_colstring
Column which becomes the index, and defines the partitioning. Should be a indexed column in the SQL server, and any orderable type. If the type is number or time, then partition boundaries can be inferred from npartitions or bytes_per_chunk; otherwide must supply explicit
divisions=
.index_col
could be a function to return a value, e.g.,sql.func.abs(sql.column('value')).label('abs(value)')
.index_col=sql.func.abs(sql.column("value")).label("abs(value)")
, orindex_col=cast(sql.column("id"),types.BigInteger).label("id")
to convert the textfieldid
toBigInteger
.Note
sql
,cast
,types
methods comes fromsqlalchemy
module.Labeling columns created by functions or arithmetic operations is required.
- divisions: sequence
Values of the index column to split the table by. If given, this will override npartitions and bytes_per_chunk. The divisions are the value boundaries of the index column used to define the partitions. For example,
divisions=list('acegikmoqsuwz')
could be used to partition a string column lexographically into 12 partitions, with the implicit assumption that each partition contains similar numbers of records.- npartitionsint
Number of partitions, if divisions is not given. Will split the values of the index column linearly between limits, if given, or the column max/min. The index column must be numeric or time for this to work
- limits: 2-tuple or None
Manually give upper and lower range of values for use with npartitions; if None, first fetches max/min from the DB. Upper limit, if given, is inclusive.
- columnslist of strings or None
Which columns to select; if None, gets all; can include sqlalchemy functions, e.g.,
sql.func.abs(sql.column('value')).label('abs(value)')
. Labeling columns created by functions or arithmetic operations is recommended.- bytes_per_chunkstr, int
If both divisions and npartitions is None, this is the target size of each partition, in bytes
- head_rowsint
How many rows to load for inferring the data-types, unless passing meta
- metaempty DataFrame or None
If provided, do not attempt to infer dtypes, but use these, coercing all chunks on load
- schemastr or None
If using a table name, pass this to sqlalchemy to select which DB schema to use within the URI connection
- engine_kwargsdict or None
Specific db engine parameters for sqlalchemy
- kwargsdict
Additional parameters to pass to pd.read_sql()
- Returns
- dask.dataframe
Examples
>>> df = dd.read_sql_table('accounts', 'sqlite:///path/to/bank.db', ... npartitions=10, index_col='id')