03_wrangle

df_to_tsv

df_to_tsv(df: pd.DataFrame, include_index: bool) -> str

Converts a DataFrame to a tab-separated string.

Useful for copying to Excel or Google Sheets.

df = pd.DataFrame([
    {"name": "Alice", "age": 30},
    {"name": "Bob", "age": 25}
])
df_to_tsv(df)

'name\tage\nAlice\t30\nBob\t25\n'

df_to_clipboard

df_to_clipboard(df: pd.DataFrame, include_index: bool)

Converts a DataFrame to a tab-separated string and copies it to the clipboard,

so it can be pasted directly into Google Sheets or Excel.

Arguments: - df (pd.DataFrame): The DataFrame to copy. - include_index (bool): Whether to include the index in the output.

flatten_dict

flatten_dict(
   d,
   sep,
   preserve: List[Any],
   keep: List[Union[str, Tuple[str]]],
   discard: List[Union[str, Tuple[str]]]
) -> dict

Flatten a nested dictionary into a single-level dictionary with compound keys.

This function recursively flattens dictionaries and lists. Nested keys are concatenated using a separator (default is ‘.’). Lists are flattened by appending the index to the key. Nested dictionaries and lists within lists are also handled.

Nested keys for preserve, keep, and discard can be specified either as strings in the form “{key}{sep}{child_key}…” or as tuples of keys.

Arguments: - d (dict): The dictionary to flatten. - sep (str): Separator to use between concatenated keys. Defaults to ‘.’. - preserve (List[Any]): List of compound keys to preserve as nested structures. If a key matches, its value will not be flattened further. - keep (List[Union[str, Tuple[str]]]): If provided, only these top-level keys will be kept. Cannot be used with ‘discard’. - discard (List[Union[str, Tuple[str]]]): If provided, these top-level keys will be excluded. Cannot be used with ‘keep’.

Returns: dict: A new flattened dictionary with compound keys.

data = {
    'key1' : 'val',
    'key2' : [1, 2, 3],
    'key3' : {
        'foo' : 'bar',
        'baz' : [1, 2, 3],
        'qux' : {
            'key4' : 'value',
            'key5' : [4, 5, 6]
        }
    }
}

flatten_dict(data)

{'key1': 'val',
 'key2.0': 1,
 'key2.1': 2,
 'key2.2': 3,
 'key3.foo': 'bar',
 'key3.baz.0': 1,
 'key3.baz.1': 2,
 'key3.baz.2': 3,
 'key3.qux.key4': 'value',
 'key3.qux.key5.0': 4,
 'key3.qux.key5.1': 5,
 'key3.qux.key5.2': 6}

flatten_dict(data, preserve=[('key3', 'qux')])

{'key1': 'val',
 'key2.0': 1,
 'key2.1': 2,
 'key2.2': 3,
 'key3.foo': 'bar',
 'key3.baz.0': 1,
 'key3.baz.1': 2,
 'key3.baz.2': 3,
 'key3.qux': {'key4': 'value', 'key5': [4, 5, 6]}}

flatten_dict(data, preserve=['key3.qux'])

{'key1': 'val',
 'key2.0': 1,
 'key2.1': 2,
 'key2.2': 3,
 'key3.foo': 'bar',
 'key3.baz.0': 1,
 'key3.baz.1': 2,
 'key3.baz.2': 3,
 'key3.qux': {'key4': 'value', 'key5': [4, 5, 6]}}

flatten_dict(data, keep=['key2'])

{'key2.0': 1, 'key2.1': 2, 'key2.2': 3}

flatten_dict(data, keep=['key3.qux'])

{'key3.qux.key4': 'value',
 'key3.qux.key5.0': 4,
 'key3.qux.key5.1': 5,
 'key3.qux.key5.2': 6}

flatten_dict(data, keep=[('key3', 'qux')])

{'key3.qux.key4': 'value',
 'key3.qux.key5.0': 4,
 'key3.qux.key5.1': 5,
 'key3.qux.key5.2': 6}

flatten_dict(data, discard=['key3'])

{'key1': 'val', 'key2.0': 1, 'key2.1': 2, 'key2.2': 3}

flatten_records_to_df

flatten_records_to_df(
   records: List[dict],
   col_prefix,
   sep,
   max_cols,
   preserve: List[Any],
   keep: List[Union[str, Tuple[str]]],
   discard: List[Union[str, Tuple[str]]]
)

Flattens a list of (potentially nested) dictionaries into a pandas DataFrame.

Each dictionary in the input list is flattened using compound keys for nested structures. Lists within dictionaries are expanded with indexed keys. The resulting DataFrame has one row per record.

Nested keys for preserve, keep, and discard can be specified either as strings in the form “{key}{sep}{child_key}…” or as tuples of keys.

Arguments: - records (List[dict]): List of dictionaries to flatten. - col_prefix (str): Prefix to add to all column names. Defaults to ’‘. - sep (str): Separator to use between concatenated keys. Defaults to’.’. - max_cols (int): Maximum allowed number of columns. Raises ValueError if exceeded. - preserve (List[Any]): List of compound keys to preserve as nested structures. - keep (List[Union[str, Tuple[str]]]): Only these top-level keys will be kept. Cannot be used with ‘discard’. - discard (List[Union[str, Tuple[str]]]): These top-level keys will be excluded. Cannot be used with ‘keep’.

Returns: pd.DataFrame: DataFrame with flattened records as rows.

records = [
    {'name': 'Alice', 'age': 30, 'address': {'city': 'Wonderland', 'zip': '12345'}},
    {'name': 'Bob', 'age': 25, 'address': {'city': 'Builderland', 'zip': '67890'}},
    {'name': 'Charlie', 'age': 35, 'address': {'city': 'Chocolate Factory', 'zip': '54321'}}
]

flatten_records_to_df(records)

	name	age	address.city	address.zip
0	Alice	30	Wonderland	12345
1	Bob	25	Builderland	67890
2	Charlie	35	Chocolate Factory	54321

flatten_records_to_df(records, preserve=['address'])

	name	age	address
0	Alice	30	{'city': 'Wonderland', 'zip': '12345'}
1	Bob	25	{'city': 'Builderland', 'zip': '67890'}
2	Charlie	35	{'city': 'Chocolate Factory', 'zip': '54321'}

flatten_records_to_df(records, discard=['address'])

	name	age
0	Alice	30
1	Bob	25
2	Charlie	35

try:
    flatten_records_to_df(records, max_cols=2)
except ValueError as e:
    print(e)

Maximum number of columns (2) exceeded.
Cols: ['name', 'age', 'address.city', 'address.zip'].