03_wrangle

df_to_tsv

df_to_tsv(df: pd.DataFrame, include_index: bool) -> str

Converts a DataFrame to a tab-separated string.

Useful for copying to Excel or Google Sheets.


df = pd.DataFrame([
    {"name": "Alice", "age": 30},
    {"name": "Bob", "age": 25}
])
df_to_tsv(df)
'name\tage\nAlice\t30\nBob\t25\n'

df_to_clipboard

df_to_clipboard(df: pd.DataFrame, include_index: bool)

Converts a DataFrame to a tab-separated string and copies it to the clipboard,

so it can be pasted directly into Google Sheets or Excel.

Arguments: - df (pd.DataFrame): The DataFrame to copy. - include_index (bool): Whether to include the index in the output.


flatten_dict

flatten_dict(
   d,
   sep,
   preserve: List[Any],
   keep: List[Union[str, Tuple[str]]],
   discard: List[Union[str, Tuple[str]]]
) -> dict

Flatten a nested dictionary into a single-level dictionary with compound keys.

This function recursively flattens dictionaries and lists. Nested keys are concatenated using a separator (default is ‘.’). Lists are flattened by appending the index to the key. Nested dictionaries and lists within lists are also handled.

Nested keys for preserve, keep, and discard can be specified either as strings in the form “{key}{sep}{child_key}…” or as tuples of keys.

Arguments: - d (dict): The dictionary to flatten. - sep (str): Separator to use between concatenated keys. Defaults to ‘.’. - preserve (List[Any]): List of compound keys to preserve as nested structures. If a key matches, its value will not be flattened further. - keep (List[Union[str, Tuple[str]]]): If provided, only these top-level keys will be kept. Cannot be used with ‘discard’. - discard (List[Union[str, Tuple[str]]]): If provided, these top-level keys will be excluded. Cannot be used with ‘keep’.

Returns: dict: A new flattened dictionary with compound keys.


data = {
    'key1' : 'val',
    'key2' : [1, 2, 3],
    'key3' : {
        'foo' : 'bar',
        'baz' : [1, 2, 3],
        'qux' : {
            'key4' : 'value',
            'key5' : [4, 5, 6]
        }
    }
}

flatten_dict(data)
{'key1': 'val',
 'key2.0': 1,
 'key2.1': 2,
 'key2.2': 3,
 'key3.foo': 'bar',
 'key3.baz.0': 1,
 'key3.baz.1': 2,
 'key3.baz.2': 3,
 'key3.qux.key4': 'value',
 'key3.qux.key5.0': 4,
 'key3.qux.key5.1': 5,
 'key3.qux.key5.2': 6}
flatten_dict(data, preserve=[('key3', 'qux')])
{'key1': 'val',
 'key2.0': 1,
 'key2.1': 2,
 'key2.2': 3,
 'key3.foo': 'bar',
 'key3.baz.0': 1,
 'key3.baz.1': 2,
 'key3.baz.2': 3,
 'key3.qux': {'key4': 'value', 'key5': [4, 5, 6]}}
flatten_dict(data, preserve=['key3.qux'])
{'key1': 'val',
 'key2.0': 1,
 'key2.1': 2,
 'key2.2': 3,
 'key3.foo': 'bar',
 'key3.baz.0': 1,
 'key3.baz.1': 2,
 'key3.baz.2': 3,
 'key3.qux': {'key4': 'value', 'key5': [4, 5, 6]}}
flatten_dict(data, keep=['key2'])
{'key2.0': 1, 'key2.1': 2, 'key2.2': 3}
flatten_dict(data, keep=['key3.qux'])
{'key3.qux.key4': 'value',
 'key3.qux.key5.0': 4,
 'key3.qux.key5.1': 5,
 'key3.qux.key5.2': 6}
flatten_dict(data, keep=[('key3', 'qux')])
{'key3.qux.key4': 'value',
 'key3.qux.key5.0': 4,
 'key3.qux.key5.1': 5,
 'key3.qux.key5.2': 6}
flatten_dict(data, discard=['key3'])
{'key1': 'val', 'key2.0': 1, 'key2.1': 2, 'key2.2': 3}

flatten_records_to_df

flatten_records_to_df(
   records: List[dict],
   col_prefix,
   sep,
   max_cols,
   preserve: List[Any],
   keep: List[Union[str, Tuple[str]]],
   discard: List[Union[str, Tuple[str]]]
)

Flattens a list of (potentially nested) dictionaries into a pandas DataFrame.

Each dictionary in the input list is flattened using compound keys for nested structures. Lists within dictionaries are expanded with indexed keys. The resulting DataFrame has one row per record.

Nested keys for preserve, keep, and discard can be specified either as strings in the form “{key}{sep}{child_key}…” or as tuples of keys.

Arguments: - records (List[dict]): List of dictionaries to flatten. - col_prefix (str): Prefix to add to all column names. Defaults to ’‘. - sep (str): Separator to use between concatenated keys. Defaults to’.’. - max_cols (int): Maximum allowed number of columns. Raises ValueError if exceeded. - preserve (List[Any]): List of compound keys to preserve as nested structures. - keep (List[Union[str, Tuple[str]]]): Only these top-level keys will be kept. Cannot be used with ‘discard’. - discard (List[Union[str, Tuple[str]]]): These top-level keys will be excluded. Cannot be used with ‘keep’.

Returns: pd.DataFrame: DataFrame with flattened records as rows.


records = [
    {'name': 'Alice', 'age': 30, 'address': {'city': 'Wonderland', 'zip': '12345'}},
    {'name': 'Bob', 'age': 25, 'address': {'city': 'Builderland', 'zip': '67890'}},
    {'name': 'Charlie', 'age': 35, 'address': {'city': 'Chocolate Factory', 'zip': '54321'}}
]

flatten_records_to_df(records)
name age address.city address.zip
0 Alice 30 Wonderland 12345
1 Bob 25 Builderland 67890
2 Charlie 35 Chocolate Factory 54321
flatten_records_to_df(records, preserve=['address'])
name age address
0 Alice 30 {'city': 'Wonderland', 'zip': '12345'}
1 Bob 25 {'city': 'Builderland', 'zip': '67890'}
2 Charlie 35 {'city': 'Chocolate Factory', 'zip': '54321'}
flatten_records_to_df(records, discard=['address'])
name age
0 Alice 30
1 Bob 25
2 Charlie 35
try:
    flatten_records_to_df(records, max_cols=2)
except ValueError as e:
    print(e)
Maximum number of columns (2) exceeded.
Cols: ['name', 'age', 'address.city', 'address.zip'].