completions

Otherwise known as chat completions. See the litellm documention.

The two required fields for completions are model and message. Some optional arguments are:

Properties of messages

Each message in the messages array can include the following fields:

  • role: str (required) - The role of the message’s author. Roles can be: system, user, assistant, function or tool.
  • content: Union[str,List[dict],None] (required) - The contents of the message. It is required for all messages, but may be null for assistant messages with function calls.
  • name: str - The name of the author of the message. It is required if the role is “function”. The name should match the name of the function represented in the content. It can contain characters (a-z, A-Z, 0-9), and underscores, with a maximum length of 64 characters.
  • function_call: object - The name and arguments of a function that should be called, as generated by the model.
  • tool_call_id: str - Tool call that this message is responding to.

Explanation of roles

  • system: Sets assistant context. Example: { "role": "system", "content": "You are a helpful assistant." }
  • user: End user input. Example: { "role": "user", "content": "What's the weather like today?" }
  • assistant: AI response. Example: { "role": "assistant", "content": "The weather is sunny and warm." }
  • function: Function call/result (name required). Example: { "role": "function", "name": "get_weather", "content": "{\"location\": \"San Francisco\"}" }
  • tool: Tool/plugin interaction (tool_call_id required). Example: { "role": "tool", "tool_call_id": "abc123", "content": "Tool response here" }

Simplified completions: prompt

Use the llm.prompt (async: llm.async_prompt) to perform a simplified single-turn completion.

completion

completion(
   model: str,
   messages: typing.List[typing.Dict[str, str]],
   *args,
   cache_enabled: bool,
   cache_path: typing.Union[str, pathlib.Path, NoneType],
   cache_key_prefix: typing.Optional[str],
   include_model_in_cache_key: bool,
   return_cache_key: bool,
   return_info: bool,
   enable_retries: bool,
   retry_on_exceptions: typing.Optional[list[Exception]],
   retry_on_all_exceptions: bool,
   max_retries: typing.Optional[int],
   retry_delay: typing.Optional[int],
   **kwargs
)

This function is a wrapper around a corresponding function in the litellm library, see this for a full list of the available arguments.


response, cache_hit, call_log = completion(
    model="gpt-4o-mini",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "What is the capital of France?"}
    ],
)
response.choices[0].message.content
'The capital of France is Paris.'
print(f"Cache hit: {cache_hit}")
print(f"Input tokens: {call_log['input_tokens']}")
print(f"Output tokens: {call_log['output_tokens']}")
print(f"Cost: {call_log['cost']}")
Cache hit: True
Input tokens: 24
Output tokens: 14
Cost: 7.8e-06
response = completion(
    model="gpt-4o-mini",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "What is the capital of France?"}
    ],
    return_info=False
)
response.choices[0].message.content
'The capital of France is Paris.'
class Recipe(BaseModel):
    name: str
    ingredients: List[str]
    steps: List[str]

response, cache_hit, call_log = completion(
    model="gpt-4o-mini",
    messages=[
        {"role": "system", "content": "You are a helpful cooking assistant."},
        {"role": "user", "content": "Give me a simple recipe for pancakes."}
    ],
    response_format=Recipe
)

Recipe.model_validate_json(response.choices[0].message.content).model_dump()
{'name': 'Simple Pancakes',
 'ingredients': ['1 cup all-purpose flour',
  '2 tablespoons sugar',
  '2 teaspoons baking powder',
  '1/2 teaspoon salt',
  '1 cup milk',
  '1 egg',
  '2 tablespoons melted butter',
  '1 teaspoon vanilla extract'],
 'steps': ['In a large bowl, whisk together the flour, sugar, baking powder, and salt.',
  'In a separate bowl, mix the milk, egg, melted butter, and vanilla extract until well combined.',
  "Pour the wet ingredients into the dry ingredients and stir until just combined. Do not overmix; it's okay if there are a few lumps.",
  'Heat a non-stick skillet or griddle over medium heat and grease lightly with butter or oil.',
  'Pour 1/4 cup of batter onto the skillet for each pancake. Cook until bubbles form on the surface, about 2-3 minutes.',
  'Flip the pancakes and cook for another 2-3 minutes, until golden brown.',
  'Remove from skillet and keep warm while cooking the remaining pancakes.']}

You can save costs during testing using mock responses:

response, cache_hit, call_log = completion(
    model="gpt-4o-mini",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "What is the capital of Sweden?"}
    ],
    mock_response = "Stockholm"
)
response.choices[0].message.content
'Stockholm'

async_completion (async)

async_completion(
   model: str,
   messages: typing.List[typing.Dict[str, str]],
   *args,
   cache_enabled: bool,
   cache_path: typing.Union[str, pathlib.Path, NoneType],
   cache_key_prefix: typing.Optional[str],
   include_model_in_cache_key: bool,
   return_cache_key: bool,
   return_info: bool,
   enable_retries: bool,
   retry_on_exceptions: typing.Optional[list[Exception]],
   retry_on_all_exceptions: bool,
   max_retries: typing.Optional[int],
   retry_delay: typing.Optional[int],
   timeout: typing.Optional[int],
   **kwargs
)

response, cache_hit, call_log = await async_completion(
    model="gpt-3.5-turbo",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "What is the capital of France?"}
    ],
)
response.choices[0].message.content
'The capital of France is Paris.'

single

single(prompt: str, model: str|None, system: str|None, *args, **kwargs)

response, cache_hit, call_log = single(
    model='gpt-4o-mini',
    system='You are a helpful assistant.',
    prompt='What is the capital of France?',
)
response
'The capital of France is Paris.'
class Recipe(BaseModel):
    name: str
    ingredients: List[str]
    steps: List[str]

response, cache_hit, call_log = single(
    model="gpt-4o-mini",
    system="You are a helpful cooking assistant.",
    prompt="Give me a simple recipe for pancakes.",
    response_format=Recipe
)

Recipe.model_validate_json(response)
Recipe(name='Simple Pancakes', ingredients=['1 cup all-purpose flour', '2 tablespoons sugar', '2 teaspoons baking powder', '1/2 teaspoon salt', '1 cup milk', '1 egg', '2 tablespoons melted butter', '1 teaspoon vanilla extract'], steps=['In a large bowl, whisk together the flour, sugar, baking powder, and salt.', 'In a separate bowl, mix the milk, egg, melted butter, and vanilla extract until well combined.', "Pour the wet ingredients into the dry ingredients and stir until just combined. Do not overmix; it's okay if there are a few lumps.", 'Heat a non-stick skillet or griddle over medium heat and grease lightly with butter or oil.', 'Pour 1/4 cup of batter onto the skillet for each pancake. Cook until bubbles form on the surface, about 2-3 minutes.', 'Flip the pancakes and cook for another 2-3 minutes, until golden brown.', 'Remove from skillet and keep warm while cooking the remaining pancakes.'])

Can do multi-turn completions using get_msgs=True and passing the messages to the prev argument:

(res, _ctx), cache_hit, call_log = single(
    model='gpt-4o-mini',
    system='You are a helpful assistant.',
    prompt='Add 1 and 1',
    multi=True
)
print(res)

(res, _ctx), cache_hit, call_log = single(
    prompt='Multiply that by 10',
    multi=_ctx,
)
print(res)
1 plus 1 equals 2.
2 multiplied by 10 equals 20.

async_single (async)

async_single(prompt: str, model: str|None, system: str|None, *args, **kwargs)

response, cache_hit, call_log = await async_single(
    model='gpt-4o-mini',
    system='You are a helpful assistant.',
    prompt='What is the capital of France?',
)
response
'The capital of France is Paris.'

You can execute a batch of prompt calls using adulib.asynchronous.batch_executor

results = await batch_executor(
    func=async_single,
    constant_kwargs=as_dict(model='gpt-4o-mini', system='You are a helpful assistant.'),
    batch_kwargs=[
        { 'prompt': 'What is the capital of France?' },
        { 'prompt': 'What is the capital of Germany?' },
        { 'prompt': 'What is the capital of Italy?' },
        { 'prompt': 'What is the capital of Spain?' },
        { 'prompt': 'What is the capital of Portugal?' },
    ],
    concurrency_limit=2,
    verbose=False,
)

print("\n".join([response for response, _, _ in results]))
The capital of France is Paris.
The capital of Germany is Berlin.
The capital of Italy is Rome.
The capital of Spain is Madrid.
The capital of Portugal is Lisbon.