Text-to-Text Task
How to define a text-to-text task
The GPT Task framework has two components:
A generalized schema to define a llm text generation task.
An execution engine that runs the task defined in the above schema.
The task definition is represented in the key-value pairs that can be transformed into, among many other formats, a JSON string, which can be validated using a JSON schema. And the validation tools exist for most of the popular programming languages.
The execution engine is integrated into the node of the Helium Network, and the JSON string format of the task definition is used to send tasks in the Helium Network.
GPT Task definition
The following is an intuitive look at a task definition:
Model
The base model could be any large language model suitable for text generation task.
The model should be a Huggingface model ID.
For example:
Messages
Messages is a list of message objects comprising the conversation so far.
For example:
Message Object
Message object has two fields: role
and content
.
The field role
represents the role of message author, can be user
, assistant
and system
.
The field content
is the message content.
During execution, the messages will be formatted to a plain string using the model's chat template, and then be send to the model as input prompt. Accroding to the different message role, different tags defined by the model will be added around each message. However, some models have no chat template, in this situation all the message contents will be simply joined to a single string.
Generation Config
Generation config is a set of parameters to control the text generation behavior of the model.
For example:
max_new_tokens
The maximum numbers of tokens to generate, ignoring the number of tokens in the input prompt.
do_sample
Whether or not to use sampling ; use greedy decoding otherwise.
num_beams
Number of beams for beam search. 1 means no beam search.
temperature
The value used to modulate the next token probabilities. The higher the temperature, the flattering the next token probabilities. When the temperature equals 0, the sampling will be downgraded to greedy decoding.
typical_p
top_k
The number of highest probability vocabulary tokens to keep for top-k-filtering.
top_p
If set to float < 1, only the smallest set of most probable tokens with probabilities that add up to top_p or higher are kept for generation.
repetition_penalty
num_return_sequences
The number of independently computed returned sequences for each element in the batch.
Seed
The seed used to initialize the random processes.
Dtype
Optional. Control the data precision for the model. Can be float16
, bfloat16
, float32
or auto
. When dtype=auto
, the parameter dtype
will be determined by the model's config file.
Quantize_bits
Optional. Control the model quantization type. Can be 4
or 8
. 4
means the INT4 quantization, 8
means the INT8 quantization.
GPT Task Response
The following is an intuitive look at a task response:
Model
The model used for text generation.
Choices
A list of choice object. The count of choices equals the the parameter num_return_sequences
in generation_config
of task definition.
Choice Object
A choice object has three fields, finish_reason
, message
and index
.
finish_reason
represents the finish reason of the generated message, can be stop
or length
. When finish reason is stop
, means the generated text ends with an eos token and stops naturally. When finish reason is length
, means the generated text is truncated by the output token length limit, which defines by the max_new_tokens
parameter in generation_config
of task definition.
message
is a message object which is the same with message object used in task definition. The role
of response message will always be assistant
.
index
is the index of the choice object in all choices, begins from 0.
Usage
Usage represents the token used of this text generation task. It has three fields, prompt_tokens
, completion_tokens
and total_tokens
.
prompt_tokens
means the input prompt tokens count. completion_tokens
means the sum of all choices content tokens count. total_tokens
is the sum of prompt_tokens
and completion_tokens
.
Last updated