DatasetReader¶
- class openicl.DatasetReader(dataset: Dataset | DatasetDict | str, input_columns: List[str] | str, output_column: str, name: str | None = None, data_files: str | None = None, input_template: PromptTemplate | None = None, output_template: PromptTemplate | None = None, input_output_template: PromptTemplate | None = None, ds_size: None | int | float = None, split: NamedSplit | None = None, test_split: str | None = 'test')[source]¶
- In-conext Learning Dataset Reader Class
Generate an DatasetReader instance through ‘dataset’.
- dataset¶
The dataset to be read.
- Type:
DatasetorDatasetDict
- input_columns¶
A list of column names (a string of column name) in the dataset that represent(s) the input field.
- Type:
List[str]orstr
- output_column¶
A column name in the dataset that represents the prediction field.
- Type:
str
- ds_size¶
The number of pieces of data to return. When ds_size is an integer and greater than or equal to 1, ds_size pieces of data are randomly returned. When 0 <
ds_size< 1,int(len(dataset) * ds_size)pieces of data are randomly returned. (used for testing)- Type:
intorfloat, optional
- references¶
The list of references, initialized by
self.dataset[self.test_split][self.output_column].- Type:
list, optional
- input_template¶
An instance of the
PromptTemplateclass, used to format the input field content during the retrieval process. (in some retrieval methods)- Type:
PromptTemplate, optional
- output_template¶
An instance of the
PromptTemplateclass, used to format the output field content during the retrieval process. (in some learnable retrieval methods)- Type:
PromptTemplate, optional
- input_output_template¶
An instance of the PromptTemplate class, used to format the input-output field content during the retrieval process. (in some retrieval methods)
- Type:
PromptTemplate, optional
- generate_input_field_corpus(dataset: Dataset | DatasetDict, split: str | None = None) List[str][source]¶
Generate corpus for input field.
- Parameters:
dataset (
DatasetorDatasetDict) – Adatasets.Datasetordatasets.DatasetDictinstance.split (
str, optional) – The split of the dataset to use. IfNone, the entire dataset will be used. Defaults toNone.
- Returns:
A list of generated input field prompts.
- Return type:
List[str]
- generate_input_field_prompt(entry: Dict) str[source]¶
Generate a prompt for the input field based on the provided
entrydata.- Parameters:
entry (
Dict) – A piece of data to be used for generating the prompt.- Returns:
The generated prompt.
- Return type:
str
- generate_input_output_field_corpus(dataset: Dataset | DatasetDict, split: str | None = None) List[str][source]¶
Generate corpus for input-output field.
- Parameters:
dataset (
DatasetorDatasetDict) – Adatasets.Datasetordatasets.DatasetDictinstance.split (
str, optional) – The split of the dataset to use. IfNone, the entire dataset will be used. Defaults toNone.
- Returns:
A list of generated input-output field prompts.
- Return type:
List[str]
- generate_input_output_field_prompt(entry: Dict) str[source]¶
Generate a prompt for the input-output field based on the provided:obj:entry data.
- Parameters:
entry (
Dict) – A piece of data to be used for generating the prompt.- Returns:
The generated prompt.
- Return type:
str
- generate_ouput_field_prompt(entry: Dict) str[source]¶
Generate a prompt for the output field based on the provided
entrydata.- Parameters:
entry (
Dict) – A piece of data to be used for generating the prompt.- Returns:
The generated prompt.
- Return type:
str
- generate_output_field_corpus(dataset: Dataset | DatasetDict, split: str | None = None) List[str][source]¶
Generate corpus for output field.
- Parameters:
dataset (
DatasetorDatasetDict) – Adatasets.Datasetordatasets.DatasetDictinstance.split (
str, optional) – The split of the dataset to use. IfNone, the entire dataset will be used. Defaults toNone.
- Returns:
A list of generated output field prompts.
- Return type:
List[str]