DatasetReader¶

In-conext Learning Dataset Reader Class: Generate an DatasetReader instance through ‘dataset’.

dataset¶

The dataset to be read.

Type:: Dataset or DatasetDict

input_columns¶

A list of column names (a string of column name) in the dataset that represent(s) the input field.

Type:: List[str] or str

output_column¶

A column name in the dataset that represents the prediction field.

Type:: str

ds_size¶

The number of pieces of data to return. When ds_size is an integer and greater than or equal to 1, ds_size pieces of data are randomly returned. When 0 < ds_size < 1, int(len(dataset) * ds_size) pieces of data are randomly returned. (used for testing)

Type:: int or float, optional

references¶

The list of references, initialized by self.dataset[self.test_split][self.output_column].

Type:: list, optional

input_template¶

An instance of the PromptTemplate class, used to format the input field content during the retrieval process. (in some retrieval methods)

Type:: PromptTemplate, optional

output_template¶

An instance of the PromptTemplate class, used to format the output field content during the retrieval process. (in some learnable retrieval methods)

Type:: PromptTemplate, optional

input_output_template¶

An instance of the PromptTemplate class, used to format the input-output field content during the retrieval process. (in some retrieval methods)

Type:: PromptTemplate, optional

generate_input_field_corpus(dataset: Dataset | DatasetDict, split: str | None = None) → List[str][source]¶

Generate corpus for input field.

Parameters:

dataset (Dataset or DatasetDict) – A datasets.Dataset or datasets.DatasetDict instance.
split (str, optional) – The split of the dataset to use. If None, the entire dataset will be used. Defaults to None.

Returns:

A list of generated input field prompts.

Return type:

List[str]

generate_input_field_prompt(entry: Dict) → str[source]¶

Generate a prompt for the input field based on the provided entry data.

Parameters:: entry (Dict) – A piece of data to be used for generating the prompt.
Returns:: The generated prompt.
Return type:: str

generate_input_output_field_corpus(dataset: Dataset | DatasetDict, split: str | None = None) → List[str][source]¶

Generate corpus for input-output field.

Parameters:

dataset (Dataset or DatasetDict) – A datasets.Dataset or datasets.DatasetDict instance.
split (str, optional) – The split of the dataset to use. If None, the entire dataset will be used. Defaults to None.

Returns:

A list of generated input-output field prompts.

Return type:

List[str]

generate_input_output_field_prompt(entry: Dict) → str[source]¶

Generate a prompt for the input-output field based on the provided:obj:entry data.

Parameters:: entry (Dict) – A piece of data to be used for generating the prompt.
Returns:: The generated prompt.
Return type:: str

generate_ouput_field_prompt(entry: Dict) → str[source]¶

Generate a prompt for the output field based on the provided entry data.

Parameters:: entry (Dict) – A piece of data to be used for generating the prompt.
Returns:: The generated prompt.
Return type:: str

generate_output_field_corpus(dataset: Dataset | DatasetDict, split: str | None = None) → List[str][source]¶

Generate corpus for output field.

Parameters:

dataset (Dataset or DatasetDict) – A datasets.Dataset or datasets.DatasetDict instance.
split (str, optional) – The split of the dataset to use. If None, the entire dataset will be used. Defaults to None.

Returns:

A list of generated output field prompts.

Return type:

List[str]

set_references(column: str, split: str | None = None) → None[source]¶

Set self.references based on column and optional split.

Parameters:

column (str) – A string of column name.
split (str, optional) – A string of dataset split. Defaults to None.