TopkRetriever

class openicl.icl_retriever.icl_topk_retriever.TopkRetriever(dataset_reader: DatasetReader, ice_separator: str | None = '\n', ice_eos_token: str | None = '\n', prompt_eos_token: str | None = '', sentence_transformers_model_name: str | None = 'all-mpnet-base-v2', ice_num: int | None = 1, index_split: str | None = 'train', test_split: str | None = 'test', tokenizer_name: str | None = 'gpt2-xl', batch_size: int | None = 1, accelerator: Accelerator | None = None)[source]
Topk In-context Learning Retriever Class

Class of Topk Retriever.

dataset_reader

An instance of the DatasetReader class.

Type:

DatasetReader

ice_separator

A string that separates each in-context example.

Type:

str, optional

ice_eos_token

A string that is added to the end of in-context examples.

Type:

str, optional

prompt_eos_token

A string that is added to the end of the prompt.

Type:

str, optional

ice_num

The number of data in the in-context examples.

Type:

int, optional

index_split

A string for the index dataset name. The index dataset is used to select data for in-context examples. Defaults to train.

Type:

str, optional

test_split

A string for the generation dataset name. The test dataset is used to generate prompts for each data. Defaults to test.

Type:

str, optional

index_ds

The index dataset. Used to select data for in-context examples.

Type:

Dataset

test_ds

The test dataset. Used to generate prompts for each data.

Type:

Dataset

accelerator

An instance of the Accelerator class, used for multiprocessing.

Type:

Accelerator, optional

batch_size

Batch size for the DataLoader.

Type:

int, optional

model

An instance of SentenceTransformer class, used to calculate embeddings.

Type:

SentenceTransformer

tokenizer

Tokenizer for model.

Type:

AutoTokenizer

index

Index generated with FAISS.

Type:

IndexIDMap

retrieve()[source]

Retrieve for each data in generation_ds.

Returns:

the index list of in-context example for each data in test_ds.

Return type:

List[List]