TopkRetriever¶

class openicl.icl_retriever.icl_topk_retriever.TopkRetriever(dataset_reader: DatasetReader, ice_separator: str | None = '\n', ice_eos_token: str | None = '\n', prompt_eos_token: str | None = '', sentence_transformers_model_name: str | None = 'all-mpnet-base-v2', ice_num: int | None = 1, index_split: str | None = 'train', test_split: str | None = 'test', tokenizer_name: str | None = 'gpt2-xl', batch_size: int | None = 1, accelerator: Accelerator | None = None)[source]¶

Topk In-context Learning Retriever Class: Class of Topk Retriever.

dataset_reader¶

An instance of the DatasetReader class.

Type:: DatasetReader

ice_separator¶

A string that separates each in-context example.

Type:: str, optional

ice_eos_token¶

A string that is added to the end of in-context examples.

Type:: str, optional

prompt_eos_token¶

A string that is added to the end of the prompt.

Type:: str, optional

ice_num¶

The number of data in the in-context examples.

Type:: int, optional

index_split¶

A string for the index dataset name. The index dataset is used to select data for in-context examples. Defaults to train.

Type:: str, optional

test_split¶

A string for the generation dataset name. The test dataset is used to generate prompts for each data. Defaults to test.

Type:: str, optional

index_ds¶

The index dataset. Used to select data for in-context examples.

Type:: Dataset

test_ds¶

The test dataset. Used to generate prompts for each data.

Type:: Dataset

accelerator¶

An instance of the Accelerator class, used for multiprocessing.

Type:: Accelerator, optional

batch_size¶

Batch size for the DataLoader.

Type:: int, optional

model¶

An instance of SentenceTransformer class, used to calculate embeddings.

Type:: SentenceTransformer

tokenizer¶

Tokenizer for model.

Type:: AutoTokenizer

index¶

Index generated with FAISS.

Type:: IndexIDMap

retrieve()[source]¶

Retrieve for each data in generation_ds.

Returns:: the index list of in-context example for each data in test_ds.
Return type:: List[List]