BM25Retriever

class openicl.icl_retriever.icl_bm25_retriever.BM25Retriever(dataset_reader: DatasetReader, ice_separator: str | None = '\n', ice_eos_token: str | None = '\n', prompt_eos_token: str | None = '', ice_num: int | None = 1, index_split: str | None = 'train', test_split: str | None = 'test', accelerator: Accelerator | None = None)[source]
BM25 In-context Learning Retriever Class

Class of BM25 Retriever.

dataset_reader

An instance of the DatasetReader class.

Type:

DatasetReader

ice_separator

A string that separates each in-context example.

Type:

str, optional

ice_eos_token

A string that is added to the end of in-context examples.

Type:

str, optional

prompt_eos_token

A string that is added to the end of the prompt.

Type:

str, optional

ice_num

The number of data in the in-context examples.

Type:

int, optional

index_split

A string for the index dataset name. The index dataset is used to select data for in-context examples. Defaults to train.

Type:

str, optional

test_split

A string for the generation dataset name. The test dataset is used to generate prompts for each data. Defaults to test.

Type:

str, optional

index_ds

The index dataset. Used to select data for in-context examples.

Type:

Dataset

test_ds

The test dataset. Used to generate prompts for each data.

Type:

Dataset

accelerator

An instance of the Accelerator class, used for multiprocessing.

Type:

Accelerator, optional

index_corpus

A corpus created from the input field data of index_ds.

Type:

List[str]

test_corpus

A corpus created from the input field data of test_ds.

Type:

List[str]

bm25

An instance of BM250kapi class, initialized using index_ds.

Type:

BM250kapi

retrieve() List[List][source]

Retrieve for each data in generation_ds.

Returns:

the index list of in-context example for each data in test_ds.

Return type:

List[List]