A Simple Example¶
Following example shows you how to perform ICL on sentiment classification dataset. More examples and tutorials can be found at our github repository.
Step 1: Load and prepare data¶
from datasets import load_dataset
from openicl import DatasetReader
# Loading dataset from huggingface
dataset = load_dataset('gpt3mix/sst2')
# Define a DatasetReader, with specified column names where input and output are stored.
data = DatasetReader(dataset, input_columns=['text'], output_column='label')
Step 2: Define the prompt template (Optional)¶
from openicl import PromptTemplate
tp_dict = {
0: "</E>Positive Movie Review: </text>",
1: "</E>Negative Movie Review: </text>"
}
template = PromptTemplate(tp_dict, {'text': '</text>'}, ice_token='</E>')
The placeholder </E> and </text> will be replaced by in-context examples and testing input, respectively.
Step 3: Initialize the Retriever¶
from openicl import TopkRetriever
# Define a retriever using the previous `DataLoader`.
# `ice_num` stands for the number of data in in-context examples.
retriever = TopkRetriever(data, ice_num=8)
Here we use the popular TopK method to build the retriever.
Step 4: Initialize the Inferencer¶
from openicl import PPLInferencer
inferencer = PPLInferencer(model_name='distilgpt2')
Step 5: Inference and scoring¶
from openicl import AccEvaluator
# the inferencer requires retriever to collect in-context examples, as well as a template to wrap up these examples.
predictions = inferencer.inference(retriever, ice_template=template)
# compute accuracy for the prediction
score = AccEvaluator().score(predictions=predictions, references=data.references)
print(score)