⏺️ Add and update records#

Other datasets#

Note

The records classes covered in this section correspond to three datasets: DatasetForTextClassification, DatasetForTokenClassification, and DatasetForText2Text. These will be deprecated in Argilla 2.0 and replaced by the fully configurable FeedbackDataset class. Not sure which dataset to use? Check out our section on choosing a dataset.

Add records#

The main component of the Argilla data model is called a record. A dataset in Argilla is a collection of these records. Records can be of different types depending on the currently supported tasks:

TextClassificationRecord
TokenClassificationRecord
Text2TextRecord

The most critical attributes of a record that are common to all types are:

text: The input text of the record (Required);
annotation: Annotate your record in a task-specific manner (Optional);
prediction: Add task-specific model predictions to the record (Optional);
metadata: Add some arbitrary metadata to the record (Optional);

Some other cool attributes for a record are:

vectors: Input vectors to enable semantic search.
explanation: Token attributions for highlighting text.

In Argilla, records are created programmatically using the client library within a Python script, a Jupyter notebook, or another IDE.

Let’s see how to create and upload a basic record to the Argilla web app (make sure Argilla is already installed on your machine as described in the setup guide).

We support different tasks within the Argilla eco-system focused on NLP: Text Classification, Token Classification and Text2Text.

Text Classification

import argilla as rg

rec = rg.TextClassificationRecord(
    text="beautiful accommodations stayed hotel santa... hotels higher ranked website.",
    prediction=[("price", 0.75), ("hygiene", 0.25)],
    annotation="price"
)
rg.log(records=rec, name="my_dataset")

single_textclass_record

Text Classification (multi-label)

import argilla as rg

rec = rg.TextClassificationRecord(
    text="damn this kid and her fancy clothes make me feel like a bad parent.",
    prediction=[("admiration", 0.75), ("annoyance", 0.25)],
    annotation=["price", "annoyance"],
    multi_label=True
)
rg.log(records=rec, name="my_dataset")

multi_textclass_record

Token Classification

import argilla as rg

rec = rg.TokenClassificationRecord(
    text="Michael is a professor at Harvard",
    tokens=["Michael", "is", "a", "professor", "at", "Harvard"],
    prediction=[("NAME", 0, 7, 0.75), ("LOC", 26, 33, 0.8)],
    annotation=[("NAME", 0, 7), ("LOC", 26, 33)],
)
rg.log(records=rec, name="my_dataset")

tokclass_record

Text2Text

import argilla as rg

rec = rg.Text2TextRecord(
    text="A giant spider is discovered... how much does he make in a year?",
    prediction=["He has 3*4 trees. So he has 12*5=60 apples."],
)
rg.log(records=rec, name="my_dataset")

text2text_record

Update records#

It is possible to update records from your Argilla datasets using our Python API. This approach works the same way as an upsert in a normal database, based on the record id. You can update any arbitrary parameters and they will be over-written if you use the id of the original record.

import argilla as rg

# Read all records in the dataset or define a specific search via the `query` parameter
record = rg.load("my_dataset")

# Modify first record metadata (if no previous metadata dict, you might need to create it)
record[0].metadata["my_metadata"] = "I'm a new value"

# Log record to update it, this will keep everything but add my_metadata field and value
rg.log(name="my_dataset", records=record[0])

Delete records#

You can delete records by passing their id into the rg.delete_records() function or using a query that matches the records. Learn more here.

Delete by id

## Delete by id
import argilla as rg
rg.delete_records(name="example-dataset", ids=[1,3,5])

Delete by query

## Discard records by query
import argilla as rg
rg.delete_records(name="example-dataset", query="metadata.code=33", discard_only=True)

⏺️ Add and update records#

Feedback Dataset#

Define a `FeedbackRecord`#

Format `metadata`#

Format `vectors`#

Format `suggestions`#

Format `responses`#

Add records#

Update records#

Delete records#

Other datasets#

Add records#

Update records#

Delete records#

⏺️ Add and update records#

Feedback Dataset#

Define a FeedbackRecord#

Format metadata#

Format vectors#

Format suggestions#

Format responses#

Add records#

Update records#

Delete records#

Other datasets#

Add records#

Update records#

Delete records#

Define a `FeedbackRecord`#

Format `metadata`#

Format `vectors`#

Format `suggestions`#

Format `responses`#