datasets generator

datasets generator

Postby Antonio Linares » Thu Feb 08, 2024 11:01 am

Simple code to use an AI model to build a dataset:

generate.py
Code: Select all  Expand view
import json
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

# Load the pre-trained model
model_name = "mlabonne/phixtral-4x2_8"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name,trust_remote_code=True)

# Define the prompt
prompt = "Generate a simple question and answer regarding astrology:"

# Generate 1000 questions and answers
qa_pairs = []
for i in range(1000):
    input_ids = tokenizer.encode(prompt, return_tensors="pt", max_length=512, truncation=True)
    output = model.generate(input_ids, max_length=100, num_return_sequences=1, no_repeat_ngram_size=2, top_k=50, top_p=0.95, temperature=0.7)
    generated_text = tokenizer.decode(output[0], skip_special_tokens=True)
    qa_pairs.append({"question": f"Question {i+1}", "answer": generated_text})

# Save the QA pairs to a JSON file
with open("questions_and_answers.json", "w") as json_file:
    json.dump(qa_pairs, json_file, indent=4)

print("Questions and answers saved to 'questions_and_answers.json'")
regards, saludos

Antonio Linares
www.fivetechsoft.com
User avatar
Antonio Linares
Site Admin
 
Posts: 42098
Joined: Thu Oct 06, 2005 5:47 pm
Location: Spain

Return to Artificial Intelligence examples

Who is online

Users browsing this forum: No registered users and 1 guest