FiveTech Software tech support forums

by **Antonio Linares** » Thu Nov 30, 2023 4:09 pm

Inspired in the idea of AI large languages models, this is a very simple and easy to understand sentences generator, quite funny :-)

The more sentences that you provide to it, the more "inspired" that it will be :-D

llml.prg

Code: Select all Expand view: #include "FiveWin.ch" static hTokens := {=>} function Main() Tokenizer( "el gato subió al arbol y maulló hasta que llegó el bombero" ) Tokenizer( "Me gusta aprender cosas nuevas todos los días" ) Tokenizer( "El cielo es azul y el sol brilla" ) Tokenizer( "La música es una forma de expresión artística" ) Tokenizer( "El chocolate es un dulce que se hace con cacao" ) Tokenizer( "La Tierra es el tercer planeta del sistema solar y tiene una luna" ) Tokenizer( "El agua es un líquido transparente e inodoro que se compone de hidrógeno y oxígeno" ) Tokenizer( "Los gatos son animales domésticos muy populares" ) ? Generate( "el" ) return nil function Tokenizer( cSentence ) local aTokens := hb_ATokens( cSentence ) local n hb_HCaseMatch( hTokens, .F. ) for n = 1 to Len( aTokens ) - 1 if ! hb_HHasKey( hTokens, aTokens[ n ] ) hTokens[ aTokens[ n ] ] = { aTokens[ n + 1 ] } else AAdd( hTokens[ aTokens[ n ] ], aTokens[ n + 1 ] ) endif next return nil function Generate( cToken ) local cSentence := cToken, n := 1 while hb_hHasKey( hTokens, cToken ) .and. ! Empty( hTokens[ cToken ] ) .and. n++ < 20 cSentence += " " + hTokens[ cToken ][ hb_RandomInt( 1, Len( hTokens[ cToken ] ) ) ] cToken = hTokens[ cToken ][ hb_RandomInt( 1, Len( hTokens[ cToken ] ) ) ] end return cSentence

Some funny results:

"el chocolate es un líquido que llegó compone de expresión artística"

"el agua brilla"

"el gato planeta del sistema solar y el agua"

"el tercer es el y maulló"

"el sol es un y el hasta que se hace de hidrogeno artística"

"el sol planeta del sistema solar y el hasta que se el sol es azul dulce que se el chocolate"

by **Antonio Linares** » Thu Nov 30, 2023 5:22 pm

In this example we load all William Shakespeare's books into memory:

https://www.fivetechsoft.com/files/shakespeare.txt

Code: Select all Expand view: #include "FiveWin.ch" static hTokens := {=>} function Main() local cText := hb_memoRead( "shakespeare.txt" ) local cSentence for each cSentence in hb_ATokens( cText, "." ) Tokenizer( cSentence ) next ? Generate( "the" ) return nil

by **Otto** » Thu Nov 30, 2023 9:19 pm

Dear Antonio,
thank you very much.

I added
#xtranslate hb_HHasKey( [<x,...>] ) => HHasKey( <x> )
and translated:
Tokenizer("The tomcat climbed the tree and meowed until the firefighter arrived.")

Tokenizer("I like learning new things every day.")
Tokenizer("The sky is blue and the sun is shining.")
Tokenizer("Music is a form of artistic expression.")
Tokenizer("Chocolate is a sweet made from cocoa.")
Tokenizer("The Earth is the third planet in the solar system and has a moon.")
Tokenizer("Water is a clear and odorless liquid made of hydrogen and oxygen.")
Tokenizer("Cats are very popular pets.")

However, as a result, I only get el
Should it work with xHarbour?

Best regards,
Otto

by **Antonio Linares** » Fri Dec 01, 2023 3:39 am

Dear Otto,

When you call function Generate( <cInitialWord> ) you have to provide an initial word that exists in your sentences,

in your case:

? Generate( "the" )

by **Antonio Linares** » Fri Dec 01, 2023 5:01 am

Dear Otto,

In this example you can visually review how we organize the tokens, so its easier to understand how it works :-)

llml.prg

Code: Select all Expand view: #include "FiveWin.ch" static hTokens := {=>} function Main() local n Tokenizer( "The cat climbed the tree and meowed until the firefighter arrived" ) Tokenizer( "I like learning new things every day" ) Tokenizer( "The sky is blue and the sun is shining" ) Tokenizer( "Music is a form of artistic expression" ) Tokenizer( "Chocolate is a sweet made from cocoa" ) Tokenizer( "The Earth is the third planet in the solar system and has a moon" ) Tokenizer( "Water is a clear and odorless liquid made of hydrogen and oxygen" ) Tokenizer( "Cats are very popular pets" ) Tokenizer( "Paris is the capital of France and a popular tourist destination" ) Tokenizer( "A triangle is a polygon with three sides and three angles" ) Tokenizer( "I like to read books and watch movies" ) Tokenizer( "A bicycle is a vehicle that has two wheels and pedals" ) Tokenizer( "Microsoft is a technology company that makes software and hardware products" ) Tokenizer( "Apples are a type of fruit that can be red, green, or yellow" ) Tokenizer( "Elephants are the largest land animals" ) Tokenizer( "The color wheel consists of primary colors like red, blue, and yellow" ) XBrowser( hTokens ) for n = 1 to 5 ? Generate( "The" ) ? Generate( "A" ) ? Generate( "are" ) next return nil function Tokenizer( cSentence ) local aTokens := hb_ATokens( cSentence ) local n, nAt hb_HCaseMatch( hTokens, .F. ) for n = 1 to Len( aTokens ) - 1 if ! hb_HHasKey( hTokens, aTokens[ n ] ) hTokens[ aTokens[ n ] ] = { aTokens[ n + 1 ] } else AAdd( hTokens[ aTokens[ n ] ], aTokens[ n + 1 ] ) endif next return nil function Generate( cToken ) local cSentence := cToken, n := 1 while hb_hHasKey( hTokens, cToken ) .and. ! Empty( hTokens[ cToken ] ) .and. n++ < 100 cSentence += " " + ( cToken := hTokens[ cToken ][ hb_RandomInt( 1, Len( hTokens[ cToken ] ) ) ] ) end return cSentence

by **Antonio Linares** » Fri Dec 01, 2023 6:13 am

You can use chatGPT to generate the sentences for you:

write 50 simple sentences to feed artificial intelligence using the format Tokenizer( sentence )

Tokenizer("The sun rises in the east.")
Tokenizer("Cats are furry animals.")
Tokenizer("Water freezes at zero degrees Celsius.")
Tokenizer("Mars is known as the red planet.")
Tokenizer("The moon orbits the Earth.")
Tokenizer("Bananas are a good source of potassium.")
Tokenizer("Football is a popular sport worldwide.")
Tokenizer("Stars twinkle in the night sky.")
Tokenizer("The Great Wall of China is a historic landmark.")
Tokenizer("Rivers flow from higher to lower elevations.")
Tokenizer("Bees play a crucial role in pollination.")
Tokenizer("Diamonds are a form of carbon.")
Tokenizer("The Pacific Ocean is the largest ocean on Earth.")
Tokenizer("The Statue of Liberty stands in New York Harbor.")
Tokenizer("The human body has 206 bones.")
Tokenizer("Photosynthesis is the process by which plants make food.")
Tokenizer("The Eiffel Tower is located in Paris.")
Tokenizer("Birds have feathers for flying.")
Tokenizer("The Amazon Rainforest is home to diverse wildlife.")
Tokenizer("Electricity is a form of energy.")
Tokenizer("The Sahara Desert is the largest hot desert.")
Tokenizer("Snowflakes have unique and intricate patterns.")
Tokenizer("Honey is produced by bees from flower nectar.")
Tokenizer("The Earth's atmosphere consists of nitrogen and oxygen.")
Tokenizer("Jupiter is the largest planet in our solar system.")
Tokenizer("A rainbow is formed by the refraction of light.")
Tokenizer("The Mona Lisa is a famous painting by Leonardo da Vinci.")
Tokenizer("Owls are nocturnal birds of prey.")
Tokenizer("Tigers are the largest cats in the world.")
Tokenizer("The Nile River is the longest river in Africa.")
Tokenizer("Penguins live in the Southern Hemisphere.")
Tokenizer("The Mona Lisa is displayed in the Louvre Museum.")
Tokenizer("Astronauts wear spacesuits for protection in space.")
Tokenizer("The Leaning Tower of Pisa is in Italy.")
Tokenizer("The human brain weighs about 3 pounds.")
Tokenizer("Ants are known for their organized colonies.")
Tokenizer("The Taj Mahal is a beautiful marble mausoleum.")
Tokenizer("Volcanoes erupt molten lava.")
Tokenizer("The speed of light is approximately 299,792 kilometers per second.")
Tokenizer("Chameleons can change color to blend into their surroundings.")
Tokenizer("The Celsius and Fahrenheit scales are used to measure temperature.")
Tokenizer("The Sydney Opera House is an iconic landmark in Australia.")
Tokenizer("The Hubble Space Telescope captures stunning images of the universe.")
Tokenizer("Pandas primarily eat bamboo.")
Tokenizer("The coldest temperature ever recorded on Earth was -128.6 degrees Fahrenheit.")
Tokenizer("The Mona Lisa has a mysterious smile.")
Tokenizer("Rainbows are a spectrum of light.")
Tokenizer("Dolphins are highly intelligent marine mammals.")
Tokenizer("The Berlin Wall divided East and West Berlin during the Cold War.")
Tokenizer("The human heart beats about 100,000 times per day.")

by **Otto** » Fri Dec 01, 2023 6:54 am

Dear Antonio,

Thank you, it works.
And thank you very much for your research and development work and for exploring new techniques for us.

Best regards,
Otto

FiveTech Software tech support forums

random sentences generator

random sentences generator

Re: random sentences generator

Re: random sentences generator

Re: random sentences generator

Re: random sentences generator

Re: random sentences generator

Re: random sentences generator

Who is online