Rendered at 09:14:30 GMT+0000 (Coordinated Universal Time) with Cloudflare Workers.
D2OQZG8l5BI1S06 1 days ago [-]
The post is AI-written, so I did not read it. But based on title and abstract I'll have to disagree.
The native content LLMs understand is text. They were literally trained on it. They much prefer it to any arbitrary structure you could come up with.
We're used to think computers prefer content that is structured and binary etc; but with LLMs that changed.
tardedmeme 1 days ago [-]
Their native content is semantic vectors. They had to be trained for a long time to convert between text and semantic vectors, and the conversion is very lossy. Seahorse emoji demonstrates this nicely, the LLM internally holds a semantic vector for seahorse+emoji but the output translation layer can't match it.
Alifatisk 1 days ago [-]
> Seahorse emoji demonstrates this nicely, the LLM internally holds a semantic vector for seahorse+emoji but the output translation layer can't match it.
I am curious about this, how can the LLM hold the embedding for seahorse+emoji if it doesn’t exist? How did it end up like this? Perhaps the dataset had discussions from people about new potential emojis?
tardedmeme 1 days ago [-]
Because it's just the embedding for a seahorse plus the embedding for an emoji symbol output.
halJordan 9 hours ago [-]
The crazy thing is that you can contribute literally nothing because you chose to be totally ignorant and only act on your perceived hobby horse.
And you're the same person who would tell me that ai is bad because, what? It might do the same thing you're proud you just did? Hallucinate some bs?
saghm 13 hours ago [-]
If only we had spent any time as an industry coming up with structured formats for text...
ClausVomBerg 16 hours ago [-]
[dead]
binyu 19 hours ago [-]
> It makes individual agents untestable because their inputs and outputs are strings
Strings can't be valid test vectors? Large language model are highly non-deterministic by design, no matter what.
The native content LLMs understand is text. They were literally trained on it. They much prefer it to any arbitrary structure you could come up with.
We're used to think computers prefer content that is structured and binary etc; but with LLMs that changed.
I am curious about this, how can the LLM hold the embedding for seahorse+emoji if it doesn’t exist? How did it end up like this? Perhaps the dataset had discussions from people about new potential emojis?
And you're the same person who would tell me that ai is bad because, what? It might do the same thing you're proud you just did? Hallucinate some bs?
Strings can't be valid test vectors? Large language model are highly non-deterministic by design, no matter what.