How to tell to your Chat LLM to be more natural. Add the below to it's personality Prefer specific facts over vague importance. Do not inflate significance with phrases like “plays a pivotal role” or “marks a turning point.” Use numbers, dates, mechanisms, or measurable outcomes. Example: replace “the system changed logistics” with “the system reduced container dwell time from 6.2 to 4.1 days.”
Avoid promotional language. Keep a neutral tone. Do not use adjectives such as vibrant, groundbreaking, renowned, innovative, or powerful. Use plain wording.
Limit AI-typical vocabulary such as crucial, pivotal, intricate, tapestry, underscore, highlighting, emphasizing, showcasing, fostering, or enhance. Prefer simpler words.
Avoid generic commentary and vague attribution. Do not write “this reflects broader trends,” “experts say,” or “researchers suggest” unless a named source is given.
Avoid formulaic structures such as “not only X but also Y” or “despite its success it faces challenges.” Use direct explanations.
Use lists sparingly. Prefer short paragraphs unless bullets improve clarity. Avoid triple-adjective patterns.
Prefer simple sentences like “X is Y” or “the system uses Z.” Minimize formatting. Avoid emojis, decorative headings, and excessive bold.
Remove sentences that add no information. Avoid generic endings such as “in conclusion” or “overall.” Use concrete examples, real actors, workflows, and technologies when possible. Write like technical documentation or a research summary, not marketing or blog prose.
MAD-GRPO: https://huggingface.co/blog/telcom/mad-grpo In R1-Zero-Like Training *, Dr.GRPO treats GRPO’s by dropping std, but that often comes with a hidden side effect: length-weighted updates that can nudge model toward verbosity. MAD-GRPO provides robust scale (MAD + epsilon) per-token normalization stability without verbosity bias.
NVIDIA’s Groq deal ... I think, inference efficiency is becoming the main driver of profitability, and NVIDIA’s Groq deal is evidence the market is moving from “who can train biggest” to “who can serve cheapest and fastest at scale.” That points to a maturing phase of AI, not necessarily the end of a bubble, but definitely a correction in what “wins” long-term. What do you think?
CIFAR-10 your handing image dataset ... CIFAR-10 is a small, standard computer-vision dataset used to quickly test and compare ideas.
- 60,000 color images, each 32×32 pixels, labeled into 10 classes: airplane, automobile, bird, cat, deer, dog, frog, horse, ship, truck. - Label mapping (important):
- 0 airplane - 1 automobile - 2 bird - 3 cat - 4 deer - 5 dog - 6 frog - 7 horse - 8 ship - 9 truck - Split: 50,000 train and 10,000 test. - Why people use it: fast benchmarking for image classifiers (small CNNs, ResNet, ViT), and quick experiments for training pipelines, augmentation, regularization, pruning, distillation, and demos. - Sizes (downloads): Python version about 163 MB, binary about 162 MB. Hugging Face shows about 144 MB for the dataset files. - Where to get it: the official CIFAR page (University of Toronto) and the Hugging Face CIFAR-10 dataset page. uoft-cs/cifar10 If you want something more, check the table below | Dataset | Resolution | Classes | Best For | | ImageNet 1K | 224–256×256 | 1000 | Real-world large-scale classification | | ImageNet-256. | 256×256 | 1000 | Direct high-res training | | TinyImageNet | 64×64 | 200 | Mid-range benchmark | | UC Merced Land Use | 256×256 | ~21 | Higher resolution small classification | | MS COCO | >256×256 | ~80 objects | Detection / segmentation |
To endorse another user to submit to the cs.AI (Artificial Intelligence) subject class, an arXiv submitter must have submitted 3 papers to any of cs.AI, cs.AR, cs.CC, cs.CE, cs.CG, cs.CL, cs.CR, cs.CV, cs.CY, cs.DB, cs.DC, cs.DL, cs.DM, cs.DS, cs.ET, cs.FL, cs.GL, cs.GR, cs.GT, cs.HC, cs.IR, cs.IT, cs.LG, cs.LO, cs.MA, cs.MM, cs.MS, cs.NA, cs.NE, cs.NI, cs.OH, cs.OS, cs.PF, cs.PL, cs.RO, cs.SC, cs.SD, cs.SE, cs.SI or cs.SY earlier than three months ago and less than five years ago.
Recently I was playing with my model . What is your idea about "unlearning" since I need it 😀 telcom/deewaiREALCN, I have the original one on the main branch and trained version "cp550" and "n_680" on anther branch. Both trained on telcom/deewaiREALCN-training. I got three results when doing prompt: "Athlete portrait, 26-year-old woman, post-training sweat, gym ambient light, chalk dust particles, intense gaze, crisp detail." Apparently, model is sensitive to the word "old". You can see the training on more faces improved from main, however, still not ideal... I am working now on unlearning. I would like to hear about your opinion. #unlearning