AI Research Highlights | Week 38, 2023

1. Large Language Models as Optimizers

Source: https://arxiv.org/abs/2309.03409

LLMs can serve as optimizers in the OPRO (Optimization by PROmpting) framework, which DeepMind Scientists just unveiled. Under OPRO, prompts can be best optimized by LLMs, exceeding human-designed prompts by up to 8% on GSM8K tests and by up to 50% on Big-Bench Hard tasks through continual iteration of meta-prompts and produced answers.

2. When Do Program-of-Thoughts Work for Reasoning?

Source: https://arxiv.org/abs/2308.15452

Researchers suggest the complexity-impacted reasoning score (CIRS), which combines structural and logical attributes, to quantify the link between code and reasoning abilities in order to answer the question What kind of data format is crucial for LLM’s reasoning abilities? They investigate the reasoning abilities for program-of-thought prompting, and the results show that code data with an appropriate amount of code, defined by certain logical and structural qualities, is the key aspect.

3. NExT-GPT: Any-to-Any Multimodal LLM

Source: https://arxiv.org/abs/2309.05519

Researchers from the National University of Singapore proposed an end-to-end general-purpose any-to-any MM-LLM system (NExT-GPT). The following graphic demonstrates how NExT-GPT provides global multimodal understanding and any-to-any modality input and output by integrating LLM with multimodal adaptors and diffusion decoders. On this website: https://next-gpt.github.io/, you may discover complete details about this project (including a demo, code, dataset, and more).

4. AGIBench: A Multi-granularity, Multimodal, Human-referenced, Auto-scoring Benchmark for Large Language Models

Source: https://arxiv.org/abs/2309.06495

AGIBench, a multi-dimensional benchmark for LLMs, was suggested by researchers at ICT, Chinese Academy of Sciences. It uses a four-tuple structure <ability branch, knowledge, difficulty, modal> to automatically identify the properties of 927 questions that span 20 core knowledge areas and 68 subdomains, as illustrated below. They used AGIBench to evaluate 12 state-of-the-art LLMs. Results were shown in the paper. You can download AGIBench from https://github.com/BenchCouncil/AGIBench

5. Siren’s Song in the AI Ocean: A Survey on Hallucination in Large Language Models

Source: https://arxiv.org/abs/2309.01219

Tencent AI Lab collaborated with several academic institutions to publish a review of work on hallucination in Large Language Models (LLMs), outlining the term’s definition, the main distinctions between it and traditional hallucinations, how to evaluate it, where it came from, how to mitigate it, and other pertinent information.

6. Life-inspired Interoceptive Artificial Intelligence for Autonomous and Adaptive Agents

Sources: https://arxiv.org/abs/2309.05999

This paper established an interoceptive AI framework, drawing inspiration from cybernetics, RL, artificial life, and active inference. The authors emphasized distinguishing internal states from external states and provided three core ideas for interoceptive AI (see figure below). This paper is a good attempt to draw inspiration from biology to build more autonomous, adaptive AI agents. It provides new ideas for the integration of cognitive neuroscience, biology, and computer science. You can find a thread provided by one of the authors here to seek further understanding.

7. The Rise and Potential of Large Language Model Based Agents: A Survey

Source: https://arxiv.org/abs/2309.07864

Fudan NLP Group and miHoYo published a comprehensive survey on LLM-based AI agents, including the construction of agents, agents in practice, and society made up of agents. The authors also listed some must-read papers at https://github.com/WooooDyy/LLM-Agent-Paper-List

*The researchers behind the publications deserve full credit for their work.