Biweekly News

Newsletter 11 | 19 June 2024

AI /ML

  1. New algorithm discovers language just by watching videos. DenseAV, developed at MIT, learns to parse and understand the meaning of language just by watching videos of people talking, with potential applications in multimedia search, language learning, and robotics.
    MIT News

  2. GPU maker tops new MLPerf benchmarks on graph neural nets and LLM fine-tuning.
    IEEE Spectrum

  3. Study demonstrates how AI can develop more personalised cancer treatment strategies.
    University of Oxford

  4. Proofread from Goole: Fixes All Errors with One Tap.
    arXiv

  5. ImageInWords: Unlocking Hyper-Detailed Image Descriptions.
    GitHub: google

  6. This is AI's 'next wave,' according to Nvidia CEO Jensen Huang; The chipmaker's chief executive said robots and "AI that understands the laws of physics" are the next wave of the technology.
    QUARTZ

Technology

  1. We are creating new crops five-times faster; Using current and past data, such as from satellite imagery and temperature and rainfall readings, and combining that with future projections, ClimateAi aims to give farmers the most accurate possible, locally-tailored weather forecasts, from one hour to six months ahead.
    BBC News

  2. Mouth-based touchpad enables people living with paralysis to interact with computers; The startup Augmental allows users to operate phones and other devices using their tongue, mouth, and head gestures.
    MIT News

  3. China Is Testing More Driverless Cars Than Any Other Country. Assisted driving systems and robot taxis are becoming more popular with government help, as cities designate large areas for testing on public roads.
    The New York Times

Miscellaneous

  1. MLow: Meta’s low bitrate audio codec
    Engineering at Meta

  2. African elephants address one another with individually specific name-like calls.
    Nature Ecology & Evolution

  3. The Age of the Drone Police Is Here.
    WIRED

Fun

  1. Code: GPT in 60 Lines of NumPy
    Jay Mody

Newsletter 10 | 15 May 2024

AI /ML

  1. AlphaFold 3 predicts the structure and interactions of all of life’s molecules; a new AI model developed by Google DeepMind and Isomorphic Labs. By accurately predicting the structure of proteins, DNA, RNA, ligands and more, and how they interact, we hope it will transform our understanding of the biological world and drug discovery.
    Google (Blog)

  2. OpenAI introduces the "Model Spec"; to deepen the public conversation about how AI models should behave. A new document that specifies OpenAI approach to shaping desired model behavior and how they evaluate tradeoffs when conflicts arise.
    OpenAI

  3. What Is a Virtual Factory, and How They’re Making Industrial Digitalization a Reality? Virtual factories are helping manufacturers unlock new possibilities, from planning to operations.
    NVIDIA (Blog)

  4. AI is gathering a growing amount of training data inside virtual worlds; simulation is increasingly being used to accelerate the development of autonomous vehicles.
    Singularity Hub

  5. From Meta AI "RadOnc-GPT "; Leveraging Meta Llama for a pioneering radiation oncology model. Mayo Clinic’s pioneeringRadOnc-GPT is a large language model (LLM) leveraging Meta Llama 2 that has the potential to significantly improve the speed, accuracy, and quality of radiation therapy decision-making, benefiting both medical practitioners and the patients they serve.
    Meta AI

  6. From Microsoft Research, You Only Cache Once (YOCO); Decoder-Decoder Architectures for Language Models. YOCO, for large language models, only caches key-value pairs once.
    arXiv

  7. ScrapeGraphAI: You Only Scrape Once. A web scraping python library that uses LLM and direct graph logic to create scraping pipelines for websites and local documents (XML, HTML, JSON, etc.).
    GitHub

Technology

  1. World's 1st "tooth regrowth medicine" to be tested in Japan from September 2024.
    The Mainichi

  2. Atomic Nucleus Excited with Laser: A Breakthrough after Decades. The "thorium transition", which physicists have been looking for for decades, has now been excited for the first time with lasers. This paves the way for revolutionary high precision technologies, including nuclear clocks.
    TU WIEN

  3. Apple introduces M4 chip; M4 enables the breakthrough design and stunning display of the new iPad Pro, while delivering a giant leap in performance.
    Apple

Miscellaneous

  1. Most common 4 digit PIN numbers from an analysis of 3.4 million. The top 20 constitute 27% of all PIN codes!
    Reddit

  2. Microsoft readies new AI model to compete with Google, OpenAI, The Information reports; new model, internally referred to as MAI-1 with roughly 500 billion parameters.
    Reuters

  3. New AI search engine "Upend" emerges from stealth, powered by 100 LLMs.
    VentureBeat

Fun

  1. Watch: Mini Sci-Fi series; showcases alien worlds from the distant past or far future, and focus on a terrifying turning point in a long forgotten alien culture.
    YouTube

Newsletter 9 | 01 May 2024

AI /ML

  1. Introducing Meta Llama 3: The most capable openly available LLM to date. Pretrained and instruction-fine-tuned models are the best models existing today at the 8B and 70B parameter scale. Largest models are over 400B parameters and, while these models are still training, seems already on par with best GPT.
    Meta AI

  2. KAN: Kolmogorov-Arnold Networks as promising alternatives to Multi-Layer Perceptrons (MLPs). Paper shows that this seemingly simple change makes KANs outperform MLPs in terms of accuracy and interpretability. For accuracy, much smaller KANs can achieve comparable or better accuracy than much larger MLPs in data fitting and PDE solving.
    arXiv

  3. VASA-1: Lifelike Audio-Driven Talking Faces Generated in Real Time. Single portrait photo + speech audio = hyper-realistic talking face video with precise lip-audio sync, lifelike facial behavior, and naturalistic head movements, generated in real time.
    Microsoft

  4. Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone. A 3.8 billion parameter language model trained on 3.3 trillion tokens, whose overall performance, as measured by both academic benchmarks and internal testing, rivals that of models such as Mixtral 8x7B and GPT-3.5 (e.g., phi-3-mini achieves 69% on MMLU and 8.38 on MT-bench), despite being small enough to be deployed on a phone.
    arXiv

  5. Apple releases eight small AI language models aimed at on-device use. OpenELM mirrors efforts by Microsoft to make useful small AI language models that run locally (270M-3B parameters).
    Ars Technica

  6. StoryDiffusion: Consistent Self-Attention for Long-Range Image and Video Generation. StoryDiffusion can create impressive comics by our consistent self-attention, maintain character consistency for cohesive storytelling.
    GitHub (storydiffusion)

  7. Automated Social Science: Language Models as Scientist and Subjects; generating and testing, in silico, social scientific hypotheses.
    arXiv

  8. OpenVoice: a versatile instant voice cloning approach that requires only a short audio clip from the reference speaker to replicate their voice and generate speech in multiple languages.
    GitHub

  9. Adobe research presents VideoGigaGAN: Towards Detail-rich Video Super-Resolution. A new generative VSR model that can produce videos with high-frequency details and temporal consistency. VideoGigaGAN builds upon a large-scale image upsampler (GigaGAN).
    GitHub (videogigagan)

Technology

  1. Ray-Ban and Meta AI vision; makes possible to ask your glasses about what you’re seeing and get helpful information, completely hands-free.
    Meta AI

  2. BirdNET-Pi is a real-time acoustic bird classification system. It uses a USB sound card to pick up bird sounds, and classifies them locally using a pre-trained machine learning model. A-BiRD uses Raspberry Pi to identify different species singing at the same time using BirdNET Sound ID at the Cornell Lab of Ornithology for analysis.
    Raspberry Pi

  3. A Universal Vaccine Against Any Viral Variant? A New Study Suggests It’s Possible.
    Singularity Hub

  4. Startups Say India Is Ideal for Testing Self-Driving Cars; unruly traffic forces innovative approaches to autonomy.
    IEEE Spectrum

  5. World's biggest 3D printer whirs into action. A giant 3D printer, which is big enough to make a house, has been unveiled at the University of Maine.
    BBC News

Miscellaneous

  1. Moderna and OpenAI partner to accelerate the development of life-saving treatments.
    Open AI

  2. Eric Schmidt-backed Augment, a GitHub Copilot rival, launches out of stealth with $252M.
    TechCrunch

  3. China's Moon atlas is the most detailed ever made; The Geologic Atlas of the Lunar Globe doubles the resolution of Apollo-era maps and will support the space ambitions of China and other countries.
    Nature

  4. A Win–Win Approach: Maximizing Wi-Fi Performance Using Game Theory.
    Shibaura Institute of Technology

  5. The Rise of GQL: A New ISO Standard in Graph Query Language.
    Tiger Graph

Fun

  1. Read: Why Everything is Becoming a Game: All the better to control you.
    Gurwinder Blog

Newsletter 8 | 17 Apr 2024

AI /ML

  1. From Google DeepMind: Mixture-of-Depths: Dynamically allocating compute in transformer-based language models. Instead of routing tokens to multiple experts, you "deploy to a single expert which can be dynamically skipped".
    arXiv

  2. Wu's Method can Boost Symbolic AI to Rival Silver Medalists and AlphaGeometry to Outperform Gold Medalists at IMO Geometry. The first AI method which outperforms an IMO gold medalist.
    arXiv

  3. From Meta AI: Schedule-Free Learning - A New Way to Train in PyTorch.
    GitHub (facebookresearch)

  4. Are large language models superhuman chemists? Best models outperformed the best human chemists in our study on average.
    arXiv

  5. Study: AI writing, illustration emits hundreds of times less carbon than humans.
    KU News

  6. The Turing test of online reviews: Can we tell the difference between human-written and GPT-4-written online reviews? It seems predictions boil down to coin toss.
    Marketing Letters (Springer)

Technology

  1. Open source cryptography: Fully Homomorphic Encryption (FHE) by Zama AI.
    ZAMA

  2. GitHub’s new AI-powered tool auto-fixes vulnerabilities in your code. Code Scanning Autofix and powered by GitHub Copilot and CodeQL, it helps deal with over 90% of alert types in JavaScript, Typescript, Java, and Python.
    Bleeping Computer

  3. Blink to Generate Power for Smart Contact Lenses: a dual-mode power pack harvests energy from light and from tears.
    IEEE Spectrum

  4. Next-generation Meta Training and Inference Accelerator (MTIA).
    Meta AI

  5. This 3D printer can figure out how to print with an unknown material. The advance could help make 3D printing more sustainable, enabling printing with renewable or recyclable materials that are difficult to characterize.
    MIT News

Miscellaneous

  1. The Solution of the Zodiac Killer's 340-Character Cipher.
    arXiv

  2. NASA wants to come up with a new clock for the moon, where seconds tick away faster.
    PHYS ORG

  3. People quasi-randomly assigned to farm rice are more collectivistic than people assigned to farm wheat.
    Nature Communications

Fun

  1. Dataset: Root System Drawings: the collection holds 1,180 drawings, the outcome of 40 years of root system excavations in Europe.
    Wageningen University & Research

Newsletter 7 | 03 Apr 2024

AI /ML

  1. Claude 3 Opus leading the Elo rating of Chatbot Arena. LMSYS Chatbot Arena is a crowdsourced open platform for LLM evals.
    Hugging Face

  2. SiMBA: Simplified Mamba-Based Architecture for Vision and Multivariate Time series. SiMBA outperforms existing State Space Models (SSMs), bridging the performance gap with state-of-the-art transformers.
    arXiv

  3. Track Everything Everywhere Fast and Robustly. A substantial improvement in training speed (more than 10 times faster), robustness, and accuracy in tracking over the SoTA optimization-based method OmniMotion.
    GitHub (FastOmniTrack)

  4. ViTAR: Vision Transformer with Any Resolution. VITAR demonstrates impressive adaptability, achieving 83.3% top-1 accuracy at a 1120x1120 resolution and 80.4% accuracy at a 4032x4032 resolution, all while reducing computational costs.
    arXiv

  5. Moirai: A Time Series Foundation Model for Universal Forecasting, offering universal forecasting capabilities.
    Salesforce AI Research

  6. NonlinearSolve.jl: High-Performance and Robust Solvers for Systems of Nonlinear Equations in Julia.
    arXiv

  7. Hybrid-Net: Real-time audio source separation, generate lyrics, chords, beat.
    GitHub (DoMusic)

  8. AI generates high-quality images 30 times faster in a single step. Novel method makes tools like Stable Diffusion and DALL-E-3 faster by simplifying the image-generating process to a single step while maintaining or enhancing image quality.
    MIT News

  9. PERL: Parameter Efficient Reinforcement Learning from Human Feedback; reward model training and reinforcement learning using LoRA from Google Research.
    arXiv

  10. From Google DeepMind, Vid2Robot: End-to-end Video-conditioned Policy Learning with Cross-Attention Transformers. Vid2Robot understands the task from videos and can perform in unseen settings.
    GitHub (vid2robot)

  11. Towards 1-bit Machine Learning Models. Recent works on extreme low-bit quantization such as BitNet and 1.58 bit have attracted a lot of attention in the machine learning community. The main idea is that matrix multiplication with quantized weights can be implemented without multiplications, which can potentially be a game-changer in terms of compute efficiency of large machine learning models.
    GitHub (1bit_blog)

Technology

  1. Inkjets are for more than just printing, they can build DNA arrays, 3D structures, and much more.
    IEEE Spectrum

  2. ‘A landmark moment’: scientists use AI to design antibodies from scratch. Modified protein-design tool could make it easier to tackle challenging drug targets — but AI antibodies are still a long way from reaching the clinic.
    Nature

  3. Engineers find a new way to convert carbon dioxide into useful products.
    MIT news

  4. You can make a song for any moment in any major language with just a few short words. v3 is the first model capable of producing radio-quality music.
    Suno AI

  5. WindSpider launches a step-changing crane for use on increasingly larger wind turbines. The self-erecting crane has no weight or height limitations and can be used in very windy locations. The new solution significantly reduces the wind turbine’s life cycle costs for the global wind industry.
    Wind Spider

Miscellaneous

  1. Algorithmic improvement is a key factor driving the advance of AI. An analysis showing that since 2012 the amount of compute needed to train a neural net to the same performance on ImageNet classification has been decreasing by a factor of 2 every 16 months. Compared to 2012, it now takes 44 times less compute to train a neural network to the level of AlexNet (by contrast, Moore’s Law would yield an 11x cost improvement over this period). Our results suggest that for AI tasks with high levels of recent investment, algorithmic progress has yielded more gains than classical hardware efficiency.
    OpenAI

  2. AI-generated food images look tastier than real ones.
    University of Oxford News

  3. Memories are made by breaking DNA and fixing it; Nerve cells form long-term memories with the help of an inflammatory response, study in mice finds.
    Nature

  4. These Plants Could Mine Valuable Metals From the Soil With Their Roots.
    Singularity Hub

  5. Start using ChatGPT instantly, making it easier for people to experience the benefits of AI without needing to sign up. It also means protection against any upcoming age verification laws.
    OpenAI

Newsletter 6 | 20 Mar 2024

AI /ML

  1. New algorithm unlocks high-resolution insights for computer vision. FeatUp, developed by MIT CSAIL researchers, boosts the resolution of any deep network or visual foundation for computer vision systems.
    MIT News & Mark T. Hamilton (Paper)

  2. A generalist AI agent for 3D virtual environments. Google DeepMind present new research on a Scalable Instructable Multiworld Agent (SIMA) that can follow natural-language instructions to carry out tasks in a variety of video game settings.
    Google DeepMind

  3. Introducing the next generation of Claude; which sets new industry benchmarks across a wide range of cognitive tasks. All Claude 3 models show increased capabilities in analysis and forecasting, nuanced content creation, code generation, and conversing in non-English languages like Spanish, Japanese, and French.
    Anthropic

  4. Cappy: Outperforming and boosting large multi-task language models with a small scorer. Cappy as a pre-trained model can potentially be used in other creative ways beyond on single LLMs.
    Google Research

  5. Stealing Part of a Production Language Model. It was possible to steal part of OpenAI’s ChatGPT or Google’s PaLM-2 (up to an affine transformation) by making queries to their public APIs. It was a known vulnerability of deployed ML models since 2016; "Stealing Machine Learning Models via Prediction APIs".
    GitHub (not-just-memorization) & arXiv

  6. Quiet-STaR: Language Models Can Teach Themselves to Think Before Speaking. A generalization of Self-Taught Reasoner (STaR) in which LMs learn to generate rationales at each token to explain future text, improving their predictions.
    arXiv

  7. Design2Code: How Far Are We From Automating Front-End Engineering. An open-source Design2Code-18B model that successfully matches the performance of Gemini Pro Vision.
    GitHub(salt-nlp)

  8. Large language models surpass human experts in predicting neuroscience results. LLMs trained on the vast scientific literature could potentially integrate noisy yet interrelated findings to forecast novel results better than human experts.
    arXiv

  9. Amazon reveals Chronos: Learning the Language of Time Series. A simple yet effective framework for pretrained probabilistic time series models.
    arXiv

  10. Using generative AI to improve software testing; MIT spinout DataCebo helps companies bolster their datasets by creating synthetic data that mimic the real thing.
    MIT News

  11. Nvidia Unveils Blackwell, Its Next GPU; a big boost in AI training performance, an even bigger one for AI inference.
    IEEE Spectrum

Technology

  1. Nvidia Announces GR00T, a Foundation Model For Humanoids. GR00T is intended to provide a starting point for specific humanoid robots to do specific tasks.
    IEEE Spectrum

  2. 3D microprinter hacked to fabricate transistors for bioelectronics. The speed of innovation in bioelectronics and critical sensors gets a new boost with the unveiling of a technique for fast-prototyping of devices.
    KTH

  3. Giant "sand battery" holds a week's heat for a whole town. It packs 1 MW of power and a capacity of up to 100 MWh of thermal energy for use during those cold polar winters.
    New Atlas

  4. Figure AI presents full conversations with Figure 01 with OpenAI partnership.
    Twitter (coreylynch)

  5. Mercedes Hires Humanoid Robots to Work at Its Factories. Apptronik's Apollo robots are 5'8 in height and will complete manual labor tasks like bringing parts to the Mercedes-Benz assembly line.
    PC Magazine

Miscellaneous

  1. Bumblebees socially learn behaviour too complex to innovate alone. Social learning might permit the acquisition of behaviours too complex to ‘re-innovate’ through individual learning.
    Nature

  2. Adaptive immune responses are larger and functionally preserved in a hypervaccinated individual. What happens if vaccinated 217 times against SARS-CoV-2 within a period of 29 months?
    The Lancet

  3. Apple’s AI ambitions could include Google or OpenAI; The iPhone-maker is in ‘active’ talks to bring Gemini to the iPhone, and has also considered using ChatGPT.
    The Verge

  4. Tick-killing pill shows promising results in human trial; the pill would be a new weapon against Lyme disease.
    Ars Technica

Fun

  1. Paper: "Certainly, here is a possible introduction for your topic" in scientific paper... Check first line of the introduction!
    Elsevier (Surfaces and Interfaces)

  2. Movie List: The Mathematical Movie Database.
    QEDCAT

Newsletter 5 | 06 Mar 2024

AI /ML

  1. Stable Diffusion 3 in early preview, text-to-image model with greatly improved performance in multi-subject prompts, image quality, and spelling abilities. While the model is not yet broadly available, today, you can subscribe the waitlist for an early preview. New Multimodal Diffusion Transformer (MMDiT) architecture uses separate sets of weights for image and language representations, which improves text understanding and spelling capabilities compared to previous versions of SD3. Stable Diffusion 3 outperforms state-of-the-art text-to-image generation systems such as DALL·E 3, Midjourney v6, and Ideogram v1 in typography and prompt adherence, based on human preference evaluations.
    Stability AI & Stability AI Research Paper

  2. YOLOv9 is here; "Learning What You Want to Learn Using Programmable Gradient Information“. New version introduces concept of Programmable Gradient Information (PGI) and Generalized Efficient Layer Aggregation Network (GELAN). It can be used to obtain complete information, so that train-from-scratch models can achieve better results than state-of-the-art models pre-trained using large datasets.
    arXiv: Computer Science & GitHub: WongKinYiu

  3. Google has a new 'woke' AI problem with Gemini — and it's going to be hard to fix. The latest version of Google's Gemini artificial intelligence (AI) will frequently produce images of Black, Native American and Asian people when prompted – but refuses to do the same for White people. It wouldn't promote meat; and said it wouldn't help promote fossil fuels... How about the historical facts? George Washington displayed as a black person. In my opinion, Generative AI should be actually unbiased and it should not skew numbers in either direction, or for anyone.
    Business Insider & Fox Business

  4. Gemini Meme

  5. Video ReCap: Recursive Captioning of Hour-Long Videos from Meta AI. A model that can process video inputs of dramatically different lengths (from 1 second to 2 hours) and output video captions at multiple hierarchy levels.
    Video ReCap (Google Sites)

  6. From Google DeepMind: Genie (Generative Interactive Environments). Genie is a foundation world model trained from Internet videos that can generate an endless variety of playable (action-controllable) worlds from synthetic images, photographs, and even sketches.
    Google DeepMind

  7. LENS Project: a tool is to provide a quick overview of the main concepts (dictionary of features) employed by a large vision model. Promising development in explainable AI (XAI).
    GitHub: Serre Lab

  8. Elon Musk sues OpenAI over AI threat: OpenAI is not so open now, Musk claims, following the closed-source release of the company's artificial general intelligence technology under Microsoft.
    Courthouse News Service

  9. Approaching Human-Level Forecasting with Language Models. "On average, the system nears the crowd aggregate of competitive forecasters, and in some settings surpasses it. Our work suggests that using LMs to forecast the future could provide accurate predictions at scale and help to inform institutional decision making.".
    arXiv

Technology

  1. Samsung unveils the Galaxy Ring as a way to 'simplify everyday wellness'. The Galaxy Ring will be part of the Samsung Health ecosystem and be compatible with the Galaxy Watch. We'll learn more in the months ahead, including the exact sensor suite, pricing and sale date.
    engadget

  2. A new optical metamaterial makes true one-way glass possible. "Just imagine having a window with that glass in your house, office, or car. Regardless of the brightness outside, people wouldn’t be able to see anything inside, while you would enjoy a perfect view from your window".
    Aalto University

  3. New time crystal stable for more than 40 minutes: Nobel Prize on the way? A team of physicists has now built the most robust time crystal ever using solid-state physics.
    YouTube: Science News

  4. Mind-reading devices are revealing the brain’s secrets. Implants and other technologies that decode neural activity can restore people’s abilities to move and speak — and help researchers to understand how the brain works.
    Nature

  5. Your fingerprints can be recreated from the sounds made when you swipe on a touchscreen — Chinese and US researchers show new side channel can reproduce fingerprints to enable attacks. Researchers claim they can successfully attack up to 27.9% of partial fingerprints.
    Tom's Hardware

Miscellaneous

  1. The startup Figure AI Inc. appears to be in the center of attention soon with developing human-like robots. Jeff Bezos and Nvidia join OpenAI and Microsoft in backing a humanoid robot unicorn valued at $2 billion.
    Fortune & Twitter: Figure_robot

  2. Although not so new, the Apache Superset is evolving to be the de facto open-source modern data exploration and visualization platform. Whether it will dethrone Tableau or not, is still debatable.
    Apache Superset & MergeYourData

  3. Mounting research shows that COVID-19 leaves its mark on the brain, including with significant drops in IQ scores. Those who had mild and resolved COVID-19 showed cognitive decline equivalent to a three-point loss of IQ.
    The Conversation

  4. Good News: Small Nuclear Thorium Reactors are Coming to Europe.
    Science News

  5. Apple to Wind Down Electric Car Effort After Decadelong Odyssey, employees on some car teams will move to Apple’s AI division (generative AI).
    Bloomberg

  6. Spontaneous playful teasing in four great ape species; new research shows that as playful teasing is present in all extant great ape genera, it is likely that the cognitive prerequisites for joking evolved in the hominoid lineage at least 13 million years ago.
    Proceedings of the Royal Society B

Fun

  1. Tutorial: Learn to read Korean in 15 minutes.
    Ryan Estrada

  2. Package: Daft is a distributed query engine for large-scale data processing in Python and is implemented in Rust. Give it a try over Pandas.
    GitHub: Eventual Computing

Newsletter 4 | 21 Feb 2024

AI /ML

  1. Sora is an AI model that can create realistic and imaginative scenes from text instructions (for now, only a minute of high fidelity video). Technically; text-conditional diffusion models jointly on videos and images of variable durations, resolutions and aspect ratios. "If you think OpenAI Sora is a creative toy like DALLE, ... think again. Sora is a data-driven physics engine. It is a simulation of many worlds, real or fantastical. The simulator learns intricate rendering, "intuitive" physics, long-horizon reasoning, and semantic grounding, all by some denoising and gradient maths."
    Twitter (DrJimFan) & OpenAI & YouTube (Two Minute Papers)

  2. Google's "Bard" recently became "Gemini". Gemini 1.5 is the next generation model that delivers dramatically enhanced performance with long-context understanding across modalities up to 1M tokens (meaning 1h video or 30k lines code) via new Mixture-of-Experts (MoE) architecture. As an impressive example; Gemini 1.5 learns to translate from English to Kalamang language purely in context, following a full linguistic manual at inference time. Kalamang is a language spoken by fewer than 200 speakers in western New Guinea. It appears that Gemini 1.5 overperform OpenAI GPT-4 in SOTA.
    Google (Blog) & Google (Blog)

  3. From Google Research "Grandmaster-Level Chess Without Search". The researchers took 10M human-human chess games from an online chess arena, then asked the Stockfish 16 engine for its estimate of the "winning probability" for this board+move, based on Stockfish's analysis of up to 0.05 seconds. They then trained a transformer-based model whose input is the board position+move, and the target output is the winning probability of the move.
    arXiv & GitHub Gist (yoavg)

  4. LLM Agents can Autonomously Hack Websites. Using GPT-4, it is possible to hack websites, performing tasks as complex as blind database schema extraction and SQL injections without human feedback.
    arXiv

  5. Lag-Llama: Towards Foundation Models for Probabilistic Time Series Forecasting: first open-source foundation model for time series forecasting.
    arXiv & GitHub (lag-llama)

  6. How symmetry can come to the aid of machine learning; Exploiting the symmetry within datasets can decrease the amount of data needed for training neural networks.
    MIT News

  7. A Human-Inspired Reading Agent with Gist Memory of Very Long Contexts from Google DeepMind.
    GitHub (read-agent)

  8. Video Joint Embedding Predictive Architecture (V-JEPA): The next step toward Yann LeCun’s vision of advanced machine intelligence (AMI) by Meta AI. Yet another novelty in teaching machines to understand and model the physical world just by watching videos.
    Meta AI

  9. Universal Manipulation Interface (UMI) In-The-Wild Robot Teaching Without In-The-Wild Robots. It is a framework that enables learning capable and generalizable manipulation policies directly from in-the-wild human demonstrations.
    GitHub (umi-gripper)

  10. SERL: A Software Suite for Sample-Efficient Robotic Reinforcement Learning. Ready-to-use software suite for robotic RL, trains policies in just 25 to 50 minutes, outperforming previous benchmarks with high success rates and robustness.
    GitHub (serl-robot)

  11. Tiny Quadrotor Learns to Fly in 18 Seconds NYU and TII researchers get robots into the air with fast simulations on a consumer laptop.
    IEEE Spectrum

Technology

  1. A team of University of Wisconsin–Madison scientists has developed the first 3D-printed brain tissue that can grow and function like typical brain tissue. First 3D-printed functional human brain tissue grows like the real thing.
    University of Wisconsin–Madison & New Atlas

  2. Mapping the Brain: Google Research is making exciting advances on understanding how our own brains work and how we think.
    YouTube (Google Brain)

  3. Europe’s deepest mine to become giant gravity battery; potential to store up to 2 MW of energy.
    Independent

  4. Smartphone screens are about to become speakers. Piezoelectrics enable displays to provide both high-quality audio and touch feedback.
    IEEE Spectrum

  5. New way of harvesting renewable energy, 1.2 MW tidal kite is now exporting power to the grid.
    New Atlas

  6. $5 device accurately tests for breast cancer in under 5 seconds.
    Journal of Vacuum Science & Technology B

Miscellaneous

  1. Be careful: Deepfake fraud is already here. It’s becoming more common, more convincing, and more dangerous.
    Twitter (Control AI)

  2. Each GPT costs between 25x and 100x the last one and Sam Altman Wants $7 Trillion. Although it is unlikely that he can get this amount, the GPT-8 appears to be impossible to build with the current state of hardware.
    Astral Codex Ten

  3. Turbocharged CAR-T cells melt tumours in mice - using a trick from cancer cells (Immune cells armed with a mutation first identified in cancer cells gain potency but don’t turn cancerous themselves).
    Nature

  4. AI Is Learning to Decode the "Language" of Chickens; opening doors to a new era in animal-human interaction.
    Singularity Hub

  5. How long is “forever”? When it comes to digital media, forever could be as close as a couple of months away. Sony is erasing digital libraries that were supposed to be accessible “forever”.
    arsTECHNICA

Fun

  1. Post: Introduction to Thompson Sampling: the Bernoulli bandit.
    GitHub (gdmarmerola)

  2. Project: Explore interesting places nearby listed on Wikipedia.
    NearbyWiki

Newsletter 3 | 07 Feb 2024

  1. A decoder-only foundation model for time-series forecasting. Google introduce TimesFM, a single forecasting model pre-trained on a large time-series corpus of 100 billion real world time-points. Compared to the latest large language models (LLMs), TimesFM is much smaller (200M parameters), yet we show that even at such scales, its zero-shot performance on a variety of unseen datasets of different domains and temporal granularities come close to the state-of-the-art supervised approaches trained explicitly on these datasets.
    Google Research & arXiv

  2. Generative expressive robot behaviors using large language models by Google Deepmind.
    GitHub (Generative Expressive Motion)

  3. A Benchmark for Real-World Planning with Language Agents by Meta AI; “Comprehensive evaluations show that the current language agents are not yet capable of handling such complex planning tasks-even GPT-4 only achieves a success rate of 0.6%. Language agents struggle to stay on task, use the right tools to collect information, or keep track of multiple constraints.”
    GitHub (OSU NLP Group)

  4. MobileDiffusion: Rapid text-to-image generation on-device. Google introduce a novel approach with the potential for rapid text-to-image generation on-device. MobileDiffusion is an efficient latent diffusion model specifically designed for mobile devices.
    Google Research

  5. LUMIERE: A Space-Time Diffusion Model for Video Generation for synthesizing videos that portray realistic, diverse and coherent motion -- a pivotal challenge in video synthesis.
    GitHub (Google Research)

  6. DeepSeek Coder comprises a series of code language models trained from scratch on both 87% code and 13% natural language in English and Chinese, with each model pre-trained on 2T token over more than 80 programming language. State-of-the-Art performance among open code models while open source and free for research and commercial use.
    GitHub

  7. SUPIR: Revolutionizing image restoration with cutting-edge large-scale AI. Text-driven, intelligent restoration, blending AI technology with creativity to give every image a brand new life.
    XPixel

  8. OK-Robot: An open, modular framework for zero-shot, language conditioned pick-and-drop tasks in arbitrary homes.
    GitHub

  9. Roboflow introduce Supervision: open-source toolkit for any computer vision project. Whether you want to process a video, draw a detection on a frame, or convert labels from one format to another, Supervision includes easy to use scripts.
    GitHub (roboflow)

  10. Can AI Unlock the Secrets of the Ancient World? Vesuvius Challenge to solve the ancient problem of the Herculaneum Papyri, a library of scrolls that were flash-fried by the eruption of Mount Vesuvius in 79 AD. Today we are overjoyed to announce that our crazy project has succeeded. After 2000 years, we can finally read the scrolls.
    Bloomberg (Archive.is)

  11. Next frontier in AI: Learning World Models. Path to artificial general intelligence (AGI) is leading to AI systems that builds an internal representation of an environment, and uses it to simulate future events within that environment.
    YouTube (Elicit)

  12. Programming light propagation creates highly efficient neural networks. Programming light propagation creates highly efficient neural networks.
    Society for Optics and Photonics (SPIE)

  13. Researchers Approach New Speed Limit for Seminal Problem: Integer linear programming. Now researchers have found a much faster way to do it.
    Quanta Magazine

  14. How AI is changing gymnastics judging? Proponents say the AI-powered Judging Support System will promote fairness and transparency in the sport.
    MIT Technology Review

  15. Invasive cervical cancer incidence following bivalent human papillomavirus vaccination: a population-based observational study of age at immunization, dose, and deprivation. Analysis from Scotland shows for women vaccinated at 12 or 13 years of age, there is no prevalance of cervical cancer.
    The Journal of the National Cancer Institute

  16. Fiber Optics Bring You Internet. Now They’re Also Listening to Trains. Distributed acoustic sensing already applied to detect earthquakes and insects. It appears that the applications grow rapidly.
    WIRED (Archive.is)

  17. AI model flags high-risk pancreatic cancer patients 18 months before diagnosis. Novel approach caught 3.5 times as many cases than current screening guidelines would have for 40-plus group.
    The Harvard Gazette

  18. Detecting the future of pandemics, sequencing wastewater could be promising: It allows us to monitor millions of people’s disease status at a single site; Inferring the sensitivity of wastewater metagenomic sequencing for pathogen early detection.
    medRxiv

  19. Fewer and faster: Global fertility isn't just declining, it's collapsing. If you’re a Millennial or a younger Gen Xer, you’ll probably see the start of a long-term decline in human population due to the global collapse in fertility. That’s something that’s never happened before with Homo sapiens.
    Substack (fasterplease)

  20. Researchers demonstrate rapid 3D printing with liquid metal. Their new technique can produce furniture-sized aluminum parts in only minutes.
    MIT News

  21. A couple of new features are coming to Google Search, starting with the self-explanatory Circle to Search — but only on a handful of Android phones. Now, you’ll be able to add complex questions to refine your visual search. For example, you can take a picture of a plant, add it to your search, and ask, “How often should I water this?”
    The Verge

  22. Twin Labs automates repetitive tasks by letting AI take over your mouse cursor such as reordering items when you’re running out of stock, downloading financial reports across several SaaS products, reaching out to potential prospects and more.
    TechCrunch

  23. Fun Project: Plato. Want to learn something new? Turn your YouTube addiction into a fun learning game.
    Plato Education

Newsletter 2 | 24 Jan 2024

  1. Fingerprint biometrics are integral to digital authentication and forensic science. However, they are based on the unproven assumption that no two fingerprints, even from different fingers of the same person, are alike. Contrary to this prevailing assumption, this study shows above 99.99% confidence that fingerprints from different fingers of the same person share very strong similarities.
    Science Advances

  2. PEDS: a new technique could efficiently solve partial differential equations for numerous applications.
    MIT News

  3. New EU project NGI TALER will bring private and secure online payments to the Eurozone. An innovative electronic payment system for the greater benefit of European citizens, merchants, and banks. This payment system is different from current online payment methods, like credit cards or bank transfers, in that it offers privacy for the buyer: neither merchants nor banks can trace or link the payments.
    TALER

  4. AlphaGeometry: An Olympiad-level AI system for geometry. An AI system that surpasses the state-of-the-art approach for geometry problems, advancing AI reasoning in mathematics by Google DeepMind.
    Google: DeepMind

  5. Researchers Claim First Functioning Graphene-Based Chip. The semiconductor bests silicon alternatives for electron mobility. The silicon as a semiconducter reaching its limits (Moore's law is dead?), the graphene-based chip (graphene is not semiconductor as material) has huge potential.
    IEEE Spectrum & YouTube: Science News

  6. Researchers think their AI system could help to democratize medicine. Google AI has better bedside manner than human doctors — and makes better diagnoses.
    Nature

  7. Cloned rhesus monkey lives to adulthood for the first time. A method that provides cloned embryos with a healthy placenta could pave the way for more research involving primates.
    Nature

  8. Bill Gates's opinion on 2024; lifesaving chatbots, 2024 election, malnutrition breakthrough, AI-powered innovation, climate conversation and more.
    The Blog of Bill Gates

  9. Microsoft Adds AI Key in First Change to PC Keyboard in Decades. The new Copilot button is the first addition since the Windows key in 1994.
    Bloomberg Technology

  10. The World's first-ever smart binoculars can identify 9,000 birds thanks to built-in AI.
    Digital Camera World

  11. Version 14 of Wolfram Language and Mathematica brings new functions and updates.
    Stephen Wolfram: Writings

  12. Marimo: a recent attempt on reactive Python notebooks. . It allows you to rapidly experiment with data and models, code with confidence in your notebook's correctness, and productionize notebooks as pipelines or interactive web apps.
    GitHub: marimo

  13. As the large multi-modal networks arise and they are hungry for data, the protection of the intellectual property becomes shady. A new attempt to help artists prevent their content fed into generative AI models is Nightshade. It is a tool that turns any image into a data sample that is unsuitable for model training. More precisely, Nightshade transforms images into "poison" samples, so that models training on them without consent will see their models learn unpredictable behaviors that deviate from expected norms.
    Sand Lab, University of Chicago

  14. From Meta AI, a new approach to enhancing the language models; Self-Rewarding Language Models. The language model itself is used via LLM-as-a-Judge prompting to provide its own rewards during training. Although preliminary research, it already outperforms many existing systems on the AlpacaEval 2.0 leaderboard, including Claude 2, Gemini Pro, and GPT-4.
    arXiv: Self-Rewarding Language Models

  15. Mark Zuckerberg’s new goal is to create artificial general intelligence with 600,000 GPUs by the end of 2024.
    The Verge

  16. For Android users: Google is making changes to Google Assistant by removing some underutilized features in Google Assistant to focus on delivering the best possible user experience.
    Google: Products

  17. After the boom in AI, robotics appears to be booming in the following year. Toyota's Robots Are Learning to Do Housework—By Copying Humans.
    Wired Business (Archive)

  18. Fun application: a periodic table of visualization methods.
    Visual Literacy

Newsletter 1 | 10 Jan 2024

  1. Last year was the breakthrough year for Large Language Models (LLMs), check out the round up the highlights in one blogpost.
    Simon Willison’s Weblog

  2. To showcase evolution of AI in 2023, check out the a visual timeline, highlighting the most remarkable AI advancements that have shaped this year of AI.
    Everypixel Journal

  3. Using AI, MIT researchers identify a new class of antibiotic candidates. These compounds can kill methicillin-resistant Staphylococcus aureus (MRSA), a bacterium that causes deadly infections.
    MIT News

  4. New research shows that even subtle changes to digital images, designed to confuse computer vision systems, can also affect human perception.
    Google: DeepMind

  5. Microsoft Phi-2 transformer model becomes open source (MIT Licence). Phi-2 model is a 2.7 billion-parameter language model that demonstrates outstanding reasoning and language understanding capabilities, showcasing state-of-the-art performance among base language models with less than 13 billion parameters.
    Huggingface: Phi-2

  6. Apple enters the game with ml-ferret and open source a new multimodal large language model (MLLM), capable of understanding spatial referring of any shape or granularity within an image and accurately grounding open-vocabulary descriptions. In English, Mac users soon command, for example zoom to the X object in the Y side of the screen.
    GitHub: ml-ferret

  7. Are you a heavy user of Python Pandas library, and things get slow after a million rows? Check out the RAPIDS cuDF for upto 150x performance improvement.
    YouTube: Tech With Tim

  8. Homomorphic encryption is a form of encryption that allows computations to be performed on encrypted data without first having to decrypt it. Although, it was software-level so far, chips to compute with encrypted data is on the way for fully homomorphic encryption (unhackable data!?).
    IEEE: Spectrum

  9. OpenAI will open its custom ChatGPT store next week. The store to share and sell custom AI agents will launch after being delayed for a month.
    The Verge

  10. New meta-analysis: On average, undergraduate students' intelligence is merely average. The results show that the average IQ of undergraduate students today is a mere 102 IQ points and declined by approximately 0.2 IQ points per year.
    Frontiers in Psychology

  11. Searching the Internet for information sucks, and things getting worse. How bad are search results? Let's compare Google, Bing, Marginalia, Kagi (recent favorite engine of tech community), Mwmbl, and ChatGPT.
    Danluu blog

  12. The most popular programming languages since 1965 to 2022 (video).
    X (old Twitter)

  13. The Splatter Image is an ultra-fast method for single- and few-view 3D reconstruction. It can be trained using only 1 GPU, with reconstruction is done at 38 FPS.
    GitHub: Splatter Image

  14. More and more unified frameworks are coming to the field, one interesting example is GLEE. It is an object-level foundation model for locating and identifying objects in images and videos. Through a unified framework, GLEE accomplishes detection, segmentation, tracking, grounding, and identification of arbitrary objects in the open world scenario for various object perception tasks.
    GitHub: GLEE

  15. Following EfficientSAM, another improvement on the segment anything model (SAM). TinySAM: Pushing the Envelope for Efficient Segment Anything Model.
    arXiv: Computer Vision and Pattern Recognition

  16. Fun project: Suprised by the outcomes of a Python snippet; check out this fun project attempting to explain what exactly is happening under the hood for some counter-intuitive snippets and lesser-known features in Python.
    GitHub: wtfpython