Xinze Zhang
arXiv:2606.09860v1 Announce Type: new Abstract: Non-alcoholic fatty liver disease (NAFLD) affects roughly 25% of global adults, posing substantial hepatic and cardiovascular risks. Yet, population-...
preprint
Juan M. Huerta
arXiv:2606.09877v1 Announce Type: new Abstract: LLM wiki systems compile knowledge into pre-filled KV caches for efficient inference, but assume a static corpus -- an assumption that fails whenever...
preprint
Ningyuan Xi, Hao Xu, Hongsheng Xin, Ning Miao
arXiv:2606.09883v1 Announce Type: new Abstract: Large language models (LLMs) have made remarkable progress in reasoning tasks, largely driven by post-training paradigms, especially reinforcement le...
preprint
Zirui Liu, Jie Ouyang, Qi Liu, Xianquan Wang, Jiayu Liu, Tingyue Pan, Qingchuan Li, Jing Sha, Zhenya Huang, Shijin Wang, Enhong Chen
arXiv:2606.09887v1 Announce Type: new Abstract: Reinforcement learning (RL) for large language models usually supervises reasoning with scalar outcome rewards, such as binary correctness. Such rewa...
preprint
Sujoy Banik, Sayantan Chakraborty, Boishakhi Das Toma, Zainab Ghafoor, Ushashi Bhattacharjee, Koushik Howlader, Tirtho Roy
arXiv:2606.09898v1 Announce Type: new Abstract: Cancer treatment planning requires decisions across multiple clinical dimensions at once. Clinicians must determine whether a patient should receive ...
preprint
Maxx Richard Rahman, Prakhar Kumar, Wolfgang Maass
arXiv:2606.09907v1 Announce Type: new Abstract: Multimodal clinical learning is increasingly important for integrating diverse patient data, including imaging, text, and personalised health records...
preprint
Michael Chin
arXiv:2606.09923v1 Announce Type: new Abstract: Neural operators such as the Fourier Neural Operator (FNO) have emerged as powerful surrogates for solving partial differential equations (PDEs), ach...
preprint
Vadim Popov, Wenju Gu, Tasnima Sadekova, Georgii Aparin, Assel Yermekova
arXiv:2606.09962v1 Announce Type: new Abstract: Continuous diffusion for categorical data is a framework belonging to the diffusion family and aiming at generating discrete data. The scientific int...
preprint
Michael Yu, Matthew L. Olson
arXiv:2606.10080v1 Announce Type: new Abstract: Generative models have shown remarkable progress in a variety of domains such as protein design, but such power enables the opaque generation of haza...
preprint
Zhen Qin, Yang Chen
arXiv:2606.10085v1 Announce Type: new Abstract: Matrix-valued time series arise in a wide range of applications, such as spatio-temporal data from medical imaging and geophysics. Existing methods a...
preprint
Mashrur M. Morshed, Vishnu Naresh Boddeti
arXiv:2606.10153v1 Announce Type: new Abstract: Learning the compositional nature of the physical world requires joint observation of interacting factors. However, because practical data is often d...
preprint
Muhammad Umer Sheikh, Hassan Abid, Khawar Shehzad, Ufaq Khan, Muhammad Haris Khan
arXiv:2606.10194v1 Announce Type: new Abstract: Climate change research increasingly requires AI systems that reason across text, dynamic visual content, and scientific figures, yet existing climat...
preprint
Muhammad Ahmed
arXiv:2606.10435v1 Announce Type: new Abstract: Transformers achieve strong language modeling performance by providing direct token-to-token communication paths, but causal self-attention scales qu...
preprint
Haorui Wang, Parshin Shojaee, Kazem Meidani, Kunyang Sun, Jos\'e Miguel Hern\'andez-Lobato, Teresa Head-Gordon, Jiajun He, Chandan K. Reddy, Chao Zhang, Yuanqi Du
arXiv:2606.10587v1 Announce Type: new Abstract: Large language models (LLMs) are on the rise for accelerating scientific discovery, most recently in advanced tasks such as generating valid scientif...
preprint
Junbo Ding, Xin Zang, Chenchen Pan, Donghao Song, Jiaxin Zhu, Danhuai Guo
arXiv:2606.10632v1 Announce Type: new Abstract: Lipschitz-style individual fairness formalizes the idea that semantically similar examples should receive similar predictions, but its evaluation in ...
preprint
Emma Kasteleyn, Timo Maier, Axel Lauer, Veronika Eyring, Pierre Gentine, Ana Lucic
arXiv:2606.10642v1 Announce Type: new Abstract: Machine learning weather prediction (MLWP) models have achieved impressive forecasting performance at a small fraction of the computational costs req...
preprint
Fateme Mohammad Mohammadi, Hector Budman, Joshua L. Pulsipher
arXiv:2606.10682v1 Announce Type: new Abstract: While physics-informed neural networks (PINNs) have shown strong potential for process modeling, physical equations are only enforced as soft constra...
preprint
Olga Shakhmatova, Dmitrii Kriukov, Daniil Larionov, Nikita Khromov, Iaroslav Bespalov, Alexander Zolotarev, Kirill Grishchenkov, Ekaterina Ivanova, Miron Kuznetsov, Ilya Sochenkov, Elizaveta Panchenko, Artem Shelmanov, Dmitry V. Dylov
arXiv:2606.10725v2 Announce Type: new Abstract: Background. Atrial fibrillation (AF) is the most prevalent cardiac arrhythmia and a major determinant of prognosis. Established AF risk scores rely o...
preprint
Naoki Nonaka, Jun Seita
arXiv:2606.10802v1 Announce Type: new Abstract: Deep Neural Networks (DNNs) typically require extensive datasets for effective training. In the medical domain, acquiring large-scale data is often c...
preprint
Gal Bloch, Ariel Gera, Matan Orbach, Ohad Eytan, Assaf Toledo
arXiv:2606.10896v1 Announce Type: new Abstract: We present \textbf{Flash-GMM}, a fused Triton kernel for efficient computation of Gaussian Mixture Models (GMMs) over large-scale data in a single GP...
preprint
Dennis Wu, Yi-Chun Hung, Braden Yuille, James E. Fitzgerald, Han Liu
arXiv:2606.10238v1 Announce Type: new Abstract: Neural population geometry shapes downstream computation. Recent empirical findings in neurobiology suggest that a hyperbolic structure underlies pop...
preprint
Stanis{\l}aw Nar\k{e}bski, Tomasz Komendzi\'nski, Tomasz M. Rutkowski
arXiv:2606.10889v1 Announce Type: new Abstract: Early detection of neurodegeneration remains a critical clinical challenge. This study investigates whether sleep EEG signal criticality, quantified ...
preprint
Xiangsheng Ge, Yang Xie
arXiv:2606.11066v1 Announce Type: cross Abstract: Neural population activity models can recover rich temporal structure from binned spikes, but their read-in and readout layers often remain tied to...
preprint
Sovesh Mohapatra, Christoffer G. Alexandersen, Panagiotis Fotiadis, Max B. Kelz, John A. Detre, Fabio Pasqualetti, Dani S. Bassett
arXiv:2606.11091v1 Announce Type: cross Abstract: Network control theory can be used to model intrinsic and extrinsic strategies to steer neural dynamics. Standard approaches are node-centric, stru...
preprint
Abhijoy Sarkar, Aarchi Singh Thakur
arXiv:2606.11144v1 Announce Type: cross Abstract: Resistance to first-line osimertinib in EGFR-mutant non-small-cell lung cancer (NSCLC) is the canonical example of predictable clonal evolution und...
preprint
Stefano De Carli, Nicola Licini, Davide Previtali, Fabio Previdi, Antonio Ferramosca
arXiv:2503.19158v3 Announce Type: replace-cross Abstract: Type 1 Diabetes (T1D) management is a complex task due to many variability factors. Artificial Pancreas (AP) systems have alleviated patien...
preprint
Florian P. Mahner, Ka Chun Lam, Francisco Pereira, Martin N. Hebart
arXiv:2605.26921v2 Announce Type: replace-cross Abstract: The study of representations is widespread across fields, including neuroscience, psychology, and artificial intelligence. While representa...
preprint
Sahil Rahman, Maxx Richard Rahman
arXiv:2606.02386v2 Announce Type: replace-cross Abstract: Protein language models (PLMs) are passive oracles: they generate sequences in a single forward pass with no mechanism to consult external ...
preprint
Zambaldi et al.
AlphaProteo generates novel protein binders with state-of-the-art binding affinities across diverse targets.
protein-design binder deepmind
bioRxiv
CURATED
2024-06-25 Hayes et al.
A multimodal generative language model that reasons over the sequence, structure, and function of proteins.
protein-lm generative foundation-model
Nature
CURATED
2024-05-08 Abramson et al.
AlphaFold 3 can predict the joint structure of complexes including proteins, nucleic acids, small molecules, ions, and modified residues.
structure-prediction protein deepmind
Nature
CURATED
2023-07-11 Watson et al.
A structure denoising diffusion probabilistic model for protein backbone generation.
protein-design diffusion baker-lab
ICLR 2023
CURATED
2023-02-01 Corso et al.
A diffusion generative model over the non-Euclidean manifold of ligand poses for molecular docking.
docking diffusion drug-discovery
Zhao, Y., Cai, Q., Chen, D., Chen, J.
Datasets in the Gene Expression Omnibus (GEO) remain difficult to reuse at scale because sample annotations are heterogeneous and raw sequencing data require assay-specific preprocessing. We presen...
preprint
Jing, B., Bafna, M., Diaz, D. J., Klivans, A. R., Berger, B.
Generative models have become staple tools for modeling and designing biomolecular structures. However, although these tools have improved in structural prediction accuracy, their ability to filter...
preprint
Nourisa, J., Passemiers, A., Moreau, Y., Raimondi, D.
Batch correction is essential for integrating datasets and enabling population-level insights into health and disease. Embedding-based approaches are among the most widely used solutions, but here ...
preprint
Pham, T. D.
Objective: To develop a cross-domain spatial AI framework for identifying conserved tissue-state organisation across trauma, oral disease, and cardiovascular tissue using spatial transcriptomic dat...
preprint
Dreisler, M. W., Michael, R., Hatzakis, N. S., Boomsma, W.
Small-molecule lead refinement is constrained by the cost of synthesizing and assaying candidates, making the surrogate models that prioritize compounds for experimental testing central to the desi...
preprint
Dai, J., Molloy, E.
Hybridization is an important evolutionary process, commonly modeled by the network multispecies coalescent. Reconstructing evolutionary histories under this model is notoriously costly, even for l...
preprint
Kiwitz, L., Turiello, R., Effern, M., Toma, M., Landsberg, J., Hoelzel, M., Thurley, K.
Detailed spatial analysis of the tumor micro-environment (TME) through multiplexed fluorescence imaging requires quantitative image-processing and data-analysis methods. While data-preprocessing do...
preprint
Honeybrook, L.
Roughly half the cells in the human body are microbial, and changes in these communities are increasingly implicated in cardiovascular, metabolic, and oncological diseases. Yet identifying which ta...
preprint
Petrov, P. B., Oshinjo, A., Roning, J., Izzi, V.
The extracellular matrix (ECM) is a fundamental metazoan innovation that provides structural support and regulatory cues essential for multicellular life. While core matrisome components are subjec...
preprint
Pulido-Quetglas, C., Fasshauer, D.
Motivation: SNARE proteins catalyse membrane fusion across the eukaryotic endomembrane system, from synaptic vesicle exocytosis to intracellular trafficking, endosomal and vacuolar transport, and a...
preprint
Shengyi, Z.
Domains are the basic units of protein structure and function. Appropriate inter-domain organization is critical to enable cooperative execution of multiple related functions. It is thus a crucial ...
preprint
Figueroa, J. L., White, R. A.
We now exist in the era of massive datasets from genomics, large language models, and all the known knowledge of humanity right at our fingertips. Much of this data is becoming more accessible; how...
preprint
Charvel, E., Alves Monteiro, H. J., Mirarab, S., Bafna, V.
Ecologists and conservation biologists rely on genetic diversity as a key essential biodiversity variable (EBV) used to track population health and dynamics, and utilize the population parameter {t...
preprint
Gardeux, V., Carsanaro, S., Chen, W. J., David, F. P. A., Goutte-Gattat, D., Hilton, J. A., Lubiana, T., Patel, N., Raymor, B., Zucchi, I., Deplancke, B., Ernst, C., Osumi-Sutherland, D., Robinson-Rechavi, M., Sternberg, P. W., Bastian, F. B.
The rapid accumulation of single-cell RNA-Seq (scRNA-seq) data across multiple repositories presents major challenges for data accessibility, integration, and reproducibility. While primary reposit...
preprint
Naranjo Rincon, S., Ahmad, F., Easley, T., Shoushtari, S., Glatard, T., Kiar, G., Modi, H., Dahan, S., Robinson, E., Kamilov, U., Bijsterbosch, J.
As the field expands from early research into the human connectome, there has been a fast expansion in the number of analytical approaches to study resting state functional MRI (rsfMRI) data. With ...
preprint
Melton, G., Negron, D. A., Hauser, K., Jagannathan, S., Tolli, N., Jennings, K., Necciai, B., Sozhamannan, S., Abramson, B.
Loop-mediated isothermal amplification (LAMP) is a cost-effective and portable assay technique for performing nucleic acid-based diagnostics in the field whose adoption is hindered by design and re...
preprint
Semenov, A., Gupta, S., Roberts, A. M. P., Boginski, V., Aksenov, A. A.
Spectral similarity search is the basis of mass spectrometry-based metabolomics, underpinning library matching, molecular networks construction, and repository searches such as MASST. Until recentl...
preprint
Fernandes, G. M. d. M., Wang, W., Parwani, A., Ahmadian, S. S., Alves, M. J., Philips, J. J., Otero, J. J.
The reproducibility of immunohistochemistry in tumor tissue analysis across reference labs remains a persistent challenge. We tested the extent to which an intra-slide calibration technology mitiga...
preprint
Fumagalli, L., Becchi, T., Cereda, M., Pozzoli, U.
Motif discovery and binding-site prediction in DNA and RNA sequences are central tasks in regulatory genomics, yet the methodological landscape is split between interpretable but rigid position wei...
preprint
Moore, B. M., Freeman, J., Millikin, R. J., Mohanty, C., George, K. S., Bal, A., Lock, C., Sauer, J.-D., Spurgeon, M. E., Moore, D. L., Travers, B. G., Stewart, R.
Science fundamentally depends on the generation and testing of hypotheses, many of them controversial. An explosion in scientific literature has made evaluating hypotheses even within a domain a pr...
preprint