Get trending papers in your email inbox once a day!
Get trending papers in your email inbox!
SubscribeMatQnA: A Benchmark Dataset for Multi-modal Large Language Models in Materials Characterization and Analysis
Recently, large language models (LLMs) have achieved remarkable breakthroughs in general domains such as programming and writing, and have demonstrated strong potential in various scientific research scenarios. However, the capabilities of AI models in the highly specialized field of materials characterization and analysis have not yet been systematically or sufficiently validated. To address this gap, we present MatQnA, the first multi-modal benchmark dataset specifically designed for material characterization techniques. MatQnA includes ten mainstream characterization methods, such as X-ray Photoelectron Spectroscopy (XPS), X-ray Diffraction (XRD), Scanning Electron Microscopy (SEM), Transmission Electron Microscopy (TEM), etc. We employ a hybrid approach combining LLMs with human-in-the-loop validation to construct high-quality question-answer pairs, integrating both multiple-choice and subjective questions. Our preliminary evaluation results show that the most advanced multi-modal AI models (e.g., GPT-4.1, Claude 4, Gemini 2.5, and Doubao Vision Pro 32K) have already achieved nearly 90% accuracy on objective questions in materials data interpretation and analysis tasks, demonstrating strong potential for applications in materials characterization and analysis. The MatQnA dataset is publicly available at https://huggingface.co/datasets/richardhzgg/matQnA.
Polar nano-clusters in nominally paraelectric ceramics demonstrating high microwave tunability for wireless communication
Dielectric materials, with high tunability at microwave frequencies, are key components in the design of microwave communication systems. Dense Ba0.6Sr0.4TiO3 (BST) ceramics, with different grain sizes, were prepared in order to optimise the dielectric tunability via polar nano cluster effects. Dielectric permittivity and loss measurements were carried at both high and low frequencies and were supported by results from X-ray powder diffraction, scanning and transmission electron microscopies, Raman spectroscopy and piezoresponse force microscopy. The concentration of polar nano clusters, whose sizes are found to be in the range 20 to 50 nm, and the dielectric tunability increase with increasing grain size. A novel method for measurement of the microwave tunability in bulk dielectrics is presented. The highest tunability of 32% is achieved in ceramics with an average grain size of 10 um. The tunability of BST ceramics with applied DC field is demonstrated in a prototype small resonant antenna.
Microscale stress-geometry interactions in an additively manufactured NiTi cardiovascular stent: A synchrotron dual imaging tomography and diffraction study
This study explores cardiovascular stents fabricated using laser powder bed fusion (LPBF); an emerging method to offer patient-specific customisable parts. Here, the shape memory alloy NiTi, in a near equiatomic composition, was investigated to deconvolve the material response from macroscopic component effects. Specifically, stress-geometry interactions were revealed, in-situ, for a minaturised cardiovascular stent subjected to an externally applied cylindrical stress whilst acquiring synchrotron X-ray imaging and diffraction data. The approach enabled the collection of spatially resolved micromechanical deformation data; the formation of stress-induced martensite and R-phase was evident, occurring in locations near junctions between stent ligaments where stress concentrations exist. In the as-fabricated condition, hardness maps were obtained through nanoindentation, demonstrating that the localised deformation and deformation patterning is further controlled by porosity and microstructural heterogeneity. Electron backscatter diffraction (EBSD) supported these observations, showing a finer grain structure near stent junctions with higher associated lattice curvature. These features, combined with stress concentrations when loaded will initiate localised phase transformations. If the stent was subjected to repeated loading, representing in-vivo conditions, these regions would be susceptible to cyclic damage through transformation memory loss, leading to premature component failure. This study highlights the challenges that must be addressed for the post-processing treatment of LABF-processed stents for healthcare-related applications.
BS-Diff: Effective Bone Suppression Using Conditional Diffusion Models from Chest X-Ray Images
Chest X-rays (CXRs) are commonly utilized as a low-dose modality for lung screening. Nonetheless, the efficacy of CXRs is somewhat impeded, given that approximately 75% of the lung area overlaps with bone, which in turn hampers the detection and diagnosis of diseases. As a remedial measure, bone suppression techniques have been introduced. The current dual-energy subtraction imaging technique in the clinic requires costly equipment and subjects being exposed to high radiation. To circumvent these issues, deep learning-based image generation algorithms have been proposed. However, existing methods fall short in terms of producing high-quality images and capturing texture details, particularly with pulmonary vessels. To address these issues, this paper proposes a new bone suppression framework, termed BS-Diff, that comprises a conditional diffusion model equipped with a U-Net architecture and a simple enhancement module to incorporate an autoencoder. Our proposed network cannot only generate soft tissue images with a high bone suppression rate but also possesses the capability to capture fine image details. Additionally, we compiled the largest dataset since 2010, including data from 120 patients with high-definition, high-resolution paired CXRs and soft tissue images collected by our affiliated hospital. Extensive experiments, comparative analyses, ablation studies, and clinical evaluations indicate that the proposed BS-Diff outperforms several bone-suppression models across multiple metrics. Our code can be accessed at https://github.com/Benny0323/BS-Diff.
Radiative Gaussian Splatting for Efficient X-ray Novel View Synthesis
X-ray is widely applied for transmission imaging due to its stronger penetration than natural light. When rendering novel view X-ray projections, existing methods mainly based on NeRF suffer from long training time and slow inference speed. In this paper, we propose a 3D Gaussian splatting-based framework, namely X-Gaussian, for X-ray novel view synthesis. Firstly, we redesign a radiative Gaussian point cloud model inspired by the isotropic nature of X-ray imaging. Our model excludes the influence of view direction when learning to predict the radiation intensity of 3D points. Based on this model, we develop a Differentiable Radiative Rasterization (DRR) with CUDA implementation. Secondly, we customize an Angle-pose Cuboid Uniform Initialization (ACUI) strategy that directly uses the parameters of the X-ray scanner to compute the camera information and then uniformly samples point positions within a cuboid enclosing the scanned object. Experiments show that our X-Gaussian outperforms state-of-the-art methods by 6.5 dB while enjoying less than 15% training time and over 73x inference speed. The application on sparse-view CT reconstruction also reveals the practical values of our method. Code and models will be publicly available at https://github.com/caiyuanhao1998/X-Gaussian . A video demo of the training process visualization is at https://www.youtube.com/watch?v=gDVf_Ngeghg .
Statistics of X-Ray Polarization Measurements
The polarization of an X-ray beam that produces electrons with velocity components perpendicular to the beam generates an azimuthal distribution of the ejected electrons. We present methods for simulating and for analyzing the angular dependence of electron detections which enable us to derive simple analytical expressions for useful statistical properties of observable data. The derivations are verified by simulations. While we confirm the results of previous work on this topic, we provide an extension needed for analytical treatment of the full range of possible polarization amplitudes.
SV-DRR: High-Fidelity Novel View X-Ray Synthesis Using Diffusion Model
X-ray imaging is a rapid and cost-effective tool for visualizing internal human anatomy. While multi-view X-ray imaging provides complementary information that enhances diagnosis, intervention, and education, acquiring images from multiple angles increases radiation exposure and complicates clinical workflows. To address these challenges, we propose a novel view-conditioned diffusion model for synthesizing multi-view X-ray images from a single view. Unlike prior methods, which are limited in angular range, resolution, and image quality, our approach leverages the Diffusion Transformer to preserve fine details and employs a weak-to-strong training strategy for stable high-resolution image generation. Experimental results demonstrate that our method generates higher-resolution outputs with improved control over viewing angles. This capability has significant implications not only for clinical applications but also for medical education and data extension, enabling the creation of diverse, high-quality datasets for training and analysis. Our code is available at https://github.com/xiechun298/SV-DRR.
Illicit object detection in X-ray imaging using deep learning techniques: A comparative evaluation
Automated X-ray inspection is crucial for efficient and unobtrusive security screening in various public settings. However, challenges such as object occlusion, variations in the physical properties of items, diversity in X-ray scanning devices, and limited training data hinder accurate and reliable detection of illicit items. Despite the large body of research in the field, reported experimental evaluations are often incomplete, with frequently conflicting outcomes. To shed light on the research landscape and facilitate further research, a systematic, detailed, and thorough comparative evaluation of recent Deep Learning (DL)-based methods for X-ray object detection is conducted. For this, a comprehensive evaluation framework is developed, composed of: a) Six recent, large-scale, and widely used public datasets for X-ray illicit item detection (OPIXray, CLCXray, SIXray, EDS, HiXray, and PIDray), b) Ten different state-of-the-art object detection schemes covering all main categories in the literature, including generic Convolutional Neural Network (CNN), custom CNN, generic transformer, and hybrid CNN-transformer architectures, and c) Various detection (mAP50 and mAP50:95) and time/computational-complexity (inference time (ms), parameter size (M), and computational load (GFLOPS)) metrics. A thorough analysis of the results leads to critical observations and insights, emphasizing key aspects such as: a) Overall behavior of the object detection schemes, b) Object-level detection performance, c) Dataset-specific observations, and d) Time efficiency and computational complexity analysis. To support reproducibility of the reported experimental results, the evaluation code and model weights are made publicly available at https://github.com/jgenc/xray-comparative-evaluation.
DiffCrysGen: A Score-Based Diffusion Model for Design of Diverse Inorganic Crystalline Materials
Crystal structure generation is a foundational challenge in materials discovery, particularly in designing functional inorganic crystalline materials with desired properties. Most existing diffusion-based generative models for crystals rely on complex, hand-crafted priors and modular architectures to separately model atom types, atomic positions, and lattice parameters. These methods often require customized diffusion processes and conditional denoising, which can introduce additional model complexities and inconsistencies. Here we introduce DiffCrysGen, a fully data-driven, score-based diffusion model that jointly learns the distribution of all structural components in crystalline materials. With crystal structure representation as unified 2D matrices, DiffCrysGen bypasses the need for task-specific priors or decoupled modules, enabling end-to-end generation of atom types, fractional coordinates, and lattice parameters within a single framework. Our model learns crystallographic symmetry and chemical validity directly from large-scale datasets, allowing it to scale to complex materials discovery tasks. As a demonstration, we applied DiffCrysGen to the design of rare-earth-free magnetic materials with high saturation magnetization, showing its effectiveness in generating stable, diverse, and property-aligned candidates for sustainable magnet applications.
GL-LCM: Global-Local Latent Consistency Models for Fast High-Resolution Bone Suppression in Chest X-Ray Images
Chest X-Ray (CXR) imaging for pulmonary diagnosis raises significant challenges, primarily because bone structures can obscure critical details necessary for accurate diagnosis. Recent advances in deep learning, particularly with diffusion models, offer significant promise for effectively minimizing the visibility of bone structures in CXR images, thereby improving clarity and diagnostic accuracy. Nevertheless, existing diffusion-based methods for bone suppression in CXR imaging struggle to balance the complete suppression of bones with preserving local texture details. Additionally, their high computational demand and extended processing time hinder their practical use in clinical settings. To address these limitations, we introduce a Global-Local Latent Consistency Model (GL-LCM) architecture. This model combines lung segmentation, dual-path sampling, and global-local fusion, enabling fast high-resolution bone suppression in CXR images. To tackle potential boundary artifacts and detail blurring in local-path sampling, we further propose Local-Enhanced Guidance, which addresses these issues without additional training. Comprehensive experiments on a self-collected dataset SZCH-X-Rays, and the public dataset JSRT, reveal that our GL-LCM delivers superior bone suppression and remarkable computational efficiency, significantly outperforming several competitive methods. Our code is available at https://github.com/diaoquesang/GL-LCM.
Crystal Structure Prediction by Joint Equivariant Diffusion
Crystal Structure Prediction (CSP) is crucial in various scientific disciplines. While CSP can be addressed by employing currently-prevailing generative models (e.g. diffusion models), this task encounters unique challenges owing to the symmetric geometry of crystal structures -- the invariance of translation, rotation, and periodicity. To incorporate the above symmetries, this paper proposes DiffCSP, a novel diffusion model to learn the structure distribution from stable crystals. To be specific, DiffCSP jointly generates the lattice and atom coordinates for each crystal by employing a periodic-E(3)-equivariant denoising model, to better model the crystal geometry. Notably, different from related equivariant generative approaches, DiffCSP leverages fractional coordinates other than Cartesian coordinates to represent crystals, remarkably promoting the diffusion and the generation process of atom positions. Extensive experiments verify that our DiffCSP significantly outperforms existing CSP methods, with a much lower computation cost in contrast to DFT-based methods. Moreover, the superiority of DiffCSP is also observed when it is extended for ab initio crystal generation.
Towards Predicting Temporal Changes in a Patient's Chest X-ray Images based on Electronic Health Records
Chest X-ray imaging (CXR) is an important diagnostic tool used in hospitals to assess patient conditions and monitor changes over time. Generative models, specifically diffusion-based models, have shown promise in generating realistic synthetic X-rays. However, these models mainly focus on conditional generation using single-time-point data, i.e., typically CXRs taken at a specific time with their corresponding reports, limiting their clinical utility, particularly for capturing temporal changes. To address this limitation, we propose a novel framework, EHRXDiff, which predicts future CXR images by integrating previous CXRs with subsequent medical events, e.g., prescriptions, lab measures, etc. Our framework dynamically tracks and predicts disease progression based on a latent diffusion model, conditioned on the previous CXR image and a history of medical events. We comprehensively evaluate the performance of our framework across three key aspects, including clinical consistency, demographic consistency, and visual realism. We demonstrate that our framework generates high-quality, realistic future images that capture potential temporal changes, suggesting its potential for further development as a clinical simulation tool. This could offer valuable insights for patient monitoring and treatment planning in the medical field.
Zero-Shot Low-dose CT Denoising via Sinogram Flicking
Many low-dose CT imaging methods rely on supervised learning, which requires a large number of paired noisy and clean images. However, obtaining paired images in clinical practice is challenging. To address this issue, zero-shot self-supervised methods train denoising networks using only the information within a single image, such as ZS-N2N. However, these methods often employ downsampling operations that degrade image resolution. Additionally, the training dataset is inherently constrained to the image itself. In this paper, we propose a zero-shot low-dose CT imaging method based on sinogram flicking, which operates within a single image but generates many copies via random conjugate ray matching. Specifically, two conjugate X-ray pencil beams measure the same path; their expected values should be identical, while their noise levels vary during measurements. By randomly swapping portions of the conjugate X-rays in the sinogram domain, we generate a large set of sinograms with consistent content but varying noise patterns. When displayed dynamically, these sinograms exhibit a flickering effect due to their identical structural content but differing noise patterns-hence the term sinogram flicking. We train the network on pairs of sinograms with the same content but different noise distributions using a lightweight model adapted from ZS-NSN. This process is repeated to obtain the final results. A simulation study demonstrates that our method outperforms state-of-the-art approaches such as ZS-N2N.
DDGS-CT: Direction-Disentangled Gaussian Splatting for Realistic Volume Rendering
Digitally reconstructed radiographs (DRRs) are simulated 2D X-ray images generated from 3D CT volumes, widely used in preoperative settings but limited in intraoperative applications due to computational bottlenecks, especially for accurate but heavy physics-based Monte Carlo methods. While analytical DRR renderers offer greater efficiency, they overlook anisotropic X-ray image formation phenomena, such as Compton scattering. We present a novel approach that marries realistic physics-inspired X-ray simulation with efficient, differentiable DRR generation using 3D Gaussian splatting (3DGS). Our direction-disentangled 3DGS (DDGS) method separates the radiosity contribution into isotropic and direction-dependent components, approximating complex anisotropic interactions without intricate runtime simulations. Additionally, we adapt the 3DGS initialization to account for tomography data properties, enhancing accuracy and efficiency. Our method outperforms state-of-the-art techniques in image accuracy. Furthermore, our DDGS shows promise for intraoperative applications and inverse problems such as pose registration, delivering superior registration accuracy and runtime performance compared to analytical DRR methods.
Synthetically Enhanced: Unveiling Synthetic Data's Potential in Medical Imaging Research
Chest X-rays (CXR) are the most common medical imaging study and are used to diagnose multiple medical conditions. This study examines the impact of synthetic data supplementation, using diffusion models, on the performance of deep learning (DL) classifiers for CXR analysis. We employed three datasets: CheXpert, MIMIC-CXR, and Emory Chest X-ray, training conditional denoising diffusion probabilistic models (DDPMs) to generate synthetic frontal radiographs. Our approach ensured that synthetic images mirrored the demographic and pathological traits of the original data. Evaluating the classifiers' performance on internal and external datasets revealed that synthetic data supplementation enhances model accuracy, particularly in detecting less prevalent pathologies. Furthermore, models trained on synthetic data alone approached the performance of those trained on real data. This suggests that synthetic data can potentially compensate for real data shortages in training robust DL models. However, despite promising outcomes, the superiority of real data persists.
Weak localization in radiative transfer of acoustic waves in a randomly-fluctuating slab
This paper concerns the derivation of radiative transfer equations for acoustic waves propagating in a randomly fluctuating slab (between two parallel planes) in the weak-scattering regime, and the study of boundary effects through an asymptotic analysis of the Wigner transform of the wave solution. These radiative transfer equations allow to model the transport of wave energy density, taking into account the scattering by random heterogeneities. The approach builds on the method of images, where the slab is extended to a full-space, with a periodic map of mechanical properties and a series of sources located along a periodic pattern. Two types of boundary effects, both on the (small) scale of the wavelength, are observed: one at the boundaries of the slab, and one inside the domain. The former impact the entire energy density (coherent as well as incoherent) and is also observed in half-spaces. The latter, more specific to slabs, corresponds to the constructive interference of waves that have reflected at least twice on the boundaries of the slab and only impacts the coherent part of the energy density.
Multi-view X-ray Image Synthesis with Multiple Domain Disentanglement from CT Scans
X-ray images play a vital role in the intraoperative processes due to their high resolution and fast imaging speed and greatly promote the subsequent segmentation, registration and reconstruction. However, over-dosed X-rays superimpose potential risks to human health to some extent. Data-driven algorithms from volume scans to X-ray images are restricted by the scarcity of paired X-ray and volume data. Existing methods are mainly realized by modelling the whole X-ray imaging procedure. In this study, we propose a learning-based approach termed CT2X-GAN to synthesize the X-ray images in an end-to-end manner using the content and style disentanglement from three different image domains. Our method decouples the anatomical structure information from CT scans and style information from unpaired real X-ray images/ digital reconstructed radiography (DRR) images via a series of decoupling encoders. Additionally, we introduce a novel consistency regularization term to improve the stylistic resemblance between synthesized X-ray images and real X-ray images. Meanwhile, we also impose a supervised process by computing the similarity of computed real DRR and synthesized DRR images. We further develop a pose attention module to fully strengthen the comprehensive information in the decoupled content code from CT scans, facilitating high-quality multi-view image synthesis in the lower 2D space. Extensive experiments were conducted on the publicly available CTSpine1K dataset and achieved 97.8350, 0.0842 and 3.0938 in terms of FID, KID and defined user-scored X-ray similarity, respectively. In comparison with 3D-aware methods (pi-GAN, EG3D), CT2X-GAN is superior in improving the synthesis quality and realistic to the real X-ray images.
BS-LDM: Effective Bone Suppression in High-Resolution Chest X-Ray Images with Conditional Latent Diffusion Models
Lung diseases represent a significant global health challenge, with Chest X-Ray (CXR) being a key diagnostic tool due to their accessibility and affordability. Nonetheless, the detection of pulmonary lesions is often hindered by overlapping bone structures in CXR images, leading to potential misdiagnoses. To address this issue, we developed an end-to-end framework called BS-LDM, designed to effectively suppress bone in high-resolution CXR images. This framework is based on conditional latent diffusion models and incorporates a multi-level hybrid loss-constrained vector-quantized generative adversarial network which is crafted for perceptual compression, ensuring the preservation of details. To further enhance the framework's performance, we introduce offset noise and a temporal adaptive thresholding strategy. These additions help minimize discrepancies in generating low-frequency information, thereby improving the clarity of the generated soft tissue images. Additionally, we have compiled a high-quality bone suppression dataset named SZCH-X-Rays. This dataset includes 818 pairs of high-resolution CXR and dual-energy subtraction soft tissue images collected from a partner hospital. Moreover, we processed 241 data pairs from the JSRT dataset into negative images, which are more commonly used in clinical practice. Our comprehensive experimental and clinical evaluations reveal that BS-LDM excels in bone suppression, underscoring its significant clinical value.
Taming Generative Synthetic Data for X-ray Prohibited Item Detection
Training prohibited item detection models requires a large amount of X-ray security images, but collecting and annotating these images is time-consuming and laborious. To address data insufficiency, X-ray security image synthesis methods composite images to scale up datasets. However, previous methods primarily follow a two-stage pipeline, where they implement labor-intensive foreground extraction in the first stage and then composite images in the second stage. Such a pipeline introduces inevitable extra labor cost and is not efficient. In this paper, we propose a one-stage X-ray security image synthesis pipeline (Xsyn) based on text-to-image generation, which incorporates two effective strategies to improve the usability of synthetic images. The Cross-Attention Refinement (CAR) strategy leverages the cross-attention map from the diffusion model to refine the bounding box annotation. The Background Occlusion Modeling (BOM) strategy explicitly models background occlusion in the latent space to enhance imaging complexity. To the best of our knowledge, compared with previous methods, Xsyn is the first to achieve high-quality X-ray security image synthesis without extra labor cost. Experiments demonstrate that our method outperforms all previous methods with 1.2% mAP improvement, and the synthetic images generated by our method are beneficial to improve prohibited item detection performance across various X-ray security datasets and detectors. Code is available at https://github.com/pILLOW-1/Xsyn/.
Vision-Language Generative Model for View-Specific Chest X-ray Generation
Synthetic medical data generation has opened up new possibilities in the healthcare domain, offering a powerful tool for simulating clinical scenarios, enhancing diagnostic and treatment quality, gaining granular medical knowledge, and accelerating the development of unbiased algorithms. In this context, we present a novel approach called ViewXGen, designed to overcome the limitations of existing methods that rely on general domain pipelines using only radiology reports to generate frontal-view chest X-rays. Our approach takes into consideration the diverse view positions found in the dataset, enabling the generation of chest X-rays with specific views, which marks a significant advancement in the field. To achieve this, we introduce a set of specially designed tokens for each view position, tailoring the generation process to the user's preferences. Furthermore, we leverage multi-view chest X-rays as input, incorporating valuable information from different views within the same study. This integration rectifies potential errors and contributes to faithfully capturing abnormal findings in chest X-ray generation. To validate the effectiveness of our approach, we conducted statistical analyses, evaluating its performance in a clinical efficacy metric on the MIMIC-CXR dataset. Also, human evaluation demonstrates the remarkable capabilities of ViewXGen, particularly in producing realistic view-specific X-rays that closely resemble the original images.
VinDr-CXR: An open dataset of chest X-rays with radiologist's annotations
Most of the existing chest X-ray datasets include labels from a list of findings without specifying their locations on the radiographs. This limits the development of machine learning algorithms for the detection and localization of chest abnormalities. In this work, we describe a dataset of more than 100,000 chest X-ray scans that were retrospectively collected from two major hospitals in Vietnam. Out of this raw data, we release 18,000 images that were manually annotated by a total of 17 experienced radiologists with 22 local labels of rectangles surrounding abnormalities and 6 global labels of suspected diseases. The released dataset is divided into a training set of 15,000 and a test set of 3,000. Each scan in the training set was independently labeled by 3 radiologists, while each scan in the test set was labeled by the consensus of 5 radiologists. We designed and built a labeling platform for DICOM images to facilitate these annotation procedures. All images are made publicly available (https://www.physionet.org/content/vindr-cxr/1.0.0/) in DICOM format along with the labels of both the training set and the test set.
LIMITR: Leveraging Local Information for Medical Image-Text Representation
Medical imaging analysis plays a critical role in the diagnosis and treatment of various medical conditions. This paper focuses on chest X-ray images and their corresponding radiological reports. It presents a new model that learns a joint X-ray image & report representation. The model is based on a novel alignment scheme between the visual data and the text, which takes into account both local and global information. Furthermore, the model integrates domain-specific information of two types -- lateral images and the consistent visual structure of chest images. Our representation is shown to benefit three types of retrieval tasks: text-image retrieval, class-based retrieval, and phrase-grounding.
VinDr-SpineXR: A deep learning framework for spinal lesions detection and classification from radiographs
Radiographs are used as the most important imaging tool for identifying spine anomalies in clinical practice. The evaluation of spinal bone lesions, however, is a challenging task for radiologists. This work aims at developing and evaluating a deep learning-based framework, named VinDr-SpineXR, for the classification and localization of abnormalities from spine X-rays. First, we build a large dataset, comprising 10,468 spine X-ray images from 5,000 studies, each of which is manually annotated by an experienced radiologist with bounding boxes around abnormal findings in 13 categories. Using this dataset, we then train a deep learning classifier to determine whether a spine scan is abnormal and a detector to localize 7 crucial findings amongst the total 13. The VinDr-SpineXR is evaluated on a test set of 2,078 images from 1,000 studies, which is kept separate from the training set. It demonstrates an area under the receiver operating characteristic curve (AUROC) of 88.61% (95% CI 87.19%, 90.02%) for the image-level classification task and a mean average precision ([email protected]) of 33.56% for the lesion-level localization task. These results serve as a proof of concept and set a baseline for future research in this direction. To encourage advances, the dataset, codes, and trained deep learning models are made publicly available.
Explainable Knowledge Distillation for On-device Chest X-Ray Classification
Automated multi-label chest X-rays (CXR) image classification has achieved substantial progress in clinical diagnosis via utilizing sophisticated deep learning approaches. However, most deep models have high computational demands, which makes them less feasible for compact devices with low computational requirements. To overcome this problem, we propose a knowledge distillation (KD) strategy to create the compact deep learning model for the real-time multi-label CXR image classification. We study different alternatives of CNNs and Transforms as the teacher to distill the knowledge to a smaller student. Then, we employed explainable artificial intelligence (XAI) to provide the visual explanation for the model decision improved by the KD. Our results on three benchmark CXR datasets show that our KD strategy provides the improved performance on the compact student model, thus being the feasible choice for many limited hardware platforms. For instance, when using DenseNet161 as the teacher network, EEEA-Net-C2 achieved an AUC of 83.7%, 87.1%, and 88.7% on the ChestX-ray14, CheXpert, and PadChest datasets, respectively, with fewer parameters of 4.7 million and computational cost of 0.3 billion FLOPS.
TorchXRayVision: A library of chest X-ray datasets and models
TorchXRayVision is an open source software library for working with chest X-ray datasets and deep learning models. It provides a common interface and common pre-processing chain for a wide set of publicly available chest X-ray datasets. In addition, a number of classification and representation learning models with different architectures, trained on different data combinations, are available through the library to serve as baselines or feature extractors.
Aperture Diffraction for Compact Snapshot Spectral Imaging
We demonstrate a compact, cost-effective snapshot spectral imaging system named Aperture Diffraction Imaging Spectrometer (ADIS), which consists only of an imaging lens with an ultra-thin orthogonal aperture mask and a mosaic filter sensor, requiring no additional physical footprint compared to common RGB cameras. Then we introduce a new optical design that each point in the object space is multiplexed to discrete encoding locations on the mosaic filter sensor by diffraction-based spatial-spectral projection engineering generated from the orthogonal mask. The orthogonal projection is uniformly accepted to obtain a weakly calibration-dependent data form to enhance modulation robustness. Meanwhile, the Cascade Shift-Shuffle Spectral Transformer (CSST) with strong perception of the diffraction degeneration is designed to solve a sparsity-constrained inverse problem, realizing the volume reconstruction from 2D measurements with Large amount of aliasing. Our system is evaluated by elaborating the imaging optical theory and reconstruction algorithm with demonstrating the experimental imaging under a single exposure. Ultimately, we achieve the sub-super-pixel spatial resolution and high spectral resolution imaging. The code will be available at: https://github.com/Krito-ex/CSST.
MCL: Multi-view Enhanced Contrastive Learning for Chest X-ray Report Generation
Radiology reports are crucial for planning treatment strategies and enhancing doctor-patient communication, yet manually writing these reports is burdensome for radiologists. While automatic report generation offers a solution, existing methods often rely on single-view radiographs, limiting diagnostic accuracy. To address this problem, we propose MCL, a Multi-view enhanced Contrastive Learning method for chest X-ray report generation. Specifically, we first introduce multi-view enhanced contrastive learning for visual representation by maximizing agreements between multi-view radiographs and their corresponding report. Subsequently, to fully exploit patient-specific indications (e.g., patient's symptoms) for report generation, we add a transitional ``bridge" for missing indications to reduce embedding space discrepancies caused by their presence or absence. Additionally, we construct Multi-view CXR and Two-view CXR datasets from public sources to support research on multi-view report generation. Our proposed MCL surpasses recent state-of-the-art methods across multiple datasets, achieving a 5.0% F1 RadGraph improvement on MIMIC-CXR, a 7.3% BLEU-1 improvement on MIMIC-ABN, a 3.1% BLEU-4 improvement on Multi-view CXR, and an 8.2% F1 CheXbert improvement on Two-view CXR.
Evolution of the Accretion Disk and Corona During the Outburst of the Neutron Star Transient MAXI J1807+132
Low-mass X-ray binaries with a neutron star as the primary object show a complex array of phenomenology during outbursts. The observed variability in X-ray emission primarily arises from changes in the innermost regions of the accretion disk, neutron star surface, and corona. In this work, we present the results of a comprehensive X-ray spectral and timing analysis of the neutron star transient MAXI J1807+132 during its 2023 outburst using data from the NICER observatory. The outburst is marked by a very rapid rise in the count rate by about a factor of 20 in a day. The source undergoes full state transitions and displays hysteresis effect in the hardness and rms intensity diagrams. Spectral analysis with a three-component model is consistent with disk truncation during the hard states and reaching the last stable orbit during the intermediate and soft states. We discuss the different values of the last stable radius in the context of possible distance of the source and magnetic field strength. The characteristic frequencies throughout the hard and intermediate states are found to be strongly correlated with the inner radius of the disk. Together with the spectral and fast variability properties, we attempt to trace the evolution of the size of the corona along the outburst. Following the main outburst, the source undergoes a high amplitude reflare wherein it shows a complex behavior with relatively high variability (10 %), but low hardness.
CheXagent: Towards a Foundation Model for Chest X-Ray Interpretation
Chest X-rays (CXRs) are the most frequently performed imaging test in clinical practice. Recent advances in the development of vision-language foundation models (FMs) give rise to the possibility of performing automated CXR interpretation, which can assist physicians with clinical decision-making and improve patient outcomes. However, developing FMs that can accurately interpret CXRs is challenging due to the (1) limited availability of large-scale vision-language datasets in the medical image domain, (2) lack of vision and language encoders that can capture the complexities of medical data, and (3) absence of evaluation frameworks for benchmarking the abilities of FMs on CXR interpretation. In this work, we address these challenges by first introducing CheXinstruct - a large-scale instruction-tuning dataset curated from 28 publicly-available datasets. We then present CheXagent - an instruction-tuned FM capable of analyzing and summarizing CXRs. To build CheXagent, we design a clinical large language model (LLM) for parsing radiology reports, a vision encoder for representing CXR images, and a network to bridge the vision and language modalities. Finally, we introduce CheXbench - a novel benchmark designed to systematically evaluate FMs across 8 clinically-relevant CXR interpretation tasks. Extensive quantitative evaluations and qualitative reviews with five expert radiologists demonstrate that CheXagent outperforms previously-developed general- and medical-domain FMs on CheXbench tasks. Furthermore, in an effort to improve model transparency, we perform a fairness evaluation across factors of sex, race and age to highlight potential performance disparities. Our project is at https://stanford-aimi.github.io/chexagent.html.
IXPE Observation of the Low-Synchrotron Peaked Blazar S4 0954+65 During An Optical-X-ray Flare
The X-ray polarization observations made possible with the Imaging X-ray Polarimetry Explorer (IXPE) offer new ways of probing high-energy emission processes in astrophysical jets from blazars. Here we report on the first X-ray polarization observation of the blazar S4 0954+65 in a high optical and X-ray state. During our multi-wavelength campaign on the source, we detected an optical flare whose peak coincided with the peak of an X-ray flare. This optical-X-ray flare most likely took place in a feature moving along the parsec-scale jet, imaged at 43 GHz by the Very Long Baseline Array. The 43 GHz polarization angle of the moving component underwent a rotation near the time of the flare. In the optical band, prior to the IXPE observation, we measured the polarization angle to be aligned with the jet axis. In contrast, during the optical flare the optical polarization angle was perpendicular to the jet axis; after the flare, it reverted to being parallel to the jet axis. Due to the smooth behavior of the optical polarization angle during the flare, we favor shocks as the main acceleration mechanism. We also infer that the ambient magnetic field lines in the jet were parallel to the jet position angle. The average degree of optical polarization during the IXPE observation was (14.3pm4.1)%. Despite the flare, we only detected an upper limit of 14% (at 3sigma level) on the X-ray polarization degree; although a reasonable assumption on the X-ray polarization angle results in an upper limit of 8.8% (3sigma). We model the spectral energy distribution (SED) and spectral polarization distribution (SPD) of S4 0954+65 with leptonic (synchrotron self-Compton) and hadronic (proton and pair synchrotron) models. The constraints we obtain with our combined multi-wavelength polarization observations and SED modeling tentatively disfavor hadronic models for the X-ray emission in S4 0954+65.
Unaligned 2D to 3D Translation with Conditional Vector-Quantized Code Diffusion using Transformers
Generating 3D images of complex objects conditionally from a few 2D views is a difficult synthesis problem, compounded by issues such as domain gap and geometric misalignment. For instance, a unified framework such as Generative Adversarial Networks cannot achieve this unless they explicitly define both a domain-invariant and geometric-invariant joint latent distribution, whereas Neural Radiance Fields are generally unable to handle both issues as they optimize at the pixel level. By contrast, we propose a simple and novel 2D to 3D synthesis approach based on conditional diffusion with vector-quantized codes. Operating in an information-rich code space enables high-resolution 3D synthesis via full-coverage attention across the views. Specifically, we generate the 3D codes (e.g. for CT images) conditional on previously generated 3D codes and the entire codebook of two 2D views (e.g. 2D X-rays). Qualitative and quantitative results demonstrate state-of-the-art performance over specialized methods across varied evaluation criteria, including fidelity metrics such as density, coverage, and distortion metrics for two complex volumetric imagery datasets from in real-world scenarios.
ChestX-ray8: Hospital-scale Chest X-ray Database and Benchmarks on Weakly-Supervised Classification and Localization of Common Thorax Diseases
The chest X-ray is one of the most commonly accessible radiological examinations for screening and diagnosis of many lung diseases. A tremendous number of X-ray imaging studies accompanied by radiological reports are accumulated and stored in many modern hospitals' Picture Archiving and Communication Systems (PACS). On the other side, it is still an open question how this type of hospital-size knowledge database containing invaluable imaging informatics (i.e., loosely labeled) can be used to facilitate the data-hungry deep learning paradigms in building truly large-scale high precision computer-aided diagnosis (CAD) systems. In this paper, we present a new chest X-ray database, namely "ChestX-ray8", which comprises 108,948 frontal-view X-ray images of 32,717 unique patients with the text-mined eight disease image labels (where each image can have multi-labels), from the associated radiological reports using natural language processing. Importantly, we demonstrate that these commonly occurring thoracic diseases can be detected and even spatially-located via a unified weakly-supervised multi-label image classification and disease localization framework, which is validated using our proposed dataset. Although the initial quantitative results are promising as reported, deep convolutional neural network based "reading chest X-rays" (i.e., recognizing and locating the common disease patterns trained with only image-level labels) remains a strenuous task for fully-automated high precision CAD systems. Data download link: https://nihcc.app.box.com/v/ChestXray-NIHCC
AdverX-Ray: Ensuring X-Ray Integrity Through Frequency-Sensitive Adversarial VAEs
Ensuring the quality and integrity of medical images is crucial for maintaining diagnostic accuracy in deep learning-based Computer-Aided Diagnosis and Computer-Aided Detection (CAD) systems. Covariate shifts are subtle variations in the data distribution caused by different imaging devices or settings and can severely degrade model performance, similar to the effects of adversarial attacks. Therefore, it is vital to have a lightweight and fast method to assess the quality of these images prior to using CAD models. AdverX-Ray addresses this need by serving as an image-quality assessment layer, designed to detect covariate shifts effectively. This Adversarial Variational Autoencoder prioritizes the discriminator's role, using the suboptimal outputs of the generator as negative samples to fine-tune the discriminator's ability to identify high-frequency artifacts. Images generated by adversarial networks often exhibit severe high-frequency artifacts, guiding the discriminator to focus excessively on these components. This makes the discriminator ideal for this approach. Trained on patches from X-ray images of specific machine models, AdverX-Ray can evaluate whether a scan matches the training distribution, or if a scan from the same machine is captured under different settings. Extensive comparisons with various OOD detection methods show that AdverX-Ray significantly outperforms existing techniques, achieving a 96.2% average AUROC using only 64 random patches from an X-ray. Its lightweight and fast architecture makes it suitable for real-time applications, enhancing the reliability of medical imaging systems. The code and pretrained models are publicly available.
Deep Learning Based Defect Detection for Solder Joints on Industrial X-Ray Circuit Board Images
Quality control is of vital importance during electronics production. As the methods of producing electronic circuits improve, there is an increasing chance of solder defects during assembling the printed circuit board (PCB). Many technologies have been incorporated for inspecting failed soldering, such as X-ray imaging, optical imaging, and thermal imaging. With some advanced algorithms, the new technologies are expected to control the production quality based on the digital images. However, current algorithms sometimes are not accurate enough to meet the quality control. Specialists are needed to do a follow-up checking. For automated X-ray inspection, joint of interest on the X-ray image is located by region of interest (ROI) and inspected by some algorithms. Some incorrect ROIs deteriorate the inspection algorithm. The high dimension of X-ray images and the varying sizes of image dimensions also challenge the inspection algorithms. On the other hand, recent advances on deep learning shed light on image-based tasks and are competitive to human levels. In this paper, deep learning is incorporated in X-ray imaging based quality control during PCB quality inspection. Two artificial intelligence (AI) based models are proposed and compared for joint defect detection. The noised ROI problem and the varying sizes of imaging dimension problem are addressed. The efficacy of the proposed methods are verified through experimenting on a real-world 3D X-ray dataset. By incorporating the proposed methods, specialist inspection workload is largely saved.
Segmentation of 3D pore space from CT images using curvilinear skeleton: application to numerical simulation of microbial decomposition
Recent advances in 3D X-ray Computed Tomographic (CT) sensors have stimulated research efforts to unveil the extremely complex micro-scale processes that control the activity of soil microorganisms. Voxel-based description (up to hundreds millions voxels) of the pore space can be extracted, from grey level 3D CT scanner images, by means of simple image processing tools. Classical methods for numerical simulation of biological dynamics using mesh of voxels, such as Lattice Boltzmann Model (LBM), are too much time consuming. Thus, the use of more compact and reliable geometrical representations of pore space can drastically decrease the computational cost of the simulations. Several recent works propose basic analytic volume primitives (e.g. spheres, generalized cylinders, ellipsoids) to define a piece-wise approximation of pore space for numerical simulation of draining, diffusion and microbial decomposition. Such approaches work well but the drawback is that it generates approximation errors. In the present work, we study another alternative where pore space is described by means of geometrically relevant connected subsets of voxels (regions) computed from the curvilinear skeleton. Indeed, many works use the curvilinear skeleton (3D medial axis) for analyzing and partitioning 3D shapes within various domains (medicine, material sciences, petroleum engineering, etc.) but only a few ones in soil sciences. Within the context of soil sciences, most studies dealing with 3D medial axis focus on the determination of pore throats. Here, we segment pore space using curvilinear skeleton in order to achieve numerical simulation of microbial decomposition (including diffusion processes). We validate simulation outputs by comparison with other methods using different pore space geometrical representations (balls, voxels).
Reconstruction of three-dimensional porous media using generative adversarial neural networks
To evaluate the variability of multi-phase flow properties of porous media at the pore scale, it is necessary to acquire a number of representative samples of the void-solid structure. While modern x-ray computer tomography has made it possible to extract three-dimensional images of the pore space, assessment of the variability in the inherent material properties is often experimentally not feasible. We present a novel method to reconstruct the solid-void structure of porous media by applying a generative neural network that allows an implicit description of the probability distribution represented by three-dimensional image datasets. We show, by using an adversarial learning approach for neural networks, that this method of unsupervised learning is able to generate representative samples of porous media that honor their statistics. We successfully compare measures of pore morphology, such as the Euler characteristic, two-point statistics and directional single-phase permeability of synthetic realizations with the calculated properties of a bead pack, Berea sandstone, and Ketton limestone. Results show that GANs can be used to reconstruct high-resolution three-dimensional images of porous media at different scales that are representative of the morphology of the images used to train the neural network. The fully convolutional nature of the trained neural network allows the generation of large samples while maintaining computational efficiency. Compared to classical stochastic methods of image reconstruction, the implicit representation of the learned data distribution can be stored and reused to generate multiple realizations of the pore structure very rapidly.
Calculation of Femur Caput Collum Diaphyseal angle for X-Rays images using Semantic Segmentation
This paper investigates the use of deep learning approaches to estimate the femur caput-collum-diaphyseal (CCD) angle from X-ray images. The CCD angle is an important measurement in the diagnosis of hip problems, and correct prediction can help in the planning of surgical procedures. Manual measurement of this angle, on the other hand, can be time-intensive and vulnerable to inter-observer variability. In this paper, we present a deep-learning algorithm that can reliably estimate the femur CCD angle from X-ray images. To train and test the performance of our model, we employed an X-ray image dataset with associated femur CCD angle measurements. Furthermore, we built a prototype to display the resulting predictions and to allow the user to interact with the predictions. As this is happening in a sterile setting during surgery, we expanded our interface to the possibility of being used only by voice commands. Our results show that our deep learning model predicts the femur CCD angle on X-ray images with great accuracy, with a mean absolute error of 4.3 degrees on the left femur and 4.9 degrees on the right femur on the test dataset. Our results suggest that deep learning has the potential to give a more efficient and accurate technique for predicting the femur CCD angle, which might have substantial therapeutic implications for the diagnosis and management of hip problems.
On the generation of periodic discrete structures with identical two-point correlation
Strategies for the generation of periodic discrete structures with identical two-point correlation are developed. Starting from a pair of root structures, which are not related by translation, phase inversion or axis reflections, child structures of arbitrary resolution (i.e., pixel or voxel numbers) and number of phases (i.e., material phases/species) can be generated by means of trivial embedding based phase extension, application of kernels and/or phase coalescence, such that the generated structures inherit the two-point-correlation equivalence. Proofs of the inheritance property are provided by means of the Discrete Fourier Transform theory. A Python 3 implementation of the results is offered by the authors through the Github repository https://github.com/DataAnalyticsEngineering/EQ2PC in order to make the provided results reproducible and useful for all interested readers. Examples for the generation of structures are demonstrated, together with applications in the homogenization theory of periodic media.
Face X-ray for More General Face Forgery Detection
In this paper we propose a novel image representation called face X-ray for detecting forgery in face images. The face X-ray of an input face image is a greyscale image that reveals whether the input image can be decomposed into the blending of two images from different sources. It does so by showing the blending boundary for a forged image and the absence of blending for a real image. We observe that most existing face manipulation methods share a common step: blending the altered face into an existing background image. For this reason, face X-ray provides an effective way for detecting forgery generated by most existing face manipulation algorithms. Face X-ray is general in the sense that it only assumes the existence of a blending step and does not rely on any knowledge of the artifacts associated with a specific face manipulation technique. Indeed, the algorithm for computing face X-ray can be trained without fake images generated by any of the state-of-the-art face manipulation methods. Extensive experiments show that face X-ray remains effective when applied to forgery generated by unseen face manipulation techniques, while most existing face forgery detection or deepfake detection algorithms experience a significant performance drop.
Space Group Constrained Crystal Generation
Crystals are the foundation of numerous scientific and industrial applications. While various learning-based approaches have been proposed for crystal generation, existing methods seldom consider the space group constraint which is crucial in describing the geometry of crystals and closely relevant to many desirable properties. However, considering space group constraint is challenging owing to its diverse and nontrivial forms. In this paper, we reduce the space group constraint into an equivalent formulation that is more tractable to be handcrafted into the generation process. In particular, we translate the space group constraint into two parts: the basis constraint of the invariant logarithmic space of the lattice matrix and the Wyckoff position constraint of the fractional coordinates. Upon the derived constraints, we then propose DiffCSP++, a novel diffusion model that has enhanced a previous work DiffCSP by further taking space group constraint into account. Experiments on several popular datasets verify the benefit of the involvement of the space group constraint, and show that our DiffCSP++ achieves promising performance on crystal structure prediction, ab initio crystal generation and controllable generation with customized space groups.
RISING a new framework for few-view tomographic image reconstruction with deep learning
This paper proposes a new two-step procedure for sparse-view tomographic image reconstruction. It is called RISING, since it combines an early-stopped Rapid Iterative Solver with a subsequent Iteration Network-based Gaining step. So far, regularized iterative methods have widely been used for X-ray computed tomography image reconstruction from low-sampled data, since they converge to a sparse solution in a suitable domain, as upheld by the Compressed Sensing theory. Unfortunately, their use is practically limited by their high computational cost which imposes to perform only a few iterations in the available time for clinical exams. Data-driven methods, using neural networks to post-process a coarse and noisy image obtained from geometrical algorithms, have been recently studied and appreciated for both their computational speed and accurate reconstructions. However, there is no evidence, neither theoretically nor numerically, that neural networks based algorithms solve the mathematical inverse problem modeling the tomographic reconstruction process. In our two-step approach, the first phase executes very few iterations of a regularized model-based algorithm whereas the second step completes the missing iterations by means of a neural network. The resulting hybrid deep-variational framework preserves the convergence properties of the iterative method and, at the same time, it exploits the computational speed and flexibility of a data-driven approach. Experiments performed on a simulated and a real data set confirm the numerical and visual accuracy of the reconstructed RISING images in short computational times.
MedShift: Implicit Conditional Transport for X-Ray Domain Adaptation
Synthetic medical data offers a scalable solution for training robust models, but significant domain gaps limit its generalizability to real-world clinical settings. This paper addresses the challenge of cross-domain translation between synthetic and real X-ray images of the head, focusing on bridging discrepancies in attenuation behavior, noise characteristics, and soft tissue representation. We propose MedShift, a unified class-conditional generative model based on Flow Matching and Schrodinger Bridges, which enables high-fidelity, unpaired image translation across multiple domains. Unlike prior approaches that require domain-specific training or rely on paired data, MedShift learns a shared domain-agnostic latent space and supports seamless translation between any pair of domains seen during training. We introduce X-DigiSkull, a new dataset comprising aligned synthetic and real skull X-rays under varying radiation doses, to benchmark domain translation models. Experimental results demonstrate that, despite its smaller model size compared to diffusion-based approaches, MedShift offers strong performance and remains flexible at inference time, as it can be tuned to prioritize either perceptual fidelity or structural consistency, making it a scalable and generalizable solution for domain adaptation in medical imaging. The code and dataset are available at https://caetas.github.io/medshift.html
ELIXR: Towards a general purpose X-ray artificial intelligence system through alignment of large language models and radiology vision encoders
Our approach, which we call Embeddings for Language/Image-aligned X-Rays, or ELIXR, leverages a language-aligned image encoder combined or grafted onto a fixed LLM, PaLM 2, to perform a broad range of tasks. We train this lightweight adapter architecture using images paired with corresponding free-text radiology reports from the MIMIC-CXR dataset. ELIXR achieved state-of-the-art performance on zero-shot chest X-ray (CXR) classification (mean AUC of 0.850 across 13 findings), data-efficient CXR classification (mean AUCs of 0.893 and 0.898 across five findings (atelectasis, cardiomegaly, consolidation, pleural effusion, and pulmonary edema) for 1% (~2,200 images) and 10% (~22,000 images) training data), and semantic search (0.76 normalized discounted cumulative gain (NDCG) across nineteen queries, including perfect retrieval on twelve of them). Compared to existing data-efficient methods including supervised contrastive learning (SupCon), ELIXR required two orders of magnitude less data to reach similar performance. ELIXR also showed promise on CXR vision-language tasks, demonstrating overall accuracies of 58.7% and 62.5% on visual question answering and report quality assurance tasks, respectively. These results suggest that ELIXR is a robust and versatile approach to CXR AI.
Doming and spin cascade in Ferric Haems: Femtosecond X-ray Absorption and X-ray Emission Studies
The structure-function relationship is at the heart of biology and major protein deformations are correlated to specific functions. In the case of heme proteins, doming is associated with the respiratory function in hemoglobin and myoglobin, while ruffling has been correlated with electron transfer processes, such as in the case of Cytochrome c (Cyt c). The latter has indeed evolved to become an important electron transfer protein in humans. In its ferrous form, it undergoes ligand release and doming upon photoexcitation, but its ferric form does not release the distal ligand, while the return to the ground state has been attributed to thermal relaxation. Here, by combining femtosecond Fe K-edge X-ray absorption near-edge structure (XANES) studies and femtosecond Fe Kalpha and Kbeta X-ray emission spectroscopy (XES), we demonstrate that the photocycle of ferric Cyt c is entirely due to a cascade among excited spin states of the Iron ion, causing the ferric heme to undergo doming, which we identify for the first time. We also argue that this pattern is common to all ferric haems, raising the question of the biological relevance of doming in such proteins.
Toward quantitative fractography using convolutional neural networks
The science of fractography revolves around the correlation between topographic characteristics of the fracture surface and the mechanisms and external conditions leading to their creation. While being a topic of investigation for centuries, it has remained mostly qualitative to date. A quantitative analysis of fracture surfaces is of prime interest for both the scientific community and the industrial sector, bearing the potential for improved understanding on the mechanisms controlling the fracture process and at the same time assessing the reliability of computational models currently being used for material design. With new advances in the field of image analysis, and specifically with machine learning tools becoming more accessible and reliable, it is now feasible to automate the process of extracting meaningful information from fracture surface images. Here, we propose a method of identifying and quantifying the relative appearance of intergranular and transgranular fracture events from scanning electron microscope images. The newly proposed method is based on a convolutional neural network algorithm for semantic segmentation. The proposed method is extensively tested and evaluated against two ceramic material systems (Al_2O_3,MgAl_2O_4) and shows high prediction accuracy, despite being trained on only one material system (MgAl_2O_4). While here attention is focused on brittle fracture characteristics, the method can be easily extended to account for other fracture morphologies, such as dimples, fatigue striations, etc.
Longitudinal Data and a Semantic Similarity Reward for Chest X-Ray Report Generation
Chest X-Ray (CXR) report generation is a promising approach to improving the efficiency of CXR interpretation. However, a significant increase in diagnostic accuracy is required before that can be realised. Motivated by this, we propose a framework that is more inline with a radiologist's workflow by considering longitudinal data. Here, the decoder is additionally conditioned on the report from the subject's previous imaging study via a prompt. We also propose a new reward for reinforcement learning based on CXR-BERT, which computes the similarity between reports. We conduct experiments on the MIMIC-CXR dataset. The results indicate that longitudinal data improves CXR report generation. CXR-BERT is also shown to be a promising alternative to the current state-of-the-art reward based on RadGraph. This investigation indicates that longitudinal CXR report generation can offer a substantial increase in diagnostic accuracy. Our Hugging Face model is available at: https://huggingface.co/aehrc/cxrmate and code is available at: https://github.com/aehrc/cxrmate.
