new

Get trending papers in your email inbox!

Subscribe

Daily Papers

byAK and the research community

Jul 31

Tucano: Advancing Neural Text Generation for Portuguese

Significant advances have been made in natural language processing in recent years. However, our current deep learning approach to language modeling requires substantial resources in terms of data and computation. One of the side effects of this data-hungry paradigm is the current schism between languages, separating those considered high-resource, where most of the development happens and resources are available, and the low-resource ones, which struggle to attain the same level of performance and autonomy. This study aims to introduce a new set of resources to stimulate the future development of neural text generation in Portuguese. In this work, we document the development of GigaVerbo, a concatenation of deduplicated Portuguese text corpora amounting to 200 billion tokens. Via this corpus, we trained a series of decoder-transformers named Tucano. Our models perform equal or superior to other Portuguese and multilingual language models of similar size in several Portuguese benchmarks. The evaluation of our models also reveals that model performance on many currently available benchmarks used by the Portuguese NLP community has little to no correlation with the scaling of token ingestion during training, highlighting the limitations of such evaluations when it comes to the assessment of Portuguese generative language models. All derivatives of our study are openly released on GitHub and Hugging Face. See https://nkluge-correa.github.io/Tucano/

A JWST Project on 47 Tucanae: Kinematics, energy equipartition and anisotropy of multiple populations

Recent work with JWST has demonstrated its capability to identify and chemically characterize multiple populations in globular clusters down to the H-burning limit. In this study, we explore the kinematics of multiple populations in the globular cluster 47 Tucanae by combining data from JWST, HST, and Gaia. We analyzed velocity dispersion and anisotropy profiles from the cluster center out to sim10R_h. Our findings indicate that while 1G stars are isotropic, 2G stars are significantly radially anisotropic. These results align with the predictions of simulations of the dynamical evolution of clusters where 2G stars are initially more centrally concentrated than 1G stars. Furthermore, we subdivided the 2G population into two subpopulations: 2G_A and 2G_B, with the latter being more chemically extreme. We compared their dynamical profiles and found no significant differences. For the first time, we measured the degree of energy equipartition among the multiple populations of 47 Tucanae. Overall, within the analyzed radial range (sim2-4R_h), both populations exhibit a low degree of energy equipartition. The most significant differences between 1G and 2G stars are observed in the tangential velocity component, where 2G stars are characterized by a stronger degree of energy equipartition than 1G stars. In the radial component, the behavior of 1G and 2G stars is more variable, with differences largely dependent on radius. Finally, our analysis reveals that the ratio of rotational velocity to velocity dispersion is larger for the 2G population, while 1G stars exhibit higher skewness in their tangential proper motions, providing further evidence of differences in the kinematic properties of the 1G and 2G populations.