This is the prototype version of the tableParser package. The built-in table-to-text parser first unifies, classifies, and then translates the tabled content in scientific artciles into a text vector. Subsequently, the result is processed with the already established function standardStats() of the JATSdecoder package, to extract the reported statistical standard results (t, Z, χ2, F, r, d, β, SE, r, d, η2, ω2, OR, RR, p), the coded p-values and recompute p-values. A comprehensive description of the underlying processing concepts is currently being prepared.

How does it work?

The following process tree is a simplified representation of the conversion process of HTML tables into the extracted statistical standard results. The HTML table is first transposed into a matrix. The matrix is then parsed into text lines, considering the detected legend codings within the caption and foot notes (e.g. '*' for p<.05). The representations of the numerical results within the text lines are further unified and then processed with JATSdecoder's function standardStats() to detect all statistical standard results and recompute p-values if possible.

The parsing of the matrix content to text is based on upon the decision of the classifier. In the case of correlation matrices, the reported sample size specified within the caption or footnote is subtracted by two and then imputed as degrees of freedom. This process enables a subsequent recomputation of the p-values.

Important note: This is a testing site of a prototype software. The software handles many simple table structures. Complex and non-barrier-free tables may result in faulty processing.

Installation via: www.github.com/ingmarboeschen/tableParser


To help you understand how tableparser works, you can test it here with some random articles in HTML format that were published in Frontiers of Psychology.



The article's original HTML table/s visualized:

Parsed table content produced by table2text():



Extracted standard statistics with table2stats(file,T2t=TRUE,estimateZ=T):

Standard statistics are t-, F-, chi2, Z-, G2, H-, U- and Q-statistics, as well as p-, r-, R2-, d-, eta2-values.

Hosted by:
http://www.uni-hamburg.de
Contact:
Department of Research Methods and Statistics

Dr. Ingmar Böschen
Von-Melle-Park 5
20146 Hamburg
Germany
Empowered by:
https://github.com/ingmarboeschen/JATSdecoder