In this section appear some of the lines in which we are working at the moment and that, by their
degree of occupation are generating some products that will be published
soon:
- Comparison of different Text sizes with the
techniques “centroid” and “Fold-in”
Resistance of both techniques
comparing pseudodocuments of a variable size under the assumption that
when increasing the size of the pseudocuments, the comparisons based
on the cosine will increase more noticiable in centroid
than in “Fold-in”. In addition, it is analyzed
the intervention of
the first dimension in this effect. The validity of this
type of techniques is important due the difference in
consumption of resources between both.
- Different parameters to implement autotutors with LSA
a) university students: using
corpus specific of domain of small and medium sizes formed by
extremely structured texts or little structured or the sum of both. On
these corpus, different semantic spaces formed under different
parameters are put on trial. These parameters include
the way of preprocess (purges and measures of importance of the
terms), the reduction of dimensions in percentage, the way to
construct to pseudodocuments (centroid and Fold-in) and the measures
of similarity of texts (cosines or Euclidean distances). The evaluated
students compose two groups, one of experts and another one of
non-experts. Both groups also answer an open question that will be
evaluated by LSA system under all and each of the semantics spaces and by a
group of human graders. By an Analysis of Variance,
those parameters of the spaces that contribute more to the correlation to the human
criterion are evaluated.
b) Students of Primary, Secundary and University: summaries made by
students of different academic courses. They also look for what
parameters improve the evaluation of the summaries comparing
the evaluations of the LSA with others done by expert judges. To see
first results in [pdf].
- Discrimination between different emotional tones.
Text segments with an optimistic emotional style, and text segments
with a depressive emotional style are put on approval under different
techniques among them LSA and clusters.