Mining the LAMOST Spectral Archive - PowerPoint PPT Presentation

About This Presentation
Title:

Mining the LAMOST Spectral Archive

Description:

Mining the LAMOST Spectral Archive A-Li Luo, Yan-Xia Zhang, Jian-Nan Zhang, and Yong-Heng Zhao National Astronomical Observatories Chinese Academy of Sciences – PowerPoint PPT presentation

Number of Views:46
Avg rating:3.0/5.0
Slides: 21
Provided by: Luo81
Learn more at: http://www.lamost.org
Category:

less

Transcript and Presenter's Notes

Title: Mining the LAMOST Spectral Archive


1
Mining the LAMOST Spectral Archive
  • A-Li Luo, Yan-Xia Zhang, Jian-Nan Zhang, and
    Yong-Heng Zhao
  • National Astronomical Observatories
  • Chinese Academy of Sciences
  • lal_at_lamost.bao.ac.cn

2
Brief Introduction About LAMOST Telescope
  • Aperture 4m
  • Focal length  20m
  • Field of view5 degree
  • Focal plane1.75m
  • Number of fiber4000
  • Spectra of objects as faint as 20m.5 with an
    exposure of 1.5 hour
  • Range370nm-900nm
  • Resolution R1000
  • 2000
  • (high R spectrographs for stellar spectra are
    in plan)

Focal Plane
MB Spherical Primary Mirror
MA Reflecting Schmidt Corrector
A meridian reflecting Schmidt telescope
3
Introduction About the LAMOST Archive
  • Two main data sets
  • a spectroscopic catalogue (catalogue
    subsets)
  • a set of individual spectra
  • Expected size of LAMOST archive

The telescope will be used to survey 1,000,000
QSOs, 10,000,000 galaxies, and
1,000,000 stars. The size of the final one
dimensional spectral dataset will exceed 1
terabyte.
4
The archive will provide
  • The archive will provide reference data for
    stars, galaxies and quasars in
  • FITS format, and will also provide a variety of
    services including a
  • flexible User interface on the web, which will
    allow sophisticated queries
  • within the database. Mining tools are also
    indispensable in order to
  • extract novel information.
  • Two main kinds of query
  • 1. simple and advance search
  • Enables users to retrieve data subsets based on
    search limits chosen by the them.
  • 2. SQL search
  • Gives users the option of supplying
    their search criteria in standard SQL.
  • Mining tools
  • The LAMOST software system will contain a
    spectra-based data miner of knowledge, which
    incorporates data mining functions such as
    clustering, characterization, and classification.

5
Introduction to Data Mining
  • Data mining (DM)
  • knowledge discovery in databases,
  • knowledge extraction,
  • data archaeology,
  • data dredging,
  • information harvesting,
  • business intelligence etc.
  • Two kinds of DM
  • Event-based mining
  • Relationship-based mining.

6
Mining the LAMOST archive
  • Clustering
  • Characterization
  • Classification

7
1. Clustering
  • What is clustering?
  • It divides a database into different groups.
  • The goal of clustering?
  • To find groups of objects that are very
    different from one other.
  • Same as classification?
  • No, one does not know a priori either which
    objects one's clusters will include or by which
    attributes the data will be clustered.
  • How to use clusters?
  • When we find clusters that segment the
    database meaningfully, these clusters may then be
    used to classify the new data.

8
Clustering
  • Some of the common algorithms used to perform
    clustering include
  • (1) partitioning-based algorithms, which
    enumerate various partitions and then score them
    by some criterion e.g. K-means, K-medoids etc.
  • (2) hierarchy-based algorithms, which create a
    hierarchical decomposition of the set of data (or
    objects) using some criterion
  • (3) model-based algorithms, in which a model is
    hypothesized for each of the clusters.

9
An example of clusteringSearching for NLQs
  • What is NLQ?
  • QSOs are active galactic nuclei(AGN) in which two
    different regions of ionized gas can be
    distinguished a broad-line region (BLR) and a
    narrow-line region(NLR).
  • The full-width half maxima (FWHM) of emission
    lines in spectra of broad-line QSOs (BLQs) often
    exceed 5000km/s, except that in the cases of
    narrow-line QSOs (NLQs) the FWHMs are generally
    narrower than 1000km/s.

10
An example of clusteringSearching for NLQs
  • Why do we select this example?
  • In the LAMOST archive, there will be 1
    million QSO spectra, including large numbers of
    NLQs amongst them. While NLRs in Seyfert galaxies
    are already relatively well studied, there are no
    comparable studies of NLRs in quasars.
  • Why we use clustering technique?
  • Under the framework of the united AGN model,
    we will need to compare statistically the spectra
    of NLQs with those of Seyfert galaxies.
  • In which space to do the clustering?
  • The basic goal of principal component
    analysis (PCA) is to reduce the dimensions of the
    multi-parameter space defined by one's data
    without loss of information. Such a reduction in
    dimensions has important benefits, especially as
    projection onto a 2-d or 3-d subspace is often
    useful for visualizing the data.
  • What kind of clustering algorithm we will use?
  • K-means clustering algorithm.
  • Experiment data?
  • 15000 spectra of QSO in SDSS DR1

11
An example of clusteringSearching for NLQs
  • SDSS DR1 QSO spectra for more than 15,000
    objects projected onto a 2-d PCA subspace. The
    x-axis is the first principal component, PC1,
    while the y-axis is the second principal
    component, PC2. Each small asterisk in the figure
    represents a projection of a spectrum. We found
    that most of the spectra were located within a
    spherical space. A quick check revealed that most
    BLQs lie within the spherical space, while most
    NLQs (which are less numerous) lie outside it.
    Using a K-means algorithm, we altered the size of
    the spherical space in order to achieve an
    optimal separation between BLQs and NLQs.

12
2. Characterization
  • What is data characterization?
  • It is a summarization of general features of
    objects in target classes, and produces
    characteristic rules.
  • How does characterization do?
  • The data relevant to a user-specified class
    are normally retrieved by a database query and
    run through a summarization module to extract the
    essence of the data at different levels of
    abstraction.

13
An example of characterizationeffective
temperatures of stars
  • Different with direct measurement
  • DM methods to estimate stellar parameters
    are different from traditional methods based on
    direct measurement. Its need not to measure each
    stellar spectrum.
  • Other automated estimation method for Teff
  • Bailer-Jones (2000,2002) have trained an
    artificial neural network (ANN) to estimate
    stellar parameters. Soubiran et al. (1998) and
    Katz et al. (1998) have established a template
    library containing 211 stellar spectra, and used
    cross-correlation techniques to match their
    observations with their templates.
  • Our method
  • Here we present a surface-fitting technique
    to estimate the distribution function of stellar
    effective temperature. We estimate the
    temperature distribution in PCA space, and the
    effective temperature of each star is just one
    point in such a distribution.

14
An example of characterizationeffective
temperatures of stars
The data set we used is a comprehensive library
of synthetic stellar spectra from Lejeune et
al.(1997), which is based on three original grids
of model atmosphere spectra by Kurucz et
al.(1979), Fluks et al.(1994), and Bessell et
al.(1989,1991).
  • First of all, the spectra in this data set
    were processed by means of a PCA, yielding above
    figure, in which all 1599 stellar spectra are
    projected onto a 2-d PCA plane.

15
An example of characterizationeffective
temperatures of stars
Consider that the data distribution in PCA
space is a locus X, and effective temperature T
is the function of X Tf(X). Thus, T is a
surface in a 3-d space as shown in above figure.
16
An example of characterizationeffective
temperatures of stars
  • By experimentation, we found that the following
    equations can fit the surface well.
  • T10P(x,y)
  • Where P(x,y) is a polynomial of the form
  • P(x,y)25.0069-1.80461x0.0525264x2-
    0.000450855x3 3.22394y-0.181638xy0.00256156x2y0
    .173964y2-0.00434289xy20.00358684y3.

17
An example of characterizationeffective
temperatures of stars
  • This figure gives the isotherm of effective
    temperature in a PCA plane. When an observational
    spectrum is projected onto this PCA space, we can
    judge the effective temperature of the object in
    question.
  • We are presently working on optimizing
    characterization algorithms in order to obtain
    distributions of stellar parameters, such as
    Teff, g, and Fe/H.

18
3. Classification
  • What is classification?
  • Classification is also called predictive
    data mining'', in that the aim is to identify the
    characteristics of group in advance.
  • What data of LAMOST needs to be classified?
  • For the LAMOST data archive, the data
    analysis pipeline will give the classification
    result e.g. QSO, galaxy or star of a particular
    spectral type. But for galaxies, the pipeline
    will not classify them further. The archive will
    include 107 galaxies, and the classification of
    galaxy spectra is a complex problem.
  • Why should we classify galaxy spectra?
  • A good classification scheme should be useful
    in understanding the evolutionary relationship
    between different types.

19
Classification of galaxy spectra
  • Galaxy spectral classifications can depend on
  • different methods
  • Line strength
  • Correlation between morphology and spectrum
  • Objective method (ANN or PCA)
  • Evolutionary models
  • We are now finding objective methods with more
    physical meaning to explain evolution of
    galaxies.

20
DM VO
  • An objective of LAMOST DM is to provide software
    tools that will also be useful for the
    development of China's Virtual Observatory(VO).
  • The LAMOST data set, including all its
    sub-catalogues and FITs files of 1-d spectra,
    will of course be another important contribution
    to the VO
  • The true relationship between LAMOST and the VO
    is in using data mining and knowledge discovery
    to explore the LAMOST data.
Write a Comment
User Comments (0)
About PowerShow.com