"""
Mapping file for ingesting the tignanillo database. For use with
the imptools modules of the Python node-software. This module
consists in one list of mappings plus supporting line-functions.

The source files are one table of levels and one of lines, derived from
the Chianti-7 data. Only Hydrogen, Helium and Oxygen data are present.
Change the value of the basepath variable, below, to select the directory from
which the files are read.

The output is two text files, one for states and one for lines,
suitable for ingesting into MySQL. The files contain the
values to be set into the columns of the table, using the LOAD DATA
command. They are not a SQL scripts, so the tables must be created (with the
CREATE TABLE SQL-command) first. Further, the order and type of the columns
created in the table must match the schema described and coded below 
(the names of columns are not so critical).

Each row of the states table describes one electronic state of one ion.
The columns of the table are, in order

id : integer - a unique ID for the state
element : string  - the chemical symbol for the elemnt
nuclearcharge : integer - the atomic number of the ion
ioncharge : integer - the number of positive charges on the ion
configuration : string - the electronic configuration of the ion
s : integer - the angular momentum due to spin, S
l : integer - the orbital angular momentum, L
j : integer - the total angular momemtum, J
energy : float - the energy above hte ground state, in 1/cm

Parity is not given because it is not trivially available from the source files.
(In the ingestion of the Chianti node, parity is recovered from the electronic
configuration.)

Each row of the lines table describes one readiative transition between states of one ion.
The columns of the table are, in order:
initial state : integer - code for initial state (foreign key to states table)
final state : integer - code for final state (foreign key to states table)
wavelength : float - wavelength of transition in vacuum, in Angstrom.
log10wosc : decimal log of weighted oscillator strength
a : Einstein A coefficient

Each column in the output is generated by one entry in the linemap array
below. Each of these entries maps a column to a "line-function" which
can extract the column's value from a string holding a line of the source file.
Most of the columns are present in the source file in the required form,
and these can be extracted by the standard line-functions imported from
the imptools module of the node software. Four columns needs special parsing
and for each of these there is a dedicated line-function in this module.

The primary key for the states table is formed from the atomic number, ionic
change and a state index given in the source file. The latter is unique only
for states of a single ion; therefore it is combined with the former two
numbers to get an identifier that is unique across all ions. This is done by
the line function statesPk.

The source file gives the chemical symbol for the ion as part of the 
ionization state, e.g. "He II". From this we extarct only the symbol. This
is done by the line function elementSymbol.

The initial and final states for each line are foreign-key references to
rows of the states table (i.e. they take the same value as the primary
key of the row describing the appropriate state) and are derived using the 
same coding as the primary key of that table. Because the inoput file has
two columns giving the index for final and inital state there are two
line functions. 
"""


from imptools.linefuncs import *

def statesPk(linedata, sep):
  """
  Generates a primary key for the state from the given index, the atomic number
  and the ionic charge. The index numbers in the source data repeat are only unique
  for a given ion (because the source files are separate for each ion), whereas we
  need a number unique across all ions. The key is formed as an integer:
  (1000000 * index) + (1000 * ionCharge) + atomicNumber
    linedata The line from the levels file
  """

  # Note the use of standard line-functions from the imptools module
  index = int(bySepNr(linedata, 4, '|'));
  ionCharge = int(bySepNr(linedata, 3, '|'));
  atomicNumber = int(bySepNr(linedata, 2, '|'));
  
  return (1000000 * index) + (1000 * ionCharge) + atomicNumber



def elementSymbol(linedata, sep):
  """
  Extracts the element symbol from the ionization stage.
  E.g., extracts He from "He II".
    linedata The line from the levels file
  """

  # Assume that the first word, delimited by spaces, in
  # the given column is the element symbol. Note the
  # use of a standard line function twice, first to get the
  # given column and then to get the first word in that column.
  ionizationStage = bySepNr(linedata, 1, '|')
  return bySepNr(ionizationStage, 0, ' ') 

def initialStates(linedata, sep):
  """
  Generates a foreign key for the initial state from the given index, the atomic number
  and the ionic charge. The index numbers in the source data repeat are only unique
  for a given ion (because the source files are separate for each ion), whereas we
  need a number unique across all ions. The key is formed as an integer:
  (1000000 * index) + (1000 * ionCharge) + atomicNumber
    linedata The line from the levels file
  """

  # Note the use of standard line-functions from the imptools module
  index = int(bySepNr(linedata, 5, '|'));
  ionCharge = int(bySepNr(linedata, 3, '|'));
  atomicNumber = int(bySepNr(linedata, 2, '|'));
  
  return (1000000 * index) + (1000 * ionCharge) + atomicNumber

def finalStates(linedata, sep):
  """
  Generates a foreign key for the final state from the given index, the atomic number
  and the ionic charge. The index numbers in the source data repeat are only unique
  for a given ion (because the source files are separate for each ion), whereas we
  need a number unique across all ions. The key is formed as an integer:
  (1000000 * index) + (1000 * ionCharge) + atomicNumber
    linedata The line from the levels file
  """

  # Note the use of standard line-functions from the imptools module
  index = int(bySepNr(linedata, 4, '|'));
  ionCharge = int(bySepNr(linedata, 3, '|'));
  atomicNumber = int(bySepNr(linedata, 2, '|'));
  
  return (1000000 * index) + (1000 * ionCharge) + atomicNumber

def speciesFk(linedata, sep):
  ionCharge = int(bySepNr(linedata, 3, '|'));
  atomicNumber = int(bySepNr(linedata, 2, '|'));

  return (1000 * ionCharge) + atomicNumber


# The names of the files
# There is an input file and output file
# for each table of the database.
basepath = "/users/guy/vamdc/tignanillo/"
outpath  = "/users/guy/vamdc/tignanillo/"
statesInFile  = basepath + 'states.in'
statesOutFile = outpath + 'states.out'
linesInFile  = basepath + 'lines.in'
linesOutFile = outpath + 'lines.out'


# The mappings details. This is a two-element list, one element each
# for the states and lines files. Each element is a dictionary of
# instructions, of which one, 'linemap', is itself a list of dictionaries,
# with one dictionary per column of the output file.
mapping = [

  # Mappings for the states file.
  {
    'outfile' : statesOutFile,
    'infiles' : statesInFile,
    'commentchar' : '%',

    # Mappings for each column of the output.
    # Note that the the value for 'cbyte' is a tuple containing first a
    # reference to the line function (not the name of the function, therefore
    # not quoted) followed by the argument list. It is not an in-line call
    # to the line function. E.g. 'cbyte' : (bySepNr,2,'|) is correct but
    # 'cbyte' : (bySepNr(2,'|')) is not. Note also that 'magic' values
    # indicating missing data are converted to nulls in the output.
    'linemap' : [
      {'cname' : 'id',            'cbyte' : (statesPk,'|')},
      {'cname' : 'species',       'cbyte' : (speciesFk,'|')},
      {'cname' : 'element',       'cbyte' : (elementSymbol,'|')},
      {'cname' : 'nuclearcharge', 'cbyte' : (bySepNr,2,'|')}, 
      {'cname' : 'ioncharge',     'cbyte' : (bySepNr,3,'|')},
      {'cname' : 'configuration', 'cbyte' : (bySepNr,5,'|')},
      {'cname' : 's',             'cbyte' : (bySepNr,6,'|')},
      {'cname' : 'l',             'cbyte' : (bySepNr,7,'|')},
      {'cname' : 'j',             'cbyte' : (bySepNr,8,'|')},
      {'cname' : 'energy',        'cbyte' : (bySepNr,9,'|'), 'cnull' : '-1.0'}
    ]
  },

  # Mappings for the lines file.
  {
    'outfile' : linesOutFile,
    'infiles' : linesInFile,
    'commentchar' : '%',

    # Mappings for each column of the output.
    # Note that the the value for 'cbyte' is a tuple containing first a
    # reference to the line function (not the name of the function, therefore
    # not quoted) followed by the argument list. It is not an in-line call
    # to the line function. E.g. 'cbyte' : (bySepNr,2,'|) is correct but
    # 'cbyte' : (bySepNr(2,'|')) is not. Note also that 'magic' values indicating
    # missing data are converted to nulls in the output.
    'linemap' : [
      {'cname' : 'initialstate',  'cbyte' : (initialStates,'|')},
      {'cname' : 'finalstate',    'cbyte' : (finalStates,'|')},
      {'cname' : 'wavelength',    'cbyte' : (bySepNr,6,'|'), 'cnull' : '-1.00000e+00'},
      {'cname' : 'log10wosc',     'cbyte' : (bySepNr,8,'|'), 'cnull' : '-1.000e+00'},
      {'cname' : 'a',             'cbyte' : (bySepNr,9,'|')}
    ]
  }

]
