copying first string into second line ~ Discussion of Coding

copying first string into second line

I have a text file in this format:

abacas? Abaca[Noun]+[Prop]+[A3sg]+SH[P3sg]+[Nom] : 20.1748046875  abac? Aba[Noun]+[Prop]+[A3sg]+SH[P3sg]+[Nom] : 16.3037109375 Aba[Noun]+[Prop]+[A3sg]+[Pnon]+[Nom]-CH[Noun+Agt]+[A3sg]+[Pnon]+[Nom] : 23.0185546875  abac?larla Aba[Noun]+[Prop]+[A3sg]+[Pnon]+[Nom]-CH[Noun+Agt]+lAr[A3pl]+[Pnon]+YlA[Ins] : 27.8974609375 aba[Noun]+[A3sg]+[Pnon]+[Nom]-CH[Noun+Agt]+lAr[A3pl]+[Pnon]+YlA[Ins] : 23.3427734375 abac?[Noun]+lAr[A3pl]+[Pnon]+YlA[Ins] : 19.556640625

Here I call the first string before the first space as word (for example abac?s?)

The string which starts with after first space and ends with integer is definition (for example Abaca[Noun]+[Prop]+[A3sg]+SH[P3sg]+[Nom] : 20.1748046875)

I want to do this: If a line includes more than one definition (first line has one, second line has two, third line has three), apply newline and put the first string (word) into the beginning of the new line. Expected output:

abacas? Abaca[Noun]+[Prop]+[A3sg]+SH[P3sg]+[Nom] : 20.1748046875  abac? Aba[Noun]+[Prop]+[A3sg]+SH[P3sg]+[Nom] : 16.3037109375  abac? Aba[Noun]+[Prop]+[A3sg]+[Pnon]+[Nom]-CH[Noun+Agt]+[A3sg]+[Pnon]+[Nom] : 23.0185546875  abac?larla Aba[Noun]+[Prop]+[A3sg]+[Pnon]+[Nom]-CH[Noun+Agt]+lAr[A3pl]+[Pnon]+YlA[Ins] : 27.8974609375  abac?larla aba[Noun]+[A3sg]+[Pnon]+[Nom]-CH[Noun+Agt]+lAr[A3pl]+[Pnon]+YlA[Ins] : 23.3427734375  abac?larla abac?[Noun]+lAr[A3pl]+[Pnon]+YlA[Ins] : 19.556640625

I have almost 1.500.000 lines in my text file and the number of definition is not certain for each line. It can be 1 to 5

Answer by glenn jackman for copying first string into second line

Assuming there are always 4 space-separated words for each definition:

awk '{for (i=1; i

    Or if the split should occur after that floating point number
    perl -pe 's/\b\d+\.\d+\K\s+(?=\S)/\n/g' file  
    (This is the perl equivalent of Avinash's answer)
  
Answer by repzero for copying first string into second line

here is a sed in action
    sed -r '/^indirger(ken|di)/{s/([0-9]+[.][0-9]+ )(indirge)/\1\n\2/g}' my_file  
    output
    indirgerdi indirge[Verb]+[Pos]+Hr[Aor]+[A3sg]+YDH[Past] : 22.2626953125   indirge[Verb]+[Pos]+Hr[Aor]+YDH[Past]+[A3sg] : 18.720703125  indirgerken indirge[Verb]+[Pos]+Hr[Aor]+[A3sg]-Yken[Adv+While] : 19.6201171875  
  
Answer by andreas-hofmann for copying first string into second line

Small python script does the job. Input is expected in input.txt, output gotes to output.txt.
    import re    rf = re.compile('([^\s]+\s).+')  r = re.compile('([^\s]+\s\:\s\d+\.\d+)')    with open("input.txt", "r") as f:      text = f.read()    with open("output.txt", "w") as f:      for l in text.split('\n'):          offset = 0          first = ""          match = re.search(rf, l[offset:])          if match:              first = match.group(1)              offset = len(first)          while True:              match =  re.search(r, l[offset:])              if not match:                  break              s = match.group(1)              offset += len(s)              f.write(first + " " + s + "\n")  
  
Answer by Benjamin W. for copying first string into second line

Bash and grep:
    #!/bin/bash    while IFS=' ' read -r in1 in2 in3 in4; do      if [[ -n $in4 ]]; then          prepend="$in1"          echo "$in1 $in2 $in3 $in4"      else          echo "$prepend $in1 $in2 $in3"      fi  done < <(grep -o '[[:alnum:]][^:]\+ : [[:digit:].]\+' "$1")  
    The output of grep -o is putting all definitions on a separate line, but definitions originating from the same line are missing the "word" at the beginning:
    abacas? Abaca[Noun]+[Prop]+[A3sg]+SH[P3sg]+[Nom] : 20.1748046875  abac? Aba[Noun]+[Prop]+[A3sg]+SH[P3sg]+[Nom] : 16.3037109375  Aba[Noun]+[Prop]+[A3sg]+[Pnon]+[Nom]-CH[Noun+Agt]+[A3sg]+[Pnon]+[Nom] : 23.0185546875  abac?larla Aba[Noun]+[Prop]+[A3sg]+[Pnon]+[Nom]-CH[Noun+Agt]+lAr[A3pl]+[Pnon]+YlA[Ins] : 27.8974609375  aba[Noun]+[A3sg]+[Pnon]+[Nom]-CH[Noun+Agt]+lAr[A3pl]+[Pnon]+YlA[Ins] : 23.3427734375  abac?[Noun]+lAr[A3pl]+[Pnon]+YlA[Ins] : 19.556640625  
    The for loop now loops over this, using a space as the input file separator. If in4 is a zero length string, we're on a line where the "word" is missing, so we prepend it.
    The script takes the input file name as its argument, and saving output to an output file can be done with simple redirection:
    ./script inputfile > outputfile  
  
Answer by Adam Katz for copying first string into second line

I am assuming the following format:
    word definitionkey : definitionvalue [definitionkey : definitionvalue ?]  
    None of those elements may contain a space and they are always delimited by a single space.
    The following code should work:
    awk '{ for (i=2; i<=NF; i+=3) print $1, $i, $(i+1), $(i+2) }' file  
    Explanation (this is the same code but with comments and more spaces):
    awk '    # match any line    {      # iterate over each "key : value"      for (i=2; i<=NF; i+=3)        print $1, $i, $(i+1), $(i+2)  # prints each "word key : value"    }  ' file  
    awk has some tricks that you may not be familiar with.  It works on a line-by-line basis.  Each stanza has an optional conditional before it (awk 'NF >=4 {?}' would make sense here since we'll have an error given fewer than four fields).  NF is the number of fields and a dollar sign ($) indicates we want the value of the given field, so $1 is the value of the first field, $NF is the value of the last field, and $(i+1) is the value of the third field (assuming i=2).  print will default to using spaces between its arguments and adds a line break at the end (otherwise, we'd need printf "%s %s %s %s\n", $1, $i, $(i+1), $(i+2), which is a bit harder to read).
  
Answer by Varun for copying first string into second line

Please find the following bash code
        #!/bin/bash      # read.sh      while read variable      do              for i in "$variable"              do                      var=`echo "$i" |wc -w`                      array_1=( $i )                      counter=0                      for((j=1 ; j < $var ; j++))                      do                              if [ $counter = 0 ]  #1                              then                                      echo -ne ${array_1[0]}' '                              fi #1                              echo -ne ${array_1[$j]}' '                              counter=$(expr $counter + 1)                              if [ $counter = 3 ] #2                              then                                      counter=0                                      echo                              fi #2                      done              done      done  
    I have tested and it is working.  To test  On bash shell prompt give the following command
         $ ./read.sh < input.txt > output.txt  
    where read.sh is script , input.txt is input file and  output.txt is where output is generated
  
Answer by Casimir et Hippolyte for copying first string into second line

With bash:
    while read -r line  do      pre=${line%% *}      echo "$line" | sed 's/\([0-9]\) /\1\n'$pre' /g'  done < "yourfile.txt"  
    This script read the file line by line. For each line, the prefix is extracted with a parameter expansion (all until the first space) and spaces preceded by a digit are replaced with a newline and the prefix using sed.
  
Answer by anishsane for copying first string into second line

Using perl:
    $ perl -nE 'm/([^ ]*) (.*)/; my $word=$1; $_=$2; say $word . " " . $_ for / *(.*?[0-9]+\.[0-9]+)/g;' < input.log    Output:  abacas? Abaca[Noun]+[Prop]+[A3sg]+SH[P3sg]+[Nom] : 20.1748046875  abac? Aba[Noun]+[Prop]+[A3sg]+SH[P3sg]+[Nom] : 16.3037109375  abac? Aba[Noun]+[Prop]+[A3sg]+[Pnon]+[Nom]-CH[Noun+Agt]+[A3sg]+[Pnon]+[Nom] : 23.0185546875  abac?larla Aba[Noun]+[Prop]+[A3sg]+[Pnon]+[Nom]-CH[Noun+Agt]+lAr[A3pl]+[Pnon]+YlA[Ins] : 27.8974609375  abac?larla aba[Noun]+[A3sg]+[Pnon]+[Nom]-CH[Noun+Agt]+lAr[A3pl]+[Pnon]+YlA[Ins] : 23.3427734375  abac?larla abac?[Noun]+lAr[A3pl]+[Pnon]+YlA[Ins] : 19.556640625  
    Explanation:
      Split the line to separate first field as word.
  Then split the remaining line using the regex .*?[0-9]+\.[0-9]+.
  Print word concatenated with every match of above regex.
  
  

  Fatal error:  Call to a member function getElementsByTagName() on a non-object in D:\XAMPP INSTALLASTION\xampp\htdocs\endunpratama9i\www-stackoverflow-info-proses.php on line 72







Share This:  
 Facebook
 Twitter
 Google+
 Stumble
 Digg

Discussion of Coding

Blog coding and discussion of coding about JavaScript, PHP, CGI, general web building etc.

Wednesday, January 6, 2016