0

I would like to capitalize the first word of every sentence in a string. For example this string:

apple Park will run one of the largest on-site solar energy installations in the world. it is also the site of the world’s largest naturally ventilated building.

Should become:

Apple Park will run one of the largest on-site solar energy installations in the world. It is also the site of the world’s largest naturally ventilated building.

I also would like that the capitalization don't happens when a world already has a capital letter among its characters, for example:

iPad is a mobile device.

Remains

iPad is a mobile device.

For the first part of this task, I could use this code by rintaro:

let str = "someSentenceWith UTF text İŞğĞ. anotherSentenceğüÜğ"

var result = ""
str.uppercaseString.enumerateSubstringsInRange(str.characters.indices, options: .BySentences) { (sub, _, _, _)  in
    result += String(sub!.characters.prefix(1))
    result += String(sub!.characters.dropFirst(1)).lowercaseString
}

print(result)

But its for Swift 2 and don't work for Swift 3.

Community
  • 1
  • 1
Cue
  • 2,952
  • 3
  • 33
  • 54
  • @JamesWebster haha you got it . – LC 웃 Mar 14 '17 at 09:15
  • 1
    Just convert your Swift 2 code to Swift 3 code and it will work (the docs will give you the new method names) - well it will work as well as it did in Swift 2 (in each sentence it uppercases the first letter and lowercases the rest, potentially losing uppercase letters within a sentence), but you can address that if required by improving the algorithm. HTH – CRD Mar 14 '17 at 10:05

1 Answers1

1

Is this what you're after?

Each sentence is iterated. For the first word in each sentence, if it contains a capital letter, nothing changes, otherwise, it is capitalized and the rest of the sentence is appended to the result.

let str = "this is a sentence without a brand named tablet. this too is a sentence but with iPad in it! iPad at start of sentence here?"
var result = ""

//Iterate each sentence
str.uppercased().enumerateSubstrings(in: str.startIndex ..< str.endIndex, options: .bySentences) { substring, range, _, _ in

    var original = str.substring(with: range)

    var capitalize = true

    //Iterate each word in the sentence
    substring!.enumerateSubstrings(in: substring!.startIndex ..< substring!.endIndex, options: .byWords) { word, wordRange , _ , stop in

        var originalWord = original.substring(with: wordRange)

        //If there is a capital letter in that word, don't capitalize it
        for character in originalWord.characters {
            if String(character).uppercased().characters.first == character {
                capitalize = false
                break
            }
        }

        //But always stop after the first word. It's the only one of concern
        stop = true
    }


    //Modify the first word if needed
    if capitalize {
        result += String(original.characters.prefix(1)).uppercased()
        result += String(original.characters.dropFirst(1))
    }
    else {
        result += original
    }

}
print(result)

outputs:

This is a sentence without a brand named tablet. This too is a sentence but with iPad in it! iPad at start of sentence here?

NB. I didn't focus on efficiency here. If you are going to use this for a large amount of data, you may want to profile it first!

Note

I don't think the .bySentences option is very robust. During my testing, I accidentally had two spaces in one of the sentences and it failed to parse properly. I've also just tried with your example "Apple..." sentences and it only finds one.

James Webster
  • 31,873
  • 11
  • 70
  • 114
  • Sorry but this doesn't actually work. Try changing your "THIS" to "this" and you'll see it fails. Note the original's use of `uppercaseString`, but it also munges more than the initial letters so it doesn't work properly either. – CRD Mar 14 '17 at 09:43
  • I think this is part of the problem with `.bySentences` I mention. It seems it's actually using the capital letter as a marker for a new sentence. Are your sentences only ever in English? Perhaps you could look for sentence end markers ([.?!, etc]) for the sentence parsing bit – James Webster Mar 14 '17 at 09:57
  • Yes it is using the initial capital letter as part of its sentence recognition, which is why the original solution contains uppercase & lowercase conversions. – CRD Mar 14 '17 at 10:09
  • @CRD. I'll have another go! – James Webster Mar 14 '17 at 10:09
  • @CRD, I've updated and this appears to work. It uses an uppercase string for the comparisons, but pulls values from the original where necessary. – James Webster Mar 14 '17 at 10:18
  • It works fine. Next step could be to find way to avoid it to make lowercase even the words with capital letters within a word thas is not in the start of a sentence. For example in: "new AAAS president emphasizes making the case for science". But it was not part of my question so that's okay. You did a great job thanks! – Cue Mar 14 '17 at 10:46
  • 1
    @Tel, I've amended. It should no longer lowercase the rest of the string! :) – James Webster Mar 14 '17 at 10:51
  • James - Good, you saved me adding an answer ;-) – CRD Mar 14 '17 at 17:55