相关文章推荐
Collectives™ on Stack Overflow

Find centralized, trusted content and collaborate around the technologies you use most.

Learn more about Collectives

Teams

Q&A for work

Connect and share knowledge within a single location that is structured and easy to search.

Learn more about Teams

I have tried to load 6.000.000 (6 mio) strings of 64 characters in length in order to sort them in VBA. What I have noticed is : 1. When I use an Array the memory occupied is around 916 MB 2. When I use an ArrayList the memory occupied goes to 1.105 MB None of them is reasonable to me as the strings size is around 380 MB. What I doing wrong ? As the numbers of strings will grow rapidly I will face 'Out of memory' very soon. Any idea will be welcome.

Demetres

I am developing in VBA in Excel. The statement that I use is: 'Set AL = CreateObject("System.Collections.ArrayList")' It was 'Dim AL As Object' declared. Demetres Dec 11, 2013 at 18:39

Most of the issue is the fact that VBA natively uses BSTRs , which are Unicode strings. I assume that your calculation of ~380 mb is based on 6 million * 64 characters @ 1 byte each. In actuality, the math works out to something like this:

  • VBA Strings are Unicode, which in this case means each character is 2 bytes.

  • A String in VBA is 4 bytes for internally storing the length before the string, 2 bytes for a unicode Null at the end of the string, and the 2 bytes per character.

  • That works out to 4 + (64 * 2) + 2 = 134 bytes per 64 character
    String.

  • Each entry in the String array is actually a pointer to the String,
    so that's another 4 bytes per slot, 138 in total so far.

  • Assuming 6 million of these Strings, that's 828,000,000 bytes (using commas US style) which, depending upon your definition of mb , is either 789.6 or 828 mb.

  • I'm not sure about the rest of the overhead, perhaps garbage collector reference counters?

    Anyway, I would suggest that you use 64 slot Byte arrays to load and store your strings, assuming it's ASCII characters. You eliminate (4 + 64 + 2) * 6,000,000 bytes and your code will presumably run faster because it doesn't need to compare as many bytes. You could probably optimize your sort by comparing a Word (32 or 64 bits depending upon your processor) at a time instead of just character by character.

    Update

    I think I was slightly wrong on that calculation. Byte Arrays are SAFEARRAYs which have quite a bit of overhead themselves, about 20 bytes. So the savings would be closer to (4 + 64 + 2 - 20) * 6,000,000.

    Raw ASCII String Sort Example

    Before you look at this example, please, please take my recommendation and import your text into Access to sort instead. 6 million strings for a total of 380 mb is well within Access' limits and Access can (as I understand it) sort them without resorting to loading all the strings into memory at the same time

    Create a text file called "data.txt" with the following text:

    Strings

    In add a code module and call it "mdlQuickSort" and add the following code. I haven't commented much, but if you're curious as to how it works you can read Wikipedia's article on QuickSort or let me know and I'll add better comments.

    Option Explicit
    Public Sub QuickSortInPlace(ByRef arrArray() As Variant)
        If UBound(arrArray) <= 1 Then
            Exit Sub
        End If
        qSort arrArray, 0, UBound(arrArray)
    End Sub
    Private Sub qSort(ByRef arrArray() As Variant, left As Long, right As Long)
        Dim pivot As Long
        Dim newPivotIndex As Long
        If left < right Then
            pivot = MedianOf3(arrArray, left, right)
            newPivotIndex = partition(arrArray, left, right, pivot)
            qSort arrArray, left, newPivotIndex - 1
            qSort arrArray, newPivotIndex + 1, right
        End If
    End Sub
    Private Function partition(ByRef arrArray() As Variant, left As Long, right As Long, pivot As Long) As Long
        Dim pivotValue As Variant
        pivotValue = arrArray(pivot)
        Swap arrArray, pivot, right
        Dim storeIndex As Long
        storeIndex = left
        Dim i As Long
        For i = left To right - 1
            If CompareFunc(arrArray(i), pivotValue) = -1 Then
                Swap arrArray, i, storeIndex
                storeIndex = storeIndex + 1
            End If
        Swap arrArray, storeIndex, right
        partition = storeIndex
    End Function
    Private Sub Swap(ByRef arrArray() As Variant, indexA As Long, indexB As Long)
        Dim temp As Variant
        temp = arrArray(indexA)
        arrArray(indexA) = arrArray(indexB)
        arrArray(indexB) = temp
    End Sub
    Private Function MedianOf3(ByRef arrArray() As Variant, left As Long, right As Long) As Long
        Dim a As Variant, b As Variant, c As Variant
        Dim indexA As Long, indexB As Long, indexC As Long
        Dim ab As Long
        Dim bc As Long
        Dim ac As Long
        indexA = left
        indexB = (left + right) \ 2
        indexC = right
        a = arrArray(indexA)
        b = arrArray(indexB)
        c = arrArray(indexC)
        ab = CompareFunc(a, b)
        bc = CompareFunc(b, c)
        ac = CompareFunc(a, c)
        If ab = -1 Then
            If ac = -1 Then
                If bc = -1 Or bc = 0 Then
                    'a b c
                    'Already in B
                    'a c b
                    Swap arrArray, indexB, indexC
                End If
                'c a b
                Swap arrArray, indexA, indexB
            End If
            If bc = -1 Then
                If ac = -1 Then
                    'b a c
                    Swap arrArray, indexA, indexB
                    'b c a
                    Swap arrArray, indexB, indexC
                End If
                'c b a
                'Already in B
            End If
        End If
        MedianOf3 = indexB
    End Function
    Private Function CompareFunc(str_a As Variant, str_b As Variant) As Long
        Dim a As Byte
        Dim b As Byte
        Dim i As Long
        For i = 0 To 63
            a = str_a(i)
            b = str_b(i)
            If a <> b Then
                Exit For
            End If
        If i <= 63 Then
            If a < b Then
                CompareFunc = -1
                CompareFunc = 1
            End If
            CompareFunc = 0
        End If
    End Function
    

    Finally, add a module called "mdlMain". This is where the Strings are loaded. Here is the code:

    Option Explicit
    Public Sub Main()
        Dim arrStrings() As Variant
        Dim i As Long
        'Get the strings from the file
        FillArrStringsInPlace arrStrings
        'Print the unsorted list
        Debug.Print "Unsorted Strings" & vbCrLf & "---------------------"
        For i = 0 To UBound(arrStrings)
            Debug.Print StrConv(arrStrings(i), vbUnicode)
        'Sort in place
        QuickSortInPlace arrStrings
        'Print the sorted list
        Debug.Print vbCrLf & vbCrLf & "Sorted Strings" & vbCrLf & "---------------------"
        For i = 0 To UBound(arrStrings)
            Debug.Print StrConv(arrStrings(i), vbUnicode)
    End Sub
    Public Sub FillArrStringsInPlace(ByRef arr() As Variant)
        Dim iFile As Integer
        Dim strInput As String
        Dim lineCount As Long
        Dim arrBytes() As Byte
        'Open a file in the same folder as this Access db called "data.txt"
        iFile = FreeFile
        Open ActiveWorkbook.Path & "\data.txt" For Input As iFile
        'Since I already know how many strings there are, I'm assigning it here.
        'The alternatives would be to either "dynamically resize" the array, which
        'is equivalent to copying the entire thing everytime you add a new string,
        'Or to count the number of newlines in the file and dimensioning the array
        'to that size before reading in the strings line by line.  Neither is as
        'efficient as just defining it before-hand.
        ReDim arr(0 To 7)
        While Not EOF(iFile)
            Line Input #iFile, strInput
            arrBytes = StrConv(strInput, vbFromUnicode)
            ReDim Preserve arrBytes(0 To 63)
            arr(lineCount) = arrBytes
            lineCount = lineCount + 1
        Close iFile
    End Sub
    

    I had put some code in there to try and optimize things with CopyMemory, but it was a tad dangerous, so I decided to leave it out.

    Really an answer !!! Thanks a lot. I will also give a try to your suggestion of using Byte arrays. – Demetres Dec 12, 2013 at 13:56 Blackhawk dear, the process you are suggesting is a little bit complex to me. If you can, give me any hint or links to see. – Demetres Dec 12, 2013 at 15:30 @user3092241 If the number of strings really is going to "grow rapidly" (and you are going to forge ahead with the VBA sort instead of Access), I suggest splitting the set of strings up into manageable chunks of about 100 mb each, sort each chunk individually, then finally merge them back together. That will prevent the Out of Memory errors from occurring since you never have more than a couple hundred mb in memory at any one time. – Blackhawk Dec 12, 2013 at 23:31

    Thanks for contributing an answer to Stack Overflow!

    • Please be sure to answer the question. Provide details and share your research!

    But avoid

    • Asking for help, clarification, or responding to other answers.
    • Making statements based on opinion; back them up with references or personal experience.

    To learn more, see our tips on writing great answers.

     
    推荐文章